首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In the analysis of data generated by change-point processes, one critical challenge is to determine the number of change-points. The classic Bayes information criterion (BIC) statistic does not work well here because of irregularities in the likelihood function. By asymptotic approximation of the Bayes factor, we derive a modified BIC for the model of Brownian motion with changing drift. The modified BIC is similar to the classic BIC in the sense that the first term consists of the log likelihood, but it differs in the terms that penalize for model dimension. As an example of application, this new statistic is used to analyze array-based comparative genomic hybridization (array-CGH) data. Array-CGH measures the number of chromosome copies at each genome location of a cell sample, and is useful for finding the regions of genome deletion and amplification in tumor cells. The modified BIC performs well compared to existing methods in accurately choosing the number of regions of changed copy number. Unlike existing methods, it does not rely on tuning parameters or intensive computing. Thus it is impartial and easier to understand and to use.  相似文献   

2.
Saville BR  Herring AH 《Biometrics》2009,65(2):369-376
Summary .  Deciding which predictor effects may vary across subjects is a difficult issue. Standard model selection criteria and test procedures are often inappropriate for comparing models with different numbers of random effects due to constraints on the parameter space of the variance components. Testing on the boundary of the parameter space changes the asymptotic distribution of some classical test statistics and causes problems in approximating Bayes factors. We propose a simple approach for testing random effects in the linear mixed model using Bayes factors. We scale each random effect to the residual variance and introduce a parameter that controls the relative contribution of each random effect free of the scale of the data. We integrate out the random effects and the variance components using closed-form solutions. The resulting integrals needed to calculate the Bayes factor are low-dimensional integrals lacking variance components and can be efficiently approximated with Laplace's method. We propose a default prior distribution on the parameter controlling the contribution of each random effect and conduct simulations to show that our method has good properties for model selection problems. Finally, we illustrate our methods on data from a clinical trial of patients with bipolar disorder and on data from an environmental study of water disinfection by-products and male reproductive outcomes.  相似文献   

3.
Ball RD 《Genetics》2007,177(4):2399-2416
We calculate posterior probabilities for candidate genes as a function of genomic location. Posterior probabilities for quantitative trait loci (QTL) presence in a small interval are calculated using a Bayesian model-selection approach based on the Bayesian information criterion (BIC) and used to combine QTL colocation information with sequence-specific evidence, e.g., from differential expression and/or association studies. Our method takes into account uncertainty in estimation of number and locations of QTL and estimated map position. Posterior probabilities for QTL presence were calculated for simulated data with n = 100, 300, and 1200 QTL progeny and compared with interval mapping and composite-interval mapping. Candidate genes that mapped to QTL regions had substantially larger posterior probabilities. Among candidates with a given Bayes factor, those that map near a QTL are more promising for further investigation with association studies and functional testing or for use in marker-aided selection. The BIC is shown to correspond very closely to Bayes factors for linear models with a nearly noninformative Zellner prior for the simulated QTL data with n > or = 100. It is shown how to modify the BIC to use a subjective prior for the QTL effects.  相似文献   

4.
Under the model of independent test statistics, we propose atwo-parameter family of Bayes multiple testing procedures. Thetwo parameters can be viewed as tuning parameters. Using theBenjamini–Hochberg step-up procedure for controlling falsediscovery rate as a baseline for conservativeness, we choosethe tuning parameters to compromise between the operating characteristicsof that procedure and a less conservative procedure that focuseson alternatives that a priori might be considered likely ormeaningful. The Bayes procedures do not have the theoreticaland practical shortcomings of the popular stepwise procedures.In terms of the number of mistakes, simulations for two examplesindicate that over a large segment of the parameter space, theBayes procedure is preferable to the step-up procedure. Anotherdesirable feature of the procedures is that they are computationallyfeasible for any number of hypotheses.  相似文献   

5.
Ripatti S  Palmgren J 《Biometrics》2000,56(4):1016-1022
There exists a growing literature on the estimation of gamma distributed multiplicative shared frailty models. There is, however, often a need to model more complicated frailty structures, but attempts to extend gamma frailties run into complications. Motivated by hip replacement data with a more complicated dependence structure, we propose a model based on multiplicative frailties with a multivariate log-normal joint distribution. We give a justification and an estimation procedure for this generally structured frailty model, which is a generalization of the one presented by McGilchrist (1993, Biometrics 49, 221-225). The estimation is based on Laplace approximation of the likelihood function. This leads to estimating equations based on a penalized fixed effects partial likelihood, where the marginal distribution of the frailty terms determines the penalty term. The tuning parameters of the penalty function, i.e., the frailty variances, are estimated by maximizing an approximate profile likelihood. The performance of the approximation is evaluated by simulation, and the frailty model is fitted to the hip replacement data.  相似文献   

6.
The penalized least squares approach with smoothly clipped absolutedeviation penalty has been consistently demonstrated to be anattractive regression shrinkage and selection method. It notonly automatically and consistently selects the important variables,but also produces estimators which are as efficient as the oracleestimator. However, these attractive features depend on appropriatechoice of the tuning parameter. We show that the commonly usedgeneralized crossvalidation cannot select the tuning parametersatisfactorily, with a nonignorable overfitting effect in theresulting model. In addition, we propose a BIC tuning parameterselector, which is shown to be able to identify the true modelconsistently. Simulation studies are presented to support theoreticalfindings, and an empirical example is given to illustrate itsuse in the Female Labor Supply data.  相似文献   

7.
Recent interest in cancer research focuses on predicting patients' survival by investigating gene expression profiles based on microarray analysis. We propose a doubly penalized Buckley-James method for the semiparametric accelerated failure time model to relate high-dimensional genomic data to censored survival outcomes, which uses the elastic-net penalty that is a mixture of L1- and L2-norm penalties. Similar to the elastic-net method for a linear regression model with uncensored data, the proposed method performs automatic gene selection and parameter estimation, where highly correlated genes are able to be selected (or removed) together. The two-dimensional tuning parameter is determined by generalized crossvalidation. The proposed method is evaluated by simulations and applied to the Michigan squamous cell lung carcinoma study.  相似文献   

8.
Huang J  Harrington D 《Biometrics》2002,58(4):781-791
The Cox proportional hazards model is often used for estimating the association between covariates and a potentially censored failure time, and the corresponding partial likelihood estimators are used for the estimation and prediction of relative risk of failure. However, partial likelihood estimators are unstable and have large variance when collinearity exists among the explanatory variables or when the number of failures is not much greater than the number of covariates of interest. A penalized (log) partial likelihood is proposed to give more accurate relative risk estimators. We show that asymptotically there always exists a penalty parameter for the penalized partial likelihood that reduces mean squared estimation error for log relative risk, and we propose a resampling method to choose the penalty parameter. Simulations and an example show that the bootstrap-selected penalized partial likelihood estimators can, in some instances, have smaller bias than the partial likelihood estimators and have smaller mean squared estimation and prediction errors of log relative risk. These methods are illustrated with a data set in multiple myeloma from the Eastern Cooperative Oncology Group.  相似文献   

9.
Summary .   Recent interest in cancer research focuses on predicting patients' survival by investigating gene expression profiles based on microarray analysis. We propose a doubly penalized Buckley–James method for the semiparametric accelerated failure time model to relate high-dimensional genomic data to censored survival outcomes, which uses the elastic-net penalty that is a mixture of L 1- and L 2-norm penalties. Similar to the elastic-net method for a linear regression model with uncensored data, the proposed method performs automatic gene selection and parameter estimation, where highly correlated genes are able to be selected (or removed) together. The two-dimensional tuning parameter is determined by generalized crossvalidation. The proposed method is evaluated by simulations and applied to the Michigan squamous cell lung carcinoma study.  相似文献   

10.
Roy J  Daniels MJ 《Biometrics》2008,64(2):538-545
Summary .   In this article we consider the problem of fitting pattern mixture models to longitudinal data when there are many unique dropout times. We propose a marginally specified latent class pattern mixture model. The marginal mean is assumed to follow a generalized linear model, whereas the mean conditional on the latent class and random effects is specified separately. Because the dimension of the parameter vector of interest (the marginal regression coefficients) does not depend on the assumed number of latent classes, we propose to treat the number of latent classes as a random variable. We specify a prior distribution for the number of classes, and calculate (approximate) posterior model probabilities. In order to avoid the complications with implementing a fully Bayesian model, we propose a simple approximation to these posterior probabilities. The ideas are illustrated using data from a longitudinal study of depression in HIV-infected women.  相似文献   

11.
A common problem in molecular phylogenetics is choosing a model of DNA substitution that does a good job of explaining the DNA sequence alignment without introducing superfluous parameters. A number of methods have been used to choose among a small set of candidate substitution models, such as the likelihood ratio test, the Akaike Information Criterion (AIC), the Bayesian Information Criterion (BIC), and Bayes factors. Current implementations of any of these criteria suffer from the limitation that only a small set of models are examined, or that the test does not allow easy comparison of non-nested models. In this article, we expand the pool of candidate substitution models to include all possible time-reversible models. This set includes seven models that have already been described. We show how Bayes factors can be calculated for these models using reversible jump Markov chain Monte Carlo, and apply the method to 16 DNA sequence alignments. For each data set, we compare the model with the best Bayes factor to the best models chosen using AIC and BIC. We find that the best model under any of these criteria is not necessarily the most complicated one; models with an intermediate number of substitution types typically do best. Moreover, almost all of the models that are chosen as best do not constrain a transition rate to be the same as a transversion rate, suggesting that it is the transition/transversion rate bias that plays the largest role in determining which models are selected. Importantly, the reversible jump Markov chain Monte Carlo algorithm described here allows estimation of phylogeny (and other phylogenetic model parameters) to be performed while accounting for uncertainty in the model of DNA substitution.  相似文献   

12.
Model averaging is gaining popularity among ecologists for making inference and predictions. Methods for combining models include Bayesian model averaging (BMA) and Akaike’s Information Criterion (AIC) model averaging. BMA can be implemented with different prior model weights, including the Kullback–Leibler prior associated with AIC model averaging, but it is unclear how the prior model weight affects model results in a predictive context. Here, we implemented BMA using the Bayesian Information Criterion (BIC) approximation to Bayes factors for building predictive models of bird abundance and occurrence in the Chihuahuan Desert of New Mexico. We examined how model predictive ability differed across four prior model weights, and how averaged coefficient estimates, standard errors and coefficients’ posterior probabilities varied for 16 bird species. We also compared the predictive ability of BMA models to a best single-model approach. Overall, Occam’s prior of parsimony provided the best predictive models. In general, the Kullback–Leibler prior, however, favored complex models of lower predictive ability. BMA performed better than a best single-model approach independently of the prior model weight for 6 out of 16 species. For 6 other species, the choice of the prior model weight affected whether BMA was better than the best single-model approach. Our results demonstrate that parsimonious priors may be favorable over priors that favor complexity for making predictions. The approach we present has direct applications in ecology for better predicting patterns of species’ abundance and occurrence.  相似文献   

13.
A Bayesian procedure is developed for the selection of concomitant variables in survival models. The variables are selected in a step-up procedure according to the criterion of maximum expected likelihood, where the expectation is over the prior parameter space. Prior knowledge of the influence of these covariates on patient prognosis is incorporated into the analysis. The step-up procedure is stopped when the Bayes factor in favor of omitting the variable selected in a particular step exceeds a specified value. The resulting model with the selected variables is fitted using Bayes estimates of the coefficients. This technique is applied to Hodgkin's disease data from a large Cooperative Clinical Trial Group and the results are compared to the results from the classical likelihood selection procedure.  相似文献   

14.
Nonparametric mixed effects models for unequally sampled noisy curves   总被引:7,自引:0,他引:7  
Rice JA  Wu CO 《Biometrics》2001,57(1):253-259
We propose a method of analyzing collections of related curves in which the individual curves are modeled as spline functions with random coefficients. The method is applicable when the individual curves are sampled at variable and irregularly spaced points. This produces a low-rank, low-frequency approximation to the covariance structure, which can be estimated naturally by the EM algorithm. Smooth curves for individual trajectories are constructed as best linear unbiased predictor (BLUP) estimates, combining data from that individual and the entire collection. This framework leads naturally to methods for examining the effects of covariates on the shapes of the curves. We use model selection techniques--Akaike information criterion (AIC), Bayesian information criterion (BIC), and cross-validation--to select the number of breakpoints for the spline approximation. We believe that the methodology we propose provides a simple, flexible, and computationally efficient means of functional data analysis.  相似文献   

15.
A fundamental issue in quantitative trait locus (QTL) mapping is to determine the plausibility of the presence of a QTL at a given genome location. Bayesian analysis offers an attractive way of testing alternative models (here, QTL vs. no-QTL) via the Bayes factor. There have been several numerical approaches to computing the Bayes factor, mostly based on Markov Chain Monte Carlo (MCMC), but these strategies are subject to numerical or stability problems. We propose a simple and stable approach to calculating the Bayes factor between nested models. The procedure is based on a reparameterization of a variance component model in terms of intra-class correlation. The Bayes factor can then be easily calculated from the output of a MCMC scheme by averaging conditional densities at the null intra-class correlation. We studied the performance of the method using simulation. We applied this approach to QTL analysis in an outbred population. We also compared it with the Likelihood Ratio Test and we analyzed its stability. Simulation results were very similar to the simulated parameters. The posterior probability of the QTL model increases as the QTL effect does. The location of the QTL was also correctly obtained. The use of meta-analysis is suggested from the properties of the Bayes factor.  相似文献   

16.
Most statistical methods for censored survival data assume there is no dependence between the lifetime and censoring mechanisms, an assumption which is often doubtful in practice. In this paper we study a parametric model which allows for dependence in terms of a parameter delta and a bias function B(t, theta). We propose a sensitivity analysis on the estimate of the parameter of interest for small values of delta. This parameter measures the dependence between the lifetime and the censoring mechanisms. Its size can be interpreted in terms of a correlation coefficient between the two mechanisms. A medical example suggests that even a small degree of dependence between the failure and censoring processes can have a noticeable effect on the analysis.  相似文献   

17.
Baierl A  Bogdan M  Frommlet F  Futschik A 《Genetics》2006,173(3):1693-1703
A modified version (mBIC) of the Bayesian Information Criterion (BIC) has been previously proposed for backcross designs to locate multiple interacting quantitative trait loci. In this article, we extend the method to intercross designs. We also propose two modifications of the mBIC. First we investigate a two-stage procedure in the spirit of empirical Bayes methods involving an adaptive (i.e., data-based) choice of the penalty. The purpose of the second modification is to increase the power of detecting epistasis effects at loci where main effects have already been detected. We investigate the proposed methods by computer simulations under a wide range of realistic genetic models, with nonequidistant marker spacings and missing data. In the case of large intermarker distances we use imputations according to Haley and Knott regression to reduce the distance between searched positions to not more than 10 cM. Haley and Knott regression is also used to handle missing data. The simulation study as well as real data analyses demonstrates good properties of the proposed method of QTL detection.  相似文献   

18.
Zhang B  Betensky RA 《Human genetics》2006,119(6):642-648
We consider the problem of accurate classification of family relationship in the presence of laboratory error without parental data. We first propose an adjusted version of the test statistic proposed by Ehm and Wagner based on the summation over a large number of genetics markers. We then propose use of the Bayes factor as a classification rule. We prove theoretically that the Bayes factor is the optimal classification rule in that the total classification error is minimized. We show via simulations that both the adjusted Ehm and Wagner method and Bayes factor classification rule reduce misclassification errors, and that the Bayes factor classification rule is robust against under-estimation or over-estimation of laboratory errors. For monozygotic twins versus dizygotic twins, the correct classification rate of the Bayes rule is over 99%. For full-siblings versus half-siblings, the Bayes factor classification rule generally outperforms Ehm and Wagner’s method (in Am J Hum Genet 62:181–188, 1998), especially when full-sibling proportion is high.Electronic Supplementary Material Supplementary material is available for this article at and is accessible for authorized users.  相似文献   

19.
Bayesian adaptive sequence alignment algorithms   总被引:3,自引:1,他引:2  
The selection of a scoring matrix and gap penalty parameters continues to be an important problem in sequence alignment. We describe here an algorithm, the 'Bayes block aligner, which bypasses this requirement. Instead of requiring a fixed set of parameter settings, this algorithm returns the Bayesian posterior probability for the number of gaps and for the scoring matrices in any series of interest. Furthermore, instead of returning the single best alignment for the chosen parameter settings, this algorithm returns the posterior distribution of all alignments considering the full range of gapping and scoring matrices selected, weighing each in proportion to its probability based on the data. We compared the Bayes aligner with the popular Smith-Waterman algorithm with parameter settings from the literature which had been optimized for the identification of structural neighbors, and found that the Bayes aligner correctly identified more structural neighbors. In a detailed examination of the alignment of a pair of kinase and a pair of GTPase sequences, we illustrate the algorithm's potential to identify subsequences that are conserved to different degrees. In addition, this example shows that the Bayes aligner returns an alignment-free assessment of the distance between a pair of sequences.   相似文献   

20.
We propose a Bayesian method for testing molecular clock hypotheses for use with aligned sequence data from multiple taxa. Our method utilizes a nonreversible nucleotide substitution model to avoid the necessity of specifying either a known tree relating the taxa or an outgroup for rooting the tree. We employ reversible jump Markov chain Monte Carlo to sample from the posterior distribution of the phylogenetic model parameters and conduct hypothesis testing using Bayes factors, the ratio of the posterior to prior odds of competing models. Here, the Bayes factors reflect the relative support of the sequence data for equal rates of evolutionary change between taxa versus unequal rates, averaged over all possible phylogenetic parameters, including the tree and root position. As the molecular clock model is a restriction of the more general unequal rates model, we use the Savage-Dickey ratio to estimate the Bayes factors. The Savage-Dickey ratio provides a convenient approach to calculating Bayes factors in favor of sharp hypotheses. Critical to calculating the Savage-Dickey ratio is a determination of the prior induced on the modeling restrictions. We demonstrate our method on a well-studied mtDNA sequence data set consisting of nine primates. We find strong support against a global molecular clock, but do find support for a local clock among the anthropoids. We provide mathematical derivations of the induced priors on branch length restrictions assuming equally likely trees. These derivations also have more general applicability to the examination of prior assumptions in Bayesian phylogenetics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号