首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 546 毫秒
1.
Nearly all current Bayesian phylogenetic applications rely on Markov chain Monte Carlo (MCMC) methods to approximate the posterior distribution for trees and other parameters of the model. These approximations are only reliable if Markov chains adequately converge and sample from the joint posterior distribution. Although several studies of phylogenetic MCMC convergence exist, these have focused on simulated data sets or select empirical examples. Therefore, much that is considered common knowledge about MCMC in empirical systems derives from a relatively small family of analyses under ideal conditions. To address this, we present an overview of commonly applied phylogenetic MCMC diagnostics and an assessment of patterns of these diagnostics across more than 18,000 empirical analyses. Many analyses appeared to perform well and failures in convergence were most likely to be detected using the average standard deviation of split frequencies, a diagnostic that compares topologies among independent chains. Different diagnostics yielded different information about failed convergence, demonstrating that multiple diagnostics must be employed to reliably detect problems. The number of taxa and average branch lengths in analyses have clear impacts on MCMC performance, with more taxa and shorter branches leading to more difficult convergence. We show that the usage of models that include both Γ-distributed among-site rate variation and a proportion of invariable sites is not broadly problematic for MCMC convergence but is also unnecessary. Changes to heating and the usage of model-averaged substitution models can both offer improved convergence in some cases, but neither are a panacea.  相似文献   

2.
This paper introduces a flexible and adaptive nonparametric method for estimating the association between multiple covariates and power spectra of multiple time series. The proposed approach uses a Bayesian sum of trees model to capture complex dependencies and interactions between covariates and the power spectrum, which are often observed in studies of biomedical time series. Local power spectra corresponding to terminal nodes within trees are estimated nonparametrically using Bayesian penalized linear splines. The trees are considered to be random and fit using a Bayesian backfitting Markov chain Monte Carlo (MCMC) algorithm that sequentially considers tree modifications via reversible-jump MCMC techniques. For high-dimensional covariates, a sparsity-inducing Dirichlet hyperprior on tree splitting proportions is considered, which provides sparse estimation of covariate effects and efficient variable selection. By averaging over the posterior distribution of trees, the proposed method can recover both smooth and abrupt changes in the power spectrum across multiple covariates. Empirical performance is evaluated via simulations to demonstrate the proposed method's ability to accurately recover complex relationships and interactions. The proposed methodology is used to study gait maturation in young children by evaluating age-related changes in power spectra of stride interval time series in the presence of other covariates.  相似文献   

3.
The software tool PBEAM provides a parallel implementation of the BEAM, which is the first algorithm for large scale epistatic interaction mapping, including genome-wide studies with hundreds of thousands of markers. BEAM describes markers and their interactions with a Bayesian partitioning model and computes the posterior probability of each marker sets via Markov Chain Monte Carlo (MCMC). PBEAM takes the advantage of simulating multiple Markov chains simultaneously. This design can efficiently reduce ~n-fold execution time in the circumstance of n CPUs. The implementation of PBEAM is based on MPI libraries.

Availability

PBEAM is available for download at http://bioinfo.au.tsinghua.edu.cn/pbeam/  相似文献   

4.
Summary .  We compare two Monte Carlo (MC) procedures, sequential importance sampling (SIS) and Markov chain Monte Carlo (MCMC), for making Bayesian inferences about the unknown states and parameters of state–space models for animal populations. The procedures were applied to both simulated and real pup count data for the British grey seal metapopulation, as well as to simulated data for a Chinook salmon population. The MCMC implementation was based on tailor-made proposal distributions combined with analytical integration of some of the states and parameters. SIS was implemented in a more generic fashion. For the same computing time MCMC tended to yield posterior distributions with less MC variation across different runs of the algorithm than the SIS implementation with the exception in the seal model of some states and one of the parameters that mixed quite slowly. The efficiency of the SIS sampler greatly increased by analytically integrating out unknown parameters in the observation model. We consider that a careful implementation of MCMC for cases where data are informative relative to the priors sets the gold standard, but that SIS samplers are a viable alternative that can be programmed more quickly. Our SIS implementation is particularly competitive in situations where the data are relatively uninformative; in other cases, SIS may require substantially more computer power than an efficient implementation of MCMC to achieve the same level of MC error.  相似文献   

5.
Estimating species trees using multiple-allele DNA sequence data   总被引:3,自引:0,他引:3  
Several techniques, such as concatenation and consensus methods, are available for combining data from multiple loci to produce a single statement of phylogenetic relationships. However, when multiple alleles are sampled from individual species, it becomes more challenging to estimate relationships at the level of species, either because concatenation becomes inappropriate due to conflicts among individual gene trees, or because the species from which multiple alleles have been sampled may not form monophyletic groups in the estimated tree. We propose a Bayesian hierarchical model to reconstruct species trees from multiple-allele, multilocus sequence data, building on a recently proposed method for estimating species trees from single allele multilocus data. A two-step Markov Chain Monte Carlo (MCMC) algorithm is adopted to estimate the posterior distribution of the species tree. The model is applied to estimate the posterior distribution of species trees for two multiple-allele datasets--yeast (Saccharomyces) and birds (Manacus-manakins). The estimates of the species trees using our method are consistent with those inferred from other methods and genetic markers, but in contrast to other species tree methods, it provides credible regions for the species tree. The Bayesian approach described here provides a powerful framework for statistical testing and integration of population genetics and phylogenetics.  相似文献   

6.
We consider inference for demographic models and parameters based upon postprocessing the output of an MCMC method that generates samples of genealogical trees (from the posterior distribution for a specific prior distribution of the genealogy). This approach has the advantage of taking account of the uncertainty in the inference for the tree when making inferences about the demographic model and can be computationally efficient in terms of reanalyzing data under a wide variety of models. We consider a (simulation-consistent) estimate of the likelihood for variable population size models, which uses importance sampling, and propose two new approximate likelihoods, one for migration models and one for continuous spatial models.  相似文献   

7.
并行编程技术可以有效提高算法的执行效率。文中分别利用CPU的单指令多数据流扩展指令集(Streaming SIMD Extensions,SSE)技术和多核并行编程技术,对脉冲耦合神经网络(Pulse Coupled Neural Network,PCNN)分割算法进行并行编程优化,以减少算法的运行时间。实验结果表明,SSE技术以及多核并行编程技术大大加快了PCNN分割算法的运行速度,有效提高了算法的执行效率,在一定程度上解决了该方法计算量大、耗时多的问题,具有应用于医学图像处理的潜在价值。  相似文献   

8.
A key element to a successful Markov chain Monte Carlo (MCMC) inference is the programming and run performance of the Markov chain. However, the explicit use of quality assessments of the MCMC simulations-convergence diagnostics-in phylogenetics is still uncommon. Here, we present a simple tool that uses the output from MCMC simulations and visualizes a number of properties of primary interest in a Bayesian phylogenetic analysis, such as convergence rates of posterior split probabilities and branch lengths. Graphical exploration of the output from phylogenetic MCMC simulations gives intuitive and often crucial information on the success and reliability of the analysis. The tool presented here complements convergence diagnostics already available in other software packages primarily designed for other applications of MCMC. Importantly, the common practice of using trace-plots of a single parameter or summary statistic, such as the likelihood score of sampled trees, can be misleading for assessing the success of a phylogenetic MCMC simulation. AVAILABILITY: The program is available as source under the GNU General Public License and as a web application at http://ceb.scs.fsu.edu/awty.  相似文献   

9.
Linkage analysis of quantitative trait loci in multiple line crosses   总被引:8,自引:0,他引:8  
Yi N  Xu S 《Genetica》2002,114(3):217-230
Simple line crosses, for example, backcross and F2, are commonly used in mapping quantitative trait loci (QTL). However, these simple crosses are rarely used alone in commercial plant breeding; rather, crosses involving multiple inbred lines or several simple crosses but connected by shared inbred lines may be common in plant breeding. Mapping QTL using crosses of multiple lines is more relevant to plant breeding. Unfortunately, current statistical methods and computer programs of QTL mapping are all designed for simple line crosses or multiple line crosses but under a regular mating system. It is not straightforward to extend the existing methods to handle multiple line crosses under irregular and complicated mating designs. The major hurdle comes from irregular inbreeding, multiple generations, and multiple alleles. In this study, we develop a Bayesian method implemented via the Markov chain Monte Carlo (MCMC) algorithm for mapping QTL using complicated multiple line crosses. With the MCMC algorithm, we are able to draw a complete path of the gene flow from founder alleles to their descendents via a recursive process. This has greatly simplified the problem caused by irregular mating and inbreeding in the mapping population. Adopting the reversible jump MCMC algorithm, we are able to simultaneously search for multiple QTL along the genome. We can even infer the posterior distribution of the number of QTL, one of the most important parameters in QTL study. Application of the new MCMC based QTL mapping procedure is demonstrated using two different mating designs. Design I involves two inbred lines and their derived F1, F2, and BC populations. Design II is a half-diallel cross involving three inbred lines. The two designs appear different, but can be handled with the same robust computer program.  相似文献   

10.
Bayesian adaptive Markov chain Monte Carlo estimation of genetic parameters   总被引:2,自引:0,他引:2  
Accurate and fast estimation of genetic parameters that underlie quantitative traits using mixed linear models with additive and dominance effects is of great importance in both natural and breeding populations. Here, we propose a new fast adaptive Markov chain Monte Carlo (MCMC) sampling algorithm for the estimation of genetic parameters in the linear mixed model with several random effects. In the learning phase of our algorithm, we use the hybrid Gibbs sampler to learn the covariance structure of the variance components. In the second phase of the algorithm, we use this covariance structure to formulate an effective proposal distribution for a Metropolis-Hastings algorithm, which uses a likelihood function in which the random effects have been integrated out. Compared with the hybrid Gibbs sampler, the new algorithm had better mixing properties and was approximately twice as fast to run. Our new algorithm was able to detect different modes in the posterior distribution. In addition, the posterior mode estimates from the adaptive MCMC method were close to the REML (residual maximum likelihood) estimates. Moreover, our exponential prior for inverse variance components was vague and enabled the estimated mode of the posterior variance to be practically zero, which was in agreement with the support from the likelihood (in the case of no dominance). The method performance is illustrated using simulated data sets with replicates and field data in barley.  相似文献   

11.
MOTIVATION: We present a statistical method for detecting recombination, whose objective is to accurately locate the recombinant breakpoints in DNA sequence alignments of small numbers of taxa (4 or 5). Our approach explicitly models the sequence of phylogenetic tree topologies along a multiple sequence alignment. Inference under this model is done in a Bayesian way, using Markov chain Monte Carlo (MCMC). The algorithm returns the site-dependent posterior probability of each tree topology, which is used for detecting recombinant regions and locating their breakpoints. RESULTS: The method was tested on a synthetic and three real DNA sequence alignments, where it was found to outperform the established detection methods PLATO, RECPARS, and TOPAL.  相似文献   

12.
Multigene sequence data have great potential for elucidating important and interesting evolutionary processes, but statistical methods for extracting information from such data remain limited. Although various biological processes may cause different genes to have different genealogical histories (and hence different tree topologies), we also may expect that the number of distinct topologies among a set of genes is relatively small compared with the number of possible topologies. Therefore evidence about the tree topology for one gene should influence our inferences of the tree topology on a different gene, but to what extent? In this paper, we present a new approach for modeling and estimating concordance among a set of gene trees given aligned molecular sequence data. Our approach introduces a one-parameter probability distribution to describe the prior distribution of concordance among gene trees. We describe a novel 2-stage Markov chain Monte Carlo (MCMC) method that first obtains independent Bayesian posterior probability distributions for individual genes using standard methods. These posterior distributions are then used as input for a second MCMC procedure that estimates a posterior distribution of gene-to-tree maps (GTMs). The posterior distribution of GTMs can then be summarized to provide revised posterior probability distributions for each gene (taking account of concordance) and to allow estimation of the proportion of the sampled genes for which any given clade is true (the sample-wide concordance factor). Further, under the assumption that the sampled genes are drawn randomly from a genome of known size, we show how one can obtain an estimate, with credibility intervals, on the proportion of the entire genome for which a clade is true (the genome-wide concordance factor). We demonstrate the method on a set of 106 genes from 8 yeast species.  相似文献   

13.
Markov chain Monte Carlo (MCMC) is a methodology that is gaining widespread use in the phylogenetics community and is central to phylogenetic software packages such as MrBayes. An important issue for users of MCMC methods is how to select appropriate values for adjustable parameters such as the length of the Markov chain or chains, the sampling density, the proposal mechanism, and, if Metropolis-coupled MCMC is being used, the number of heated chains and their temperatures. Although some parameter settings have been examined in detail in the literature, others are frequently chosen with more regard to computational time or personal experience with other data sets. Such choices may lead to inadequate sampling of tree space or an inefficient use of computational resources. We performed a detailed study of convergence and mixing for 70 randomly selected, putatively orthologous protein sets with different sizes and taxonomic compositions. Replicated runs from multiple random starting points permit a more rigorous assessment of convergence, and we developed two novel statistics, delta and epsilon, for this purpose. Although likelihood values invariably stabilized quickly, adequate sampling of the posterior distribution of tree topologies took considerably longer. Our results suggest that multimodality is common for data sets with 30 or more taxa and that this results in slow convergence and mixing. However, we also found that the pragmatic approach of combining data from several short, replicated runs into a "metachain" to estimate bipartition posterior probabilities provided good approximations, and that such estimates were no worse in approximating a reference posterior distribution than those obtained using a single long run of the same length as the metachain. Precision appears to be best when heated Markov chains have low temperatures, whereas chains with high temperatures appear to sample trees with high posterior probabilities only rarely.  相似文献   

14.
We have developed simulated annealing algorithms to solve theproblem of multiple sequence alignment. The algorithm wns shownto give the optimal solution as confirmed by the rigorous dynamicprogramming algorithm for three-sequence alignment. To overcomelong execution times for simulated annealing, we utilized aparallel computer. A sequential algorithm, a simple parallelalgorithm and the temperature parallel algorithm were testedon a problem. The results were compared with the result obtainedby a conventional tree-based algorithm where alignments weremerged by two-' dynamic programming. Every annealing algorithmproduced a better energy value than the conventional algorithm.The best energy value, which probably represents the optimalsolution, wns reached within a reasonable time by both of theparallel annealing algorithms. We consider the temperature parallelalgorithm of simulated annealing to be the most suitable forfinding the optimal multiple sequence alignment because thealgorithm does not require any scheduling for optimization.The algorithm is also usefiui for refining multiple alignmentsobtained by other hewistic methods.  相似文献   

15.
Increasingly, large data sets pose a challenge for computationally intensive phylogenetic methods such as Bayesian Markov chain Monte Carlo (MCMC). Here, we investigate the performance of common MCMC proposal distributions in terms of median and variance of run time to convergence on 11 data sets. We introduce two new Metropolized Gibbs Samplers for moving through "tree space." MCMC simulation using these new proposals shows faster average run time and dramatically improved predictability in performance, with a 20-fold reduction in the variance of the time to estimate the posterior distribution to a given accuracy. We also introduce conditional clade probabilities and demonstrate that they provide a superior means of approximating tree topology posterior probabilities from samples recorded during MCMC.  相似文献   

16.
Vogl C  Xu S 《Genetics》2000,155(3):1439-1447
In line-crossing experiments, deviations from Mendelian segregation ratios are usually observed for some markers. We hypothesize that these deviations are caused by one or more segregation-distorting loci (SDL) linked to the markers. We develop both a maximum-likelihood (ML) method and a Bayesian method to map SDL using molecular markers. The ML mapping is implemented via an EM algorithm and the Bayesian method is performed via the Markov chain Monte Carlo (MCMC). The Bayesian mapping is computationally more intensive than the ML mapping but can handle more complicated models such as multiple SDL and variable number of SDL. Both methods are applied to a set of simulated data and real data from a cross of two Scots pine trees.  相似文献   

17.
Detection-nondetection data are often used to investigate species range dynamics using Bayesian occupancy models which rely on the use of Markov chain Monte Carlo (MCMC) methods to sample from the posterior distribution of the parameters of the model. In this article we develop two Variational Bayes (VB) approximations to the posterior distribution of the parameters of a single-season site occupancy model which uses logistic link functions to model the probability of species occurrence at sites and of species detection probabilities. This task is accomplished through the development of iterative algorithms that do not use MCMC methods. Simulations and small practical examples demonstrate the effectiveness of the proposed technique. We specifically show that (under certain circumstances) the variational distributions can provide accurate approximations to the true posterior distributions of the parameters of the model when the number of visits per site (K) are as low as three and that the accuracy of the approximations improves as K increases. We also show that the methodology can be used to obtain the posterior distribution of the predictive distribution of the proportion of sites occupied (PAO).  相似文献   

18.
Summary We examine situations where interest lies in the conditional association between outcome and exposure variables, given potential confounding variables. Concern arises that some potential confounders may not be measured accurately, whereas others may not be measured at all. Some form of sensitivity analysis might be employed, to assess how this limitation in available data impacts inference. A Bayesian approach to sensitivity analysis is straightforward in concept: a prior distribution is formed to encapsulate plausible relationships between unobserved and observed variables, and posterior inference about the conditional exposure–disease relationship then follows. In practice, though, it can be challenging to form such a prior distribution in both a realistic and simple manner. Moreover, it can be difficult to develop an attendant Markov chain Monte Carlo (MCMC) algorithm that will work effectively on a posterior distribution arising from a highly nonidentified model. In this article, a simple prior distribution for acknowledging both poorly measured and unmeasured confounding variables is developed. It requires that only a small number of hyperparameters be set by the user. Moreover, a particular computational approach for posterior inference is developed, because application of MCMC in a standard manner is seen to be ineffective in this problem.  相似文献   

19.
We describe a novel model and algorithm for simultaneously estimating multiple molecular sequence alignments and the phylogenetic trees that relate the sequences. Unlike current techniques that base phylogeny estimates on a single estimate of the alignment, we take alignment uncertainty into account by considering all possible alignments. Furthermore, because the alignment and phylogeny are constructed simultaneously, a guide tree is not needed. This sidesteps the problem in which alignments created by progressive alignment are biased toward the guide tree used to generate them. Joint estimation also allows us to model rate variation between sites when estimating the alignment and to use the evidence in shared insertion/deletions (indels) to group sister taxa in the phylogeny. Our indel model makes use of affine gap penalties and considers indels of multiple letters. We make the simplifying assumption that the indel process is identical on all branches. As a result, the probability of a gap is independent of branch length. We use a Markov chain Monte Carlo (MCMC) method to sample from the posterior of the joint model, estimating the most probable alignment and tree and their support simultaneously. We describe a new MCMC transition kernel that improves our algorithm's mixing efficiency, allowing the MCMC chains to converge even when started from arbitrary alignments. Our software implementation can estimate alignment uncertainty and we describe a method for summarizing this uncertainty in a single plot.  相似文献   

20.
Yi N  Xu S 《Genetics》2001,157(4):1759-1771
Quantitative trait loci (QTL) are easily studied in a biallelic system. Such a system requires the cross of two inbred lines presumably fixed for alternative alleles of the QTL. However, development of inbred lines can be time consuming and cost ineffective for species with long generation intervals and severe inbreeding depression. In addition, restriction of the investigation to a biallelic system can sometimes be misleading because many potentially important allelic interactions do not have a chance to express and thus fail to be detected. A complicated mating design involving multiple alleles mimics the actual breeding system. However, it is difficult to develop the statistical model and algorithm using the classical maximum-likelihood method. In this study, we investigate the application of a Bayesian method implemented via the Markov chain Monte Carlo (MCMC) algorithm to QTL mapping under arbitrarily complicated mating designs. We develop the method under a mixed-model framework where the genetic values of founder alleles are treated as random and the nongenetic effects are treated as fixed. With the MCMC algorithm, we first draw the gene flows from the founders to the descendants for each QTL and then draw samples of the genetic parameters. Finally, we are able to simultaneously infer the posterior distribution of the number, the additive and dominance variances, and the chromosomal locations of all identified QTL.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号