共查询到20条相似文献,搜索用时 0 毫秒
1.
Phillips MJ 《Gene》2009,441(1-2):132-140
Despite recent methodological advances in inferring the time-scale of biological evolution from molecular data, the fundamental question of whether our substitution models are sufficiently well specified to accurately estimate branch-lengths has received little attention. I examine this implicit assumption of all molecular dating methods, on a vertebrate mitochondrial protein-coding dataset. Comparison with analyses in which the data are RY-coded (AG --> R; CT --> Y) suggests that even rates-across-sites maximum likelihood greatly under-compensates for multiple substitutions among the standard (ACGT) NT-coded data, which has been subject to greater phylogenetic signal erosion. Accordingly, the fossil record indicates that branch-lengths inferred from the NT-coded data translate into divergence time overestimates when calibrated from deeper in the tree. Intriguingly, RY-coding led to the opposite result. The underlying NT and RY substitution model misspecifications likely relate respectively to "hidden" rate heterogeneity and changes in substitution processes across the tree, for which I provide simulated examples. Given the magnitude of the inferred molecular dating errors, branch-length estimation biases may partly explain current conflicts with some palaeontological dating estimates. 相似文献
2.
Zifei Han Qiang Zhang Min Wang Keying Ye Ming-Hui Chen 《Biometrical journal. Biometrische Zeitschrift》2023,65(5):2200194
The power prior has been widely used to discount the amount of information borrowed from historical data in the design and analysis of clinical trials. It is realized by raising the likelihood function of the historical data to a power parameter , which quantifies the heterogeneity between the historical and the new study. In a fully Bayesian approach, a natural extension is to assign a hyperprior to δ such that the posterior of δ can reflect the degree of similarity between the historical and current data. To comply with the likelihood principle, an extra normalizing factor needs to be calculated and such prior is known as the normalized power prior. However, the normalizing factor involves an integral of a prior multiplied by a fractional likelihood and needs to be computed repeatedly over different δ during the posterior sampling. This makes its use prohibitive in practice for most elaborate models. This work provides an efficient framework to implement the normalized power prior in clinical studies. It bypasses the aforementioned efforts by sampling from the power prior with and only. Such a posterior sampling procedure can facilitate the use of a random δ with adaptive borrowing capability in general models. The numerical efficiency of the proposed method is illustrated via extensive simulation studies, a toxicological study, and an oncology study. 相似文献
3.
Background
Two central problems in computational biology are the determination of the alignment and phylogeny of a set of biological sequences. The traditional approach to this problem is to first build a multiple alignment of these sequences, followed by a phylogenetic reconstruction step based on this multiple alignment. However, alignment and phylogenetic inference are fundamentally interdependent, and ignoring this fact leads to biased and overconfident estimations. Whether the main interest be in sequence alignment or phylogeny, a major goal of computational biology is the co-estimation of both. 相似文献4.
We describe a novel model and algorithm for simultaneously estimating multiple molecular sequence alignments and the phylogenetic trees that relate the sequences. Unlike current techniques that base phylogeny estimates on a single estimate of the alignment, we take alignment uncertainty into account by considering all possible alignments. Furthermore, because the alignment and phylogeny are constructed simultaneously, a guide tree is not needed. This sidesteps the problem in which alignments created by progressive alignment are biased toward the guide tree used to generate them. Joint estimation also allows us to model rate variation between sites when estimating the alignment and to use the evidence in shared insertion/deletions (indels) to group sister taxa in the phylogeny. Our indel model makes use of affine gap penalties and considers indels of multiple letters. We make the simplifying assumption that the indel process is identical on all branches. As a result, the probability of a gap is independent of branch length. We use a Markov chain Monte Carlo (MCMC) method to sample from the posterior of the joint model, estimating the most probable alignment and tree and their support simultaneously. We describe a new MCMC transition kernel that improves our algorithm's mixing efficiency, allowing the MCMC chains to converge even when started from arbitrary alignments. Our software implementation can estimate alignment uncertainty and we describe a method for summarizing this uncertainty in a single plot. 相似文献
5.
Background
In recent years there has been a trend of leaving the strict molecular clock in order to infer dating of speciations and other evolutionary events. Explicit modeling of substitution rates and divergence times makes formulation of informative prior distributions for branch lengths possible. Models with birth-death priors on tree branching and auto-correlated or iid substitution rates among lineages have been proposed, enabling simultaneous inference of substitution rates and divergence times. This problem has, however, mainly been analysed in the Markov chain Monte Carlo (MCMC) framework, an approach requiring computation times of hours or days when applied to large phylogenies. 相似文献6.
Bayesian logistic regression using a perfect phylogeny 总被引:1,自引:0,他引:1
Haplotype data capture the genetic variation among individuals in a population and among populations. An understanding of this variation and the ancestral history of haplotypes is important in genetic association studies of complex disease. We introduce a method for detecting associations between disease and haplotypes in a candidate gene region or candidate block with little or no recombination. A perfect phylogeny demonstrates the evolutionary relationship between single-nucleotide polymorphisms (SNPs) in the haplotype blocks. Our approach extends the logic regression technique of Ruczinski and others (2003) to a Bayesian framework, and constrains the model space to that of a perfect phylogeny. Environmental factors, as well as their interactions with SNPs, may be incorporated into the regression framework. We demonstrate our method on simulated data from a coalescent model, as well as data from a candidate gene study of sarcoidosis. 相似文献
7.
Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions 总被引:5,自引:0,他引:5
The desire to infer the evolutionary history of a group of species should be more viable now that a considerable amount of multilocus molecular data is available. However, the current molecular phylogenetic paradigm still reconstructs gene trees to represent the species tree. Further, commonly used methods of combining data, such as the concatenation method, are known to be inconsistent in some circumstances. In this paper, we propose a Bayesian hierarchical model to estimate the phylogeny of a group of species using multiple estimated gene tree distributions, such as those that arise in a Bayesian analysis of DNA sequence data. Our model employs substitution models used in traditional phylogenetics but also uses coalescent theory to explain genealogical signals from species trees to gene trees and from gene trees to sequence data, thereby forming a complete stochastic model to estimate gene trees, species trees, ancestral population sizes, and species divergence times simultaneously. Our model is founded on the assumption that gene trees, even of unlinked loci, are correlated due to being derived from a single species tree and therefore should be estimated jointly. We apply the method to two multilocus data sets of DNA sequences. The estimates of the species tree topology and divergence times appear to be robust to the prior of the population size, whereas the estimates of effective population sizes are sensitive to the prior used in the analysis. These analyses also suggest that the model is superior to the concatenation method in fitting these data sets and thus provides a more realistic assessment of the variability in the distribution of the species tree that may have produced the molecular information at hand. Future improvements of our model and algorithm should include consideration of other factors that can cause discordance of gene trees and species trees, such as horizontal transfer or gene duplication. 相似文献
8.
Cytochrome b and Bayesian inference of whale phylogeny 总被引:2,自引:0,他引:2
In the mid 1990s cytochrome b and other mitochondrial DNA data reinvigorated cetacean phylogenetics by proposing many novel and provocative hypotheses of cetacean relationships. These results sparked a revision and reanalysis of morphological datasets, and the collection of new nuclear DNA data from numerous loci. Some of the most controversial mitochondrial hypotheses have now become benchmark clades, corroborated with nuclear DNA and morphological data; others have been resolved in favor of more traditional views. That major conflicts in cetacean phylogeny are disappearing is encouraging. However, most recent papers aim specifically to resolve higher-level conflicts by adding characters, at the cost of densely sampling taxa to resolve lower-level relationships. No molecular study to date has included more than 33 cetaceans. More detailed molecular phylogenies will provide better tools for evolutionary studies. Until more genes are available for a high number of taxa, can we rely on readily available single gene mitochondrial data? Here, we estimate the phylogeny of 66 cetacean taxa and 24 outgroups based on Cytb sequences. We judge the reliability of our phylogeny based on the recovery of several deep-level benchmark clades. A Bayesian phylogenetic analysis recovered all benchmark clades and for the first time supported Odontoceti monophyly based exclusively on analysis of a single mitochondrial gene. The results recover the monophyly of all but one family level taxa within Cetacea, and most recently proposed super- and subfamilies. In contrast, parsimony never recovered all benchmark clades and was sensitive to a priori weighting decisions. These results provide the most detailed phylogeny of Cetacea to date and highlight the utility of both Bayesian methodology in general, and of Cytb in cetacean phylogenetics. They furthermore suggest that dense taxon sampling, like dense character sampling, can overcome problems in phylogenetic reconstruction. 相似文献
9.
Monte Carlo methods have received much attention in the recent literature of phylogeny analysis. However, the conventional Markov chain Monte Carlo algorithms, such as the Metropolis–Hastings algorithm, tend to get trapped in a local mode in simulating from the posterior distribution of phylogenetic trees, rendering the inference ineffective. In this paper, we apply an advanced Monte Carlo algorithm, the stochastic approximation Monte Carlo algorithm, to Bayesian phylogeny analysis. Our method is compared with two popular Bayesian phylogeny software, BAMBE and MrBayes, on simulated and real datasets. The numerical results indicate that our method outperforms BAMBE and MrBayes. Among the three methods, SAMC produces the consensus trees which have the highest similarity to the true trees, and the model parameter estimates which have the smallest mean square errors, but costs the least CPU time. 相似文献
10.
Only recently has Bayesian inference of phylogeny been proposed. The method is now a practical alternative to the other methods; indeed, the method appears to possess advantages over the other methods in terms of ability to use complex models of evolution, ease of interpretation of the results, and computational efficiency. However, the method should be used cautiously. The results of a Bayesian analysis should be examined with respect to the sensitivity of the results to the priors used and the reliability of the Markov chain Monte Carlo approximation of the probabilities of trees. 相似文献
11.
Rannala B 《Systematic biology》2002,51(5):754-760
Methods for Bayesian inference of phylogeny using DNA sequences based on Markov chain Monte Carlo (MCMC) techniques allow the incorporation of arbitrarily complex models of the DNA substitution process, and other aspects of evolution. This has increased the realism of models, potentially improving the accuracy of the methods, and is largely responsible for their recent popularity. Another consequence of the increased complexity of models in Bayesian phylogenetics is that these models have, in several cases, become overparameterized. In such cases, some parameters of the model are not identifiable; different combinations of nonidentifiable parameters lead to the same likelihood, making it impossible to decide among the potential parameter values based on the data. Overparameterized models can also slow the rate of convergence of MCMC algorithms due to large negative correlations among parameters in the posterior probability distribution. Functions of parameters can sometimes be found, in overparameterized models, that are identifiable, and inferences based on these functions are legitimate. Examples are presented of overparameterized models that have been proposed in the context of several Bayesian methods for inferring the relative ages of nodes in a phylogeny when the substitution rate evolves over time. 相似文献
12.
13.
Xiao-Guang Yang 《Biologia》2009,64(4):811-818
The phylogeny of Cetacea (whales, dolphins, porpoises) has long attracted the interests of biologists and has been investigated
by many researchers based on different datasets. However, some phylogenetic relationships within Cetacea still remain controversial.
In this study, Bayesian analyses were performed to infer the phylogeny of 25 representative species within Cetacea based on
their mitochondrial genomes for the first time. The analyses recovered the clades resolved by the previous studies and strongly
supported most of the current cetacean classifications, such as the monophyly of Odontoceti (toothed whales) and Mysticeti
(baleen whales). The analyses provided a reliable and comprehensive phylogeny of Cetacea which can provide a foundation for
further exploration of cetacean ecology, conservation and biology. The results also showed that: (i) the mitochondrial genomes
were very informative for inferring phylogeny of Cetacea; and (ii) the Bayesian analyses outperformed other phylogenetic methods
on inferring mitochondrial genome-based phylogeny of Cetacea. 相似文献
14.
SUMMARY: BAli-Phy is a Bayesian posterior sampler that employs Markov chain Monte Carlo to explore the joint space of alignment and phylogeny given molecular sequence data. Simultaneous estimation eliminates bias toward inaccurate alignment guide-trees, employs more sophisticated substitution models during alignment and automatically utilizes information in shared insertion/deletions to help infer phylogenies. AVAILABILITY: Software is available for download at http://www.biomath.ucla.edu/msuchard/bali-phy. 相似文献
15.
16.
17.
18.
Yang Z 《Philosophical transactions of the Royal Society of London. Series B, Biological sciences》2008,363(1512):4031-4039
The Bayesian method of phylogenetic inference often produces high posterior probabilities (PPs) for trees or clades, even when the trees are clearly incorrect. The problem appears to be mainly due to large sizes of molecular datasets and to the large-sample properties of Bayesian model selection and its sensitivity to the prior when several of the models under comparison are nearly equally correct (or nearly equally wrong) and are of the same dimension. A previous suggestion to alleviate the problem is to let the internal branch lengths in the tree become increasingly small in the prior with the increase in the data size so that the bifurcating trees are increasingly star-like. In particular, if the internal branch lengths are assigned the exponential prior, the prior mean mu0 should approach zero faster than 1/square root n but more slowly than 1/n, where n is the sequence length. This paper examines the usefulness of this data size-dependent prior using a dataset of the mitochondrial protein-coding genes from the baleen whales, with the prior mean fixed at mu0=0.1n(-2/3). In this dataset, phylogeny reconstruction is sensitive to the assumed evolutionary model, species sampling and the type of data (DNA or protein sequences), but Bayesian inference using the default prior attaches high PPs for conflicting phylogenetic relationships. The data size-dependent prior alleviates the problem to some extent, giving weaker support for unstable relationships. This prior may be useful in reducing apparent conflicts in the results of Bayesian analysis or in making the method less sensitive to model violations. 相似文献
19.
20.
Kei Fujikawa Satoshi Teramukai Isao Yokota Takashi Daimon 《Biometrical journal. Biometrische Zeitschrift》2020,62(2):330-338
Basket trials simultaneously evaluate the effect of one or more drugs on a defined biomarker, genetic alteration, or molecular target in a variety of disease subtypes, often called strata. A conventional approach for analyzing such trials is an independent analysis of each of the strata. This analysis is inefficient as it lacks the power to detect the effect of drugs in each stratum. To address these issues, various designs for basket trials have been proposed, centering on designs using Bayesian hierarchical models. In this article, we propose a novel Bayesian basket trial design that incorporates predictive sample size determination, early termination for inefficacy and efficacy, and the borrowing of information across strata. The borrowing of information is based on the similarity between the posterior distributions of the response probability. In general, Bayesian hierarchical models have many distributional assumptions along with multiple parameters. By contrast, our method has prior distributions for response probability and two parameters for similarity of distributions. The proposed design is easier to implement and less computationally demanding than other Bayesian basket designs. Through a simulation with various scenarios, our proposed design is compared with other designs including one that does not borrow information and one that uses a Bayesian hierarchical model. 相似文献