首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A bayesian analysis of metazoan mitochondrial genome arrangements   总被引:1,自引:0,他引:1  
Genome arrangements are a potentially powerful source of information to infer evolutionary relationships among distantly related taxa. Mitochondrial genome arrangements may be especially informative about metazoan evolutionary relationships because (1) nearly all animals have the same set of definitively homologous mitochondrial genes, (2) mitochondrial genome rearrangement events are rare relative to changes in sequences, and (3) the number of possible mitochondrial genome arrangements is huge, making convergent evolution of genome arrangements appear highly unlikely. In previous studies, phylogenetic evidence in genome arrangement data is nearly always used in a qualitative fashion-the support in favor of clades with similar or identical genome arrangements is considered to be quite strong, but is not quantified. The purpose of this article is to quantify the uncertainty among the relationships of metazoan phyla on the basis of mitochondrial genome arrangements while incorporating prior knowledge of the monophyly of various groups from other sources. The work we present here differs from our previous work in the statistics literature in that (1) we incorporate prior information on classifications of metazoans at the phylum level, (2) we describe several advances in our computational approach, and (3) we analyze a much larger data set (87 taxa) that consists of each unique, complete mitochondrial genome arrangement with a full complement of 37 genes that were present in the NCBI (National Center for Biotechnology Information) database at a recent date. In addition, we analyze a subset of 28 of these 87 taxa for which the non-tRNA mitochondrial genomes are unique where the assumption of our inversion-only model of rearrangement is more plausible. We present summaries of Bayesian posterior distributions of tree topology on the basis of these two data sets.  相似文献   

2.
Several stochastic models of character change, when implemented in a maximum likelihood framework, are known to give a correspondence between the maximum parsimony method and the method of maximum likelihood. One such model has an independently estimated branch-length parameter for each site and each branch of the phylogenetic tree. This model--the no-common-mechanism model--has many parameters, and, in fact, the number of parameters increases as fast as the alignment is extended. We take a Bayesian approach to the no-common-mechanism model and place independent gamma prior probability distributions on the branch-length parameters. We are able to analytically integrate over the branch lengths, and this allowed us to implement an efficient Markov chain Monte Carlo method for exploring the space of phylogenetic trees. We were able to reliably estimate the posterior probabilities of clades for phylogenetic trees of up to 500 sequences. However, the Bayesian approach to the problem, at least as implemented here with an independent prior on the length of each branch, does not tame the behavior of the branch-length parameters. The integrated likelihood appears to be a simple rescaling of the parsimony score for a tree, and the marginal posterior probability distribution of the length of a branch is dependent upon how the maximum parsimony method reconstructs the characters at the interior nodes of the tree. The method we describe, however, is of potential importance in the analysis of morphological character data and also for improving the behavior of Markov chain Monte Carlo methods implemented for models in which sites share a common branch-length parameter.  相似文献   

3.
Recent years have witnessed a proliferation of quantitative methods for biogeographic inference. In particular, novel parametric approaches represent exciting new opportunities for the study of range evolution. Here, we review a selection of current methods for biogeographic analysis and discuss their respective properties. These methods include generalized parsimony approaches, weighted ancestral area analysis, dispersal-vicariance analysis, the dispersal--extinction--cladogenesis model and other maximum likelihood approaches, and Bayesian stochastic mapping of ancestral ranges, including a novel approach to inferring range evolution in the context of island biogeography. Some of these methods were developed specifically for problems of ancestral range reconstruction, whereas others were designed for more general problems of character state reconstruction and subsequently applied to the study of ancestral ranges. Methods for reconstructing ancestral history on a phylogenetic tree differ not only in the types of ancestral range states that are allowed, but also in the various historical events that may change the ancestral ranges. We explore how the form of allowed ancestral ranges and allowed transitions can both affect the outcome of ancestral range estimation. Finally, we mention some promising avenues for future work in the development of model-based approaches to biogeographic analysis.  相似文献   

4.
In Bayesian divergence time estimation methods, incorporating calibrating information from the fossil record is commonly done by assigning prior densities to ancestral nodes in the tree. Calibration prior densities are typically parametric distributions offset by minimum age estimates provided by the fossil record. Specification of the parameters of calibration densities requires the user to quantify his or her prior knowledge of the age of the ancestral node relative to the age of its calibrating fossil. The values of these parameters can, potentially, result in biased estimates of node ages if they lead to overly informative prior distributions. Accordingly, determining parameter values that lead to adequate prior densities is not straightforward. In this study, I present a hierarchical Bayesian model for calibrating divergence time analyses with multiple fossil age constraints. This approach applies a Dirichlet process prior as a hyperprior on the parameters of calibration prior densities. Specifically, this model assumes that the rate parameters of exponential prior distributions on calibrated nodes are distributed according to a Dirichlet process, whereby the rate parameters are clustered into distinct parameter categories. Both simulated and biological data are analyzed to evaluate the performance of the Dirichlet process hyperprior. Compared with fixed exponential prior densities, the hierarchical Bayesian approach results in more accurate and precise estimates of internal node ages. When this hyperprior is applied using Markov chain Monte Carlo methods, the ages of calibrated nodes are sampled from mixtures of exponential distributions and uncertainty in the values of calibration density parameters is taken into account.  相似文献   

5.
Zhu X  Zhang S  Tang H  Cooper R 《Human genetics》2006,120(3):431-445
Several disease-mapping methods have been proposed recently, which use the information generated by recent admixture of populations from historically distinct geographic origins. These methods include both classic likelihood and Bayesian approaches. In this study we directly maximize the likelihood function from the hidden Markov Model for admixture mapping using the EM algorithm, allowing for uncertainty in model parameters, such as the allele frequencies in the parental populations. We determined the robustness of the proposed method by examining the ancestral allele frequency estimate and individual marker-location specific ancestry when the data were generated by different population admixture models and no learning sample was used. The proposed method outperforms a widely used Bayesian MCMC strategy for data generated from various population admixture models. The multipoint information content for ancestry was derived based on the map provided by Smith et al. (2004) and the associated statistical power was calculated. We examined the distribution of admixture LD across the genome for both real and simulated data and established a threshold for genome wide significance applicable to admixture mapping studies. The software ADMIXPROGRAM for performing admixture mapping is available from authors.  相似文献   

6.
We develop a Bayesian analysis based on two different Jeffreyspriors for the Student-t regression model with unknown degreesof freedom. It is typically difficult to estimate the numberof degrees of freedom: improper prior distributions may leadto improper posterior distributions, whereas proper prior distributionsmay dominate the analysis. We show that Bayesian analysis witheither of the two considered Jeffreys priors provides a properposterior distribution. Finally, we show that Bayesian estimatorsbased on Jeffreys analysis compare favourably to other Bayesianestimators based on priors previously proposed in the literature.  相似文献   

7.
Adaptive evolution at the molecular level can be studied by detecting convergent and parallel evolution at the amino acid sequence level. For a set of homologous protein sequences, the ancestral amino acids at all interior nodes of the phylogenetic tree of the proteins can be statistically inferred. The amino acid sites that have experienced convergent or parallel changes on independent evolutionary lineages can then be identified by comparing the amino acids at the beginning and end of each lineage. At present, the efficiency of the methods of ancestral sequence inference in identifying convergent and parallel changes is unknown. More seriously, when we identify convergent or parallel changes, it is unclear whether these changes are attributable to random chance. For these reasons, claims of convergent and parallel evolution at the amino acid sequence level have been disputed. We have conducted computer simulations to assess the efficiencies, of the parsimony and Bayesian methods of ancestral sequence inference in identifying convergent and parallel-change sites. Our results showed that the Bayesian method performs better than the parsimony method in identifying parallel changes, and both methods are inefficient in identifying convergent changes. However, the Bayesian method is recommended for estimating the number of convergent-change sites because it gives a conservative estimate. We have developed statistical tests for examining whether the observed numbers of convergent and parallel changes are due to random chance. As an example, we reanalyzed the stomach lysozyme sequences of foregut fermenters and found that parallel evolution is statistically significant, whereas convergent evolution is not well supported.   相似文献   

8.
Interspecific morphological variation in animal genitalia has long attracted the attention of evolutionary biologists because of the role genital form may play in the generation and/or maintenance of species boundaries. Here we examine the origin and evolution of genital variation in rodents of the muroid genus Neotoma. We test the hypothesis that a relatively rare genital form has evolved only once in Neotoma. We use four mitochondrial and four nuclear markers to evaluate this hypothesis by establishing a phylogenetic framework in which to examine genital evolution. We find intron seven of the beta-fibrinogen gene to be a highly informative nuclear marker for the levels of differentiation that characterize Neotoma with this locus evolving at a rate slower than cytochrome b but faster than 12S. We estimate phylogenetic relationships within Neotoma using both maximum parsimony and maximum likelihood-based Bayesian methods. Our Bayesian and parsimony reconstructions differ in significant ways, but we show that our parsimony analysis may be influenced by long-branch attraction. Furthermore, our estimate of Neotoma phylogeny remains consistent across various data partitioning strategies in the Bayesian analyses. Using ancestral state reconstruction, we find support for the monophyly of taxa that possess the relatively rare genital form. However, we also find support for the independent evolution of the common genital form and discuss possible underlying developmental shifts that may have contributed to our observed patterns of morphological evolution.  相似文献   

9.
Kenneth Lange 《Genetica》1995,96(1-2):107-117
The Dirichlet distribution provides a convenient conjugate prior for Bayesian analyses involving multinomial proportions. In particular, allele frequency estimation can be carried out with a Dirichlet prior. If data from several distinct populations are available, then the parameters characterizing the Dirichlet prior can be estimated by maximum likelihood and then used for allele frequency estimation in each of the separate populations. This empirical Bayes procedure tends to moderate extreme multinomial estimates based on sample proportions. The Dirichlet distribution can also be employed to model the contributions from different ancestral populations in computing forensic match probabilities. If the ancestral populations are in genetic equilibrium, then the product rule for computing match probabilities is valid conditional on the ancestral contributions to a typical person of the reference population. This fact facilitates computation of match probabilities and tight upper bounds to match probabilities.Editor's commentsThe author continues the formal Bayesian analysis introduced by Gjertson & Morris in this voluem. He invokes Dirichlet distributions, and so brings rigor to the discussion of the effects of population structure on match probabilities. The increased computational burden this approach entails should not be regarded as a hindrance.  相似文献   

10.
Group testing, also known as pooled testing, and inverse sampling are both widely used methods of data collection when the goal is to estimate a small proportion. Taking a Bayesian approach, we consider the new problem of estimating disease prevalence from group testing when inverse (negative binomial) sampling is used. Using different distributions to incorporate prior knowledge of disease incidence and different loss functions, we derive closed form expressions for posterior distributions and resulting point and credible interval estimators. We then evaluate our new estimators, on Bayesian and classical grounds, and apply our methods to a West Nile Virus data set.  相似文献   

11.
The Bryaceae are a large cosmopolitan moss family including genera of significant morphological and taxonomic complexity. Phylogenetic relationships within the Bryaceae were reconstructed based on DNA sequence data from all three genomic compartments. In addition, maximum parsimony and Bayesian inference were employed to reconstruct ancestral character states of 38 morphological plus four habitat characters and eight insertion/deletion events. The recovered phylogenetic patterns are generally in accord with previous phylogenies based on chloroplast DNA sequence data and three major clades are identified. The first clade comprises Bryum bornholmense, B. rubens, B. caespiticium, and Plagiobryum. This corroborates the hypothesis suggested by previous studies that several Bryum species are more closely related to Plagiobryum than to the core Bryum species. The second clade includes Acidodontium, Anomobryum, and Haplodontium, while the third clade contains the core Bryum species plus Imbribryum. Within the latter clade, B. subapiculatum and B. tenuisetum form the sister clade to Imbribryum. Reconstructions of ancestral character states under maximum parsimony and Bayesian inference suggest fourteen morphological synapomorphies for the ingroup and synapomorphies are detected for most clades within the ingroup. Maximum parsimony and Bayesian reconstructions of ancestral character states are mostly congruent although Bayesian inference shows that the posterior probability of ancestral character states may decrease dramatically when node support is taken into account. Bayesian inference also indicates that reconstructions may be ambiguous at internal nodes for highly polymorphic characters.  相似文献   

12.
The phylogenetic inference of ancestral protein sequences is a powerful technique for the study of molecular evolution, but any conclusions drawn from such studies are only as good as the accuracy of the reconstruction method. Every inference method leads to errors in the ancestral protein sequence, resulting in potentially misleading estimates of the ancestral protein's properties. To assess the accuracy of ancestral protein reconstruction methods, we performed computational population evolution simulations featuring near-neutral evolution under purifying selection, speciation, and divergence using an off-lattice protein model where fitness depends on the ability to be stable in a specified target structure. We were thus able to compare the thermodynamic properties of the true ancestral sequences with the properties of “ancestral sequences” inferred by maximum parsimony, maximum likelihood, and Bayesian methods. Surprisingly, we found that methods such as maximum parsimony and maximum likelihood that reconstruct a “best guess” amino acid at each position overestimate thermostability, while a Bayesian method that sometimes chooses less-probable residues from the posterior probability distribution does not. Maximum likelihood and maximum parsimony apparently tend to eliminate variants at a position that are slightly detrimental to structural stability simply because such detrimental variants are less frequent. Other properties of ancestral proteins might be similarly overestimated. This suggests that ancestral reconstruction studies require greater care to come to credible conclusions regarding functional evolution. Inferred functional patterns that mimic reconstruction bias should be reevaluated.  相似文献   

13.
To understand patterns and processes of the diversification of life, we require an accurate understanding of taxon interrelationships. Recent studies have suggested that analyses of morphological character data using the Bayesian and maximum likelihood Mk model provide phylogenies of higher accuracy compared to parsimony methods. This has proved controversial, particularly studies simulating morphology‐data under Markov models that assume shared branch lengths for characters, as it is claimed this leads to bias favouring the Bayesian or maximum likelihood Mk model over parsimony models which do not explicitly make this assumption. We avoid these potential issues by employing a simulation protocol in which character states are randomly assigned to tips, but datasets are constrained to an empirically realistic distribution of homoplasy as measured by the consistency index. Datasets were analysed with equal weights and implied weights parsimony, and the maximum likelihood and Bayesian Mk model. We find that consistent (low homoplasy) datasets render method choice largely irrelevant, as all methods perform well with high consistency (low homoplasy) datasets, but the largest discrepancies in accuracy occur with low consistency datasets (high homoplasy). In such cases, the Bayesian Mk model is significantly more accurate than alternative models and implied weights parsimony never significantly outperforms the Bayesian Mk model. When poorly supported branches are collapsed, the Bayesian Mk model recovers trees with higher resolution compared to other methods. As it is not possible to assess homoplasy independently of a tree estimate, the Bayesian Mk model emerges as the most reliable approach for categorical morphological analyses.  相似文献   

14.
Bayesian estimation of ancestral character states on phylogenies   总被引:17,自引:0,他引:17  
Biologists frequently attempt to infer the character states at ancestral nodes of a phylogeny from the distribution of traits observed in contemporary organisms. Because phylogenies are normally inferences from data, it is desirable to account for the uncertainty in estimates of the tree and its branch lengths when making inferences about ancestral states or other comparative parameters. Here we present a general Bayesian approach for testing comparative hypotheses across statistically justified samples of phylogenies, focusing on the specific issue of reconstructing ancestral states. The method uses Markov chain Monte Carlo techniques for sampling phylogenetic trees and for investigating the parameters of a statistical model of trait evolution. We describe how to combine information about the uncertainty of the phylogeny with uncertainty in the estimate of the ancestral state. Our approach does not constrain the sample of trees only to those that contain the ancestral node or nodes of interest, and we show how to reconstruct ancestral states of uncertain nodes using a most-recent-common-ancestor approach. We illustrate the methods with data on ribonuclease evolution in the Artiodactyla. Software implementing the methods (BayesMultiState) is available from the authors.  相似文献   

15.
Establishing hypotheses of relationships is a critical prerequisite for any macroevolutionary analysis, but different approaches exist for achieving this goal. Amongst palaeontologists using morphological data the Bayesian approach is increasingly preferred over parsimony, but this shift also alters the way we think about samples of trees. Here we revisit stratigraphic congruence as a comparator between Bayesian and parsimony samples, but in a new visual context: treespace. Such spaces represent an ordination of unique topologies that can also be extended to create a ‘landscape’ where altitude represents some comparative measure (here congruence with stratigraphy). By co-opting existing visualization tools and applying them to a meta-analysis of 128 cladistic data sets we show that there is no consistent favouring of either Bayesian or parsimony according to stratigraphic congruence metrics, and further that empirical treespace visualizations suggest a complex variety of topological landscapes. We conclude by arguing that treespaces should become a standard exploratory tool in phylogenetic analysis.  相似文献   

16.
The dominance of angiosperms has played a direct role in the diversification of insects, especially Coleoptera. The shift to angiosperm feeding from other diets is likely to have increased the rate of speciation in Phytophaga. However, Phytophaga is only one of many hyperdiverse lineages of beetles and studies of host-shift proliferation have been somewhat limited to groups that primitively feed on plants. We have studied the diet-diverse beetle family Erotylidae (Cucujoidea) to determine if diet is correlated with high diversification rates and morphological evolution by first reconstructing ancestral diets and then testing for associations between diet and species number and diet and ovipositor type. A Bayesian phylogenetic analysis of morphological data that was previously published in Leschen (2003, Pages 1-108 in Fauna of New Zealand, 47; 53 terminal taxa and 1 outgroup, 120 adult characters and 1 diet character) yielded results that are similar to the parsimony analyses of Leschen (2003). Ancestral state reconstructions based on Bayesian and parsimony inference were largely congruent and both reconstructed microfungal feeding (the diet of the outgroup Biphyllidae) at the root of the Erotylidae tree. Shifts among microfungal, saprophagous, and phytophagous diets were most frequent. The largest numbers of species are contained in lineages that are macrofungal feeders (subfamily Erotylinae) and phytophagous (derived Languriinae), although the Bayesian posterior predictive tests of character state correlation were unable to detect any significant associations. Ovipositor morphology correlated with diet (i.e., acute forms were associated with phytophagy and unspecialized forms were associated with a mixture of diets). Although there is a general trend to increased species number associated with the shift from microfungal feeding to phytophagy (based on character mapping and mainly restricted to shifts in Languriinae), there is a large radiation of taxa feeding on macrofungi. Cycad feeding is scattered in more deeply diverged taxa and may have preceded the evolution of angiosperm feeding in some groups. Preliminary analysis of diet mapped onto higher beetle phylogenies suggests that about half of the major Coleoptera lineages may have had fungus-feeding ancestors. We discuss the roles of stochastic models and prior distributions of the reconstruction of ancestral character states in the context of the current data.  相似文献   

17.
Multigene sequence data have great potential for elucidating important and interesting evolutionary processes, but statistical methods for extracting information from such data remain limited. Although various biological processes may cause different genes to have different genealogical histories (and hence different tree topologies), we also may expect that the number of distinct topologies among a set of genes is relatively small compared with the number of possible topologies. Therefore evidence about the tree topology for one gene should influence our inferences of the tree topology on a different gene, but to what extent? In this paper, we present a new approach for modeling and estimating concordance among a set of gene trees given aligned molecular sequence data. Our approach introduces a one-parameter probability distribution to describe the prior distribution of concordance among gene trees. We describe a novel 2-stage Markov chain Monte Carlo (MCMC) method that first obtains independent Bayesian posterior probability distributions for individual genes using standard methods. These posterior distributions are then used as input for a second MCMC procedure that estimates a posterior distribution of gene-to-tree maps (GTMs). The posterior distribution of GTMs can then be summarized to provide revised posterior probability distributions for each gene (taking account of concordance) and to allow estimation of the proportion of the sampled genes for which any given clade is true (the sample-wide concordance factor). Further, under the assumption that the sampled genes are drawn randomly from a genome of known size, we show how one can obtain an estimate, with credibility intervals, on the proportion of the entire genome for which a clade is true (the genome-wide concordance factor). We demonstrate the method on a set of 106 genes from 8 yeast species.  相似文献   

18.
Missing data are commonly thought to impede a resolved or accurate reconstruction of phylogenetic relationships, and probabilistic analysis techniques are increasingly viewed as less vulnerable to the negative effects of data incompleteness than parsimony analyses. We test both assumptions empirically by conducting parsimony and Bayesian analyses on an approximately 1.5 × 106‐cell (27 965 characters × 52 species) mustelid–procyonid molecular supermatrix with 62.7% missing entries. Contrary to the first assumption, phylogenetic relationships inferred from our analyses are fully (Bayesian) or almost fully (parsimony) resolved topologically with mostly strong support and also largely in accord with prior molecular estimations of mustelid and procyonid phylogeny derived with parsimony, Bayesian, and other probabilistic analysis techniques from smaller but complete or nearly complete data sets. Contrary to the second assumption, we found no compelling evidence in support of a relationship between the inferior performance of parsimony and taxon incompleteness (i.e. the proportion of missing character data for a taxon), although we found evidence for a connection between the inferior performance of parsimony and character incompleteness (i.e. no overlap in character data between some taxa). The relatively good performance of our analyses may be related to the large number of sampled characters, so that most taxa (even highly incomplete ones) are represented by a sufficient number of characters allowing both approaches to resolve their relationships. © The Willi Hennig Society 2009.  相似文献   

19.
[目的]动物典型的单一染色体线粒体基因组在甲胁虱属Hoplopleura已裂化成多个线粒体微环染色体.本研究旨在通过测定太平洋甲胁虱Hoplopleura pacifica的线粒体基因组来推测甲胁虱属祖先线粒体核型.[方法]利用Illumina HiSeq X Ten高通量测序技术对太平洋甲胁虱裂化线粒体基因组进行测定...  相似文献   

20.
We tested whether it is beneficial for the accuracy of phylogenetic inference to sample characters that are evolving under different sets of parameters, using both Bayesian MCMC (Markov chain Monte Carlo) and parsimony approaches. We examined differential rates of evolution among characters, differential character-state frequencies and character-state space, and differential relative branch lengths among characters. We also compared the relative performance of parsimony and Bayesian analyses by progressively incorporating more of these heterogeneous parameters and progressively increasing the severity of this heterogeneity. Bayesian analyses performed better than parsimony when heterogeneous simulation parameters were incorporated into the substitution model. However, parsimony outperformed Bayesian MCMC when heterogeneous simulation parameters were not incorporated into the Bayesian substitution model. The higher the rate of evolution simulated, the better parsimony performed relative to Bayesian analyses. Bayesian and parsimony analyses converged in their performance as the number of simulated heterogeneous model parameters increased. Up to a point, rate heterogeneity among sites was generally advantageous for phylogenetic inference using both approaches. In contrast, branch-length heterogeneity was generally disadvantageous for phylogenetic inference using both parsimony and Bayesian approaches. Parsimony was found to be more conservative than Bayesian analyses, in that it resolved fewer incorrect clades.
© The Willi Hennig Society 2006.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号