共查询到20条相似文献,搜索用时 15 毫秒
1.
A New Method of Inference of Ancestral Nucleotide and Amino Acid Sequences 总被引:44,自引:2,他引:42 下载免费PDF全文
A statistical method was developed for reconstructing the nucleotide or amino acid sequences of extinct ancestors, given the phylogeny and sequences of the extant species. A model of nucleotide or amino acid substitution was employed to analyze data of the present-day sequences, and maximum likelihood estimates of parameters such as branch lengths were used to compare the posterior probabilities of assignments of character states (nucleotides or amino acids) to interior nodes of the tree; the assignment having the highest probability was the best reconstruction at the site. The lysozyme c sequences of six mammals were analyzed by using the likelihood and parsimony methods. The new likelihood-based method was found to be superior to the parsimony method. The probability that the amino acids for all interior nodes at a site reconstructed by the new method are correct was calculated to be 0.91, 0.86, and 0.73 for all, variable, and parsimony-informative sites, respectively, whereas the corresponding probabilities for the parsimony method were 0.84, 0.76, and 0.51, respectively. The probability that an amino acid in an ancestral sequence is correctly reconstructed by the likelihood analysis ranged from 91.3 to 98.7% for the four ancestral sequences. 相似文献
2.
Estimating evolution of temporal sequence changes: a practical approach to inferring ancestral developmental sequences and sequence heterochrony 总被引:2,自引:0,他引:2
Developmental biology often yields data in a temporal context. Temporal data in phylogenetic systematics has important uses in the field of evolutionary developmental biology and, in general, comparative biology. The evolution of temporal sequences, specifically developmental sequences, has proven difficult to examine due to the highly variable temporal progression of development. Issues concerning the analysis of temporal sequences and problems with current methods of analysis are discussed. We present here an algorithm to infer ancestral temporal sequences, quantify sequence heterochronies, and estimate pseudoreplicate consensus support for sequence changes using Parsimov-based genetic inference [PGi]. Real temporal developmental sequence data sets are used to compare PGi with currently used approaches, and PGi is shown to be the most efficient, accurate, and practical method to examine biological data and infer ancestral states on a phylogeny. The method is also expandable to address further issues in developmental evolution, namely modularity. 相似文献
3.
B. Edwin Blaisdell 《Journal of molecular evolution》1985,22(1):69-81
Summary The course of evolutionary change in DNA sequences has been modeled as a Markov process. The Markov process was represented by discrete time matrix methods. The parameters of the Markov transition matrices were estimated by least-squares direct-search optimization of the fit of the calculated divergence matrix to that observed for two aligned sequences. The Markov process corrected for multiple and parallel substitutions of bases at the same site. The method avoided the incorrect assumption of all previously described methods that the divergence between two present-day sequences is twice the divergence of either from the common and unknown ancestral sequence. The three previous methods were shown to be equivalent. The present method also avoided the undesirable assumptions that sequence composition has not changed with time and that the substitution rates in the two descendant lineages were the same. It permitted simultaneous estimation of ancestral sequence composition and, if applicable, of different substitution rates for the two descendant lineages, provided the total number of estimated parameters was less than 16. Properties of the Markov chain were discussed. It was proved for symmetric substitution matrices that all elements of the equilibrium divergence matrix equal 1/16, and that the total difference in the divergence matrix at epoch k equals the total change in the common substitution matrix at epoch 2k for all values of k. It was shown how to resolve an ambiguity in the assignment of two different substitution rates to the two descendant lineages when four or more similar sequences are available. The method was applied to the divergence matrix for codon site 3 for the mouse and rabbit beta-globins. This observed divergence matrix was significantly asymmetric and required at least two different substitution rates. This result could be achieved only by using different asymmetric substitution matrices for the two lineages. 相似文献
4.
Molecular evolution of intergenic DNA in higher primates: pattern of DNA changes, molecular clock, and evolution of repetitive sequences 总被引:7,自引:1,他引:6
A 3.1-kb intergenic DNA fragment located between the psi beta-globin and
delta-globin genes in the beta-globin gene cluster was cloned from gorilla,
orangutan, rhesus monkey, and spider monkey, and the nucleotide sequence of
each fragment was determined. The phylogeny of these four sequences,
together with two previously published allelic sequences from humans and
one from chimpanzee, was constructed, and the accumulation of mutations in
the region was analyzed. The sites of base substitutions are not evenly
distributed within the region: two Alu repeats have accumulated 0.21 + 0.02
substitutions/site with 0.15 + 0.008 substitutions/site in the remainder of
the fragment. The occurrence of substitutions at neighboring sites is more
frequent than would be expected if they were independent. The observed
excesses disappear when ancestral -CG- dinucleotide sites are excluded. The
phylogenetic relationships of the sequences indicate that the human
sequence shares a most recent coancestor with the chimpanzee sequence. The
data also show that great apes have accumulated fewer mutations in this
part of the genome than has the rhesus monkey. The relative rates of
accumulation of 12 kinds of nucleotide substitution in the region during
primate evolution are asymmetric in the DNA strands. From these rates of
accumulation, the origin of a simple stretch of sequence near the 3' end of
the 3.1-kb fragment was deduced to be a sequence comprising 50% T and 50% C
on one strand. The two oppositely oriented Alu sequences in the 3.1-kb
region were inserted at their present positions before the divergence of
the New-World monkeys from other lineages. Our analysis shows that the
nucleotide sequences of the two Alu repeats in spider monkey are
unexpectedly similar both to each other and to the deduced ancestral
sequence of Alu repeats. The data suggest that there has been some type of
recombinational event between the spider monkey Alu repeats but that it was
not a simple gene conversion.
相似文献
5.
Gerdine F O Sanson Silvia Y Kawashita Adriana Brunstein Marcelo R S Briones 《Molecular biology and evolution》2002,19(2):170-178
A known phylogeny was generated using a four-step serial bifurcate PCR method. The ancestor sequence (SSU rDNA) evolved in vitro for 280 nested PCR cycles, and the resulting 15 ancestor and 16 terminal sequences (2,238 bp each) were determined. Parsimony, distance, and maximum likelihood analysis of the terminal sequences reconstructed the topology of the real phylogeny and branch lengths accurately. Divergence dates and ancestor sequences were estimated with very small error, particularly at the base of the phylogeny, mostly due to insertion and deletion changes. The substitution patterns along the known phylogeny are not described by reversible models, and accordingly, the probability substitution matrix, based on the observed substitutions from ancestor to terminal nodes along the known phylogeny, was calculated. This approach is an extension of previous studies using bacteriophage serial propagation, because here mutations were allowed to occur neutrally rather than by addition of a mutagenic agent, which produced biased mutational changes. These results provide for the first time biochemical experimental support for phylogenies, divergence date estimates, and an irreversible substitution model based on neutrally evolving DNA sequences. The substitution preferences observed here (A to G and T to C) are consistent with the high G+C content of the Thermus aquaticus genome. This suggests, at least in part, that the method here described, which explores the high Taq DNA polymerase error rate, simulates the evolution of a DNA segment in a thermophilic organism. These organisms include the bacterial rod T. aquaticus and several Archaea, and thus, the method and data set described here may well contribute new insights about the genome evolution of these organisms. 相似文献
6.
Reconstructing phylogeny is a crucial target of contemporary biology, now commonly approached through computerized analysis of genetic sequence data. In angiosperms, despite recent progress at the ordinal level, many relationships between families remain unclear. Here we take a case study from Lamiales, an angiosperm order in which interfamilial relationships have so far proved particularly problematic. We examine the effect of changing one factor-the quantity of sequence data analyzed-on phylogeny reconstruction in this group. We use simulation to estimate a priori the sequence data that would be needed to resolve an accurate, supported phylogeny of Lamiales. We investigate the effect of increasing the length of sequence data analyzed, the rate of substitution in the sequences used, and of combining gene partitions. This method could be a valuable technique for planning systematic investigations in other problematic groups. Our results suggest that increasing sequence length is a better way to improve support, resolution, and accuracy than employing sequences with a faster substitution rate. Indeed, the latter may in some cases have detrimental effects on phylogeny reconstruction. Further molecular sequencing-of at least 10,000 bp-should result in a fully resolved and supported phylogeny of Lamiales, but at present the problematic aspects of this tree model remain. 相似文献
7.
Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis 总被引:22,自引:14,他引:8
A nonhomogeneous, nonstationary stochastic model of DNA sequence evolution
allowing varying equilibrium G + C contents among lineages is devised in
order to deal with sequences of unequal base compositions. A
maximum-likelihood implementation of this model for phylogenetic analyses
allows handling of a reasonable number of sequences. The relevance of the
model and the accuracy of parameter estimates are theoretically and
empirically assessed, using real or simulated data sets. Overall, a
significant amount of information about past evolutionary modes can be
extracted from DNA sequences, suggesting that process (rates of distinct
kinds of nucleotide substitutions) and pattern (the evolutionary tree) can
be simultaneously inferred. G + C contents at ancestral nodes are quite
accurately estimated. The new method appears to be useful for phylogenetic
reconstruction when base composition varies among compared sequences. It
may also be suitable for molecular evolution studies.
相似文献
8.
We present a molecular phylogeny for the genus Hemileuca (Saturniidae), based on 624 bp of mitochondrial cytochrome oxidase I (COI) and 932 bp of the nuclear gene elongation factor 1 alpha (EF1alpha). Combined analysis of both gene sequences increased resolution and supported most of the phylogenetic relationships suggested by separate analysis of each gene. However, a maximum parsimony (MP) model for just COI sequence from one sample of most taxa produced a phylogeny incongruent with EF1alpha and combined dataset analyses under either MP or ML models. Time of year and time of day during which adult moths fly corresponded strongly with the phylogeny. Although most Hemileuca are diurnal, ancestral Hemileuca probably were nocturnal, fall-flying insects. The two-gene molecular phylogeny suggests that wing morphology is frequently homoplastic. There was no correlation between the primary larval hostplants and phylogenetic placement of taxa. No phylogenetic pattern of specialization was evident for single hostplant families across the genus. Our results suggest that phenological behavioral characters may be more conserved than the wing morphology characters that are more commonly used to infer phylogenetic relationships in Lepidoptera. Inclusion of a molecular component in the re-evaluation of systematic data is likely to alter prior assumptions of phylogenetic relationships in groups where such potentially homoplastic characters have been used. 相似文献
9.
A method is proposed to optimize molecular sequence data that does not employ multiple sequence alignment. This method treats entire homologous contiguous stretches of sequence data as individual characters. This sequence is treated as the homologous unit employed in phylogeny reconstruction. The sets of specific sequences exhibited by the terminal taxa constitute the character states. The number of states is then less than or equal to the number of unique sequences (or homologous fragments) exhibited by the data. A matrix of transformation costs is created to relate the states to one another. The cells of this matrix are defined as the minimum transformation cost between each pair of states based on insertion–deletion and base substitution costs. The diagnosis of a topology then follows existing dynamic programming techniques, with the number of states greatly expanded. Since the possible sequences reconstructed at nodes are limited to those exhibited by the terminals, cladograms constructed in this way may be longer than those of other methods in that they require a greater number of weighted evolutionary events. Example data, the effects of missing data, restricted ancestors, and putative long-branch attraction are discussed. 相似文献
10.
Models of codon substitution have been commonly used to compare protein-coding DNA sequences and are particularly effective in detecting signals of natural selection acting on the protein. Their utility in reconstructing molecular phylogenies and in dating species divergences has not been explored. Codon models naturally accommodate synonymous and nonsynonymous substitutions, which occur at very different rates and may be informative for recent and ancient divergences, respectively. Thus codon models may be expected to make an efficient use of phylogenetic information in protein-coding DNA sequences. Here we applied codon models to 106 protein-coding genes from eight yeast species to reconstruct phylogenies using the maximum likelihood method, in comparison with nucleotide- and amino acid-based analyses. The results appeared to confirm that expectation. Nucleotide-based analysis, under simplistic substitution models, were efficient in recovering recent divergences whereas amino acid-based analysis performed better at recovering deep divergences. Codon models appeared to combine the advantages of amino acid and nucleotide data and had good performance at recovering both recent and deep divergences. Estimation of relative species divergence times using amino acid and codon models suggested that translation of gene sequences into proteins led to information loss of from 30% for deep nodes to 66% for recent nodes. Although computational burden makes codon models unfeasible for tree search in large data sets, we suggest that they may be useful for comparing candidate trees. Nucleotide models that accommodate the differences in evolutionary dynamics at the three codon positions also performed well, at much less computational cost. We discuss the relationship between a model's fit to data and its utility in phylogeny reconstruction and caution against use of overly complex substitution models. 相似文献
11.
Polymorphism of the nucleotide sequences encoding 149 amino acids of linked major histocompatibility complex (Mhc) class II 131 and 132 peptides, and of the intervening intron (548–773 base pairs), was examined within and among seven Pacific salmon (Oncorhynchus) species. Levels of nucleotide diversity were higher for theB1 sequence than forB2 or the intron in comparisons both within and between species. For the codons of the peptide binding region of the BI sequence, the level of nonsynonymous nucleotide substitution (dN) exceeded the level of synonymous substitution (dS) by a factor of ten for within-species comparisons, and by a factor of four for between-species comparisons. The excess of dN indicates that balancing selection maintains diversity at this salmonidMhc class II locus, as is common forMhc loci in other vertebrates. Levels of nucleotide diversity for both the exon and intron sequences were greater among than within species, and there were numerous species-specific nucleotides present in both the coding and noncoding regions. Thus, neighbor-joining analysis of both the intron and exon regions provided phylogenies in which the sequences clustered strongly by species. There was little evidence of shared ancestral (trans-species) polymorphism in the exon phylogeny, and the intron phylogeny depicted standard relationships among the Pacific salmon species. The lack of shared allelicB1 lineages in these closely related species may result from severe bottlenecks that occurred during speciation or during the ice ages that glaciated the rim of the north Pacific Ocean approximately every 100 000 years in the Pleistocene.The nucleotide sequence data reported in this paper have been submitted to the GenBank nucleotide sequence database and have been assigned the accession numbers U34692-U34720 相似文献
12.
MICHAEL HEADS 《Biological journal of the Linnean Society. Linnean Society of London》2009,98(4):757-774
The present study illustrates a method for analysing the biogeography of a group that is based on the group's phylogeny but does not invoke founder dispersal or centre of origin. The case studies presented include groups from many different parts of the world, but most are from the south‐west Pacific. The idea that basal groups are ancestral is not valid as a generalization. Neither the basal group, nor the oldest fossil represents the centre of origin, the time of origin or the ancestral ecology. Basal groups comprise less diverse sister groups and their distributions occur around centres of differentiation in already widespread ancestors, and not centres of origin for the whole group. Thus, the sequence of nodes in a phylogeny may indicate the spatial sequence of differentiation in a widespread ancestor rather than a series of founder dispersal events. Allocation of clades to a priori geographic areas, such as the continents, in the initial stages of biogeographic analysis has often involved incorrect assumptions of sympatry. This has led to the idea that the ‘areas of sympatry’ were centres of origin. Areas other than those defined by the taxa themselves need not be used in analysis. The fossil‐calibrated molecular clock, with dates transmogrified from minimum to maximum dates, has been used to test for vicariance. Recent work in population genetics, however, indicates that allopatry is caused by vicariance rather than founder dispersal, and so vicariance can instead be used to test the clock. Deriving evolutionary chronology by calibrating spatial vicariance in molecular clades with associated tectonic events is more reasonable than relying on the fossil record to give maximum (absolute) dates. © 2009 The Linnean Society of London, Biological Journal of the Linnean Society, 2009, 98 , 757–774. 相似文献
13.
Reliable inference of ancestral sequences can be critical to identifying both patterns and causes of molecular evolution. Robustness of ancestral inference is often assumed among closely related species, but tests of this assumption have been limited. Here, we examine the performance of inference methods for data simulated under scenarios of codon bias evolution within the Drosophila melanogaster subgroup. Genome sequence data for multiple, closely related species within this subgroup make it an important system for studying molecular evolutionary genetics. The effects of asymmetric and lineage-specific substitution rates (i.e., varying levels of codon usage bias and departures from equilibrium) on the reliability of ancestral codon usage was investigated. Maximum parsimony inference, which has been widely employed in analyses of Drosophila codon bias evolution, was compared to an approach that attempts to account for uncertainty in ancestral inference by weighting ancestral reconstructions by their posterior probabilities. The latter approach employs maximum likelihood estimation of rate and base composition parameters. For equilibrium and most non-equilibrium scenarios that were investigated, the probabilistic method appears to generate reliable ancestral codon bias inferences for molecular evolutionary studies within the D. melanogaster subgroup. These reconstructions are more reliable than parsimony inference, especially when codon usage is strongly skewed. However, inference biases are considerable for both methods under particular departures from stationarity (i.e., when adaptive evolution is prevalent). Reliability of inference can be sensitive to branch lengths, asymmetry in substitution rates, and the locations and nature of lineage-specific processes within a gene tree. Inference reliability, even among closely related species, can be strongly affected by (potentially unknown) patterns of molecular evolution in lineages ancestral to those of interest. 相似文献
14.
This article describes complete mitochondrial DNA displacement loop sequences from 32 Japanese Black cattle and the analysis of these data in conjunction with previously published sequences from African, European, and Indian subjects. The origins of North East Asian domesticated cattle are unclear. The earliest domestic cattle in the region were Bos taurus and may have been domesticated from local wild cattle (aurochsen; B. primigenius), or perhaps had an origin in migrants from the early domestic center of the Near East. In phylogenetic analyses, taurine sequences form a dense tree with a center consisting of intermingled European and Japanese sequences with one group of Japanese and another of all African sequences, each forming distinct clusters at extremes of the phylogeny. This topology and calibrated levels of sequence divergence suggest that the clusters may represent three different strains of ancestral aurochs, adopted at geographically and temporally separate stages of the domestication process. Unlike Africa, half of Japanese cattle sequences are topologically intermingled with the European variants. This suggests an interchange of variants that may be ancient, perhaps a legacy of the first introduction of domesticates to East Asia. 相似文献
15.
Phylogenetic interpretation of ontogenetic change: sorting out the actual and artefactual in an empirical case study of centrarchid fishes 总被引:1,自引:0,他引:1
PAULA M. MABEE 《Zoological Journal of the Linnean Society》1993,107(3):175-291
Hypothesized relationships between ontogenetic and phylogenetic change in morphological characters were empirically tested in centrarchid fishes by comparing observed patterns of character development with patterns of character evolution as inferred from a representative phylogenetic hypothesis. This phylogeny was based on 56–61 morphological characters that were polarized by outgroup comparison. Through these comparisons, evolutionary changes in character ontogeny were categorized in one of eight classes (terminal addition, terminal deletion, terminal substitution, non-terminal addition, non-terminal deletion, non-terminal substitution, ontogenetic reversal and substitution). The relative frequencies of each of these classes provided an empirical basis from which assumptions underlying hypothesized relationships between ontogeny and phylogeny were tested. In order to test hypothesized relationships between ontogeny and phylogeny that involve assumptions about the relative frequencies of terminal change (e.g. the use of ontogeny as a homology criterion), two additional phylogenies were generated in which terminal addition and terminal deletion were maximized and minimized for all characters. Character state change interpreted from these phylogenies thus represents the maxima and minima of the frequency range of terminal addition and terminal deletion for the 8.7 × 1036 trees possible for centrarchids. It was found for these data that terminal change accounts for c. 75% of the character state change. This suggests either that early ontogeny is conserved in evolution or that interpretation and classification of evolutionary changes in ontogeny is biased in part by the way that characters are recognized, delimited and coded. It was found that ontogenetic interpretation is influenced by two levels of homology decision: an initial decision involving delimitation of the character (the ontogenetic sequence), and the subsequent recognition of homologous components of developmental sequences. Recognition of phylogenetic homology among individual components of developmental sequences is necessary for interpretation of evolutionary changes in ontogeny as either terminal or non-terminal. If development is the primary criterion applied in recognizing individual homologies among parts of ontogenetic sequences, the only possible interpretation of phylogenetic differences is that of terminal change. If homologies of the components cannot be ascertained, recognition of the homology of the developmental sequence as a whole will result in the interpretation of evolutionary differences as substitutions. Particularly when the objective of a study is to discover how ontogeny has evolved, criteria in addition to ontogeny must be used to recognize homology. Interpretation is also dependent upon delimitation within an ontogenetic sequence. This is in part a function of the way that an investigator ‘sees’ and codes characters. Binary and multistate characters influence interpretation differently and predictably. The use of ontogeny for determining phylogenetic polarity as previously proposed rests on the assumptions that ancestral ontogenies are conserved and that character evolution occurs predominantly through terminal addition. It was found for these data that terminal addition may comprise a maximum of 51.9% of the total character state change. It is concluded that the ontogenetic criterion is not a reliable indicator of phylogenetic polarity. Process and pattern data are collected simultaneously by those engaged in comparative morphological studies of development. The set of alternative explanatory processes is limited in the process of observing development. These form necessary starting points for the research of developmental biologists. Separating ‘empirical’ results from interpretational influences requires awareness of potential biases in the course of character selection, coding and interpretation. Consideration of the interpretational problems involved in identifying and classifying phylogenetic changes in ontogeny leads to a re-evaluation of the purpose, usefulness and information conveyed by the current classification system. It is recommended that alternative classification schemes be pursued. 相似文献
16.
DNA在鸟类分子系统发育研究中的应用 总被引:1,自引:0,他引:1
鸟类分子系统发育研究中常用的DNA技术有DNA杂交、RFLP和DNA序列分析等。DNA杂交技术曾在鸟类中有过大规模的应用,并由此诞生了一套新的鸟类分类系统。在鸟类的RFLP分析中,用的最多的靶序列是线粒体DNA。DNA序列分析技术被认为是进行分子系统发育研究最有效、最可靠的方法。在DNA序列分析中,线粒体基因应用最广泛,但由于其自身的一些不足,近年来,不少学者把目光投向了核基因,将线粒体基因和核基因结合起来进行系统发育研究。目前在鸟类分子系统发育中,应用较多的核基因是scnDNA,其内含子可以用于中等阶元水平的系统研究,而外显子主要用于高等阶元的系统研究。除了分子标记自身的问题之外,鸟类分子系统发育研究中还存在着方法上的问题,包括分子标记的选择,样本数量以及数据处理等。今后鸟类分子系统发育研究应该更加注重方法的标准化。 相似文献
17.
Bayesian estimation of ancestral character states on phylogenies 总被引:17,自引:0,他引:17
Biologists frequently attempt to infer the character states at ancestral nodes of a phylogeny from the distribution of traits observed in contemporary organisms. Because phylogenies are normally inferences from data, it is desirable to account for the uncertainty in estimates of the tree and its branch lengths when making inferences about ancestral states or other comparative parameters. Here we present a general Bayesian approach for testing comparative hypotheses across statistically justified samples of phylogenies, focusing on the specific issue of reconstructing ancestral states. The method uses Markov chain Monte Carlo techniques for sampling phylogenetic trees and for investigating the parameters of a statistical model of trait evolution. We describe how to combine information about the uncertainty of the phylogeny with uncertainty in the estimate of the ancestral state. Our approach does not constrain the sample of trees only to those that contain the ancestral node or nodes of interest, and we show how to reconstruct ancestral states of uncertain nodes using a most-recent-common-ancestor approach. We illustrate the methods with data on ribonuclease evolution in the Artiodactyla. Software implementing the methods (BayesMultiState) is available from the authors. 相似文献
18.
The small subunit ribosomal RNA gene (srDNA) has been used extensively for phylogenetic analyses. One common assumption in these analyses is that substitution rates are biased toward transitions. We have developed a simple method for estimating relative rates of base change that does not assume rate constancy and takes into account base composition biases in different structures and taxa. We have applied this method to srDNA sequences from taxa with a noncontroversial phylogeny to measure relative rates of evolution in various structural regions of srRNA and relative rates of the different transitions and transversions. We find that: (1) the long single-stranded regions of the RNA molecule evolve slowest, (2) biases in base composition associated with structure and phylogenetic position exist, and (3) the srDNAs studied lack a consistent transition/transversion bias. We have made suggestions based on these findings for refinement of phylogenetic analyses using srDNA data. 相似文献
19.
Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites 总被引:31,自引:24,他引:7
Felsenstein's maximum-likelihood approach for inferring phylogeny from DNA
sequences assumes that the rate of nucleotide substitution is constant over
different nucleotide sites. This assumption is sometimes unrealistic, as
has been revealed by analysis of real sequence data. In the present paper
Felsenstein's method is extended to the case where substitution rates over
sites are described by the gamma distribution. A numerical example is
presented to show that the method fits the data better than do previous
models.
相似文献
20.
Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions 总被引:5,自引:0,他引:5
The desire to infer the evolutionary history of a group of species should be more viable now that a considerable amount of multilocus molecular data is available. However, the current molecular phylogenetic paradigm still reconstructs gene trees to represent the species tree. Further, commonly used methods of combining data, such as the concatenation method, are known to be inconsistent in some circumstances. In this paper, we propose a Bayesian hierarchical model to estimate the phylogeny of a group of species using multiple estimated gene tree distributions, such as those that arise in a Bayesian analysis of DNA sequence data. Our model employs substitution models used in traditional phylogenetics but also uses coalescent theory to explain genealogical signals from species trees to gene trees and from gene trees to sequence data, thereby forming a complete stochastic model to estimate gene trees, species trees, ancestral population sizes, and species divergence times simultaneously. Our model is founded on the assumption that gene trees, even of unlinked loci, are correlated due to being derived from a single species tree and therefore should be estimated jointly. We apply the method to two multilocus data sets of DNA sequences. The estimates of the species tree topology and divergence times appear to be robust to the prior of the population size, whereas the estimates of effective population sizes are sensitive to the prior used in the analysis. These analyses also suggest that the model is superior to the concatenation method in fitting these data sets and thus provides a more realistic assessment of the variability in the distribution of the species tree that may have produced the molecular information at hand. Future improvements of our model and algorithm should include consideration of other factors that can cause discordance of gene trees and species trees, such as horizontal transfer or gene duplication. 相似文献