首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
Accuracy of estimated phylogenetic trees from molecular data   总被引:27,自引:0,他引:27  
The accuracies and efficiencies of three different methods of making phylogenetic trees from gene frequency data were examined by using computer simulation. The methods examined are UPGMA, Farris' (1972) method, and Tateno et al.'s (1982) modified Farris method. In the computer simulation eight species (or populations) were assumed to evolve according to a given model tree, and the evolutionary changes of allele frequencies were followed by using the infinite-allele model. At the end of the simulated evolution five genetic distance measures (Nei's standard and minimum distances, Rogers' distance, Cavalli-Sforza's f theta, and the modified Cavalli-Sforza distance) were computed for all pairs of species, and the distance matrix obtained for each distance measure was used for reconstructing a phylogenetic tree. The phylogenetic tree obtained was then compared with the model tree. The results obtained indicate that in all tree-making methods examined the accuracies of both the topology and branch lengths of a reconstructed tree (rooted tree) are very low when the number of loci used is less than 20 but gradually increase with increasing number of loci. When the expected number of gene substitutions (M) for the shortest branch is 0.1 or more per locus and 30 or more loci are used, the topological error as measured by the distortion index (dT) is not great, but the probability of obtaining the correct topology (P) is less than 0.5 even with 60 loci. When M is as small as 0.004, P is substantially lower. In obtaining a good topology (small dT and high P) UPGMA and the modified Farris method generally show a better performance than the Farris method. The poor performance of the Farris method is observed even when Rogers' distance which obeys the triangle inequality is used. The main reason for this seems to be that the Farris method often gives overestimates of branch lengths. For estimating the expected branch lengths of the true tree UPGMA shows the best performance. For this purpose Nei's standard distance gives a better result than the others because of its linear relationship with the number of gene substitutions. Rogers' or Cavalli-Sforza's distance gives a phylogenetic tree in which the parts near the root are condensed and the other parts are elongated. It is recommended that more than 30 loci, including both polymorphic and monomorphic loci, be used for making phylogenetic trees. The conclusions from this study seem to apply also to data on nucleotide differences obtained by the restriction enzyme techniques.  相似文献   

2.
Slatkin M  Pollack JL 《Genetics》2006,172(3):1979-1984
The gene genealogies of two linked loci in three species are analyzed using a series of Markov chain models. We calculate the probability that the gene tree of one locus is concordant with the species tree, given that the gene tree of the other locus is concordant. We define a threshold value of the recombination rate, r*, to be the rate for which the difference between the conditional probability of concordance and its asymptotic value is reduced to 5% of the initial difference. We find that, although r* depends in a complicated way on the times of speciation and effective population sizes, it is always relatively small, <10/N4, where N4 is the effective size of the species represented by the internal branch of the species tree. Consequently, the concordance of gene trees of neutral loci with the species tree is expected to be on roughly the same length scale on the chromosome as the extent of significant linkage disequilibrium within species unless the effective size of contemporary populations is very different from the effective sizes of their ancestral populations. Both balancing selection and selective sweeps can result in much longer genomic regions having concordant gene trees.  相似文献   

3.
Molecular evolutionary processes modify DNA over time, creating both newly derived substitutions shared by related descendant lineages (phylogenetic signal) and “false” similarities which confound phylogenetic reconstruction (homoplasy). However, some types of DNA regions, for example those containing tandem duplicate repeats, are preferentially subject to homoplasy-inducing processes such as sporadically occurring concerted evolution and DNA insertion/deletion. This added level of homoplasic “noise” can make DNA regions with repeats less reliable in phylogenetic reconstruction than those without repeats. Most molecular datasets which distinguish among African hominoids support a human-chimpanzee clade; the most notable exception is from the involucrin gene. However, phylogenetic resolution supporting a chimpanzee-gorilla clade is based entirely on involucrin DNA repeat regions. This is problematic because (1) involucrin repeats are difficult to align, and published alignments are contradictory; (2) involucrin repeats are subject to DNA insertion/deletion; (3) gorillas are polymorphic in that some do not have repeats reported to be synapomorphies linking chimpanzees and gorillas. Gene tree/species tree conflicts can occur due to the sorting of ancestrally polymorphic alleles during speciation. Because hominoid females transfer between groups, mitochondrial and nuclear gene flow occur to the same extent, and the probability of conflict between mitochondrial and nuclear gene trees is theoretically low. When hominoid intraspecific mitochondrial variability is taken into account [based on cytochrome oxidase subunit II (COII) gene sequences], humans and chimpanzees are most closely related, showing the same relative degree of separation from gorillas as when single individuals representing species are analyzed. Conflicting molecular phylogenies can be explained in terms of molecular evolutionary processes and sorting of ancient polymorphisms. This perspective can enhance our understanding of hominoid molecular phylogenies. © 1994 Wiley-Liss, Inc.  相似文献   

4.
Although microsatellites or simple sequence repeats (SSRs) have become a popular tool in genetic mapping and gene flow studies, their utility is limited due to paucity of information about DNA sequences in plants. We tested the utility of microsatellite markers characterized for the tropical tree Pithecellobium elegans as a genetic tool for related species. The results indicate that SSR loci are conserved among closely related species, and SSR primers developed for P. elegans could be successfully used as a genetic tool in several species of the tribe Ingeae. This study indicates that there is high potential for the transfer of SSR markers among closely related taxa, circumventing laborious cloning and screening procedures involved in characterizing SSR loci for many species.  相似文献   

5.
The concordance of gene trees and species trees is reconsidered in detail, allowing for samples of arbitrary size to be taken from the species. A sense of concordance for gene tree and species tree topologies is clarified, such that if the "collapsed gene tree" produced by a gene tree has the same topology as the species tree, the gene tree is said to be topologically concordant with the species tree. The term speciodendric is introduced to refer to genes whose trees are topologically concordant with species trees. For a given three-species topology, probabilities of each of the three possible collapsed gene tree topologies are given, as are probabilities of monophyletic concordance and concordance in the sense of N. Takahata (1989), Genetics 122, 957-966. Increasing the sample size is found to increase the probability of topological concordance, but a limit exists on how much the topological concordance probability can be increased. Suggested sample sizes beyond which this probability can be increased only minimally are given. The results are discussed in terms of implications for molecular studies of phylogenetics and speciation.  相似文献   

6.
Accuracy of phylogenetic trees estimated from DNA sequence data   总被引:4,自引:1,他引:3  
The relative merits of four different tree-making methods in obtaining the correct topology were studied by using computer simulation. The methods studied were the unweighted pair-group method with arithmetic mean (UPGMA), Fitch and Margoliash's (FM) method, thd distance Wagner (DW) method, and Tateno et al.'s modified Farris (MF) method. An ancestral DNA sequence was assumed to evolve into eight sequences following a given model tree. Both constant and varying rates of nucleotide substitution were considered. Once the DNA sequences for the eight extant species were obtained, phylogenetic trees were constructed by using corrected (d) and uncorrected (p) nucleotide substitutions per site. The topologies of the trees obtained were then compared with that of the model tree. The results obtained can be summarized as follows: (1) The probability of obtaining the correct rooted or unrooted tree is low unless a large number of nucleotide differences exists between different sequences. (2) When the number of nucleotide substitutions per sequence is small or moderately large, the FM, DW, and MF methods show a better performance than UPGMA in recovering the correct topology. The former group of methods is particularly good for obtaining the correct unrooted tree. (3) When the number of substitutions per sequence is large, UPGMA is at least as good as the other methods, particularly for obtaining the correct rooted tree. (4) When the rate of nucleotide substitution varies with evolutionary lineage, the FM, DW, and MF methods show a better performance in obtaining the correct topology than UPGMA, except when a rooted tree is to be produced from data with a large number of nucleotide substitutions per sequence.(ABSTRACT TRUNCATED AT 250 WORDS)   相似文献   

7.
A gene tree is an evolutionary reconstruction of the genealogical history of the genetic variation found in a sample of homologous genes or DNA regions that have experienced little or no recombination. Gene trees have the potential of straddling the interface between intra- and interspecific evolution. It is precisely at this interface that the process of speciation occurs, and gene trees can therefore be used as a powerful tool to probe this interface. One application is to infer species status. The cohesion species is defined as an evolutionary lineage or set of lineages with genetic exchangeability and/or ecological interchangeability. This species concept can be phrased in terms of null hypotheses that can be tested rigorously and objectively by using gene trees. First, an overlay of geography upon the gene tree is used to test the null hypothesis that the sample is from a single evolutionary lineage. This phase of testing can indicate that the sampled organisms are indeed from a single lineage and therefore a single cohesion species. In other cases, this null hypothesis is not rejected due to a lack of power or inadequate sampling. Alternatively, this null hypothesis can be rejected because two or more lineages are in the sample. The test can identify lineages even when hybridization and lineage sorting occur. Only when this null hypothesis is rejected is there the potential for more than one cohesion species. Although all cohesion species are evolutionary lineages, not all evolutionary lineages are cohesion species. Therefore, if the first null hypothesis is rejected, a second null hypothesis is tested that all lineages are genetically exchangeable and/or ecologically interchangeable. This second test is accomplished by direct contrasts of previously identified lineages or by overlaying reproductive and/or ecological data upon the gene tree and testing for significant transitions that are concordant with the previously identified lineages. Only when this second null hypothesis is rejected is a lineage elevated to the status of cohesion species. By using gene trees in this manner, species can be identified with objective, a priori criteria with an inference procedure that automatically yields much insight into the process of speciation. When one or more of the null hypotheses cannot be rejected, this procedure also provides specific guidance for future work that will be needed to judge species status.  相似文献   

8.
Determining the factors promoting speciation is a major task in ecological and evolutionary research and can be aided by phylogeographic analysis. The Qinling–Daba Mountains (QDM) located in central China form an important geographic barrier between southern subtropical and northern temperate regions, and exhibit complex topography, climatic, and ecological diversity. Surprisingly, few phylogeographic analyses and studies of plant speciation in this region have been conducted. To address this issue, we investigated the genetic divergence and evolutionary histories of three closely related tree peony species (Paeonia qiui, P. jishanensis, and P. rockii) endemic to the QDM. Forty populations of the three tree peony species were genotyped using 22 nuclear simple sequence repeat markers (nSSRs) and three chloroplast DNA sequences to assess genetic structure and phylogenetic relationships, supplemented by morphological characterization and ecological niche modeling (ENM). Morphological and molecular genetic analyses showed the three species to be clearly differentiated from each other. In addition, coalescent analyses using DIYABC conducted on nSSR variation indicated that the species diverged from each other in the late Pleistocene, while ecological niche modeling (ENM) suggested they occupied a larger area during the Last Glacial Maximum (LGM) than at present. The combined genetic evidence from nuclear and chloroplast DNA and the results of ENM indicate that each species persisted through the late Pleistocene in multiple refugia in the Qinling, Daba, and Taihang Mountains with divergence favored by restricted gene flow caused by geographic isolation, ecological divergence, and limited pollen and seed dispersal. Our study contributes to a growing understanding of the origin and population structure of tree peonies and provides insights into the high level of plant endemism present in the Qinling–Daba Mountains of Central China.  相似文献   

9.
Liu L  Pearl DK 《Systematic biology》2007,56(3):504-514
The desire to infer the evolutionary history of a group of species should be more viable now that a considerable amount of multilocus molecular data is available. However, the current molecular phylogenetic paradigm still reconstructs gene trees to represent the species tree. Further, commonly used methods of combining data, such as the concatenation method, are known to be inconsistent in some circumstances. In this paper, we propose a Bayesian hierarchical model to estimate the phylogeny of a group of species using multiple estimated gene tree distributions, such as those that arise in a Bayesian analysis of DNA sequence data. Our model employs substitution models used in traditional phylogenetics but also uses coalescent theory to explain genealogical signals from species trees to gene trees and from gene trees to sequence data, thereby forming a complete stochastic model to estimate gene trees, species trees, ancestral population sizes, and species divergence times simultaneously. Our model is founded on the assumption that gene trees, even of unlinked loci, are correlated due to being derived from a single species tree and therefore should be estimated jointly. We apply the method to two multilocus data sets of DNA sequences. The estimates of the species tree topology and divergence times appear to be robust to the prior of the population size, whereas the estimates of effective population sizes are sensitive to the prior used in the analysis. These analyses also suggest that the model is superior to the concatenation method in fitting these data sets and thus provides a more realistic assessment of the variability in the distribution of the species tree that may have produced the molecular information at hand. Future improvements of our model and algorithm should include consideration of other factors that can cause discordance of gene trees and species trees, such as horizontal transfer or gene duplication.  相似文献   

10.
The evolutionary history of morphological structures generally is equated with that of the taxa that carry them. It is argued here that, analogous to genes, developmental genetic pathways underlying morphological structures may be subject to developmental evolutionary changes that result, for instance, in duplication (serial homology analogous to gene duplication and paralogy). Entities that undergo evolution are expected to be related to each other as a tree. Just as with molecular evolution, "structure trees" and species trees sometimes may be incongruent, with implications for morphological homology concepts. Detection of structure trees through morphological evolutionary analyses may point to an entity that is maintained through evolution, possibly in part because it is a developmentally integrated structure ("individualized"). This idea is illustrated in a morphological evolutionary analysis of leaf primordia. These analyses suggest that leaf primordia in monocots and close relatives are related to each other as a tree and, therefore, are developmentally integrated, evolving entities. Among monocot primordia this tree structure breaks down, and it is concluded that there is no entity, the "monocot leaf primordium." However, one group of primordia is identified within monocots that have uniform characteristics and that are well represented by model species maize and rice. Such analyses of structure trees can facilitate the extrapolation and interpretation of results from molecular developmental and other comparative studies.  相似文献   

11.
We review recent models to estimate phylogenetic trees under the multispecies coalescent. Although the distinction between gene trees and species trees has come to the fore of phylogenetics, only recently have methods been developed that explicitly estimate species trees. Of the several factors that can cause gene tree heterogeneity and discordance with the species tree, deep coalescence due to random genetic drift in branches of the species tree has been modeled most thoroughly. Bayesian approaches to estimating species trees utilizes two likelihood functions, one of which has been widely used in traditional phylogenetics and involves the model of nucleotide substitution, and the second of which is less familiar to phylogeneticists and involves the probability distribution of gene trees given a species tree. Other recent parametric and nonparametric methods for estimating species trees involve parsimony criteria, summary statistics, supertree and consensus methods. Species tree approaches are an appropriate goal for systematics, appear to work well in some cases where concatenation can be misleading, and suggest that sampling many independent loci will be paramount. Such methods can also be challenging to implement because of the complexity of the models and computational time. In addition, further elaboration of the simplest of coalescent models will be required to incorporate commonly known issues such as deviation from the molecular clock, gene flow and other genetic forces.  相似文献   

12.
Patterns of genetic variation provide insight into the evolutionary history of a species. Mouse (Mus musculus) is a good model for this purpose. Here we present the analysis of genealogies of the 21 nuclear loci and one mitochondrial DNA region in M. musculus based on our nucleotide sequences of nine inbred strains from three M. musculus subspecies (musculus, domesticus, and castaneus) and one M. spicilegus strain as an outgroup. The mitochondrial DNA gene genealogy of those strains confirmed the introgression pattern of one musculus strain. When all the nuclear DNA data were concatenated to produce a phylogenetic tree of nine strains, musculus and domesticus strains formed monophyletic clusters with each other, while the two castaneus strains were paraphyletic. When each DNA region was treated independently, the phylogenetic networks revealed an unnegligibly high level of subspecies admixture and the mosaic nature of their genome. Estimation of ancestral and derived population sizes and migration rates suggests the effects of ancestral polymorphism and gene flow on the pattern of genetic variation of the current subspecies. Gene genealogies of Fut4 and Dfy loci also suggested existence of the gene flow between M. musculus and M. spicilegus or other distant species.  相似文献   

13.
Given a gene tree and a species tree, a coalescent history is a list of the branches of the species tree on which coalescences in the gene tree take place. Each pair consisting of a gene tree topology and a species tree topology has some number of possible coalescent histories. Here we show that, for each n≥7, there exist a species tree topology S and a gene tree topology GS, both with n leaves, for which the number of coalescent histories exceeds the corresponding number of coalescent histories when the species tree topology is S and the gene tree topology is also S. This result has the interpretation that the gene tree topology G discordant with the species tree topology S can be produced by the evolutionary process in more ways than can the gene tree topology that matches the species tree topology, providing further insight into the surprising combinatorial properties of gene trees that arise from their joint consideration with species trees.  相似文献   

14.
Mathematical consequences of the genealogical species concept   总被引:16,自引:0,他引:16  
A genealogical species is defined as a basal group of organisms whose members are all more closely related to each other than they are to any organisms outside the group ("exclusivity"), and which contains no exclusive group within it. In practice, a pair of species is so defined when phylogenies of alleles from a sample of loci shows them to be reciprocally monophyletic at all or some specified fraction of the loci. We investigate the length of time it takes to attain this status when an ancestral population divides into two descendant populations of equal size with no gene exchange, and when genetic drift and mutation are the only evolutionary forces operating. The number of loci used has a substantial effect on the probability of observing reciprocal monophyly at different times after population separation, with very long times needed to observe complete reciprocal monophyly for a large number of loci. In contrast, the number of alleles sampled per locus has a relatively small effect on the probability of reciprocal monophyly. Because a single mitochondrial or chloroplast locus becomes reciprocally monophyletic much faster than does a single nuclear locus, it is not advisable to use mitochondrial and chloroplast DNA to recognize genealogical species for long periods after population divergence. Using a weaker criterion of assigning genealogical species status when more than 50% of sampled nuclear loci show reciprocal monophyly, genealogical species status depends much less on the number of sampled loci, and is attained at roughly 4-7 N generations after populations are isolated, where N is the historically effective population size of each descendant. If genealogical species status is defined as more than 95% of sampled nuclear loci showing reciprocal monophyly, this status is attained after roughly 9-12 N generations.  相似文献   

15.
The distribution of genetic variants in plant populations is strongly affected both by current patterns of microevolutionary forces, such as gene flow and selection, and by the phylogenetic history of populations and species. Understanding the interplay of shared history and current evolutionary events is particularly confounding in plants due to the reticulating nature of gene exchange between diverging lineages. Certain gene sequences provide historically ordered neutral molecular variation that can be converted to gene genealogies which trace the evolutionary relationships among haplotypes (alleles). Gene genealogies can be used to understand the evolution of specific DNA sequences and relate sequence variation to plant phenotype. For example, in a study of the RPS2 gene in Arabidopsis thaliana, resistant phenotypes clustered in one portion of the gene tree. The field of phylogeography examines the distribution of allele genealogies in an explicit geographical context and, when coupled with a nested clade analysis, can provide insight into historical processes such as range expansion, gene flow, and genetic drift. A phylogeographical approach offers insight into practical issues as well. Here we show how haplotype trees can address the origins of invasive plants, one of the greatest global threats to biodiversity. A study of the geographical diversity of haplotypes in invasive Phragmites populations in the United States indicates that invasiveness is due to the colonization and spread of distinct genotypes from Europe ( Saltonstall 2002). Likewise, a phylogeographical analysis of Tamarix populations indicates that hybridization events between formerly isolated species of Eurasia have produced the most common genotype of the second-worst invasive plant species in the United States.  相似文献   

16.
Gene trees are evolutionary trees representing the ancestry of genes sampled from multiple populations. Species trees represent populations of individuals—each with many genes—splitting into new populations or species. The coalescent process, which models ancestry of gene copies within populations, is often used to model the probability distribution of gene trees given a fixed species tree. This multispecies coalescent model provides a framework for phylogeneticists to infer species trees from gene trees using maximum likelihood or Bayesian approaches. Because the coalescent models a branching process over time, all trees are typically assumed to be rooted in this setting. Often, however, gene trees inferred by traditional phylogenetic methods are unrooted. We investigate probabilities of unrooted gene trees under the multispecies coalescent model. We show that when there are four species with one gene sampled per species, the distribution of unrooted gene tree topologies identifies the unrooted species tree topology and some, but not all, information in the species tree edges (branch lengths). The location of the root on the species tree is not identifiable in this situation. However, for 5 or more species with one gene sampled per species, we show that the distribution of unrooted gene tree topologies identifies the rooted species tree topology and all its internal branch lengths. The length of any pendant branch leading to a leaf of the species tree is also identifiable for any species from which more than one gene is sampled.  相似文献   

17.
Because of the stochastic way in which lineages sort during speciation, gene trees may differ in topology from each other and from species trees. Surprisingly, assuming that genetic lineages follow a coalescent model of within-species evolution, we find that for any species tree topology with five or more species, there exist branch lengths for which gene tree discordance is so common that the most likely gene tree topology to evolve along the branches of a species tree differs from the species phylogeny. This counterintuitive result implies that in combining data on multiple loci, the straightforward procedure of using the most frequently observed gene tree topology as an estimate of the species tree topology can be asymptotically guaranteed to produce an incorrect estimate. We conclude with suggestions that can aid in overcoming this new obstacle to accurate genomic inference of species phylogenies.  相似文献   

18.
This paper studies gene trees in subdivided populations which are constructed as perfect phylogenies from the pattern of mutations in a sample of DNA sequences and presents a new recursion for the probability distribution of such gene trees. The underlying evolutionary model is the coalescent process in a subdivided population. The infinitely-many-sites model of mutation is assumed. Ancestral inference questions that are discussed are maximum likelihood estimation of migration and mutation rates; detection of population growth by likelihood techniques; determining the distribution of the time to the most recent common ancestor of a sample of sequences; determining the distribution of the age of the mutations on the gene tree; determining in which subpopulation the most recent common ancestor of all the sequences was; determining subpopulation ancestors, where they were, and times to them; and determining in which subpopulations mutations occurred. A computational technique of Griffiths and Tavaré used is a computer intensive Markov chain simulation, which simulates gene trees conditional on their topology implied by the mutation pattern in the sample of DNA sequences. The software GENETREE, which implements these ancestral inference techniques, is available.  相似文献   

19.
Paralogy is a pervasive problem in trying to use nuclear gene sequences to infer species phylogenies. One strategy for dealing with this problem is to infer species phylogenies from gene trees using reconciled trees, rather than directly from the sequences themselves. In this approach, the optimal species tree is the tree that requires the fewest gene duplications to be invoked. Because reconciled trees can identify orthologous from paralogous sequences, there is no need to do this prior to the analysis. Multiple gene trees can be analyzed simultaneously; however, the problem of nonuniform gene sampling raises practical problems which are discussed. In this paper the technique is applied to phylogenies for nine vertebrate genes (aldolase, alpha-fetoprotein, lactate dehydrogenase, prolactin, rhodopsin, trypsinogen, tyrosinase, vassopressin, and Wnt-7). The resulting species tree shows much similarity with currently accepted vertebrate relationships.  相似文献   

20.
We present a novel distance-based algorithm for evolutionary tree reconstruction. Our algorithm reconstructs the topology of a tree with n leaves in O(n(2)) time using O(n) working space. In the general Markov model of evolution, the algorithm recovers the topology successfully with (1 - o(1)) probability from sequences with polynomial length in n. Moreover, for almost all trees, our algorithm achieves the same success probability on polylogarithmic sample sizes. The theoretical results are supported by simulation experiments involving trees with 500, 1,895, and 3,135 leaves. The topologies of the trees are recovered with high success from 2,000 bp DNA sequences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号