首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The concordance of gene trees and species trees is reconsidered in detail, allowing for samples of arbitrary size to be taken from the species. A sense of concordance for gene tree and species tree topologies is clarified, such that if the "collapsed gene tree" produced by a gene tree has the same topology as the species tree, the gene tree is said to be topologically concordant with the species tree. The term speciodendric is introduced to refer to genes whose trees are topologically concordant with species trees. For a given three-species topology, probabilities of each of the three possible collapsed gene tree topologies are given, as are probabilities of monophyletic concordance and concordance in the sense of N. Takahata (1989), Genetics 122, 957-966. Increasing the sample size is found to increase the probability of topological concordance, but a limit exists on how much the topological concordance probability can be increased. Suggested sample sizes beyond which this probability can be increased only minimally are given. The results are discussed in terms of implications for molecular studies of phylogenetics and speciation.  相似文献   

2.
Relationships between gene trees and species trees   总被引:49,自引:10,他引:39  
It is well known that a phylogenetic tree (gene tree) constructed from DNA sequences for a genetic locus does not necessarily agree with the tree that represents the actual evolutionary pathway of the species involved (species tree). One of the important factors that cause this difference is genetic polymorphism in the ancestral species. Under the assumption of neutral mutations, this problem can be studied by evaluating the probability (P) that a gene tree has the same topology as that of the species tree. When one gene (allele) is used from each of the species involved, the probability can be expressed as a simple function of Ti = ti/(2N), where ti is the evolutionary time measured in generations for the ith internodal branch of the species tree and N is the effective population size. When any of the Ti's is less than 1, the probability P becomes considerably less than 1.0. This probability cannot be substantially increased by increasing the number of alleles sampled from a locus. To increase the probability, one has to use DNA sequences from many different loci that have evolved independently of each other.   相似文献   

3.
Spatial genetic structure (SGS) of plants mainly depends on the effective population size and gene dispersal. Maternally inherited loci are expected to have higher genetic differentiation between populations and more intensive SGS within populations than biparentally inherited loci because of smaller effective population sizes and fewer opportunities of gene dispersal in the maternally inherited loci. We investigated biparentally inherited nuclear genotypes and maternally inherited chloroplast haplotypes of microsatellites in 17 tree populations of three wild cherry species under different conditions of tree distribution and seed dispersal. As expected, interpopulation genetic differentiation was 6–9 times higher in chloroplast haplotypes than in nuclear genotypes. This difference indicated that pollen flow 4–7 times exceeded seed flow between populations. However, no difference between nuclear and chloroplast loci was detected in within‐population SGS intensity due to their substantial variation among the populations. The SGS intensity tended to increase as trees became more aggregated, suggesting that tree aggregation biased pollen and seed dispersal distances toward shorter. The loss of effective seed dispersers, Asian black bears, did not affect the SGS intensity probably because of mitigation of the bear loss by other vertebrate dispersers and too few tree generations after the bear loss to alter SGS. The findings suggest that SGS is more variable in smaller spatial scales due to various ecological factors in local populations.  相似文献   

4.
Liu L  Pearl DK 《Systematic biology》2007,56(3):504-514
The desire to infer the evolutionary history of a group of species should be more viable now that a considerable amount of multilocus molecular data is available. However, the current molecular phylogenetic paradigm still reconstructs gene trees to represent the species tree. Further, commonly used methods of combining data, such as the concatenation method, are known to be inconsistent in some circumstances. In this paper, we propose a Bayesian hierarchical model to estimate the phylogeny of a group of species using multiple estimated gene tree distributions, such as those that arise in a Bayesian analysis of DNA sequence data. Our model employs substitution models used in traditional phylogenetics but also uses coalescent theory to explain genealogical signals from species trees to gene trees and from gene trees to sequence data, thereby forming a complete stochastic model to estimate gene trees, species trees, ancestral population sizes, and species divergence times simultaneously. Our model is founded on the assumption that gene trees, even of unlinked loci, are correlated due to being derived from a single species tree and therefore should be estimated jointly. We apply the method to two multilocus data sets of DNA sequences. The estimates of the species tree topology and divergence times appear to be robust to the prior of the population size, whereas the estimates of effective population sizes are sensitive to the prior used in the analysis. These analyses also suggest that the model is superior to the concatenation method in fitting these data sets and thus provides a more realistic assessment of the variability in the distribution of the species tree that may have produced the molecular information at hand. Future improvements of our model and algorithm should include consideration of other factors that can cause discordance of gene trees and species trees, such as horizontal transfer or gene duplication.  相似文献   

5.
Kuo CH  Avise JC 《Genetica》2005,124(2-3):179-186
Computer simulations were used to investigate population conditions under which phylogeographic breaks in gene genealogies can be interpreted with confidence to infer the existence and location of historical barriers to gene flow in continuously distributed, low-dispersal species. We generated collections of haplotypic gene trees under a variety of demographic scenarios and analyzed them with regard to salient genealogical breaks in their spatial patterns. In the first part of the analysis, we estimated the frequency in which the spatial location of the deepest phylogeographic break between successive pairs of populations along a linear habitat coincided with a spatial physical barrier to dispersal. Results confirm previous reports that individual gene trees can show ‘haphazard’ phylogeographic discontinuities even in the absence of historical barriers to gene flow. In the second part of the analysis, we assessed the probability that pairs of gene genealogies from a set of population samples agree upon the location of a geographical barrier. Our findings extend earlier reports by demonstrating that spatially concordant phylogeographic breaks across independent neutral loci normally emerge only in the presence of longstanding historical barriers to gene flow. Genealogical concordance across multiple loci thus becomes a deciding criterion by which to distinguish between stochastic and deterministic causation in accounting for phylogeographic discontinuities in continuously distributed species.  相似文献   

6.
The relationship between speciation times and the corresponding times of gene divergence is of interest in phylogenetic inference as a means of understanding the past evolutionary dynamics of populations and of estimating the timing of speciation events. It has long been recognized that gene divergence times might substantially pre-date speciation events. Although the distribution of the difference between these has previously been studied for the case of two populations, this distribution has not been explicitly computed for larger species phylogenies. Here we derive a simple method for computing this distribution for trees of arbitrary size. A two-stage procedure is proposed which (i) considers the probability distribution of the time from the speciation event at the root of the species tree to the gene coalescent time conditionally on the number of gene lineages available at the root; and (ii) calculates the probability mass function for the number of gene lineages at the root. This two-stage approach dramatically simplifies numerical analysis, because in the first step the conditional distribution does not depend on an underlying species tree, while in the second step the pattern of gene coalescence prior to the species tree root is irrelevant. In addition, the algorithm provides intuition concerning the properties of the distribution with respect to the various features of the underlying species tree. The methodology is complemented by developing probabilistic formulae and software, written in R. The method and software are tested on five-taxon species trees with varying levels of symmetry. The examples demonstrate that more symmetric species trees tend to have larger mean coalescent times and are more likely to have a unimodal gamma-like distribution with a long right tail, while asymmetric trees tend to have smaller mean coalescent times with an exponential-like distribution. In addition, species trees with longer branches generally have shorter mean coalescent times, with branches closest to the root of the tree being most influential.  相似文献   

7.
Rannala B  Yang Z 《Genetics》2003,164(4):1645-1656
The effective population sizes of ancestral as well as modern species are important parameters in models of population genetics and human evolution. The commonly used method for estimating ancestral population sizes, based on counting mismatches between the species tree and the inferred gene trees, is highly biased as it ignores uncertainties in gene tree reconstruction. In this article, we develop a Bayes method for simultaneous estimation of the species divergence times and current and ancestral population sizes. The method uses DNA sequence data from multiple loci and extracts information about conflicts among gene tree topologies and coalescent times to estimate ancestral population sizes. The topology of the species tree is assumed known. A Markov chain Monte Carlo algorithm is implemented to integrate over uncertain gene trees and branch lengths (or coalescence times) at each locus as well as species divergence times. The method can handle any species tree and allows different numbers of sequences at different loci. We apply the method to published noncoding DNA sequences from the human and the great apes. There are strong correlations between posterior estimates of speciation times and ancestral population sizes. With the use of an informative prior for the human-chimpanzee divergence date, the population size of the common ancestor of the two species is estimated to be approximately 20,000, with a 95% credibility interval (8000, 40,000). Our estimates, however, are affected by model assumptions as well as data quality. We suggest that reliable estimates have yet to await more data and more realistic models.  相似文献   

8.
Molecular phylogenetics has entered a new era in which species trees are estimated from a collection of gene trees using methods that accommodate their heterogeneity and discordance with the species tree. Empirical evaluation of species trees is necessary to assess the performance (i.e., accuracy and precision) of these methods with real data, which consists of gene genealogies likely shaped by different historical and demographic processes. We analyzed 20 loci for 16 species of the South American lizards of the Liolaemus darwinii species group and reconstructed a species tree with *BEAST, then compared the performance of this method under different sampling strategies of loci, individuals, and sequence lengths. We found an increase in the accuracy and precision of species trees with the number of loci, but for any number of loci, accuracy substantially decreased only when using only one individual per species or 25% of the full sequence length (~ 147 bp). In addition, locus "informativeness" was an important factor in the accuracy/precision of species trees when using a few loci, but it became increasingly irrelevant with additional loci. Our empirical results combined with the previous simulation studies suggest that there is an optimal range of sampling effort of loci, individuals, and sequence lengths for a given speciation history and information content of the data. Future studies should be directed toward further assessment of other factors that can impact performance of species trees, including gene flow, locus "informativeness," tree shape, missing data, and errors in species delimitation.  相似文献   

9.
Multigene sequence data have great potential for elucidating important and interesting evolutionary processes, but statistical methods for extracting information from such data remain limited. Although various biological processes may cause different genes to have different genealogical histories (and hence different tree topologies), we also may expect that the number of distinct topologies among a set of genes is relatively small compared with the number of possible topologies. Therefore evidence about the tree topology for one gene should influence our inferences of the tree topology on a different gene, but to what extent? In this paper, we present a new approach for modeling and estimating concordance among a set of gene trees given aligned molecular sequence data. Our approach introduces a one-parameter probability distribution to describe the prior distribution of concordance among gene trees. We describe a novel 2-stage Markov chain Monte Carlo (MCMC) method that first obtains independent Bayesian posterior probability distributions for individual genes using standard methods. These posterior distributions are then used as input for a second MCMC procedure that estimates a posterior distribution of gene-to-tree maps (GTMs). The posterior distribution of GTMs can then be summarized to provide revised posterior probability distributions for each gene (taking account of concordance) and to allow estimation of the proportion of the sampled genes for which any given clade is true (the sample-wide concordance factor). Further, under the assumption that the sampled genes are drawn randomly from a genome of known size, we show how one can obtain an estimate, with credibility intervals, on the proportion of the entire genome for which a clade is true (the genome-wide concordance factor). We demonstrate the method on a set of 106 genes from 8 yeast species.  相似文献   

10.
We consider gene trees in three species for which the species tree is known. We show that population subdivision in ancestral species can lead to asymmetry in the frequencies of the two gene trees not concordant with the species tree and, if subdivision is extreme, cause the one of the nonconcordant gene trees to be more probable than the concordant gene tree. Although published data for the human-chimp-gorilla clade and for three species of Drosophila show asymmetry consistent with our model, sequencing error could also account for observed patterns. We show that substantial levels of persistent ancestral subdivision are needed to account for the observed levels of asymmetry found in these two studies.  相似文献   

11.
Gene trees are evolutionary trees representing the ancestry of genes sampled from multiple populations. Species trees represent populations of individuals—each with many genes—splitting into new populations or species. The coalescent process, which models ancestry of gene copies within populations, is often used to model the probability distribution of gene trees given a fixed species tree. This multispecies coalescent model provides a framework for phylogeneticists to infer species trees from gene trees using maximum likelihood or Bayesian approaches. Because the coalescent models a branching process over time, all trees are typically assumed to be rooted in this setting. Often, however, gene trees inferred by traditional phylogenetic methods are unrooted. We investigate probabilities of unrooted gene trees under the multispecies coalescent model. We show that when there are four species with one gene sampled per species, the distribution of unrooted gene tree topologies identifies the unrooted species tree topology and some, but not all, information in the species tree edges (branch lengths). The location of the root on the species tree is not identifiable in this situation. However, for 5 or more species with one gene sampled per species, we show that the distribution of unrooted gene tree topologies identifies the rooted species tree topology and all its internal branch lengths. The length of any pendant branch leading to a leaf of the species tree is also identifiable for any species from which more than one gene is sampled.  相似文献   

12.
Development of methods for estimating species trees from multilocus data is a current challenge in evolutionary biology. We propose a method for estimating the species tree topology and branch lengths using approximate Bayesian computation (ABC). The method takes as data a sample of observed rooted gene tree topologies, and then iterates through the following sequence of steps: First, a randomly selected species tree is used to compute the distribution of rooted gene tree topologies. This distribution is then compared to the observed gene topology frequencies, and if the fit between the observed and the predicted distributions is close enough, the proposed species tree is retained. Repeating this many times leads to a collection of retained species trees that are then used to form the estimate of the overall species tree. We test the performance of the method, which we call ST-ABC, using both simulated and empirical data. The simulation study examines both symmetric and asymmetric species trees over a range of branch lengths and sample sizes. The results from the simulation study show that the model performs very well, giving accurate estimates for both the topology and the branch lengths across the conditions studied, and that a sample size of 25 loci appears to be adequate for the method. Further, we apply the method to two empirical cases: a 4-taxon data set for primates and a 7-taxon data set for yeast. In both cases, we find that estimates obtained with ST-ABC agree with previous studies. The method provides efficient estimation of the species tree, and does not require sequence data, but rather the observed distribution of rooted gene topologies without branch lengths. Therefore, this method is a useful alternative to other currently available methods for species tree estimation.  相似文献   

13.
N. Takahata 《Genetics》1989,122(4):957-966
A genealogical relationship among genes at a locus (gene tree) sampled from three related populations was examined with special reference to population relatedness (population tree). A phylogenetically informative event in a gene tree constructed from nucleotide differences consists of interspecific coalescences of genes in each of which two genes sampled from different populations are descended from a common ancestor. The consistency probability between gene and population trees in which they are topologically identical was formulated in terms of interspecific coalescences. It was found that the consistency probability thus derived substantially increases as the sample size of genes increases, unless the divergence time of populations is very long compared to population sizes. Hence, there are cases where large samples at a locus are very useful in inferring a population tree.  相似文献   

14.
Numerous simulation studies have investigated the accuracy of phylogenetic inference of gene trees under maximum parsimony, maximum likelihood, and Bayesian techniques. The relative accuracy of species tree inference methods under simulation has received less study. The number of analytical techniques available for inferring species trees is increasing rapidly, and in this paper, we compare the performance of several species tree inference techniques at estimating recent species divergences using computer simulation. Simulating gene trees within species trees of different shapes and with varying tree lengths (T) and population sizes (), and evolving sequences on those gene trees, allows us to determine how phylogenetic accuracy changes in relation to different levels of deep coalescence and phylogenetic signal. When the probability of discordance between the gene trees and the species tree is high (i.e., T is small and/or is large), Bayesian species tree inference using the multispecies coalescent (BEST) outperforms other methods. The performance of all methods improves as the total length of the species tree is increased, which reflects the combined benefits of decreasing the probability of discordance between species trees and gene trees and gaining more accurate estimates for gene trees. Decreasing the probability of deep coalescences by reducing also leads to accuracy gains for most methods. Increasing the number of loci from 10 to 100 improves accuracy under difficult demographic scenarios (i.e., coalescent units ≤ 4N(e)), but 10 loci are adequate for estimating the correct species tree in cases where deep coalescence is limited or absent. In general, the correlation between the phylogenetic accuracy and the posterior probability values obtained from BEST is high, although posterior probabilities are overestimated when the prior distribution for is misspecified.  相似文献   

15.
We propose a method of analysing genetic data to obtain separate estimates of the size (N(p)) and migration rate (m(p)) for the sampled populations, without precise prior knowledge of mutation rates at each locus ( micro(L)). The effects of migration and mutation can be distinguished because high migration has the effect of reducing genetic differentiation across all loci, whereas a high mutation rate will only affect the locus in question. The method also takes account of any differences between the spectra of immigrant alleles and of new mutant alleles. If the genetic data come from a range of population sizes, and the loci have a range of mutation rates, it is possible to estimate the relative sizes of the different N(p) values, and likewise the m(p) and the micro(L). Microsatellite loci may also be particularly appropriate because loci with a high mutation rate can reach mutation-drift-migration equilibrium more quickly, and because the spectra of mutants arriving in a population can be particularly distinct from the immigrants. We demonstrate this principle using a microsatellite data set from Mauritian skinks. The method identifies low gene flow between a putative new species and populations of its sister species, whereas the differentiation of two other populations is attributed to small population size. These distinct interpretations were not readily apparent from conventional measures of genetic differentiation and gene diversity. When the method is evaluated using simulated data sets, it correctly distinguishes low gene flow from small population size. Loci that are not at mutation-migration-drift equilibrium can distort the parameter estimates slightly. We discuss strategies for detecting and overcoming this effect.  相似文献   

16.
Lineage, or true ‘species’, trees may differ from gene trees because of stochastic processes in molecular evolution leading to gene‐tree heterogeneity. Problems with inferring species trees because of excessive incomplete lineage sorting may be exacerbated in lineages with rapid diversification or recent divergences necessitating the use of multiple loci and individuals. Many recent multilocus studies that investigate divergence times identify lineage splitting to be more recent than single‐locus studies, forcing the revision of biogeographic scenarios driving divergence. Here, we use 21 nuclear loci from regional populations to re‐evaluate hypotheses identified in an mtDNA phylogeographic study of the Brown Creeper (Certhia americana), as well as identify processes driving divergence. Nuclear phylogeographic analyses identified hierarchical genetic structure, supporting a basal split at approximately 32°N latitude, splitting northern and southern populations, with mixed patterns of genealogical concordance and discordance between data sets within the major lineages. Coalescent‐based analyses identify isolation, with little to no gene flow, as the primary driver of divergence between lineages. Recent isolation appears to have caused genetic bottlenecks in populations in the Sierra Madre Oriental and coastal mountain ranges of California, which may be targets for conservation concerns.  相似文献   

17.
We review recent models to estimate phylogenetic trees under the multispecies coalescent. Although the distinction between gene trees and species trees has come to the fore of phylogenetics, only recently have methods been developed that explicitly estimate species trees. Of the several factors that can cause gene tree heterogeneity and discordance with the species tree, deep coalescence due to random genetic drift in branches of the species tree has been modeled most thoroughly. Bayesian approaches to estimating species trees utilizes two likelihood functions, one of which has been widely used in traditional phylogenetics and involves the model of nucleotide substitution, and the second of which is less familiar to phylogeneticists and involves the probability distribution of gene trees given a species tree. Other recent parametric and nonparametric methods for estimating species trees involve parsimony criteria, summary statistics, supertree and consensus methods. Species tree approaches are an appropriate goal for systematics, appear to work well in some cases where concatenation can be misleading, and suggest that sampling many independent loci will be paramount. Such methods can also be challenging to implement because of the complexity of the models and computational time. In addition, further elaboration of the simplest of coalescent models will be required to incorporate commonly known issues such as deviation from the molecular clock, gene flow and other genetic forces.  相似文献   

18.
Mathematical consequences of the genealogical species concept   总被引:16,自引:0,他引:16  
A genealogical species is defined as a basal group of organisms whose members are all more closely related to each other than they are to any organisms outside the group ("exclusivity"), and which contains no exclusive group within it. In practice, a pair of species is so defined when phylogenies of alleles from a sample of loci shows them to be reciprocally monophyletic at all or some specified fraction of the loci. We investigate the length of time it takes to attain this status when an ancestral population divides into two descendant populations of equal size with no gene exchange, and when genetic drift and mutation are the only evolutionary forces operating. The number of loci used has a substantial effect on the probability of observing reciprocal monophyly at different times after population separation, with very long times needed to observe complete reciprocal monophyly for a large number of loci. In contrast, the number of alleles sampled per locus has a relatively small effect on the probability of reciprocal monophyly. Because a single mitochondrial or chloroplast locus becomes reciprocally monophyletic much faster than does a single nuclear locus, it is not advisable to use mitochondrial and chloroplast DNA to recognize genealogical species for long periods after population divergence. Using a weaker criterion of assigning genealogical species status when more than 50% of sampled nuclear loci show reciprocal monophyly, genealogical species status depends much less on the number of sampled loci, and is attained at roughly 4-7 N generations after populations are isolated, where N is the historically effective population size of each descendant. If genealogical species status is defined as more than 95% of sampled nuclear loci showing reciprocal monophyly, this status is attained after roughly 9-12 N generations.  相似文献   

19.
Takezaki N  Nei M 《Genetics》2008,178(1):385-392
Microsatellite DNA loci or short tandem repeats (STRs) are abundant in eukaryotic genomes and are often used for constructing phylogenetic trees of closely related populations or species. These phylogenetic trees are usually constructed by using some genetic distance measure based on allele frequency data, and there are many distance measures that have been proposed for this purpose. In the past the efficiencies of these distance measures in constructing phylogenetic trees have been studied mathematically or by computer simulations. Recently, however, allele frequencies of 783 STR loci have been compiled from various human populations. We have therefore used these empirical data to investigate the relative efficiencies of different distance measures in constructing phylogenetic trees. The results showed that (1) the probability of obtaining the correct branching pattern of a tree (PC) is generally highest for DA distance; (2) FST*, standard genetic distance (DS), and FST/(1-FST) give similar PC-values, FST* being slightly better than the other two; and (3) (deltamu)2 shows PC-values much lower than the other distance measures. To have reasonably high PC-values for trees similar to ours, at least 30 loci with a minimum of 15 individuals are required when DA distance is used.  相似文献   

20.
Because of the stochastic way in which lineages sort during speciation, gene trees may differ in topology from each other and from species trees. Surprisingly, assuming that genetic lineages follow a coalescent model of within-species evolution, we find that for any species tree topology with five or more species, there exist branch lengths for which gene tree discordance is so common that the most likely gene tree topology to evolve along the branches of a species tree differs from the species phylogeny. This counterintuitive result implies that in combining data on multiple loci, the straightforward procedure of using the most frequently observed gene tree topology as an estimate of the species tree topology can be asymptotically guaranteed to produce an incorrect estimate. We conclude with suggestions that can aid in overcoming this new obstacle to accurate genomic inference of species phylogenies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号