首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 667 毫秒
1.
Abstract We present moments and likelihood methods that estimate a DNA substitution rate from a group of closely related sister species pairs separated at an assumed time, and we test these methods with simulations. The methods also estimate ancestral population size and can test whether there is a significant difference among the ancestral population sizes of the sister species pairs. Estimates presented in the literature often ignore the ancestral coalescent prior to speciation and therefore should be biased upward. The simulations show that both methods yield accurate estimates given sample sizes of five or more species pairs and that better likelihood estimates are obtained if there is no significant difference among ancestral population sizes. The model presented here indicates that the larger than expected variation found in multitaxa datasets can be explained by variation in the ancestral coalescence and the Poisson mutation process. In this context, observed variation can often be accounted for by variation in ancestral population sizes rather than invoking variation in other parameters, such as divergence time or mutation rate. The methods are applied to data from two groups of species pairs (sea urchins and Alpheus snapping shrimp) that are thought to have separated by the rise of Panama three million years ago.  相似文献   

2.
Girod C  Vitalis R  Leblois R  Fréville H 《Genetics》2011,188(1):165-179
Reconstructing the demographic history of populations is a central issue in evolutionary biology. Using likelihood-based methods coupled with Monte Carlo simulations, it is now possible to reconstruct past changes in population size from genetic data. Using simulated data sets under various demographic scenarios, we evaluate the statistical performance of Msvar, a full-likelihood Bayesian method that infers past demographic change from microsatellite data. Our simulation tests show that Msvar is very efficient at detecting population declines and expansions, provided the event is neither too weak nor too recent. We further show that Msvar outperforms two moment-based methods (the M-ratio test and Bottleneck) for detecting population size changes, whatever the time and the severity of the event. The same trend emerges from a compilation of empirical studies. The latest version of Msvar provides estimates of the current and the ancestral population size and the time since the population started changing in size. We show that, in the absence of prior knowledge, Msvar provides little information on the mutation rate, which results in biased estimates and/or wide credibility intervals for each of the demographic parameters. However, scaling the population size parameters with the mutation rate and scaling the time with current population size, as coalescent theory requires, significantly improves the quality of the estimates for contraction but not for expansion scenarios. Finally, our results suggest that Msvar is robust to moderate departures from a strict stepwise mutation model.  相似文献   

3.
I J Wilson  D J Balding 《Genetics》1998,150(1):499-510
Ease and accuracy of typing, together with high levels of polymorphism and widespread distribution in the genome, make microsatellite (or short tandem repeat) loci an attractive potential source of information about both population histories and evolutionary processes. However, microsatellite data are difficult to interpret, in particular because of the frequency of back-mutations. Stochastic models for the underlying genetic processes can be specified, but in the past they have been too complicated for direct analysis. Recent developments in stochastic simulation methodology now allow direct inference about both historical events, such as genealogical coalescence times, and evolutionary parameters, such as mutation rates. A feature of the Markov chain Monte Carlo (MCMC) algorithm that we propose here is that the likelihood computations are simplified by treating the (unknown) ancestral allelic states as auxiliary parameters. We illustrate the algorithm by analyzing microsatellite samples simulated under the model. Our results suggest that a single microsatellite usually does not provide enough information for useful inferences, but that several completely linked microsatellites can be informative about some aspects of genealogical history and evolutionary processes. We also reanalyze data from a previously published human Y chromosome microsatellite study, finding evidence for an effective population size for human Y chromosomes in the low thousands and a recent time since their most recent common ancestor: the 95% interval runs from approximately 15, 000 to 130,000 years, with most likely values around 30,000 years.  相似文献   

4.
Molecular methods as applied to the biogeography of single species (phylogeography) or multiple codistributed species (comparative phylogeography) have been productively and extensively used to elucidate common historical features in the diversification of the Earth's biota. However, only recently have methods for estimating population divergence times or their confidence limits while taking into account the critical effects of genetic polymorphism in ancestral species become available, and earlier methods for doing so are underutilized. We review models that address the crucial distinction between the gene divergence, the parameter that is typically recovered in molecular phylogeographic studies, and the population divergence, which is in most cases the parameter of interest and will almost always postdate the gene divergence. Assuming that population sizes of ancestral species are distributed similarly to those of extant species, we show that phylogeographic studies in vertebrates suggest that divergence of alleles in ancestral species can comprise from less than 10% to over 50% of the total divergence between sister species, suggesting that the problem of ancestral polymorphism in dating population divergence can be substantial. The variance in the number of substitutions (among loci for a given species or among species for a given gene) resulting from the stochastic nature of DNA change is generally smaller than the variance due to substitutions along allelic lines whose coalescence times vary due to genetic drift in the ancestral population. Whereas the former variance can be reduced by further DNA sequencing at a single locus, the latter cannot. Contrary to phylogeographic intuition, dating population divergence times when allelic lines have achieved reciprocal monophyly is in some ways more challenging than when allelic lines have not achieved monophyly, because in the former case critical data on ancestral population size provided by residual ancestral polymorphism is lost. In the former case differences in coalescence time between species pairs can in principle be explained entirely by differences in ancestral population size without resorting to explanations involving differences in divergence time. Furthermore, the confidence limits on population divergence times are severely underestimated when those for number of substitutions per site in the DNA sequences examined are used as a proxy. This uncertainty highlights the importance of multilocus data in estimating population divergence times; multilocus data can in principle distinguish differences in coalescence time (T) resulting from differences in population divergence time and differences in T due to differences in ancestral population sizes and will reduce the confidence limits on the estimates. We analyze the contribution of ancestral population size (theta) to T and the effect of uncertainty in theta on estimates of population divergence (tau) for single loci under reciprocal monophyly using a simple Bayesian extension of Takahata and Satta's and Yang's recent coalescent methods. The confidence limits on tau decrease when the range over which ancestral population size theta is assumed to be distributed decreases and when tau increases; they generally exclude zero when tau/(4Ne) > 1. We also apply a maximum-likelihood method to several single and multilocus data sets. With multilocus data, the criterion for excluding tau = 0 is roughly that l tau/(4Ne) > 1, where l is the number of loci. Our analyses corroborate recent suggestions that increasing the number of loci is critical to decreasing the uncertainty in estimates of population divergence time.  相似文献   

5.
We investigate the expected coalescent in populations growing exponentially. The distribution of expected times to coalescence events may show a linear relationship with a number of ancestral lineages, when the latter is subjected to the "epidemic transformation". However, in a number of viral populations, upward curves are created when the epidemically transformed number of ancestral lineages is plotted against time. We consider possible causes of such upward curves. These include the possibility that a curved line is created through a transformation failure due to a sample size that is too large. We suggest a new formula for predicting such failure. The second cause is a population size increasing at an accelerating rate. However, the combination of recent coalescent events and an upward curve is created by an accelerating population increase only under restricted conditions. Specifically, such a pattern is expected only when, were population growth not to have accelerated, the transformation would have failed anyway. The third cause of nonlinearity arises in the estimated coalescent, as distinct from the real coalescent, if the mutation rate is small. However, coalescence times estimated from data typically give a straight line following epidemic transformation, but the rate of exponential increase, or r value, will be underestimated.  相似文献   

6.
An importance-sampling method is presented for computing the likelihood of the configuration of population genetic data under general assumptions about population history and transitions among states. The configuration of the data is the number of chromosomes sampled that are in each of a finite set of states. Transitions among states are governed by a Markov chain with transition probabilities dependent on one or more parameters. The method assumes that the joint distribution of coalescence times of the underlying gene genealogy is independent of the genetic state of each lineage. Given a set of coalescence times, the probability that a pair of lineages is chosen to coalesce in each replicate is proportional to the contribution that the coalescence event makes to the probability of the data. This method can be applied to gene genealogies generated by the neutral coalescent process and to genealogies generated by other processes, such as a linear birth-death process which provides a good approximation to the dynamics of low-frequency alleles. Two applications are described. In the first, the fit of allele frequencies at two microsatellite loci sampled in a Sardinian population to the one-step mutation model is tested. The one-step model is rejected for one locus but not for the other. The second application is to low-frequency alleles in a geographically subdivided population. The geographic location is the allelic state, and the alleles are assumed to be sufficiently rare that their dynamics can be approximated by a linear birth-death process in which the birth and death rates are independent of geographic location. The analysis of eight low-frequency allozyme alleles found in the glaucous-winged gull, Larus glaucescens, illustrates how geographically restricted dispersal can be detected.  相似文献   

7.
Genome-scale sequence data have become increasingly available in the phylogenetic studies for understanding the evolutionary histories of species. However, it is challenging to develop probabilistic models to account for heterogeneity of phylogenomic data. The multispecies coalescent model describes gene trees as independent random variables generated from a coalescence process occurring along the lineages of the species tree. Since the multispecies coalescent model allows gene trees to vary across genes, coalescent-based methods have been popularly used to account for heterogeneous gene trees in phylogenomic data analysis. In this paper, we summarize and evaluate the performance of coalescent-based methods for estimating species trees from genome-scale sequence data. We investigate the effects of deep coalescence and mutation on the performance of species tree estimation methods. We found that the coalescent-based methods perform well in estimating species trees for a large number of genes, regardless of the degree of deep coalescence and mutation. The performance of the coalescent methods is negatively correlated with the lengths of internal branches of the species tree.  相似文献   

8.
A number of methods commonly used in landscape genetics use an analogy to electrical resistance on a network to describe and fit barriers to movement across the landscape using genetic distance data. These are motivated by a mathematical equivalence between electrical resistance between two nodes of a network and the ‘commute time’, which is the mean time for a random walk on that network to leave one node, visit the other, and return. However, genetic data are more accurately modelled by a different quantity, the coalescence time. Here, we describe the differences between resistance distance and coalescence time, and explore the consequences for inference. We implemented a Bayesian method to infer effective movement rates and population sizes under both these models, and found that inference using commute times could produce misleading results in the presence of biased gene flow. We then used forwards‐time simulation with continuous geography to demonstrate that coalescence‐based inference remains more accurate than resistance‐based methods on realistic data, but difficulties highlight the need for methods that explicitly model continuous, heterogeneous geography.  相似文献   

9.
Graham J  Thompson EA 《Genetics》2000,156(1):375-384
In disequilibrium mapping from data on a rare allele, interest may focus on the ancestry of a random sample of current descendants of a mutation. The mutation is assumed to have been introduced into the population as a single copy a known time ago and to have reached a given copy number within the population. Theory has been developed to describe the ancestral distribution under arbitrary patterns of population expansion. Further results permit convenient realization of the ancestry for a random sample of copies of a rare allele within populations of constant size or within populations growing or shrinking at constant exponential rate. In this article, we present an efficient approximate method for realizing coalescence times under more general patterns of population growth. We also apply diagnostics, checking the age of the mutation. In the course of the derivation, some additional insight is gained into the dynamics of the descendants of the mutation.  相似文献   

10.
A 3-kb region encompassing the beta-globin gene has been analyzed for allelic sequence polymorphism in nine populations from Africa, Asia, and Europe. A unique gene tree was constructed from 326 sequences of 349 in the total sample. New maximum-likelihood methods for analyzing gene trees on the basis of coalescence theory have been used. The most recent common ancestor of the beta-globin gene tree is a sequence found only in Africa and estimated to have arisen approximately 800,000 years ago. There is no evidence for an exponential expansion out of a bottlenecked founding population, and an effective population size of approximately 10,000 has been maintained. Modest differences in levels of beta-globin diversity between Africa and Asia are better explained by greater African effective population size than by greater time depth. There may have been a reduction of Asian effective population size in recent evolutionary history. Characteristically Asian ancestry is estimated to be older than 200,000 years, suggesting that the ancestral hominid population at this time was widely dispersed across Africa and Asia. Patterns of beta-globin diversity suggest extensive worldwide late Pleistocene gene flow and are not easily reconciled with a unidirectional migration out of Africa 100,000 years ago and total replacement of archaic populations in Asia.  相似文献   

11.
N. Takahata  M. Nei 《Genetics》1990,124(4):967-978
To explain the long-term persistence of polymorphic alleles (trans-specific polymorphism) at the major histocompatibility complex (MHC) loci in rodents and primates, a computer simulation study was conducted about the coalescence time of different alleles sampled under various forms of selection. At the same time, average heterozygosity, the number of alleles in a sample, and the rate of codon substitution were examined to explain the mechanism of maintenance of polymorphism at the MHC loci. The results obtained are as follows. (1) The coalescence time for neutral alleles is too short to explain the trans-specific polymorphism at the MHC loci. (2) Under overdominant selection, the coalescence time can be tens of millions of years, depending on the parameter values used. The average heterozygosity and the number of alleles observed are also high enough to explain MHC polymorphism. (3) The pathogen adaptation model proposed by Snell is incapable of explaining MHC polymorphism, since the coalescence time for this model is too short and the expected heterozygosity and the expected number of alleles are too small. (4) From the mathematical point of view, the minority advantage model of frequency-dependent selection is capable of explaining a high degree of polymorphism and trans-specific polymorphism. (5) The molecular mimicry hypothesis also gives a sufficiently long coalescence time when the mutation rate is low in the host but very high in the parasite. However, the expected heterozygosity and the expected number of alleles tend to be too small. (6) Consideration of the molecular mechanism of the function of MHC molecules and other biological observations suggest that the most important factor for the maintenance of MHC polymorphism is overdominant selection. However, some experiments are necessary to distinguish between the overdominance and frequency-dependent selection hypotheses.  相似文献   

12.
F. Rousset 《Genetics》1996,142(4):1357-1362
Expected values of WRIGHT's F-statistics are functions of probabilities of identity in state. These values may be quite different under an infinite allele model and under stepwise mutation processes such as those occurring at microsatellite loci. However, a relationship between the probability of identity in state in stepwise mutation models and the distribution of coalescence times can be deduced from the relationship between probabilities of identity by descent and the distribution of coalescence times. The values of F(IS) and F(ST) can be computed using this property. Examination of the conditional probability of identity in state given some coalescence time and of the distribution of coalescence times are also useful for explaining the properties of F(IS) and F(ST) at high mutation rate loci, as shown here in an island model of population structure.  相似文献   

13.
We use variation at a set of eight human Y chromosome microsatellite loci to investigate the demographic history of the Y chromosome. Instead of assuming a population of constant size, as in most of the previous work on the Y chromosome, we consider a model which permits a period of recent population growth. We show that for most of the populations in our sample this model fits the data far better than a model with no growth. We estimate the demographic parameters of this model for each population and also the time to the most recent common ancestor. Since there is some uncertainty about the details of the microsatellite mutation process, we consider several plausible mutation schemes and estimate the variance in mutation size simultaneously with the demographic parameters of interest. Our finding of a recent common ancestor (probably in the last 120,000 years), coupled with a strong signal of demographic expansion in all populations, suggests either a recent human expansion from a small ancestral population, or natural selection acting on the Y chromosome.  相似文献   

14.
By far the greatest challenge for diversity studies is to characterize the diversity of prokaryotes, which probably encompasses billions of species, most of which are unculturable. Recent advances in theory and analysis have focused on multi-locus approaches and on combined analysis of molecular and ecological data. However, broad environmental surveys of bacterial diversity still rely on single-locus data, notably 16S ribosomal DNA, and little other detailed information. Evolutionary methods of delimiting species from single-locus data alone need to consider population genetic and macroevolutionary theories for the expected levels of interspecific and intraspecific variation. We discuss the use of a recent evolutionary method, based on the theory of coalescence within independently evolving populations, compared with a traditional approach that uses a fixed threshold divergence to delimit species.  相似文献   

15.
How many generations ago did the common ancestor of all present-day individuals live, and how does inbreeding affect this estimate? The number of ancestors within family trees determines the timing of the most recent common ancestor of humanity. However, mating is often non-random and inbreeding is ubiquitous in natural populations. Rates of pedigree growth are found for multiple types of inbreeding. This data is then combined with models of global population structure to estimate biparental coalescence times. When pedigrees for regular systems of mating are constructed, the growth rates of inbred populations contain Fibonacci n-step constants. The timing of the most recent common ancestor depends on global population structure, the mean rate of pedigree growth, mean fitness, and current population size. Inbreeding reduces the number of ancestors in a pedigree, pushing back global common ancestry times. These results are consistent with the remarkable findings of previous studies: all humanity shares common ancestry in the recent past.  相似文献   

16.
The sea urchin Diadema antillarum was the most important herbivore on Caribbean reefs until 1983, when mass mortality reduced its populations by more than 97%. Knowledge of its past demography is essential to reconstruct reef ecology as it was before human impact, which has been implicated as having caused high pre-mortality Diadema abundance. To determine the history of its population size, we sequenced the ATPase 6 and 8 region of mitochondrial DNA from populations in the Caribbean and in the eastern Atlantic (which was not affected by the mass mortality), as well as from the eastern Pacific D. mexicanum. The Caribbean population harbours an order of magnitude more molecular diversity than those of the eastern Pacific or the eastern Atlantic and, despite the recent mass mortality, its DNA sequences bear the genetic signature of a previous population expansion. By estimating mutation rates from divergence between D. antillarum and D. mexicanum, that were separated at a known time by the Isthmus of Panama, and by using estimates of effective population size derived from mismatch distributions and a maximum likelihood coalescence algorithm, we date the expansion as having occurred no more recently than 100 000 years before the present. Thus, Diadema was abundant in the Caribbean long before humans could have affected ecological processes; the genetic data contain no evidence of a recent, anthropogenically caused, population increase.  相似文献   

17.
Meuwissen TH  Goddard ME 《Genetics》2007,176(4):2551-2560
A novel multipoint method, based on an approximate coalescence approach, to analyze multiple linked markers is presented. Unlike other approximate coalescence methods, it considers all markers simultaneously but only two haplotypes at a time. We demonstrate the use of this method for linkage disequilibrium (LD) mapping of QTL and estimation of effective population size. The method estimates identity-by-descent (IBD) probabilities between pairs of marker haplotypes. Both LD and combined linkage and LD mapping rely on such IBD probabilities. The method is approximate in that it considers only the information on a pair of haplotypes, whereas a full modeling of the coalescence process would simultaneously consider all haplotypes. However, full coalescence modeling is computationally feasible only for few linked markers. Using simulations of the coalescence process, the method is shown to give almost unbiased estimates of the effective population size. Compared to direct marker and haplotype association analyses, IBD-based QTL mapping showed clearly a higher power to detect a QTL and a more realistic confidence interval for its position. The modeling of LD could be extended to estimate other LD-related parameters such as recombination rates.  相似文献   

18.
19.
The prediction of identity by descent (IBD) probabilities is essential for all methods that map quantitative trait loci (QTL). The IBD probabilities may be predicted from marker genotypes and/or pedigree information. Here, a method is presented that predicts IBD probabilities at a given chromosomal location given data on a haplotype of markers spanning that position. The method is based on a simplification of the coalescence process, and assumes that the number of generations since the base population and effective population size is known, although effective size may be estimated from the data. The probability that two gametes are IBD at a particular locus increases as the number of markers surrounding the locus with identical alleles increases. This effect is more pronounced when effective population size is high. Hence as effective population size increases, the IBD probabilities become more sensitive to the marker data which should favour finer scale mapping of the QTL. The IBD probability prediction method was developed for the situation where the pedigree of the animals was unknown (i.e. all information came from the marker genotypes), and the situation where, say T, generations of unknown pedigree are followed by some generations where pedigree and marker genotypes are known.  相似文献   

20.
Several methods have been developed to estimate the parental contributions in the genetic pool of an admixed population. Some pair-comparisons have been performed on real data but, to date, no systematic comparison of a large number of methods has been attempted. In this study, we performed a simulated data-based comparison of six of the most cited methods in the literature of the last 20 years. Five of these methods use allele frequencies and differ in the statistical treatment of the data. The last one also considers the degree of molecular divergence by estimating the coalescence times. Comparisons are based on the frequency at which the method can be applied, the bias and the mean square error of the estimation, and the frequency at which the true value is within the confidence interval. Eventually, each method was applied to a real data set of variously introgressed honeybee populations. In optimal conditions (highly differentiated parental populations, recent hybridization event), all methods perform equally well. When conditions are not optimal, the methods perform differently, but no method is always better or worse than all others. Some guidelines are given for the choice of the method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号