首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Markovtsova L  Marjoram P  Tavaré S 《Genetics》2000,156(3):1427-1436
We describe a Markov chain Monte Carlo approach for assessing the role of site-to-site rate variation in the analysis of within-population samples of DNA sequences using the coalescent. Our framework is a Bayesian one. We discuss methods for assessing the goodness-of-fit of these models, as well as problems concerning the separate estimation of effective population size and mutation rate. Using a mitochondrial data set for illustration, we show that ancestral inference concerning coalescence times can be dramatically affected if rate variation is ignored.  相似文献   

2.
Rannala B  Yang Z 《Genetics》2003,164(4):1645-1656
The effective population sizes of ancestral as well as modern species are important parameters in models of population genetics and human evolution. The commonly used method for estimating ancestral population sizes, based on counting mismatches between the species tree and the inferred gene trees, is highly biased as it ignores uncertainties in gene tree reconstruction. In this article, we develop a Bayes method for simultaneous estimation of the species divergence times and current and ancestral population sizes. The method uses DNA sequence data from multiple loci and extracts information about conflicts among gene tree topologies and coalescent times to estimate ancestral population sizes. The topology of the species tree is assumed known. A Markov chain Monte Carlo algorithm is implemented to integrate over uncertain gene trees and branch lengths (or coalescence times) at each locus as well as species divergence times. The method can handle any species tree and allows different numbers of sequences at different loci. We apply the method to published noncoding DNA sequences from the human and the great apes. There are strong correlations between posterior estimates of speciation times and ancestral population sizes. With the use of an informative prior for the human-chimpanzee divergence date, the population size of the common ancestor of the two species is estimated to be approximately 20,000, with a 95% credibility interval (8000, 40,000). Our estimates, however, are affected by model assumptions as well as data quality. We suggest that reliable estimates have yet to await more data and more realistic models.  相似文献   

3.
Wang J 《Genetics》2006,173(3):1679-1692
A variety of estimators have been developed to use genetic marker information in inferring the admixture proportions (parental contributions) of a hybrid population. The majority of these estimators used allele frequency data, ignored molecular information that is available in markers such as microsatellites and DNA sequences, and assumed that mutations are absent since the admixture event. As a result, these estimators may fail to deliver an estimate or give rather poor estimates when admixture is ancient and thus mutations are not negligible. A previous molecular estimator based its inference of admixture proportions on the average coalescent times between pairs of genes taken from within and between populations. In this article I propose an estimator that considers the entire genealogy of all of the sampled genes and infers admixture proportions from the numbers of segregating sites in DNA sequence samples. By considering the genealogy of all sequences rather than pairs of sequences, this new estimator also allows the joint estimation of other interesting parameters in the admixture model, such as admixture time, divergence time, population size, and mutation rate. Comparative analyses of simulated data indicate that the new coalescent estimator generally yields better estimates of admixture proportions than the previous molecular estimator, especially when the parental populations are not highly differentiated. It also gives reasonably accurate estimates of other admixture parameters. A human mtDNA sequence data set was analyzed to demonstrate the method, and the analysis results are discussed and compared with those from previous studies.  相似文献   

4.
While welcoming the comment of Ho et al. ( 2015 ), we find little that undermines the strength of our criticism, and it would appear they have misunderstood our central argument. Here we respond with the purpose of reiterating that we are (i) generally critical of much of the evidence presented in support of the time‐dependent molecular rate (TDMR) hypothesis and (ii) specifically critical of estimates of μ derived from tip‐dated sequences that exaggerate the importance of purifying selection as an explanation for TDMR over extended timescales. In response to assertions put forward by Ho et al. ( 2015 ), we use panmictic coalescent simulations of temporal data to explore a fundamental assumption for tip‐dated tree shape and associated mutation rate estimates, and the appropriateness and utility of the date randomization test. The results reveal problems for the joint estimation of tree topology, effective population size and μ with tip‐dated sequences using beast . Given the simulations, beast consistently obtains incorrect topological tree structures that are consistent with the substantial overestimation of μ and underestimation of effective population size. Data generated from lower effective population sizes were less likely to fail the date randomization test yet still resulted in substantially upwardly biased estimates of rates, bringing previous estimates of μ from temporally sampled DNA sequences into question. We find that our general criticisms of both the hypothesis of time‐dependent molecular evolution and Bayesian methods to estimate μ from temporally sampled DNA sequences are further reinforced.  相似文献   

5.
The Genealogy of Samples in Models with Selection   总被引:1,自引:0,他引:1  
C. Neuhauser  S. M. Krone 《Genetics》1997,145(2):519-534
We introduce the genealogy of a random sample of genes taken from a large haploid population that evolves according to random reproduction with selection and mutation. Without selection, the genealogy is described by Kingman''s well-known coalescent process. In the selective case, the genealogy of the sample is embedded in a graph with a coalescing and branching structure. We describe this graph, called the ancestral selection graph, and point out differences and similarities with Kingman''s coalescent. We present simulations for a two-allele model with symmetric mutation in which one of the alleles has a selective advantage over the other. We find that when the allele frequencies in the population are already in equilibrium, then the genealogy does not differ much from the neutral case. This is supported by rigorous results. Furthermore, we describe the ancestral selection graph for other selective models with finitely many selection classes, such as the K-allele models, infinitely-many-alleles models, DNA sequence models, and infinitely-many-sites models, and briefly discuss the diploid case.  相似文献   

6.
We analyze patterns of genetic variability of populations in the presence of a large seedbank with the help of a new coalescent structure called the seedbank coalescent. This ancestral process appears naturally as a scaling limit of the genealogy of large populations that sustain seedbanks, if the seedbank size and individual dormancy times are of the same order as those of the active population. Mutations appear as Poisson processes on the active lineages and potentially at reduced rate also on the dormant lineages. The presence of “dormant” lineages leads to qualitatively altered times to the most recent common ancestor and nonclassical patterns of genetic diversity. To illustrate this we provide a Wright–Fisher model with a seedbank component and mutation, motivated from recent models of microbial dormancy, whose genealogy can be described by the seedbank coalescent. Based on our coalescent model, we derive recursions for the expectation and variance of the time to most recent common ancestor, number of segregating sites, pairwise differences, and singletons. Estimates (obtained by simulations) of the distributions of commonly employed distance statistics, in the presence and absence of a seedbank, are compared. The effect of a seedbank on the expected site-frequency spectrum is also investigated using simulations. Our results indicate that the presence of a large seedbank considerably alters the distribution of some distance statistics, as well as the site-frequency spectrum. Thus, one should be able to detect from genetic data the presence of a large seedbank in natural populations.  相似文献   

7.
The multispecies coalescent (MSC) model accommodates both species divergences and within-species coalescent and provides a natural framework for phylogenetic analysis of genomic data when the gene trees vary across the genome. The MSC model implemented in the program bpp assumes a molecular clock and the Jukes–Cantor model, and is suitable for analyzing genomic data from closely related species. Here we extend our implementation to more general substitution models and relaxed clocks to allow the rate to vary among species. The MSC-with-relaxed-clock model allows the estimation of species divergence times and ancestral population sizes using genomic sequences sampled from contemporary species when the strict clock assumption is violated, and provides a simulation framework for evaluating species tree estimation methods. We conducted simulations and analyzed two real datasets to evaluate the utility of the new models. We confirm that the clock-JC model is adequate for inference of shallow trees with closely related species, but it is important to account for clock violation for distant species. Our simulation suggests that there is valuable phylogenetic information in the gene-tree branch lengths even if the molecular clock assumption is seriously violated, and the relaxed-clock models implemented in bpp are able to extract such information. Our Markov chain Monte Carlo algorithms suffer from mixing problems when used for species tree estimation under the relaxed clock and we discuss possible improvements. We conclude that the new models are currently most effective for estimating population parameters such as species divergence times when the species tree is fixed.  相似文献   

8.
Consequences of recombination on traditional phylogenetic analysis   总被引:38,自引:0,他引:38  
Schierup MH  Hein J 《Genetics》2000,156(2):879-891
We investigate the shape of a phylogenetic tree reconstructed from sequences evolving under the coalescent with recombination. The motivation is that evolutionary inferences are often made from phylogenetic trees reconstructed from population data even though recombination may well occur (mtDNA or viral sequences) or does occur (nuclear sequences). We investigate the size and direction of biases when a single tree is reconstructed ignoring recombination. Standard software (PHYLIP) was used to construct the best phylogenetic tree from sequences simulated under the coalescent with recombination. With recombination present, the length of terminal branches and the total branch length are larger, and the time to the most recent common ancestor smaller, than for a tree reconstructed from sequences evolving with no recombination. The effects are pronounced even for small levels of recombination that may not be immediately detectable in a data set. The phylogenies when recombination is present superficially resemble phylogenies for sequences from an exponentially growing population. However, exponential growth has a different effect on statistics such as Tajima's D. Furthermore, ignoring recombination leads to a large overestimation of the substitution rate heterogeneity and the loss of the molecular clock. These results are discussed in relation to viral and mtDNA data sets.  相似文献   

9.
Due to genetic variation in the ancestor of two populations or two species, the divergence time for DNA sequences from two populations is variable along the genome. Within genomic segments all bases will share the same divergence-because they share a most recent common ancestor-when no recombination event has occurred to split them apart. The size of these segments of constant divergence depends on the recombination rate, but also on the speciation time, the effective population size of the ancestral population, as well as demographic effects and selection. Thus, inference of these parameters may be possible if we can decode the divergence times along a genomic alignment. Here, we present a new hidden Markov model that infers the changing divergence (coalescence) times along the genome alignment using a coalescent framework, in order to estimate the speciation time, the recombination rate, and the ancestral effective population size. The model is efficient enough to allow inference on whole-genome data sets. We first investigate the power and consistency of the model with coalescent simulations and then apply it to the whole-genome sequences of the two orangutan sub-species, Bornean (P. p. pygmaeus) and Sumatran (P. p. abelii) orangutans from the Orangutan Genome Project. We estimate the speciation time between the two sub-species to be thousand years ago and the effective population size of the ancestral orangutan species to be , consistent with recent results based on smaller data sets. We also report a negative correlation between chromosome size and ancestral effective population size, which we interpret as a signature of recombination increasing the efficacy of selection.  相似文献   

10.
The study of sequence diversity under phylogenetic models is now classic. Theoretical studies of diversity under the Kingman coalescent appeared shortly after the introduction of the coalescent. In this paper we revisit this topic under the multispecies coalescent, an extension of the single population model to multiple populations. We derive exact formulas for the sequence dissimilarity of two sequences drawn at random under a basic multispecies setup. The multispecies model uses three parameters—the species tree birth rate under the pure birth process (Yule), the species effective population size and the mutation rate. We also discuss the effects of relaxing some of the model assumptions.  相似文献   

11.
The Kingman coalescent, which has become the foundation for a wide range of theoretical as well as empirical studies, was derived as an approximation of the Wright-Fisher (WF) model. The approximation heavily relies on the assumption that population size is large and sample size is much smaller than the population size. Whether the sample size is too large compared to the population size is rarely questioned in practice when applying statistical methods based on the Kingman coalescent. Since WF model is the most widely used population genetics model for reproduction, it is desirable to develop a coalescent framework for the WF model, which can be used whenever there are concerns about the accuracy of the Kingman coalescent as an approximation. This paper described the exact coalescent theory for the WF model and develops a simulation algorithm, which is then used, together with an analytical approach, to study the properties of the exact coalescent as well as its differences to the Kingman coalescent. We show that the Kingman coalescent differs from the exact coalescent by: (1) shorter waiting time between successive coalescent events; (2) different probability of observing a topological relationship among sequences in a sample; and (3) slightly smaller tree length in the genealogy of a large sample. On the other hand, there is little difference in the age of the most recent common ancestor (MRCA) of the sample. The exact coalescent makes up the longer waiting time between successive coalescent events by having multiple coalescence at the same time. The most significant difference among various summary statistics of a coalescent examined is the sum of lengths of external branches, which can be more than 10% larger for exact coalescent than that for the Kingman coalescent. As a whole, the Kingman coalescent is a remarkably accurate approximation to the exact coalescent for sample and population sizes falling considerably outside the region that was originally anticipated.  相似文献   

12.
This paper studies gene trees in subdivided populations which are constructed as perfect phylogenies from the pattern of mutations in a sample of DNA sequences and presents a new recursion for the probability distribution of such gene trees. The underlying evolutionary model is the coalescent process in a subdivided population. The infinitely-many-sites model of mutation is assumed. Ancestral inference questions that are discussed are maximum likelihood estimation of migration and mutation rates; detection of population growth by likelihood techniques; determining the distribution of the time to the most recent common ancestor of a sample of sequences; determining the distribution of the age of the mutations on the gene tree; determining in which subpopulation the most recent common ancestor of all the sequences was; determining subpopulation ancestors, where they were, and times to them; and determining in which subpopulations mutations occurred. A computational technique of Griffiths and Tavaré used is a computer intensive Markov chain simulation, which simulates gene trees conditional on their topology implied by the mutation pattern in the sample of DNA sequences. The software GENETREE, which implements these ancestral inference techniques, is available.  相似文献   

13.
Y X Fu  R Chakraborty 《Genetics》1998,150(1):487-497
Minisatellite and microsatellite are short tandemly repetitive sequences dispersed in eukaryotic genomes, many of which are highly polymorphic due to copy number variation of the repeats. Because mutation changes copy numbers of the repeat sequences in a generalized stepwise fashion, stepwise mutation models are widely used for studying the dynamics of these loci. We propose a minimum chi-square (MCS) method for simultaneous estimation of all the parameters in a stepwise mutation model and the ancestral allelic type of a sample. The MCS estimator requires knowing the mean number of alleles of a certain size in a sample, which can be estimated using Monte Carlo samples generated by a coalescent algorithm. The method is applied to samples of seven (CA)n repeat loci from eight human populations and one chimpanzee population. The estimated values of parameters suggest that there is a general tendency for microsatellite alleles to expand in size, because (1) each mutation has a slight tendency to cause size increase and (2) the mean size increase is larger than the mean size decrease for a mutation. Our estimates also suggest that most of these CA-repeat loci evolve according to multistep mutation models rather than single-step mutation models. We also introduced several quantities for measuring the quality of the estimation of ancestral allelic type, and it appears that the majority of the estimated ancestral allelic types are reasonably accurate. Implications of our analysis and potential extensions of the method are discussed.SINCE the discovery that a large number of loci with tandemly repeated sequences in human and many eukaryote species are highly polymorphic because of copy number variation of the repeats in different individuals (Jeffreys 1985; Litt and Luty 1989; Weber and May 1989), allele size data from such loci are rapidly becoming the dominant source of genetic markers for genome mapping, forensic testing, and population studies. Loci with repeat sequences longer than 5 bp are generally referred to as minisatellite or variable number tandem repeat loci, and those with repeat sequences between 2 to 5 bp are referred to as microsatellite or short tandem repeat loci (Tautz 1993). Because mutations change the copy number of such loci in a stepwise fashion, rapid accumulation of population samples from minisatellite and microsatellite loci has resurrected the interest of the stepwise mutation model (SMM), which was popular in the 1970s.  相似文献   

14.
We investigate the expected coalescent in populations growing exponentially. The distribution of expected times to coalescence events may show a linear relationship with a number of ancestral lineages, when the latter is subjected to the "epidemic transformation". However, in a number of viral populations, upward curves are created when the epidemically transformed number of ancestral lineages is plotted against time. We consider possible causes of such upward curves. These include the possibility that a curved line is created through a transformation failure due to a sample size that is too large. We suggest a new formula for predicting such failure. The second cause is a population size increasing at an accelerating rate. However, the combination of recent coalescent events and an upward curve is created by an accelerating population increase only under restricted conditions. Specifically, such a pattern is expected only when, were population growth not to have accelerated, the transformation would have failed anyway. The third cause of nonlinearity arises in the estimated coalescent, as distinct from the real coalescent, if the mutation rate is small. However, coalescence times estimated from data typically give a straight line following epidemic transformation, but the rate of exponential increase, or r value, will be underestimated.  相似文献   

15.
A simple nonparameteric test for population structure was applied to temporally spaced samples of HIV-1 sequences from the gag-pol region within two chronically infected individuals. The results show that temporal structure can be detected for samples separated by about 22 months or more. The performance of the method, which was originally proposed to detect geographic structure, was tested for temporally spaced samples using neutral coalescent simulations. Simulations showed that the method is robust to variation in samples sizes and mutation rates, to the presence/absence of recombination, and that the power to detect temporal structure is high. By comparing levels of temporal structure in simulations to the levels observed in real data, we estimate the effective intra-individual population size of HIV-1 to be between 10(3) and 10(4) viruses, which is in agreement with some previous estimates. Using this estimate and a simple measure of sequence diversity, we estimate an effective neutral mutation rate of about 5 x 10(-6) per site per generation in the gag-pol region. The definition and interpretation of estimates of such "effective" population parameters are discussed.  相似文献   

16.
Structured coalescent processes are derived for the finite island model under a migration mechanism that conserves the subpopulation sizes. The underlying population model is a modified Moran model in which the reproducing individual can have very many offspring with some probability. Convergence to a structured coalescent process results when assuming that migration follows a coalescent timescale which can be much shorter than the usual Wright–Fisher timescale. Three different limit processes are possible depending on the coalescent timescale, two of which allow multiple mergers of ancestral lines. The expected time to most recent common ancestor, and the expected total size of the genealogy, of balanced and unbalanced samples can be very similar, even when migration is low, if the coalescent process allows multiple mergers. The expected total size increases almost linearly with sample size in some cases. The results have implications for inference about genetic population structure.  相似文献   

17.
We propose a model based approach to use multiple gene trees to estimate the species tree. The coalescent process requires that gene divergences occur earlier than species divergences when there is any polymorphism in the ancestral species. Under this scenario, speciation times are restricted to be smaller than the corresponding gene split times. The maximum tree (MT) is the tree with the largest possible speciation times in the space of species trees restricted by available gene trees. If all populations have the same population size, the MT is the maximum likelihood estimate of the species tree. It can be shown the MT is a consistent estimator of the species tree even when the MT is built upon the estimates of the true gene trees if the gene tree estimates are statistically consistent. The MT converges in probability to the true species tree at an exponential rate.  相似文献   

18.
Seo TK  Thorne JL  Hasegawa M  Kishino H 《Genetics》2002,160(4):1283-1293
Using pseudomaximum-likelihood approaches to phylogenetic inference and coalescent theory, we develop a computationally tractable method of estimating effective population size from serially sampled viral data. We show that the variance of the maximum-likelihood estimator of effective population size depends on the serial sampling design only because internal node times on a coalescent genealogy can be better estimated with some designs than with others. Given the internal node times and the number of sequences sampled, the variance of the maximum-likelihood estimator is independent of the serial sampling design. We then estimate the effective size of the HIV-1 population within nine hosts. If we assume that the mutation rate is 2.5 x 10(-5) substitutions/generation and is the same in all patients, estimated generation lengths vary from 0.73 to 2.43 days/generation and the mean (1.47) is similar to the generation lengths estimated by other researchers. If we assume that generation length is 1.47 days and is the same in all patients, mutation rate estimates vary from 1.52 x 10(-5) to 5.02 x 10(-5). Our results indicate that effective viral population size and evolutionary rate per year are negatively correlated among HIV-1 patients.  相似文献   

19.
Wilkins JF 《Genetics》2004,168(4):2227-2244
This article presents an analysis of a model of isolation by distance in a continuous, two-dimensional habitat. An approximate expression is derived for the distribution of coalescence times for a pair of sequences sampled from specific locations in a rectangular habitat. Results are qualitatively similar to previous analyses of isolation by distance, but account explicitly for the location of samples relative to the habitat boundaries. A separation-of-timescales approach takes advantage of the fact that the sampling locations affect only the recent coalescent behavior. When the population size is larger than the number of generations required for a lineage to cross the habitat range, the long-term genealogical process is reasonably well described by Kingman's coalescent with time rescaled by the effective population size. This long-term effective population size is affected by the local dispersal behavior as well as the geometry of the habitat. When the population size is smaller than the time required to cross the habitat, deep branches in the genealogy are longer than would be expected under the standard neutral coalescent, similar to the pattern expected for a panmictic population whose population size was larger in the past.  相似文献   

20.
The multispecies coalescent model provides a natural framework for species tree estimation accounting for gene-tree conflicts. Although a number of species tree methods under the multispecies coalescent have been suggested and evaluated using simulation, their statistical properties remain poorly understood. Here, we use mathematical analysis aided by computer simulation to examine the identifiability, consistency, and efficiency of different species tree methods in the case of three species and three sequences under the molecular clock. We consider four major species-tree methods including concatenation, two-step, independent-sites maximum likelihood, and maximum likelihood. We develop approximations that predict that the probit transform of the species tree estimation error decreases linearly with the square root of the number of loci. Even in this simplest case, major differences exist among the methods. Full-likelihood methods are considerably more efficient than summary methods such as concatenation and two-step. They also provide estimates of important parameters such as species divergence times and ancestral population sizes,whereas these parameters are not identifiable by summary methods. Our results highlight the need to improve the statistical efficiency of summary methods and the computational efficiency of full likelihood methods of species tree estimation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号