首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Maximum Likelihood Estimation of Population Parameters   总被引:10,自引:5,他引:5       下载免费PDF全文
Y. X. Fu  W. H. Li 《Genetics》1993,134(4):1261-1270
One of the most important parameters in population genetics is θ = 4N(e)μ where N(e) is the effective population size and μ is the rate of mutation per gene per generation. We study two related problems, using the maximum likelihood method and the theory of coalescence. One problem is the potential improvement of accuracy in estimating the parameter θ over existing methods and the other is the estimation of parameter λ which is the ratio of two θ's. The minimum variances of estimates of the parameter θ are derived under two idealized situations. These minimum variances serve as the lower bounds of the variances of all possible estimates of θ in practice. We then show that Watterson's estimate of θ based on the number of segregating sites is asymptotically an optimal estimate of θ. However, for a finite sample of sequences, substantial improvement over Watterson's estimate is possible when θ is large. The maximum likelihood estimate of λ = θ(1)/θ(2) is obtained and the properties of the estimate are discussed.  相似文献   

2.
Y. X. Fu 《Genetics》1997,146(4):1489-1499
A coalescent theory for a sample of DNA sequences from a partially selfing diploid population and an algorithm for simulating such samples are developed in this article. Approximate formulas are given for the expectation and the variance of the number of segregating sites in a sample of k sequences from n individuals. Several new estimators of the important parameters θ = 4Nμ and the selfing rate s, where N and μ are, respectively, the effective population size and the mutation rate per sequence per generation, are proposed and their sampling properties are studied.  相似文献   

3.
A Coalescent Estimator of the Population Recombination Rate   总被引:42,自引:10,他引:32       下载免费PDF全文
J. Hey  J. Wakeley 《Genetics》1997,145(3):833-846
Population genetic models often use a population recombination parameter 4Nc, where N is the effective population size and c is the recombination rate per generation. In many ways 4Nc is comparable to 4Nu, the population mutation rate. Both combine genome level and population level processes, and together they describe the rate of production of genetic variation in a population. However, 4Nc is more difficult to estimate. For a population sample of DNA sequences, historical recombination can only be detected if polymorphisms exist, and even then most recombination events are not detectable. This paper describes an estimator of 4Nc, hereafter designated γ (gamma), that was developed using a coalescent model for a sample of four DNA sequences with recombination. The reliability of γ was assessed using multiple coalescent simulations. In general γ has low to moderate bias, and the reliability of γ is comparable, though less, than that for a widely used estimator of 4Nu. If there exists an independent estimate of the recombination rate (per generation, per base pair), γ can be used to estimate the effective population size or the neutral mutation rate.  相似文献   

4.
We describe a forward-time haploid reproduction model with a constant population size that includes life history characteristics common to many marine organisms. We develop coalescent approximations for sample gene genealogies under this model and use these to predict patterns of genetic variation. Depending on the behavior of the underlying parameters of the model, the approximations are coalescent processes with simultaneous multiple mergers or Kingman’s coalescent. Using simulations, we apply our model to data from the Pacific oyster and show that our model predicts the observed data very well. We also show that a fact which holds for Kingman’s coalescent and also for general coalescent trees–that the most-frequent allele at a biallelic locus is likely to be the ancestral allele–is not true for our model. Our work suggests that the power to detect a “sweepstakes effect” in a sample of DNA sequences from marine organisms depends on the sample size.  相似文献   

5.
Recombination is a fundamental evolutionary force. Therefore the population recombination rate ρ plays an important role in the analysis of population genetic data; however, it is notoriously difficult to estimate. This difficulty applies both to the accuracy of commonly used estimates and to the computational efforts required to obtain them. Some particularly popular methods are based on approximations to the likelihood. They require considerably less computational efforts than the full-likelihood method with not much less accuracy. Nevertheless, the computation of these approximate estimates can still be very time consuming, in particular when the sample size is large. Although auxiliary quantities for composite likelihood estimates can be computed in advance and stored in tables, these tables need to be recomputed if either the sample size or the mutation rate θ changes. Here we introduce a new method based on regression combined with boosting as a model selection technique. For large samples, it requires much less computational effort than other approximate methods, while providing similar levels of accuracy. Notably, for a sample of hundreds or thousands of individuals, the estimate of ρ using regression can be obtained on a single personal computer within a couple of minutes while other methods may need a couple of days or months (or even years). When the sample size is smaller (n ≤ 50), our new method remains computational efficient but produces biased estimates. We expect the new estimates to be helpful when analyzing large samples and/or many loci with possibly different mutation rates.  相似文献   

6.
The serial coalescent extends traditional coalescent theory to include genealogies in which not all individuals were sampled at the same time. Inference in this framework is powerful because population size and evolutionary rate may be estimated independently. However, when the sequences in question are affected by selection acting at many sites, the genealogies may differ significantly from their neutral expectation, and inference of demographic parameters may become inaccurate. I demonstrate that this inaccuracy is severe when the mutation rate and strength of selection are jointly large, and I develop a new likelihood calculation that, while approximate, improves the accuracy of population size estimates. When used in a Bayesian parameter estimation context, the new calculation allows for estimation of the shape of the pairwise coalescent rate function and can be used to detect the presence of selection acting at many sites in a sequence. Using the new method, I investigate two sets of dengue virus sequences from Puerto Rico and Thailand, and show that both genealogies are likely to have been distorted by selection.  相似文献   

7.
The Coalescent Process with Selfing   总被引:9,自引:4,他引:5       下载免费PDF全文
M. Nordborg  P. Donnelly 《Genetics》1997,146(3):1185-1195
A method for estimating the selfing rate using DNA sequence data was recently proposed by Milligan. Unfortunately, a number of errors make interpretation of his results problematic. In the present paper we first show how the usual coalescent process can be adapted to models that include selfing, and then use this result to find moment estimators as well as the likelihood surface for the selfing rate, s, and the scaled mutation rate, θ. We conclude that, regardless of the method used, large sample sizes are necessary to estimate s with any degree of certainty, and that the estimate is always highly sensitive to recent changes in the true value.  相似文献   

8.
F. Tajima 《Genetics》1996,143(3):1457-1465
The expectations of the average number of nucleotide differences per site (π), the proportion of segregating site (s), the minimum number of mutations per site (s*) and some other quantities were derived under the finite site models with and without rate variation among sites, where the finite site models include Jukes and Cantor's model, the equal-input model and Kimura's model. As a model of rate variation, the gamma distribution was used. The results indicate that if distribution parameter α is small, the effect of rate variation on these quantities are substantial, so that the estimates of θ based on the infinite site model are substantially underestimated, where θ = 4Nv, N is the effective population size and v is the mutation rate per site per generation. New methods for estimating θ are also presented, which are based on the finite site models with and without rate variation. Using these methods, underestimation can be corrected.  相似文献   

9.
Strobeck C 《Genetics》1987,117(1):149-153
Unbiased estimates of θ = 4Nµ in a random mating population can be based on either the number of alleles or the average number of nucleotide differences in a sample. However, if there is population structure and the sample is drawn from a single subpopulation, these two estimates of θ behave differently. The expected number of alleles in a sample is an increasing function of the migration rates, whereas the expected average number of nucleotide differences is shown to be independent of the migration rates and equal to 4NTµ for a general model of population structure which includes both the island model and the circular stepping-stone model. This contrast in the behavior of these two estimates of θ is used as the basis of a test for population subdivision. Using a Monte-Carlo simulation developed so that independent samples from a single subpopulation could be obtained quickly, this test is shown to be a useful method to determine if there is population subdivision.  相似文献   

10.
K. Misawa  F. Tajima 《Genetics》1997,147(4):1959-1964
Knowing the amount of DNA polymorphism is essential to understand the mechanism of maintaining DNA polymorphism in a natural population. The amount of DNA polymorphism can be measured by the average number of nucleotide differences per site (π), the proportion of segregating (polymorphic) site (s) and the minimum number of mutations per site (s*). Since the latter two quantities depend on the sample size, θ is often used as a measure of the amount of DNA polymorphism, where θ = 4Nμ, N is the effective population size and μ is the neutral mutation rate per site per generation. It is known that θ estimated from π, s and s* under the infinite site model can be biased when the mutation rate varies among sites. We have therefore developed new methods for estimating θ under the finite site model. Using computer simulations, it has been shown that the new methods give almost unbiased estimates even when the mutation rate varies among sites substantially. Furthermore, we have also developed new statistics for testing neutrality by modifying Tajima's D statistic. Computer simulations suggest that the new test statistics can be used even when the mutation rate varies among sites.  相似文献   

11.
Y. X. Fu 《Genetics》1994,138(4):1375-1386
Mutations resulting in segregating sites of a sample of DNA sequences can be classified by size and type and the frequencies of mutations of different sizes and types can be inferred from the sample. A framework for estimating the essential parameter θ = 4Nu utilizing the frequencies of mutations of various sizes and types is developed in this paper, where N is the effective size of a population and μ is mutation rate per sequence per generation. The framework is a combination of coalescent theory, general linear model and Monte-Carlo integration, which leads to two new estimators θ(ξ) and θ(η) as well as a general Watterson''s estimator θ(K) and a general Tajima''s estimator θ(π). The greatest strength of the framework is that it can be used under a variety of population models. The properties of the framework and the four estimators θ(K), θ(π), θ(ξ) and θ(η) are investigated under three important population models: the neutral Wright-Fisher model, the neutral model with recombination and the neutral Wright''s finite-islands model. Under all these models, it is shown that θ(ξ) is the best estimator among the four even when recombination rate or migration rate has to be estimated. Under the neutral Wright-Fisher model, it is shown that the new estimator θ(ξ) has a variance close to a lower bound of variances of all unbiased estimators of θ which suggests that θ(ξ) is a very efficient estimator.  相似文献   

12.
The genealogical relationship of human, chimpanzee, and gorilla varies along the genome. We develop a hidden Markov model (HMM) that incorporates this variation and relate the model parameters to population genetics quantities such as speciation times and ancestral population sizes. Our HMM is an analytically tractable approximation to the coalescent process with recombination, and in simulations we see no apparent bias in the HMM estimates. We apply the HMM to four autosomal contiguous human–chimp–gorilla–orangutan alignments comprising a total of 1.9 million base pairs. We find a very recent speciation time of human–chimp (4.1 ± 0.4 million years), and fairly large ancestral effective population sizes (65,000 ± 30,000 for the human–chimp ancestor and 45,000 ± 10,000 for the human–chimp–gorilla ancestor). Furthermore, around 50% of the human genome coalesces with chimpanzee after speciation with gorilla. We also consider 250,000 base pairs of X-chromosome alignments and find an effective population size much smaller than 75% of the autosomal effective population sizes. Finally, we find that the rate of transitions between different genealogies correlates well with the region-wide present-day human recombination rate, but does not correlate with the fine-scale recombination rates and recombination hot spots, suggesting that the latter are evolutionarily transient.  相似文献   

13.
One of the central problems in mathematical genetics is the inference of evolutionary parameters of a population (such as the mutation rate) based on the observed genetic types in a finite DNA sample. If the population model under consideration is in the domain of attraction of the classical Fleming-Viot process, such as the Wright-Fisher- or the Moran model, then the standard means to describe its genealogy is Kingman's coalescent. For this coalescent process, powerful inference methods are well-established. An important feature of the above class of models is, roughly speaking, that the number of offspring of each individual is small when compared to the total population size, and hence all ancestral collisions are binary only. Recently, more general population models have been studied, in particular in the domain of attraction of so-called generalised Lambda-Fleming-Viot processes, as well as their (dual) genealogies, given by the so-called Lambda-coalescents, which allow multiple collisions. Moreover, Eldon and Wakeley (Genetics 172:2621-2633, 2006) provide evidence that such more general coalescents might actually be more adequate to describe real populations with extreme reproductive behaviour, in particular many marine species. In this paper, we extend methods of Ethier and Griffiths (Ann Probab 15(2):515-545, 1987) and Griffiths and Tavaré (Theor Pop Biol 46:131-159, 1994a, Stat Sci 9:307-319, 1994b, Philos Trans Roy Soc Lond Ser B 344:403-410, 1994c, Math Biosci 12:77-98, 1995) to obtain a likelihood based inference method for general Lambda-coalescents. In particular, we obtain a method to compute (approximate) likelihood surfaces for the observed type probabilities of a given sample. We argue that within the (vast) family of Lambda-coalescents, the parametrisable sub-family of Beta(2 - alpha, alpha)-coalescents, where alpha in (1, 2], are of particular relevance. We illustrate our method using simulated datasets, thus obtaining maximum-likelihood estimators of mutation and demographic parameters.  相似文献   

14.
Wiuf C  Hein J 《Genetics》1999,151(3):1217-1228
In this article we discuss the ancestry of sequences sampled from the coalescent with recombination with constant population size 2N. We have studied a number of variables based on simulations of sample histories, and some analytical results are derived. Consider the leftmost nucleotide in the sequences. We show that the number of nucleotides sharing a most recent common ancestor (MRCA) with the leftmost nucleotide is approximately log(1 + 4N Lr)/4Nr when two sequences are compared, where L denotes sequence length in nucleotides, and r the recombination rate between any two neighboring nucleotides per generation. For larger samples, the number of nucleotides sharing MRCA with the leftmost nucleotide decreases and becomes almost independent of 4N Lr. Further, we show that a segment of the sequences sharing a MRCA consists in mean of 3/8Nr nucleotides, when two sequences are compared, and that this decreases toward 1/4Nr nucleotides when the whole population is sampled. A measure of the correlation between the genealogies of two nucleotides on two sequences is introduced. We show analytically that even when the nucleotides are separated by a large genetic distance, but share MRCA, the genealogies will show only little correlation. This is surprising, because the time until the two nucleotides shared MRCA is reciprocal to the genetic distance. Using simulations, the mean time until all positions in the sample have found a MRCA increases logarithmically with increasing sequence length and is considerably lower than a theoretically predicted upper bound. On the basis of simulations, it turns out that important properties of the coalescent with recombinations of the whole population are reflected in the properties of a sample of low size.  相似文献   

15.
The inference of demographic history from genome data is hindered by a lack of efficient computational approaches. In particular, it has proved difficult to exploit the information contained in the distribution of genealogies across the genome. We have previously shown that the generating function (GF) of genealogies can be used to analytically compute likelihoods of demographic models from configurations of mutations in short sequence blocks (Lohse et al. 2011). Although the GF has a simple, recursive form, the size of such likelihood calculations explodes quickly with the number of individuals and applications of this framework have so far been mainly limited to small samples (pairs and triplets) for which the GF can be written by hand. Here we investigate several strategies for exploiting the inherent symmetries of the coalescent. In particular, we show that the GF of genealogies can be decomposed into a set of equivalence classes that allows likelihood calculations from nontrivial samples. Using this strategy, we automated blockwise likelihood calculations for a general set of demographic scenarios in Mathematica. These histories may involve population size changes, continuous migration, discrete divergence, and admixture between multiple populations. To give a concrete example, we calculate the likelihood for a model of isolation with migration (IM), assuming two diploid samples without phase and outgroup information. We demonstrate the new inference scheme with an analysis of two individual butterfly genomes from the sister species Heliconius melpomene rosina and H. cydno.  相似文献   

16.
H. W. Deng  Y. X. Fu 《Genetics》1996,144(3):1271-1281
Multiple hits at some sites of human mitochondrial DNA sequences suggest that the commonly assumed infinite-sites model can be violated. Under the neutral Wright-Fisher model without recombination and population subdivision, we investigated, by computer simulations, the effect of multiple hits on the estimation of the essential parameter θ = 4N(e)μ by FU's UPBLUE procedure. We found that with moderate mutation rate heterogeneity, UPBLUE performs very well in terms of unbiasness and efficiency. Under extreme mutation rate heterogeneity, if sample size is reasonably large (e.g., >60), UPBLUE is still very satisfactory; otherwise we developed a new correction equation. Given knowledge of the degree of mutation rate heterogeneity, the performance of UPBLUE with the new correction equation was tested to be fairly satisfactory: there is almost no bias and the sampling variance is only slightly higher than the theoretical minimum variance. Thus, with an appropriate correction, UPBLUE is relatively robust to the multiple hits. In genealogies reconstructed by UPGMA, we found that the total length of branches directly linked to the tips is underestimated, and those far away tend to be overestimated, while the total length of all branches is not biased.  相似文献   

17.
A Likelihood Approach to Populations Samples of Microsatellite Alleles   总被引:4,自引:2,他引:2  
R. Nielsen 《Genetics》1997,146(2):711-716
This paper presents a likelihood approach to population samples of microsatellite alleles. A Markov chain recursion method previously published by GRIFFITHS and TAVARE is applied to estimate the likelihood function under different models of microsatellite evolution. The method presented can be applied to estimate a fundamental population genetics parameter θ as well as parameters of the mutational model. The new likelihood estimator provides a better estimator of θ in terms of the mean square error than previous approaches. Furthermore, it is demonstrated how the method may easily be applied to test models of microsatellite evolution. In particular it is shown how to compare a one-step model of microsatellite evolution to a multi-step model by a likelihood ratio test.  相似文献   

18.
19.
Epithelio–mesenchymal interactions during kidney organogenesis are disrupted in integrin α8β1-deficient mice. However, the known ligands for integrin α8β1—fibronectin, vitronectin, and tenascin-C—are not appropriately localized to mediate all α8β1 functions in the kidney. Using a method of general utility for determining the distribution of unknown integrin ligands in situ and biochemical characterization of these ligands, we identified osteopontin (OPN) as a ligand for α8β1. We have coexpressed the extracellular domains of the mouse α8 and β1 integrin subunits as a soluble heterodimer with one subunit fused to alkaline phosphatase (AP) and have used the α8β1-AP chimera as a histochemical reagent on sections of mouse embryos. Ligand localization with α8β1-AP in developing bone and kidney was observed to be overlapping with the distribution of OPN. In “far Western” blots of mouse embryonic protein extracts, bands were detected with sizes corresponding to fibronectin, vitronectin, and unknown proteins, one of which was identical to the size of OPN. In a solid-phase binding assay we demonstrated that purified OPN binds specifically to α8β1-AP. Cell adhesion assays using K562 cells expressing α8β1 were used to confirm this result. Together with a recent report that anti-OPN antibodies disrupt kidney morphogenesis, our results suggest that interactions between OPN and integrin α8β1 may help regulate kidney development and other morphogenetic processes.  相似文献   

20.
Quantifying epidemiological dynamics is crucial for understanding and forecasting the spread of an epidemic. The coalescent and the birth-death model are used interchangeably to infer epidemiological parameters from the genealogical relationships of the pathogen population under study, which in turn are inferred from the pathogen genetic sequencing data. To compare the performance of these widely applied models, we performed a simulation study. We simulated phylogenetic trees under the constant rate birth-death model and the coalescent model with a deterministic exponentially growing infected population. For each tree, we re-estimated the epidemiological parameters using both a birth-death and a coalescent based method, implemented as an MCMC procedure in BEAST v2.0. In our analyses that estimate the growth rate of an epidemic based on simulated birth-death trees, the point estimates such as the maximum a posteriori/maximum likelihood estimates are not very different. However, the estimates of uncertainty are very different. The birth-death model had a higher coverage than the coalescent model, i.e. contained the true value in the highest posterior density (HPD) interval more often (2–13% vs. 31–75% error). The coverage of the coalescent decreases with decreasing basic reproductive ratio and increasing sampling probability of infecteds. We hypothesize that the biases in the coalescent are due to the assumption of deterministic rather than stochastic population size changes. Both methods performed reasonably well when analyzing trees simulated under the coalescent. The methods can also identify other key epidemiological parameters as long as one of the parameters is fixed to its true value. In summary, when using genetic data to estimate epidemic dynamics, our results suggest that the birth-death method will be less sensitive to population fluctuations of early outbreaks than the coalescent method that assumes a deterministic exponentially growing infected population.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号