首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Maximum Likelihood Estimation of Population Parameters   总被引:10,自引:5,他引:5       下载免费PDF全文
Y. X. Fu  W. H. Li 《Genetics》1993,134(4):1261-1270
One of the most important parameters in population genetics is θ = 4N(e)μ where N(e) is the effective population size and μ is the rate of mutation per gene per generation. We study two related problems, using the maximum likelihood method and the theory of coalescence. One problem is the potential improvement of accuracy in estimating the parameter θ over existing methods and the other is the estimation of parameter λ which is the ratio of two θ's. The minimum variances of estimates of the parameter θ are derived under two idealized situations. These minimum variances serve as the lower bounds of the variances of all possible estimates of θ in practice. We then show that Watterson's estimate of θ based on the number of segregating sites is asymptotically an optimal estimate of θ. However, for a finite sample of sequences, substantial improvement over Watterson's estimate is possible when θ is large. The maximum likelihood estimate of λ = θ(1)/θ(2) is obtained and the properties of the estimate are discussed.  相似文献   

2.
A Coalescent Estimator of the Population Recombination Rate   总被引:42,自引:10,他引:32       下载免费PDF全文
J. Hey  J. Wakeley 《Genetics》1997,145(3):833-846
Population genetic models often use a population recombination parameter 4Nc, where N is the effective population size and c is the recombination rate per generation. In many ways 4Nc is comparable to 4Nu, the population mutation rate. Both combine genome level and population level processes, and together they describe the rate of production of genetic variation in a population. However, 4Nc is more difficult to estimate. For a population sample of DNA sequences, historical recombination can only be detected if polymorphisms exist, and even then most recombination events are not detectable. This paper describes an estimator of 4Nc, hereafter designated γ (gamma), that was developed using a coalescent model for a sample of four DNA sequences with recombination. The reliability of γ was assessed using multiple coalescent simulations. In general γ has low to moderate bias, and the reliability of γ is comparable, though less, than that for a widely used estimator of 4Nu. If there exists an independent estimate of the recombination rate (per generation, per base pair), γ can be used to estimate the effective population size or the neutral mutation rate.  相似文献   

3.
M. K. Kuhner  J. Yamato    J. Felsenstein 《Genetics》1995,140(4):1421-1430
We present a new way to make a maximum likelihood estimate of the parameter 4N(e)μ (effective population size times mutation rate per site, or θ) based on a population sample of molecular sequences. We use a Metropolis-Hastings Markov chain Monte Carlo method to sample genealogies in proportion to the product of their likelihood with respect to the data and their prior probability with respect to a coalescent distribution. A specific value of θ must be chosen to generate the coalescent distribution, but the resulting trees can be used to evaluate the likelihood at other values of θ, generating a likelihood curve. This procedure concentrates sampling on those genealogies that contribute most of the likelihood, allowing estimation of meaningful likelihood curves based on relatively small samples. The method can potentially be extended to cases involving varying population size, recombination, and migration.  相似文献   

4.
A Phylogenetic Estimator of Effective Population Size or Mutation Rate   总被引:17,自引:7,他引:10       下载免费PDF全文
Y. X. Fu 《Genetics》1994,136(2):685-692
A new estimator of the essential parameter θ = 4N(e)μ from DNA polymorphism data is developed under the neutral Wright-Fisher model without recombination and population subdivision, where N(e) is the effective population size and μ is the mutation rate per locus per generation. The new estimator has a variance only slightly larger than the minimum variance of all possible unbiased estimators of the parameter and is substantially smaller than that of any existing estimator. The high efficiency of the new estimator is achieved by making full use of phylogenetic information in a sample of DNA sequences from a population. An example of estimating θ by the new method is presented using the mitochondrial sequences from an American Indian population.  相似文献   

5.
Y. X. Fu 《Genetics》1994,138(4):1375-1386
Mutations resulting in segregating sites of a sample of DNA sequences can be classified by size and type and the frequencies of mutations of different sizes and types can be inferred from the sample. A framework for estimating the essential parameter θ = 4Nu utilizing the frequencies of mutations of various sizes and types is developed in this paper, where N is the effective size of a population and μ is mutation rate per sequence per generation. The framework is a combination of coalescent theory, general linear model and Monte-Carlo integration, which leads to two new estimators θ(ξ) and θ(η) as well as a general Watterson''s estimator θ(K) and a general Tajima''s estimator θ(π). The greatest strength of the framework is that it can be used under a variety of population models. The properties of the framework and the four estimators θ(K), θ(π), θ(ξ) and θ(η) are investigated under three important population models: the neutral Wright-Fisher model, the neutral model with recombination and the neutral Wright''s finite-islands model. Under all these models, it is shown that θ(ξ) is the best estimator among the four even when recombination rate or migration rate has to be estimated. Under the neutral Wright-Fisher model, it is shown that the new estimator θ(ξ) has a variance close to a lower bound of variances of all unbiased estimators of θ which suggests that θ(ξ) is a very efficient estimator.  相似文献   

6.
In this paper we present a method for estimating population divergence times by maximum likelihood in models without mutation. The maximum-likelihood estimator is compared to a commonly applied estimator based on Wright's FST statistic. Simulations suggest that the maximum-likelihood estimator is less biased and has a lower variance than the FST-based estimator. The maximum-likelihood estimator provides a statistical framework for the analysis of population history given genetic data. We demonstrate how maximum-likelihood estimates of the branching pattern of divergence of multiple populations may be obtained. We also describe how the method may be applied to test hypotheses such as whether populations have maintained equal population sizes. We illustrate the method by applying it to two previously published sets of human restriction fragment length polymorphism (RFLP) data.  相似文献   

7.
The Coalescent Process with Selfing   总被引:9,自引:4,他引:5       下载免费PDF全文
M. Nordborg  P. Donnelly 《Genetics》1997,146(3):1185-1195
A method for estimating the selfing rate using DNA sequence data was recently proposed by Milligan. Unfortunately, a number of errors make interpretation of his results problematic. In the present paper we first show how the usual coalescent process can be adapted to models that include selfing, and then use this result to find moment estimators as well as the likelihood surface for the selfing rate, s, and the scaled mutation rate, θ. We conclude that, regardless of the method used, large sample sizes are necessary to estimate s with any degree of certainty, and that the estimate is always highly sensitive to recent changes in the true value.  相似文献   

8.
H. Bovenhuis  J. I. Weller 《Genetics》1994,137(1):267-280
Maximum likelihood methodology was used to estimate effects of both a marker gene and a linked quantitative trait locus (QTL) on quantitative traits in a segregating population. Two alleles were assumed for the QTL. In addition to the effects of genotypes at both loci on the mean of the quantitative trait, recombination frequency between the loci, frequency of the QTL alleles and the residual standard deviation were also estimated. Thus six parameters were estimated in addition to the marker genotype means. The statistical model was tested on simulated data, and used to estimate direct and linked effects of the milk protein genes, β-lactoglobulin, κcasein, and β-casein, on milk, fat, and protein production and fat and protein percent in the Dutch dairy cattle population. β-Lactoglobulin had significant direct effects on milk yield and fat percent. κ-Casein had significant direct effects on milk yield, protein percent and fat yield. β-Casein had significant direct effects on milk yield, fat and protein percent and fat and protein yield. Linked QTL with significant effects on fat percent were found for κ-casein and β-casein. Since the β-casein and κ-casein genes are closely linked, it is likely that the same QTL was detected for those two markers. Further, a QTL with a significant effect on fat yield was found to be linked to κ-casein and a QTL with a significant effect on protein yield was linked to β-lactoglobulin.  相似文献   

9.
Strobeck C 《Genetics》1987,117(1):149-153
Unbiased estimates of θ = 4Nµ in a random mating population can be based on either the number of alleles or the average number of nucleotide differences in a sample. However, if there is population structure and the sample is drawn from a single subpopulation, these two estimates of θ behave differently. The expected number of alleles in a sample is an increasing function of the migration rates, whereas the expected average number of nucleotide differences is shown to be independent of the migration rates and equal to 4NTµ for a general model of population structure which includes both the island model and the circular stepping-stone model. This contrast in the behavior of these two estimates of θ is used as the basis of a test for population subdivision. Using a Monte-Carlo simulation developed so that independent samples from a single subpopulation could be obtained quickly, this test is shown to be a useful method to determine if there is population subdivision.  相似文献   

10.
Analyses of evolution and maintenance of quantitative genetic variation depend on the mutation models assumed. Currently two polygenic mutation models have been used in theoretical analyses. One is the random walk mutation model and the other is the house-of-cards mutation model. Although in the short term the two models give similar results for the evolution of neutral genetic variation within and between populations, the predictions of the changes of the variation are qualitatively different in the long term. In this paper a more general mutation model, called the regression mutation model, is proposed to bridge the gap of the two models. The model regards the regression coefficient, γ, of the effect of an allele after mutation on the effect of the allele before mutation as a parameter. When γ = 1 or 0, the model becomes the random walk model or the house-of-cards model, respectively. The additive genetic variances within and between populations are formulated for this mutation model, and some insights are gained by looking at the changes of the genetic variances as γ changes. The effects of γ on the statistical test of selection for quantitative characters during macroevolution are also discussed. The results suggest that the random walk mutation model should not be interpreted as a null hypothesis of neutrality for testing against alternative hypotheses of selection during macroevolution because it can potentially allocate too much variation for the change of population means under neutrality.  相似文献   

11.
Recombination is a fundamental evolutionary force. Therefore the population recombination rate ρ plays an important role in the analysis of population genetic data; however, it is notoriously difficult to estimate. This difficulty applies both to the accuracy of commonly used estimates and to the computational efforts required to obtain them. Some particularly popular methods are based on approximations to the likelihood. They require considerably less computational efforts than the full-likelihood method with not much less accuracy. Nevertheless, the computation of these approximate estimates can still be very time consuming, in particular when the sample size is large. Although auxiliary quantities for composite likelihood estimates can be computed in advance and stored in tables, these tables need to be recomputed if either the sample size or the mutation rate θ changes. Here we introduce a new method based on regression combined with boosting as a model selection technique. For large samples, it requires much less computational effort than other approximate methods, while providing similar levels of accuracy. Notably, for a sample of hundreds or thousands of individuals, the estimate of ρ using regression can be obtained on a single personal computer within a couple of minutes while other methods may need a couple of days or months (or even years). When the sample size is smaller (n ≤ 50), our new method remains computational efficient but produces biased estimates. We expect the new estimates to be helpful when analyzing large samples and/or many loci with possibly different mutation rates.  相似文献   

12.
Substitution Processes in Molecular Evolution. III. Deleterious Alleles   总被引:7,自引:4,他引:3  
J. H. Gillespie 《Genetics》1994,138(3):943-952
The substitution processes for various models of deleterious alleles are examined using computer simulations and mathematical analyses. Most of the work focuses on the house-of-cards model, which is a popular model of deleterious allele evolution. The rate of substitution is shown to be a concave function of the strength of selection as measured by α = 2Nσ, where N is the population size and σ is the standard deviation of fitness. For α<1, the house-of-cards model is essentially a neutral model; for α>4, the model ceases to evolve. The stagnation for large α may be understood by appealing to the theory of records. The house-of-cards model evolves to a state where the vast majority of all mutations are deleterious, but precisely one-half of those mutations that fix are deleterious (the other half are advantageous). Thus, the model is not a model of exclusively deleterious evolution as is frequently claimed. It is argued that there are no biologically reasonable models of molecular evolution where the vast majority of all substitutions are deleterious. Other models examined include the exponential and gamma shift models, the Hartl-Dykhuizen-Dean (HDD) model, and the optimum model. Of all those examined, only the optimum and HDD models appear to be reasonable candidates for silent evolution. None of the models are viewed as good candidates for protein evolution, as none are both biologically reasonable and exhibit the variability in substitutions commonly observed in protein sequence data.  相似文献   

13.
Interruptions of microsatellite sequences impact genome evolution and can alter disease manifestation. However, human polymorphism levels at interrupted microsatellites (iMSs) are not known at a genome-wide scale, and the pathways for gaining interruptions are poorly understood. Using the 1000 Genomes Phase-1 variant call set, we interrogated mono-, di-, tri-, and tetranucleotide repeats up to 10 units in length. We detected ∼26,000–40,000 iMSs within each of four human population groups (African, European, East Asian, and American). We identified population-specific iMSs within exonic regions, and discovered that known disease-associated iMSs contain alleles present at differing frequencies among the populations. By analyzing longer microsatellites in primate genomes, we demonstrate that single interruptions result in a genome-wide average two- to six-fold reduction in microsatellite mutability, as compared with perfect microsatellites. Centrally located interruptions lowered mutability dramatically, by two to three orders of magnitude. Using a biochemical approach, we tested directly whether the mutability of a specific iMS is lower because of decreased DNA polymerase strand slippage errors. Modeling the adenomatous polyposis coli tumor suppressor gene sequence, we observed that a single base substitution interruption reduced strand slippage error rates five- to 50-fold, relative to a perfect repeat, during synthesis by DNA polymerases α, β, or η. Computationally, we demonstrate that iMSs arise primarily by base substitution mutations within individual human genomes. Our biochemical survey of human DNA polymerase α, β, δ, κ, and η error rates within certain microsatellites suggests that interruptions are created most frequently by low fidelity polymerases. Our combined computational and biochemical results demonstrate that iMSs are abundant in human genomes and are sources of population-specific genetic variation that may affect genome stability. The genome-wide identification of iMSs in human populations presented here has important implications for current models describing the impact of microsatellite polymorphisms on gene expression.  相似文献   

14.
K. Misawa  F. Tajima 《Genetics》1997,147(4):1959-1964
Knowing the amount of DNA polymorphism is essential to understand the mechanism of maintaining DNA polymorphism in a natural population. The amount of DNA polymorphism can be measured by the average number of nucleotide differences per site (π), the proportion of segregating (polymorphic) site (s) and the minimum number of mutations per site (s*). Since the latter two quantities depend on the sample size, θ is often used as a measure of the amount of DNA polymorphism, where θ = 4Nμ, N is the effective population size and μ is the neutral mutation rate per site per generation. It is known that θ estimated from π, s and s* under the infinite site model can be biased when the mutation rate varies among sites. We have therefore developed new methods for estimating θ under the finite site model. Using computer simulations, it has been shown that the new methods give almost unbiased estimates even when the mutation rate varies among sites substantially. Furthermore, we have also developed new statistics for testing neutrality by modifying Tajima's D statistic. Computer simulations suggest that the new test statistics can be used even when the mutation rate varies among sites.  相似文献   

15.
F. Tajima 《Genetics》1996,143(3):1457-1465
The expectations of the average number of nucleotide differences per site (π), the proportion of segregating site (s), the minimum number of mutations per site (s*) and some other quantities were derived under the finite site models with and without rate variation among sites, where the finite site models include Jukes and Cantor's model, the equal-input model and Kimura's model. As a model of rate variation, the gamma distribution was used. The results indicate that if distribution parameter α is small, the effect of rate variation on these quantities are substantial, so that the estimates of θ based on the infinite site model are substantially underestimated, where θ = 4Nv, N is the effective population size and v is the mutation rate per site per generation. New methods for estimating θ are also presented, which are based on the finite site models with and without rate variation. Using these methods, underestimation can be corrected.  相似文献   

16.
Zeng K  Fu YX  Shi S  Wu CI 《Genetics》2006,174(3):1431-1439
By comparing the low-, intermediate-, and high-frequency parts of the frequency spectrum, we gain information on the evolutionary forces that influence the pattern of polymorphism in population samples. We emphasize the high-frequency variants on which positive selection and negative (background) selection exhibit different effects. We propose a new estimator of θ (the product of effective population size and neutral mutation rate), θL, which is sensitive to the changes in high-frequency variants. The new θL allows us to revise Fay and Wu's H-test by normalization. To complement the existing statistics (the H-test and Tajima's D-test), we propose a new test, E, which relies on the difference between θL and Watterson's θW. We show that this test is most powerful in detecting the recovery phase after the loss of genetic diversity, which includes the postselective sweep phase. The sensitivities of these tests to (or robustness against) background selection and demographic changes are also considered. Overall, D and H in combination can be most effective in detecting positive selection while being insensitive to other perturbations. We thus propose a joint test, referred to as the DH test. Simulations indicate that DH is indeed sensitive primarily to directional selection and no other driving forces.  相似文献   

17.
The evolution of population dynamics in a stochastic environment is analysed under a general form of density-dependence with genetic variation in r and K, the intrinsic rate of increase and carrying capacity in the average environment, and in σe2, the environmental variance of population growth rate. The continuous-time model assumes a large population size and a stationary distribution of environments with no autocorrelation. For a given population density, N, and genotype frequency, p, the expected selection gradient is always towards an increased population growth rate, and the expected fitness of a genotype is its Malthusian fitness in the average environment minus the covariance of its growth rate with that of the population. Long-term evolution maximizes the expected value of the density-dependence function, averaged over the stationary distribution of N. In the θ-logistic model, where density dependence of population growth is a function of Nθ, long-term evolution maximizes E[Nθ]=[1−σe2/(2r)]Kθ. While σe2 is always selected to decrease, r and K are always selected to increase, implying a genetic trade-off among them. By contrast, given the other parameters, θ has an intermediate optimum between 1.781 and 2 corresponding to the limits of high or low stochasticity.  相似文献   

18.
S. Kumar 《Genetics》1996,143(1):537-548
Maximum likelihood methods were used to study the differences in substitution rates among the four nucleotides and among different nucleotide sites in mitochondrial protein-coding genes of vertebrates. In the 1st+2nd codon position data, the frequency of nucleotide G is negatively correlated with evolutionary rates of genes, substitution rates vary substantially among sites, and the transition/transversion rate bias (R) is two to five times larger than that expected at random. Generally, largest transition biases and greatest differences in substitution rates among sites are found in the highly conserved genes. The 3rd positions in placental mammal genes exhibit strong nucleotide composition biases and the transitional rates exceed transversional rates by one to two orders of magnitude. Tamura-Nei and Hasegawa-Kishino-Yano models with gamma distributed variable rates among sites (gamma parameter, α) adequately describe the nucleotide substitution process in 1st+2nd position data. In these data, ignoring differences in substitution rates among sites leads to largest biases while estimating substitution rates. Kimura's two-parameter model with variable-rates among sites performs satisfactorily in likelihood estimation of R, α, and overall amount of evolution for 1st+2nd position data. It can also be used to estimate pairwise distances with appropriate values of α for a majority of genes.  相似文献   

19.
Metagenomic sequencing projects from environments dominated by a small number of species produce genome-wide population samples. We present a two-site composite likelihood estimator of the scaled recombination rate, ρ = 2Nec, that operates on metagenomic assemblies in which each sequenced fragment derives from a different individual. This new estimator properly accounts for sequencing error, as quantified by per-base quality scores, and missing data, as inferred from the placement of reads in a metagenomic assembly. We apply our estimator to data from a sludge metagenome project to demonstrate how this method will elucidate the rates of exchange of genetic material in natural microbial populations. Surprisingly, for a fixed amount of sequencing, this estimator has lower variance than similar methods that operate on more traditional population genetic samples of comparable size. In addition, we can infer variation in recombination rate across the genome because metagenomic projects sample genetic diversity genome-wide, not just at particular loci. The method itself makes no assumption specific to microbial populations, opening the door for application to any mixed population sample where the number of individuals sampled is much greater than the number of fragments sequenced.  相似文献   

20.
Reynolds J  Weir BS  Cockerham CC 《Genetics》1983,105(3):767-779
A distance measure for populations diverging by drift only is based on the coancestry coefficient θ, and three estimators of the distance D = -ln(1 - θ) are constructed for multiallelic, multilocus data. Simulations of a monoecious population mating at random showed that a weighted ratio of single-locus estimators performed better than an unweighted average or a least squares estimator. Jackknifing over loci provided satisfactory variance estimates of distance values. In the drift situation, in which mutation is excluded, the weighted estimator of D appears to be a better measure of distance than others that have appeared in the literature.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号