期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Accuracy of coalescent likelihood estimates: do we need more sites, more sequences, or more loci?

Felsenstein J 《Molecular biology and evolution》2006,23(3):691-700

A computer simulation study has been made of the accuracy of estimates of Theta = 4Nemu from a sample from a single isolated population of finite size. The accuracies turn out to be well predicted by a formula developed by Fu and Li, who used optimistic assumptions. Their formulas are restated in terms of accuracy, defined here as the reciprocal of the squared coefficient of variation. This should be proportional to sample size when the entities sampled provide independent information. Using these formulas for accuracy, the sampling strategy for estimation of Theta can be investigated. Two models for cost have been used, a cost-per-base model and a cost-per-read model. The former would lead us to prefer to have a very large number of loci, each one base long. The latter, which is more realistic, causes us to prefer to have one read per locus and an optimum sample size which declines as costs of sampling organisms increase. For realistic values, the optimum sample size is 8 or fewer individuals. This is quite close to the results obtained by Pluzhnikov and Donnelly for a cost-per-base model, evaluating other estimators of Theta. It can be understood by considering that the resources spent collecting larger samples prevent us from considering more loci. An examination of the efficiency of Watterson's estimator of Theta was also made, and it was found to be reasonably efficient when the number of mutants per generation in the sequence in the whole population is less than 2.5. 相似文献

2.

Mes TH 《Molecular ecology》2003,12(6):1555-1566

Mitochondrial ND4 sequences of populations of four species of parasitic nematodes of livestock were subjected to demographic analyses. Deviation from selective neutrality was detectable using the frequency spectrum of segregating sites and highly negative neutrality statistics. However, the mitochondrial data sets do not comply with the infinite-sites model that underlies these tests, and as a consequence, it was not established whether these features are solely a result of population expansion, or whether aspects of the molecular evolution of these mitochondrial regions are also involved. Coalescent analyses based on Fu's Fs neutrality test, which incorporated estimates of rate heterogeneity, the transition-transversion ratio and nucleotide bias, as well as analyses that are fairly robust to deviations from the infinite-sites model supported population expansion. Also analyses that do not depend on the infinite-sites model suggested historical population expansion of these nematodes. The very similar time since expansion, the absence of signatures of positive selection in ND4 and the logical association with human demography imply that selective sweeps of mitochondrial variants are less probable, and that expansion is the most likely scenario for the parasitic nematodes of livestock. The methods used to characterize the expansion have different assumptions and emphasize different aspects of expansions. The resulting restrictions on the interpretation of expansions are outlined. 相似文献

3.

Robustness of coalescent estimators to between-lineage mutation rate variation

Kuhner MK 《Molecular biology and evolution》2006,23(12):2355-2360

Data from HIV and from human neoplastic cells can show substantial between-lineage mutation rate variation even within a single population. Such variation may affect estimators of population quantities such as Theta = 4N(e)mu. Using simulated DNA data, I measured the effect of rate variation on recovery of Theta by the summary-statistic estimator of Watterson (Watterson GA. 1975. On the number of segregating sites in genetical systems without recombination. Theor Popul Biol. 7:256-276) and the coalescent maximum likelihood algorithm LAMARC (Kuhner MK. 2006. LAMARC 2.0: maximum likelihood and Bayesian estimation of population parameters. Bioinformatics. Advance Access doi: 10.1093/bioinformatics/btk051). Watterson's estimator showed a downward bias, as expected, with high values of Theta. LAMARC's mean estimate was accurate for all tested values of Theta and rate variation except for a downward bias when rate variation was maximal (i.e., the slow rate was zero). LAMARC had consistently narrower confidence intervals (CIs) than Watterson's estimator. Both methods tended to reject the truth too often when rate variation was 8x or greater and independent among branches, as well as when variation was 4x or greater and correlated among branches. In the case of Watterson's estimate, this excess rejection was fully attributable to variation among genealogies in the amount of total branch length associated with the fast and slow rates. However, in the case of LAMARC, some excess rejection was still observed even when between-genealogy variation was taken into account. Both estimators are robust to modest rate variation; however, their use should be coupled with a statistical test to rule out extreme rate variation as the resulting CIs may not be reliable. 相似文献

4.

John Novembre Montgomery Slatkin 《Evolution; international journal of organic evolution》2009,63(11):2914-2925

Estimating dispersal distances from population genetic data provides an important alternative to logistically taxing methods for directly observing dispersal. Although methods for estimating dispersal rates between a modest number of discrete demes are well developed, methods of inference applicable to \"isolation-by-distance\" models are much less established. Here, we present a method for estimating ρσ², the product of population density (ρ) and the variance of the dispersal displacement distribution (σ²). The method is based on the assumption that low-frequency alleles are identical by descent. Hence, the extent of geographic clustering of such alleles, relative to their frequency in the population, provides information about ρσ². We show that a novel likelihood-based method can infer this composite parameter with a modest bias in a lattice model of isolation-by-distance. For calculating the likelihood, we use an importance sampling approach to average over the unobserved intraallelic genealogies, where the intraallelic genealogies are modeled as a pure birth process. The approach also leads to a likelihood-ratio test of isotropy of dispersal, that is, whether dispersal distances on two axes are different. We test the performance of our methods using simulations of new mutations in a lattice model and illustrate its use with a dataset from Arabidopsis thaliana . 相似文献

5.

Abdo Z Crandall KA Joyce P 《Molecular ecology》2004,13(4):837-851

A plethora of statistical models have recently been developed to estimate components of population genetic history. Very few of these methods, however, have been adequately evaluated for their performance in accurately estimating population genetic parameters of interest. In this paper, we continue a research program of evaluation of population genetic methods through computer simulation. Specifically, we examine the software MIGRATEE-N 1.6.8 and test the accuracy of this software to estimate genetic diversity (Theta), migration rates, and confidence intervals. We simulated nucleotide sequence data under a neutral coalescent model with lengths of 500 bp and 1000 bp, and with three different per site Theta values of (0.00025, 0.0025, 0.025) crossed with four different migration rates (0.0000025, 0.025, 0.25, 2.5) to construct 1000 evolutionary trees per-combination per-sequence-length. We found that while MIGRATEE-N 1.6.8 performs reasonably well in estimating genetic diversity (Theta), it does poorly at estimating migration rates and the confidence intervals associated with them. We recommend researchers use this software with caution under conditions similar to those used in this evaluation. 相似文献

6.

Beyond accept-reject sampling

Perron F 《Biometrika》1999,86(4):803-813

相似文献

7.

Recombination estimation under complex evolutionary models with the coalescent composite-likelihood method

Carvajal-Rodríguez A Crandall KA Posada D 《Molecular biology and evolution》2006,23(4):817-827

The composite-likelihood estimator (CLE) of the population recombination rate considers only sites with exactly two alleles under a finite-sites mutation model (McVean, G. A. T., P. Awadalla, and P. Fearnhead. 2002. A coalescent-based method for detecting and estimating recombination from gene sequences. Genetics 160:1231-1241). While in such a model the identity of alleles is not considered, the CLE has been shown to be robust to minor misspecification of the underlying mutational model. However, there are many situations where the putative mutation and demographic history can be quite complex. One good example is rapidly evolving pathogens, like HIV-1. First we evaluated the performance of the CLE and the likelihood permutation test (LPT) under more complex, realistic models, including a general time reversible (GTR) substitution model, rate heterogeneity among sites (Gamma), positive selection, population growth, population structure, and noncontemporaneous sampling. Second, we relaxed some of the assumptions of the CLE allowing for a four-allele, GTR + Gamma model in an attempt to use the data more efficiently. Through simulations and the analysis of real data, we concluded that the CLE is robust to severe misspecifications of the substitution model, but underestimates the recombination rate in the presence of exponential growth, population mixture, selection, or noncontemporaneous sampling. In such cases, the use of more complex models slightly increases performance in some occasions, especially in the case of the LPT. Thus, our results provide for a more robust application of the estimation of recombination rates. 相似文献

8.

Efficient Strategies for Calculating Blockwise Likelihoods Under the Coalescent

Konrad Lohse Martin Chmelik Simon H. Martin Nicholas H. Barton 《Genetics》2016,202(2):775-786

The inference of demographic history from genome data is hindered by a lack of efficient computational approaches. In particular, it has proved difficult to exploit the information contained in the distribution of genealogies across the genome. We have previously shown that the generating function (GF) of genealogies can be used to analytically compute likelihoods of demographic models from configurations of mutations in short sequence blocks (). Although the GF has a simple, recursive form, the size of such likelihood calculations explodes quickly with the number of individuals and applications of this framework have so far been mainly limited to small samples (pairs and triplets) for which the GF can be written by hand. Here we investigate several strategies for exploiting the inherent symmetries of the coalescent. In particular, we show that the GF of genealogies can be decomposed into a set of equivalence classes that allows likelihood calculations from nontrivial samples. Using this strategy, we automated blockwise likelihood calculations for a general set of demographic scenarios in Mathematica. These histories may involve population size changes, continuous migration, discrete divergence, and admixture between multiple populations. To give a concrete example, we calculate the likelihood for a model of isolation with migration (IM), assuming two diploid samples without phase and outgroup information. We demonstrate the new inference scheme with an analysis of two individual butterfly genomes from the sister species Heliconius melpomene rosina and H. cydno. 相似文献

9.

Modelling competition and hybridization between native cutthroat trout and nonnative rainbow and hybrid trout

《Journal of biological dynamics》2013,7(2):158-175

Native salmonid fish have been displaced worldwide by nonnatives through hybridization, competition, and predation, but the dynamics of these factors are poorly understood. We apply stochastic Lotka–Volterra models to the displacement of cutthroat trout by rainbow/hybrid trout in the Snake River, Idaho, USA. Cutthroat trout are susceptible to hybridization in the river but are reproductively isolated in tributaries via removal of migratory rainbow/hybrid spawners at weirs. Based on information-theoretic analysis, population data provide evidence that hybridization was the primary mechanism for cutthroat trout displacement in the first 17 years of the invasion. However, under some parameter values, the data provide evidence for a model in which interaction occurs among fish from both river and tributary subpopulations. This situation is likely to occur when tributary-spawned cutthroat trout out-migrate to the river as fry. The resulting competition with rainbow/hybrid trout can result in the extinction of cutthroat trout even when reproductive segregation is maintained. 相似文献

10.

Roberto Benedetti Thomas Suesse Federica Piersimoni 《Biometrical journal. Biometrische Zeitschrift》2020,62(6):1494-1507

Maximum likelihood estimation of the model parameters for a spatial population based on data collected from a survey sample is usually straightforward when sampling and non-response are both non-informative, since the model can then usually be fitted using the available sample data, and no allowance is necessary for the fact that only a part of the population has been observed. Although for many regression models this naive strategy yields consistent estimates, this is not the case for some models, such as spatial auto-regressive models. In this paper, we show that for a broad class of such models, a maximum marginal likelihood approach that uses both sample and population data leads to more efficient estimates since it uses spatial information from sampled as well as non-sampled units. Extensive simulation experiments based on two well-known data sets are used to assess the impact of the spatial sampling design, the auto-correlation parameter and the sample size on the performance of this approach. When compared to some widely used methods that use only sample data, the results from these experiments show that the maximum marginal likelihood approach is much more precise. 相似文献

11.

总被引：10，自引：0，他引：10

Degnan JH Salter LA 《Evolution; international journal of organic evolution》2005,59(1):24-37

Under the coalescent model for population divergence, lineage sorting can cause considerable variability in gene trees generated from any given species tree. In this paper, we derive a method for computing the distribution of gene tree topologies given a bifurcating species tree for trees with an arbitrary number of taxa in the case that there is one gene sampled per species. Applications for gene tree distributions include determining exact probabilities of topological equivalence between gene trees and species trees and inferring species trees from multiple datasets. In addition, we examine the shapes of gene tree distributions and their sensitivity to changes in branch lengths, species tree shape, and tree size. The method for computing gene tree distributions is implemented in the computer program COAL. 相似文献

12.

A note on semiparametric efficient inference for two-stage outcome-dependent sampling with a continuous outcome

Song Rui; Zhou Haibo; Kosorok Michael R. 《Biometrika》2009,96(1):221-228

Outcome-dependent sampling designs have been shown to be a cost-effectiveway to enhance study efficiency. We show that the outcome-dependentsampling design with a continuous outcome can be viewed as anextension of the two-stage case-control designs to the continuous-outcomecase. We further show that the two-stage outcome-dependent samplinghas a natural link with the missing-data and biased-samplingframeworks. Through the use of semiparametric inference andmissing-data techniques, we show that a certain semiparametricmaximum-likelihood estimator is computationally convenient andachieves the semiparametric efficient information bound. Wedemonstrate this both theoretically and through simulation. 相似文献

13.

Patterns of neutral diversity under general models of selective sweeps

Coop G Ralph P 《Genetics》2012,192(1):205-224

Two major sources of stochasticity in the dynamics of neutral alleles result from resampling of finite populations (genetic drift) and the random genetic background of nearby selected alleles on which the neutral alleles are found (linked selection). There is now good evidence that linked selection plays an important role in shaping polymorphism levels in a number of species. One of the best-investigated models of linked selection is the recurrent full-sweep model, in which newly arisen selected alleles fix rapidly. However, the bulk of selected alleles that sweep into the population may not be destined for rapid fixation. Here we develop a general model of recurrent selective sweeps in a coalescent framework, one that generalizes the recurrent full-sweep model to the case where selected alleles do not sweep to fixation. We show that in a large population, only the initial rapid increase of a selected allele affects the genealogy at partially linked sites, which under fairly general assumptions are unaffected by the subsequent fate of the selected allele. We also apply the theory to a simple model to investigate the impact of recurrent partial sweeps on levels of neutral diversity and find that for a given reduction in diversity, the impact of recurrent partial sweeps on the frequency spectrum at neutral sites is determined primarily by the frequencies rapidly achieved by the selected alleles. Consequently, recurrent sweeps of selected alleles to low frequencies can have a profound effect on levels of diversity but can leave the frequency spectrum relatively unperturbed. In fact, the limiting coalescent model under a high rate of sweeps to low frequency is identical to the standard neutral model. The general model of selective sweeps we describe goes some way toward providing a more flexible framework to describe genomic patterns of diversity than is currently available. 相似文献

14.

Sparse Sampling and Maximum Likelihood Estimation for Boolean Models

G. Ayala J. R. Ferrandiz F. Montes 《Biometrical journal. Biometrische Zeitschrift》1991,33(2):237-245

A condition for practical independence of contact distribution functions in Boolean models is obtained. This result allows the authors to use maximum likelihcod methods, via sparse sampling, for estimating unknown parameters of an isotropic Boolean model. The second part of this paper is devoted to a simulation study of the proposed method. AMS classification: 60D05 相似文献

15.

Can the Site-Frequency Spectrum Distinguish Exponential Population Growth from Multiple-Merger Coalescents?

Bjarki Eldon Matthias Birkner Jochen Blath Fabian Freund 《Genetics》2015,199(3):841-856

The ability of the site-frequency spectrum (SFS) to reflect the particularities of gene genealogies exhibiting multiple mergers of ancestral lines as opposed to those obtained in the presence of population growth is our focus. An excess of singletons is a well-known characteristic of both population growth and multiple mergers. Other aspects of the SFS, in particular, the weight of the right tail, are, however, affected in specific ways by the two model classes. Using an approximate likelihood method and minimum-distance statistics, our estimates of statistical power indicate that exponential and algebraic growth can indeed be distinguished from multiple-merger coalescents, even for moderate sample sizes, if the number of segregating sites is high enough. A normalized version of the SFS (nSFS) is also used as a summary statistic in an approximate Bayesian computation (ABC) approach. The results give further positive evidence as to the general eligibility of the SFS to distinguish between the different histories. 相似文献

16.

John M. Neuhaus Alastair J. Scott Christopher J. Wild Yannan Jiang Charles E. McCulloch Ross Boylan 《Biometrics》2014,70(1):44-52

相似文献

17.

Logistic disease incidence models and case-control studies 总被引：8，自引：0，他引：8

PRENTICE R. L.; PYKE R. 《Biometrika》1979,66(3):403-411

相似文献

18.

Exact coalescent for the Wright-Fisher model

Fu YX 《Theoretical population biology》2006,69(4):385-394

The Kingman coalescent, which has become the foundation for a wide range of theoretical as well as empirical studies, was derived as an approximation of the Wright-Fisher (WF) model. The approximation heavily relies on the assumption that population size is large and sample size is much smaller than the population size. Whether the sample size is too large compared to the population size is rarely questioned in practice when applying statistical methods based on the Kingman coalescent. Since WF model is the most widely used population genetics model for reproduction, it is desirable to develop a coalescent framework for the WF model, which can be used whenever there are concerns about the accuracy of the Kingman coalescent as an approximation. This paper described the exact coalescent theory for the WF model and develops a simulation algorithm, which is then used, together with an analytical approach, to study the properties of the exact coalescent as well as its differences to the Kingman coalescent. We show that the Kingman coalescent differs from the exact coalescent by: (1) shorter waiting time between successive coalescent events; (2) different probability of observing a topological relationship among sequences in a sample; and (3) slightly smaller tree length in the genealogy of a large sample. On the other hand, there is little difference in the age of the most recent common ancestor (MRCA) of the sample. The exact coalescent makes up the longer waiting time between successive coalescent events by having multiple coalescence at the same time. The most significant difference among various summary statistics of a coalescent examined is the sum of lengths of external branches, which can be more than 10% larger for exact coalescent than that for the Kingman coalescent. As a whole, the Kingman coalescent is a remarkably accurate approximation to the exact coalescent for sample and population sizes falling considerably outside the region that was originally anticipated. 相似文献

19.

Below-threshold mortality: implications for studies in evolution,ecology and demography

Promislow Tatar Pletcher Carey 《Journal of evolutionary biology》1999,12(2):314-328

Evolutionary biologists, ecologists and experimental gerontologists have increasingly used estimates of age-specific mortality as a critical component in studies of a range of important biological processes. However, the analysis of age-specific mortality rates is plagued by specific statistical challenges caused by sampling error. Here we discuss the nature of this ‘demographic sampling error’, and the way in which it can bias our estimates of (1) rates of ageing, (2) age at onset of senescence, (3) costs of reproduction and (4) demographic tests of evolutionary models of ageing. We conducted simulations which suggest that using standard statistical techniques, we would need sample sizes on the order of tens of thousands in most experiments to effectively remove any bias due to sampling error. We argue that biologists should use much larger sample sizes than have previously been used. However, we also present simple maximum likelihood models that effectively remove biases due to demographic sampling error even at relatively small sample sizes. 相似文献

20.

Hazards regression analysis for length-biased data

WANG MEI-CHENG 《Biometrika》1996,83(2):343-354

相似文献