共查询到20条相似文献,搜索用时 15 毫秒
1.
Kuhner MK 《Bioinformatics (Oxford, England)》2006,22(6):768-770
We present a Markov chain Monte Carlo coalescent genealogy sampler, LAMARC 2.0, which estimates population genetic parameters from genetic data. LAMARC can co-estimate subpopulation Theta = 4N(e)mu, immigration rates, subpopulation exponential growth rates and overall recombination rate, or a user-specified subset of these parameters. It can perform either maximum-likelihood or Bayesian analysis, and accomodates nucleotide sequence, SNP, microsatellite or elecrophoretic data, with resolved or unresolved haplotypes. It is available as portable source code and executables for all three major platforms. AVAILABILITY: LAMARC 2.0 is freely available at http://evolution.gs.washington.edu/lamarc 相似文献
2.
We describe a method for co-estimating 4Nemu (four times the product of effective population size and neutral mutation rate) and population growth rate from sequence samples using Metropolis-Hastings sampling. Population growth (or decline) is assumed to be exponential. The estimates of growth rate are biased upwards, especially when 4Nemu is low; there is also a slight upwards bias in the estimate of 4Nemu itself due to correlation between the parameters. This bias cannot be attributed solely to Metropolis-Hastings sampling but appears to be an inherent property of the estimator and is expected to appear in any approach which estimates growth rate from genealogy structure. Sampling additional unlinked loci is much more effective in reducing the bias than increasing the number or length of sequences from the same locus. 相似文献
3.
There has recently been increased interest in the use of Markov Chain Monte Carlo (MCMC)-based Bayesian methods for estimating genetic maps. The advantage of these methods is that they can deal accurately with missing data and genotyping errors. Here we present an extension of the previous methods that makes the Bayesian method applicable to large data sets. We present an extensive simulation study examining the statistical properties of the method and comparing it with the likelihood method implemented in Mapmaker. We show that the Maximum A Posteriori (MAP) estimator of the genetic distances, corresponding to the maximum likelihood estimator, performs better than estimators based on the posterior expectation. We also show that while the performance is similar between Mapmaker and the MCMC-based method in the absence of genotyping errors, the MCMC-based method has a distinct advantage in the presence of genotyping errors. A similar advantage of the Bayesian method was not observed for missing data. We also re-analyse a recently published set of data from the eggplant and show that the use of the MCMC-based method leads to smaller estimates of genetic distances. 相似文献
4.
Bayesian coalescent inference of past population dynamics from molecular sequences 总被引:31,自引:0,他引:31
We introduce the Bayesian skyline plot, a new method for estimating past population dynamics through time from a sample of molecular sequences without dependence on a prespecified parametric model of demographic history. We describe a Markov chain Monte Carlo sampling procedure that efficiently samples a variant of the generalized skyline plot, given sequence data, and combines these plots to generate a posterior distribution of effective population size through time. We apply the Bayesian skyline plot to simulated data sets and show that it correctly reconstructs demographic history under canonical scenarios. Finally, we compare the Bayesian skyline plot model to previous coalescent approaches by analyzing two real data sets (hepatitis C virus in Egypt and mitochondrial DNA of Beringian bison) that have been previously investigated using alternative coalescent methods. In the bison analysis, we detect a severe but previously unrecognized bottleneck, estimated to have occurred 10,000 radiocarbon years ago, which coincides with both the earliest undisputed record of large numbers of humans in Alaska and the megafaunal extinctions in North America at the beginning of the Holocene. 相似文献
5.
The pool adjacent violator algorithm Ayer et al. (1955, The Annals of Mathematical Statistics, 26, 641-647) has long been known to give the maximum likelihood estimator of a series of ordered binomial parameters, based on an independent observation from each distribution (see Barlow et al., 1972, Statistical Inference under Order Restrictions, Wiley, New York). This result has immediate application to estimation of a survival distribution based on current survival status at a set of monitoring times. This paper considers an extended problem of maximum likelihood estimation of a series of 'ordered' multinomial parameters p(i)= (p(1i),p(2i),.,p(mi)) for 1 相似文献
6.
The marginal likelihood is commonly used for comparing different evolutionary models in Bayesian phylogenetics and is the central quantity used in computing Bayes Factors for comparing model fit. A popular method for estimating marginal likelihoods, the harmonic mean (HM) method, can be easily computed from the output of a Markov chain Monte Carlo analysis but often greatly overestimates the marginal likelihood. The thermodynamic integration (TI) method is much more accurate than the HM method but requires more computation. In this paper, we introduce a new method, steppingstone sampling (SS), which uses importance sampling to estimate each ratio in a series (the "stepping stones") bridging the posterior and prior distributions. We compare the performance of the SS approach to the TI and HM methods in simulation and using real data. We conclude that the greatly increased accuracy of the SS and TI methods argues for their use instead of the HM method, despite the extra computation needed. 相似文献
7.
Tziafetas G 《Biometrical journal. Biometrische Zeitschrift》1980,22(7):583-592
The author describes a Bayesian probability model for estimating population distributions when either micro or macro data on population migration are available. The model is tested using data for two groups of five regions in the Federal Republic of Germany, and it is found that the macro Bayesian estimators lead to a better projection of population distribution than those using micro data. 相似文献
8.
We describe a method for co-estimating r = C/mu (where C is the per-site recombination rate and mu is the per-site neutral mutation rate) and Theta = 4N(e)mu (where N(e) is the effective population size) from a population sample of molecular data. The technique is Metropolis-Hastings sampling: we explore a large number of possible reconstructions of the recombinant genealogy, weighting according to their posterior probability with regard to the data and working values of the parameters. Different relative rates of recombination at different locations can be accommodated if they are known from external evidence, but the algorithm cannot itself estimate rate differences. The estimates of Theta are accurate and apparently unbiased for a wide range of parameter values. However, when both Theta and r are relatively low, very long sequences are needed to estimate r accurately, and the estimates tend to be biased upward. We apply this method to data from the human lipoprotein lipase locus. 相似文献
9.
10.
pIPHULA is the parallel program to estimate the parameters of a realistic model of population growth. 相似文献
11.
The molecular clock provides a powerful way to estimate species divergence times. If information on some species divergence times is available from the fossil or geological record, it can be used to calibrate a phylogeny and estimate divergence times for all nodes in the tree. The Bayesian method provides a natural framework to incorporate different sources of information concerning divergence times, such as information in the fossil and molecular data. Current models of sequence evolution are intractable in a Bayesian setting, and Markov chain Monte Carlo (MCMC) is used to generate the posterior distribution of divergence times and evolutionary rates. This method is computationally expensive, as it involves the repeated calculation of the likelihood function. Here, we explore the use of Taylor expansion to approximate the likelihood during MCMC iteration. The approximation is much faster than conventional likelihood calculation. However, the approximation is expected to be poor when the proposed parameters are far from the likelihood peak. We explore the use of parameter transforms (square root, logarithm, and arcsine) to improve the approximation to the likelihood curve. We found that the new methods, particularly the arcsine-based transform, provided very good approximations under relaxed clock models and also under the global clock model when the global clock is not seriously violated. The approximation is poorer for analysis under the global clock when the global clock is seriously wrong and should thus not be used. The results suggest that the approximate method may be useful for Bayesian dating analysis using large data sets. 相似文献
12.
Wiuf C 《Journal of mathematical biology》2006,53(5):821-841
Composite likelihood methods have become very popular for the analysis of large-scale genomic data sets because of the computational intractability of the basic coalescent process and its generalizations: It is virtually impossible to calculate the likelihood of an observed data set spanning a large chromosomal region without using approximate or heuristic methods. Composite likelihood methods are approximate methods and, in the present article, assume the likelihood is written as a product of likelihoods, one for each of a number of smaller regions that together make up the whole region from which data is collected. A very general framework for neutral coalescent models is presented and discussed. The framework comprises many of the most popular coalescent models that are currently used for analysis of genetic data. Assume data is collected from a series of consecutive regions of equal size. Then it is shown that the observed data forms a stationary, ergodic process. General conditions are given under which the maximum composite estimator of the parameters describing the model (e.g. mutation rates, demographic parameters and the recombination rate) is a consistent estimator as the number of regions tends to infinity. 相似文献
13.
14.
We present a likelihood method for estimating codon usage bias parameters along the lineages of a phylogeny. The method is an extension of the classical codon-based models used for estimating dN/dS ratios along the lineages of a phylogeny. However, we add one extra parameter for each lineage: the selection coefficient for optimal codon usage (S), allowing joint maximum likelihood estimation of S and the dN/dS ratio. We apply the method to previously published data from Drosophila melanogaster, Drosophila simulans, and Drosophila yakuba and show, in accordance with previous results, that the D. melanogaster lineage has experienced a reduction in the selection for optimal codon usage. However, the D. melanogaster lineage has also experienced a change in the biological mutation rates relative to D. simulans, in particular, a relative reduction in the mutation rate from A to G and an increase in the mutation rate from C to T. However, neither a reduction in the strength of selection nor a change in the mutational pattern can alone explain all of the data observed in the D. melanogaster lineage. For example, we also confirm previous results showing that the Notch locus has experienced positive selection for previously classified unpreferred mutations. 相似文献
15.
Comparison of Bayesian and maximum-likelihood inference of population genetic parameters 总被引:9,自引:0,他引:9
Beerli P 《Bioinformatics (Oxford, England)》2006,22(3):341-345
Comparison of the performance and accuracy of different inference methods, such as maximum likelihood (ML) and Bayesian inference, is difficult because the inference methods are implemented in different programs, often written by different authors. Both methods were implemented in the program MIGRATE, that estimates population genetic parameters, such as population sizes and migration rates, using coalescence theory. Both inference methods use the same Markov chain Monte Carlo algorithm and differ from each other in only two aspects: parameter proposal distribution and maximization of the likelihood function. Using simulated datasets, the Bayesian method generally fares better than the ML approach in accuracy and coverage, although for some values the two approaches are equal in performance. MOTIVATION: The Markov chain Monte Carlo-based ML framework can fail on sparse data and can deliver non-conservative support intervals. A Bayesian framework with appropriate prior distribution is able to remedy some of these problems. RESULTS: The program MIGRATE was extended to allow not only for ML(-) maximum likelihood estimation of population genetics parameters but also for using a Bayesian framework. Comparisons between the Bayesian approach and the ML approach are facilitated because both modes estimate the same parameters under the same population model and assumptions. 相似文献
16.
Stitelman OM Wester CW De Gruttola V van der Laan MJ 《The international journal of biostatistics》2011,7(1):19
The Cox proportional hazards model or its discrete time analogue, the logistic failure time model, posit highly restrictive parametric models and attempt to estimate parameters which are specific to the model proposed. These methods are typically implemented when assessing effect modification in survival analyses despite their flaws. The targeted maximum likelihood estimation (TMLE) methodology is more robust than the methods typically implemented and allows practitioners to estimate parameters that directly answer the question of interest. TMLE will be used in this paper to estimate two newly proposed parameters of interest that quantify effect modification in the time to event setting. These methods are then applied to the Tshepo study to assess if either gender or baseline CD4 level modify the effect of two cART therapies of interest, efavirenz (EFV) and nevirapine (NVP), on the progression of HIV. The results show that women tend to have more favorable outcomes using EFV while males tend to have more favorable outcomes with NVP. Furthermore, EFV tends to be favorable compared to NVP for individuals at high CD4 levels. 相似文献
17.
Maximum-likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach. 总被引:29,自引:0,他引:29
A new method for the estimation of migration rates and effective population sizes is described. It uses a maximum-likelihood framework based on coalescence theory. The parameters are estimated by Metropolis-Hastings importance sampling. In a two-population model this method estimates four parameters: the effective population size and the immigration rate for each population relative to the mutation rate. Summarizing over loci can be done by assuming either that the mutation rate is the same for all loci or that the mutation rates are gamma distributed among loci but the same for all sites of a locus. The estimates are as good as or better than those from an optimized FST-based measure. The program is available on the World Wide Web at http://evolution.genetics. washington.edu/lamarc.html/. 相似文献
18.
Background
Estimates of divergence dates between species improve our understanding of processes ranging from nucleotide substitution to speciation. Such estimates are frequently based on molecular genetic differences between species; therefore, they rely on accurate estimates of the number of such differences (i.e. substitutions per site, measured as branch length on phylogenies). We used simulations to determine the effects of dataset size, branch length heterogeneity, branch depth, and analytical framework on branch length estimation across a range of branch lengths. We then reanalyzed an empirical dataset for plethodontid salamanders to determine how inaccurate branch length estimation can affect estimates of divergence dates. 相似文献19.
Möhle M 《Journal of theoretical biology》2000,204(4):629-638
A special stochastic process, called the coalescent, is of fundamental interest in population genetics. For a large class of population models this process is the appropriate tool to analyse the ancestral structure of a sample of n individuals or genes, if the total number of individuals in the population is sufficiently large. A corresponding convergence theorem was first proved by Kingman in 1982 for the Wright-Fisher model and the Moran model. Generalizations to a large class of exchangeable population models and to models with overlying mutation processes followed shortly later. One speaks of the "robustness of the coalescent, as this process appears in many models as the total population size tends to infinity. This publication can be considered as an introduction to the theory of the coalescent as well as a review of the most important "convergence-to-the-coalescent-theorems. Convergence theorems are not only presented for the classical exchangeable haploid case but also for larger classes of population models, for example for diploid, two-sex or non-exchangeable models. A review-like summary of further examples and applications of convergence to the coalescent is given including the most important biological forces like mutation, recombination and selection. The general coalescent process allows for simultaneous multiple mergers of ancestral lines. 相似文献
20.
Volz EM 《Genetics》2012,190(1):187-201
Estimates of the coalescent effective population size N(e) can be poorly correlated with the true population size. The relationship between N(e) and the population size is sensitive to the way in which birth and death rates vary over time. The problem of inference is exacerbated when the mechanisms underlying population dynamics are complex and depend on many parameters. In instances where nonparametric estimators of N(e) such as the skyline struggle to reproduce the correct demographic history, model-based estimators that can draw on prior information about population size and growth rates may be more efficient. A coalescent model is developed for a large class of populations such that the demographic history is described by a deterministic nonlinear dynamical system of arbitrary dimension. This class of demographic model differs from those typically used in population genetics. Birth and death rates are not fixed, and no assumptions are made regarding the fraction of the population sampled. Furthermore, the population may be structured in such a way that gene copies reproduce both within and across demes. For this large class of models, it is shown how to derive the rate of coalescence, as well as the likelihood of a gene genealogy with heterochronous sampling and labeled taxa, and how to simulate a coalescent tree conditional on a complex demographic history. This theoretical framework encapsulates many of the models used by ecologists and epidemiologists and should facilitate the integration of population genetics with the study of mathematical population dynamics. 相似文献