首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
A commonly used tool in disease association studies is the search for discrepancies between the haplotype distribution in the case and control populations. In order to find this discrepancy, the haplotypes frequency in each of the populations is estimated from the genotypes. We present a new method HAPLOFREQ to estimate haplotype frequencies over a short genomic region given the genotypes or haplotypes with missing data or sequencing errors. Our approach incorporates a maximum likelihood model based on a simple random generative model which assumes that the genotypes are independently sampled from the population. We first show that if the phased haplotypes are given, possibly with missing data, we can estimate the frequency of the haplotypes in the population by finding the global optimum of the likelihood function in polynomial time. If the haplotypes are not phased, finding the maximum value of the likelihood function is NP-hard. In this case, we define an alternative likelihood function which can be thought of as a relaxed likelihood function. We show that the maximum relaxed likelihood can be found in polynomial time and that the optimal solution of the relaxed likelihood approaches asymptotically to the haplotype frequencies in the population. In contrast to previous approaches, our algorithms are guaranteed to converge in polynomial time to a global maximum of the different likelihood functions. We compared the performance of our algorithm to the widely used program PHASE, and we found that our estimates are at least 10% more accurate than PHASE and about ten times faster than PHASE. Our techniques involve new algorithms in convex optimization. These algorithms may be of independent interest. Particularly, they may be helpful in other maximum likelihood problems arising from survey sampling.  相似文献   

2.
Statistical Properties of a DNA Sample under the Finite-Sites Model   总被引:1,自引:0,他引:1       下载免费PDF全文
Z. Yang 《Genetics》1996,144(4):1941-1950
Statistical properties of a DNA sample from a random-mating population of constant size are studied under the finite-sites model. It is assumed that there is no migration and no recombination occurs within the locus. A Markov process model is used for nucleotide substitution, allowing for multiple substitutions at a single site. The evolutionary rates among sites are treated as either constant or variable. The general likelihood calculation using numerical integration involves intensive computation and is feasible for three or four sequences only; it may be used for validating approximate algorithms. Methods are developed to approximate the probability distribution of the number of segregating sites in a random sample of n sequences, with either constant or variable substitution rates across sites. Calculations using parameter estimates obtained for human D-loop mitochondrial DNAs show that among-site rate variation has a major effect on the distribution of the number of segregating sites; the distribution under the finite-sites model with variable rates among sites is quite different from that under the infinite-sites model.  相似文献   

3.
We describe a forward-time haploid reproduction model with a constant population size that includes life history characteristics common to many marine organisms. We develop coalescent approximations for sample gene genealogies under this model and use these to predict patterns of genetic variation. Depending on the behavior of the underlying parameters of the model, the approximations are coalescent processes with simultaneous multiple mergers or Kingman’s coalescent. Using simulations, we apply our model to data from the Pacific oyster and show that our model predicts the observed data very well. We also show that a fact which holds for Kingman’s coalescent and also for general coalescent trees–that the most-frequent allele at a biallelic locus is likely to be the ancestral allele–is not true for our model. Our work suggests that the power to detect a “sweepstakes effect” in a sample of DNA sequences from marine organisms depends on the sample size.  相似文献   

4.
Maximum likelihood estimation of the model parameters for a spatial population based on data collected from a survey sample is usually straightforward when sampling and non-response are both non-informative, since the model can then usually be fitted using the available sample data, and no allowance is necessary for the fact that only a part of the population has been observed. Although for many regression models this naive strategy yields consistent estimates, this is not the case for some models, such as spatial auto-regressive models. In this paper, we show that for a broad class of such models, a maximum marginal likelihood approach that uses both sample and population data leads to more efficient estimates since it uses spatial information from sampled as well as non-sampled units. Extensive simulation experiments based on two well-known data sets are used to assess the impact of the spatial sampling design, the auto-correlation parameter and the sample size on the performance of this approach. When compared to some widely used methods that use only sample data, the results from these experiments show that the maximum marginal likelihood approach is much more precise.  相似文献   

5.
Estimation of population size with missing zero-class is an important problem that is encountered in epidemiological assessment studies. Fitting a Poisson model to the observed data by the method of maximum likelihood and estimation of the population size based on this fit is an approach that has been widely used for this purpose. In practice, however, the Poisson assumption is seldom satisfied. Zelterman (1988) has proposed a robust estimator for unclustered data that works well in a wide class of distributions applicable for count data. In the work presented here, we extend this estimator to clustered data. The estimator requires fitting a zero-truncated homogeneous Poisson model by maximum likelihood and thereby using a Horvitz-Thompson estimator of population size. This was found to work well, when the data follow the hypothesized homogeneous Poisson model. However, when the true distribution deviates from the hypothesized model, the population size was found to be underestimated. In the search of a more robust estimator, we focused on three models that use all clusters with exactly one case, those clusters with exactly two cases and those with exactly three cases to estimate the probability of the zero-class and thereby use data collected on all the clusters in the Horvitz-Thompson estimator of population size. Loss in efficiency associated with gain in robustness was examined based on a simulation study. As a trade-off between gain in robustness and loss in efficiency, the model that uses data collected on clusters with at most three cases to estimate the probability of the zero-class was found to be preferred in general. In applications, we recommend obtaining estimates from all three models and making a choice considering the estimates from the three models, robustness and the loss in efficiency.  相似文献   

6.
In this paper, we consider the problem of estimating the size N of a finite and closed population, using data obtained from capture-recapture experiments. By defining an appropriate model, we investigate the maximum of the likelihood, of the profile likelihood and of an orthogonal adjusted profile likelihood (COX and REID, 1987) function. We show that they all may present infinity as the maximum likelihood estimator of N. This seems to be a characteristic of the likelihood approach in this problem. Further, we present a Bayesian approach with minimum prior information as a way of countering this difficulty. Exact analytical expressions for the posterior modes are also obtained.  相似文献   

7.
Detecting population expansion and decline using microsatellites   总被引:15,自引:0,他引:15  
Beaumont MA 《Genetics》1999,153(4):2013-2029
This article considers a demographic model where a population varies in size either linearly or exponentially. The genealogical history of microsatellite data sampled from this population can be described using coalescent theory. A method is presented whereby the posterior probability distribution of the genealogical and demographic parameters can be estimated using Markov chain Monte Carlo simulations. The likelihood surface for the demographic parameters is complicated and its general features are described. The method is then applied to published microsatellite data from two populations. Data from the northern hairy-nosed wombat show strong evidence of decline. Data from European humans show weak evidence of expansion.  相似文献   

8.
We present the results of extensive simulations that emulate the development and distribution of linkage disequilibrium (LD) between single-nucleotide polymorphisms (SNPs) and a gene locus that is phenotypically stratified into two classes (disease phenotype and wild-type phenotype). Our approach, based on coalescence theory, allows an explicit modeling of the demographic history of the population without conditioning on the age of the mutation, and serves as an efficient tool to carry out simulations. More specifically, we compare the influence that a constant population size or an exponentially growing population has on the amount of LD. These results indicate that attempts to locate single disease genes are most likely successful in small and constant populations. On the other hand, if we consider an exponentially growing population that started to expand from an initially constant population of reasonable size, then our simulations indicate a lower success rate. The power to detect association is enhanced if haplotypes constructed from several SNPs are used as markers. The versatility of the coalescence approach also allows the analysis of other relevant factors that influence the chances that a disease gene will be located. We show that several alleles leading to the same disease have no substantial influence on the amount of LD, as long as the differences between the disease-causing alleles are confined to the same region of the gene locus and as long as each allele occurs in an appreciable frequency. Our simulations indicate that mapping of less-frequent diseases is more likely to be successful. Moreover, we show that successful attempts to map complex diseases depend crucially on the phenotype-genotype correlations of all alleles at the disease locus. An analysis of lipoprotein lipase data indicates that our simulations capture the major features of LD occurring in biological data.  相似文献   

9.
The evolutionary history of a population involves changes in size, movements and selection pressures through time. Reconstruction of population history based on modern genetic data tends to be averaged over time or to be biased by generally reflecting only recent or extreme events, leaving many population historic processes undetected. Temporal genetic data present opportunities to reveal more complex population histories and provide important insights into what processes have influenced modern genetic diversity. Here we provide a synopsis of methods available for the analysis of ancient genetic data. We review 29 ancient DNA studies, summarizing the analytical methods and general conclusions for each study. Using the serial coalescent and a model-testing approach, we then re-analyse data from two species represented by these data sets in a common interpretive framework. Our analyses show that phylochronologic data can reveal more about population history than modern data alone, thus revealing 'cryptic' population processes, and enable us to determine whether simple or complex models best explain the data. Our re-analyses point to the need for novel methods that consider gene flow, multiple populations and population size in reconstruction of population history. We conclude that population genetic samples over large temporal and geographical scales, when analysed using more complex models and the serial coalescent, are critical to understand past population dynamics and provide important tools for reconstructing the evolutionary process.  相似文献   

10.
Various methodological approaches using molecular sequence data have been developed and applied across several fields, including phylogeography, conservation biology, virology and human evolution. The aim of these approaches is to obtain predictive estimates of population history from DNA sequence data that can then be used for hypothesis testing with empirical data. This recent work provides opportunities to evaluate hypotheses of constant population size through time, of population growth or decline, of the rate of growth or decline, and of migration and growth in subdivided populations. At the core of many of these approaches is the extraction of information from the structure of phylogenetic trees to infer the demographic history of a population, and underlying nearly all methods is coalescent theory. With the increasing availability of DNA sequence data, it is important to review the different ways in which information can be extracted from DNA sequence data to estimate demographic parameters.  相似文献   

11.
Sargsyan O 《PloS one》2012,7(5):e37588
Hitchhiking and severe bottleneck effects have impact on the dynamics of genetic diversity of a population by inducing homogenization at a single locus and at the genome-wide scale, respectively. As a result, identification and differentiation of the signatures of such events from DNA sequence data at a single locus is challenging. This paper develops an analytical framework for identifying and differentiating recent homogenization events at multiple neutral loci in low recombination regions. The dynamics of genetic diversity at a locus after a recent homogenization event is modeled according to the infinite-sites mutation model and the Wright-Fisher model of reproduction with constant population size. In this setting, I derive analytical expressions for the distribution, mean, and variance of the number of polymorphic sites in a random sample of DNA sequences from a locus affected by a recent homogenization event. Based on this framework, three likelihood-ratio based tests are presented for identifying and differentiating recent homogenization events at multiple loci. Lastly, I apply the framework to two data sets. First, I consider human DNA sequences from four non-coding loci on different chromosomes for inferring evolutionary history of modern human populations. The results suggest, in particular, that recent homogenization events at the loci are identifiable when the effective human population size is 50,000 or greater in contrast to 10,000, and the estimates of the recent homogenization events are agree with the "Out of Africa" hypothesis. Second, I use HIV DNA sequences from HIV-1-infected patients to infer the times of HIV seroconversions. The estimates are contrasted with other estimates derived as the mid-time point between the last HIV-negative and first HIV-positive screening tests. The results show that significant discrepancies can exist between the estimates.  相似文献   

12.
Chao A  Chu W  Hsu CH 《Biometrics》2000,56(2):427-433
We consider a capture-recapture model in which capture probabilities vary with time and with behavioral response. Two inference procedures are developed under the assumption that recapture probabilities bear a constant relationship to initial capture probabilities. These two procedures are the maximum likelihood method (both unconditional and conditional types are discussed) and an approach based on optimal estimating functions. The population size estimators derived from the two procedures are shown to be asymptotically equivalent when population size is large enough. The performance and relative merits of various population size estimators for finite cases are discussed. The bootstrap method is suggested for constructing a variance estimator and confidence interval. An example of the deer mouse analyzed in Otis et al. (1978, Wildlife Monographs 62, 93) is given for illustration.  相似文献   

13.
Choi SC  Hey J 《Genetics》2011,189(2):561-577
A new approach to assigning individuals to populations using genetic data is described. Most existing methods work by maximizing Hardy-Weinberg and linkage equilibrium within populations, neither of which will apply for many demographic histories. By including a demographic model, within a likelihood framework based on coalescent theory, we can jointly study demographic history and population assignment. Genealogies and population assignments are sampled from a posterior distribution using a general isolation-with-migration model for multiple populations. A measure of partition distance between assignments facilitates not only the summary of a posterior sample of assignments, but also the estimation of the posterior density for the demographic history. It is shown that joint estimates of assignment and demographic history are possible, including estimation of population phylogeny for samples from three populations. The new method is compared to results of a widely used assignment method, using simulated and published empirical data sets.  相似文献   

14.
The arrival of agriculture into Europe during the Neolithic transition brought a significant shift in human lifestyle and subsistence. However, the conditions under which the spread of the new culture and technologies occurred are still debated. Similarly, the roles played by women and men during the Neolithic transition are not well understood, probably due to the fact that mitochondrial DNA (mtDNA) and Y chromosome (NRY) data are usually studied independently rather than within the same statistical framework. Here, we applied an integrative approach, using different model-based inferential techniques, to analyse published datasets from contemporary and ancient European populations. By integrating mtDNA and NRY data into the same admixture approach, we show that both males and females underwent the same admixture history and both support the demic diffusion model of Ammerman and Cavalli-Sforza. Similarly, the patterns of genetic diversity found in extant and ancient populations demonstrate that both modern and ancient mtDNA support the demic diffusion model. They also show that population structure and differential growth between farmers and hunter-gatherers are necessary to explain both types of data. However, we also found some differences between male and female markers, suggesting that the female effective population size was larger than that of the males, probably due to different demographic histories. We argue that these differences are most probably related to the various shifts in cultural practices and lifestyles that followed the Neolithic Transition, such as sedentism, the shift from polygyny to monogamy or the increase of patrilocality.  相似文献   

15.
Bartolucci F  Pennoni F 《Biometrics》2007,63(2):568-578
We propose an extension of the latent class model for the analysis of capture-recapture data which allows us to take into account the effect of a capture on the behavior of a subject with respect to future captures. The approach is based on the assumption that the variable indexing the latent class of a subject follows a Markov chain with transition probabilities depending on the previous capture history. Several constraints are allowed on these transition probabilities and on the parameters of the conditional distribution of the capture configuration given the latent process. We also allow for the presence of discrete explanatory variables, which may affect the parameters of the latent process. To estimate the resulting models, we rely on the conditional maximum likelihood approach and for this aim we outline an EM algorithm. We also give some simple rules for point and interval estimation of the population size. The approach is illustrated by applying it to two data sets concerning small mammal populations.  相似文献   

16.
Linkage disequilibrium (LD) is of great interest for gene mapping and the study of population history. We propose a multilocus model for LD, based on the decay of haplotype sharing (DHS). The DHS model is most appropriate when the LD in which one is interested is due to the introduction of a variant on an ancestral haplotype, with recombinations in succeeding generations resulting in preservation of only a small region of the ancestral haplotype around the variant. This is generally the scenario of interest for gene mapping by LD. The DHS parameter is a measure of LD that can be interpreted as the expected genetic distance to which the ancestral haplotype is preserved, or, equivalently, 1/(time in generations to the ancestral haplotype). The method allows for multiple origins of alleles and for mutations, and it takes into account missing observations and ambiguities in haplotype determination, via a hidden Markov model. Whereas most commonly used measures of LD apply to pairs of loci, the DHS measure is designed for application to the densely mapped haplotype data that are increasingly available. The DHS method explicitly models the dependence among multiple tightly linked loci on a chromosome. When the assumptions about population structure are sufficiently tractable, the estimate of LD is obtained by maximum likelihood. For more-complicated models of population history, we find means and covariances based on the model and solve a quasi-score estimating equation. Simulations show that this approach works extremely well both for estimation of LD and for fine mapping. We apply the DHS method to published data sets for cystic fibrosis and progressive myoclonus epilepsy.  相似文献   

17.
Sex-biased admixture has been observed in a wide variety of admixed populations. Genetic variation in sex chromosomes and functions of quantities computed from sex chromosomes and autosomes have often been examined to infer patterns of sex-biased admixture, typically using statistical approaches that do not mechanistically model the complexity of a sex-specific history of admixture. Here, expanding on a model of Verdu and Rosenberg (2011) that did not include sex specificity, we develop a model that mechanistically examines sex-specific admixture histories. Under the model, multiple source populations contribute to an admixed population, potentially with their male and female contributions varying over time. In an admixed population descended from two source groups, we derive the moments of the distribution of the autosomal admixture fraction from a specific source population as a function of sex-specific introgression parameters and time. Considering admixture processes that are constant in time, we demonstrate that surprisingly, although the mean autosomal admixture fraction from a specific source population does not reveal a sex bias in the admixture history, the variance of autosomal admixture is informative about sex bias. Specifically, the long-term variance decreases as the sex bias from a contributing source population increases. This result can be viewed as analogous to the reduction in effective population size for populations with an unequal number of breeding males and females. Our approach suggests that it may be possible to use the effect of sex-biased admixture on autosomal DNA to assist with methods for inference of the history of complex sex-biased admixture processes.  相似文献   

18.
DNA extracted from hair or faeces shows increasing promise for censusing populations whose individuals are difficult to locate. To date, the main problem with this approach has been that genotyping errors are common. If these errors are not identified, counting genotypes is likely to overestimate the number of individuals in a population. Here, we describe an algorithm that uses maximum likelihood estimates of genotyping error rates to calculate the evidence that samples came from the same individual. We test this algorithm with a hypothetical model of genotyping error and show that this algorithm works well with substantial rates of genotyping error and reasonable amounts of data. Additional work is necessary to develop statistical models of error in empirical data.  相似文献   

19.
Anderson EC 《Genetics》2005,170(2):955-967
This article presents an efficient importance-sampling method for computing the likelihood of the effective size of a population under the coalescent model of Berthier et al. Previous computational approaches, using Markov chain Monte Carlo, required many minutes to several hours to analyze small data sets. The approach presented here is orders of magnitude faster and can provide an approximation to the likelihood curve, even for large data sets, in a matter of seconds. Additionally, confidence intervals on the estimated likelihood curve provide a useful estimate of the Monte Carlo error. Simulations show the importance sampling to be stable across a wide range of scenarios and show that the N(e) estimator itself performs well. Further simulations show that the 95% confidence intervals around the N(e) estimate are accurate. User-friendly software implementing the algorithm for Mac, Windows, and Unix/Linux is available for download. Applications of this computational framework to other problems are discussed.  相似文献   

20.
We introduce the mid-depth method, a practical approach for testing hypotheses of demographic history using genealogies reconstructed from sequence data. The relative positions of internal nodes within a genealogy contain information about past population dynamics. We explain how this information can be used to (1) test the null hypothesis of constant population size and (2) estimate the growth rate and current population size of an exponentially growing population. Simulation tests indicate that, as expected, estimates of exponential growth rates are sometimes biased. The mid-depth method is computationally rapid and does not require knowledge of the sample's mutation rate. However, it does assume that the reconstructed genealogy is correct and is therefore best suited to the analysis of variation-rich viral data sets. When applied to HIV-1 sequence data, the mid-depth method provides phylogenetic evidence of different exponential growth rates for subtypes A and B. We posit that this difference in growth rate reflects the different transmission routes and epidemiological histories of the two subtypes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号