首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Anderson EC 《Genetics》2005,170(2):955-967
This article presents an efficient importance-sampling method for computing the likelihood of the effective size of a population under the coalescent model of Berthier et al. Previous computational approaches, using Markov chain Monte Carlo, required many minutes to several hours to analyze small data sets. The approach presented here is orders of magnitude faster and can provide an approximation to the likelihood curve, even for large data sets, in a matter of seconds. Additionally, confidence intervals on the estimated likelihood curve provide a useful estimate of the Monte Carlo error. Simulations show the importance sampling to be stable across a wide range of scenarios and show that the N(e) estimator itself performs well. Further simulations show that the 95% confidence intervals around the N(e) estimate are accurate. User-friendly software implementing the algorithm for Mac, Windows, and Unix/Linux is available for download. Applications of this computational framework to other problems are discussed.  相似文献   

2.
A mixture model approach is presented for the mapping of one or more quantitative trait loci (QTLs) in complex populations. In order to exploit the full power of complete linkage maps the simultaneous likelihood of phenotype and a multilocus (all markers and putative QTLs) genotype is computed. Maximum likelihood estimation in our mixture models is implemented via an Expectation-Maximization algorithm: exact, stochastic or Monte Carlo EM by using a simple and flexible Gibbs sampler. Parameters include allele frequencies of markers and QTLs, discrete or normal effects of biallelic or multiallelic QTLs, and homogeneous or heterogeneous residual variances. As an illustration a dairy cattle data set consisting of twenty half-sib families has been reanalyzed. We discuss the potential which our and other approaches have for realistic multiple-QTL analyses in complex populations.  相似文献   

3.
Markov chain Monte Carlo (MCMC) has recently gained use as a method of estimating required probability and likelihood functions in pedigree analysis, when exact computation is impractical. However, when a multiallelic locus is involved, irreducibility of the constructed Markov chain, an essential requirement of the MCMC method, may fail. Solutions proposed by several researchers, which do not identify all the noncommunicating sets of genotypic configurations, are inefficient with highly polymorphic loci. This is a particularly serious problem in linkage analysis, because highly polymorphic markers are much more informative and thus are preferred. In the present paper, we describe an algorithm that finds all the noncommunicating classes of genotypic configurations on any pedigree. This leads to a more efficient method of defining an irreducible Markov chain. Examples, including a pedigree from a genetic study of familial Alzheimer disease, are used to illustrate how the algorithm works and how penetrances are modified for specific individuals to ensure irreducibility.  相似文献   

4.
B R Smith  C M Herbinger  H R Merry 《Genetics》2001,158(3):1329-1338
Two Markov chain Monte Carlo algorithms are proposed that allow the partitioning of individuals into full-sib groups using single-locus genetic marker data when no parental information is available. These algorithms present a method of moving through the sibship configuration space and locating the configuration that maximizes an overall score on the basis of pairwise likelihood ratios of being full-sib or unrelated or maximizes the full joint likelihood of the proposed family structure. Using these methods, up to 757 out of 759 Atlantic salmon were correctly classified into 12 full-sib families of unequal size using four microsatellite markers. Large-scale simulations were performed to assess the sensitivity of the procedures to the number of loci and number of alleles per locus, the allelic distribution type, the distribution of families, and the independent knowledge of population allelic frequencies. The number of loci and the number of alleles per locus had the most impact on accuracy. Very good accuracy can be obtained with as few as four loci when they have at least eight alleles. Accuracy decreases when using allelic frequencies estimated in small target samples with skewed family distributions with the pairwise likelihood approach. We present an iterative approach that partly corrects that problem. The full likelihood approach is less sensitive to the precision of allelic frequencies estimates but did not perform as well with the large data set or when little information was available (e.g., four loci with four alleles).  相似文献   

5.
R Guerra  Y Wan  A Jia  C I Amos  J C Cohen 《Human heredity》1999,49(3):146-153
Robust genetic models are used to assess linkage between a quantitative trait and genetic variation at a specific locus using allele-sharing data. Little is known about the relative performance of different possible significance tests under these models. Under the robust variance components model approach there are several alternatives: standard Wald and likelihood ratio tests, a quasilikelihood Wald test, and a Monte Carlo test. This paper reports on the relative performance (significance level and power) of the robust sibling pair test and the different alternatives under the robust variance components model. Simulations show that (1) for a fixed sample size of nuclear families, the variance components model approach is more powerful than the robust sibling pair approach; (2) when the number of nuclear families is at least approximately 100 and heritability at the trait locus is moderate to high (>0.20) all tests based on the variance components model are equally effective; (3) when the number of nuclear families is less than approximately 100 or heritability at the trait locus is low (<0. 20), on balance, the Monte Carlo test provides the best power and is the most valid. The different testing procedures are applied to determine which are able to detect the known association between low density lipoprotein cholesterol and the common genotypes at the locus encoding apolipoprotein E. Results from this application show that the robust sibling pair method may be more effective in practice than that indicated by simulations.  相似文献   

6.
Genetic assignment methods use genotype likelihoods to draw inference about where individuals were or were not born, potentially allowing direct, real-time estimates of dispersal. We used simulated data sets to test the power and accuracy of Monte Carlo resampling methods in generating statistical thresholds for identifying F0 immigrants in populations with ongoing gene flow, and hence for providing direct, real-time estimates of migration rates. The identification of accurate critical values required that resampling methods preserved the linkage disequilibrium deriving from recent generations of immigrants and reflected the sampling variance present in the data set being analysed. A novel Monte Carlo resampling method taking into account these aspects was proposed and its efficiency was evaluated. Power and error were relatively insensitive to the frequency assumed for missing alleles. Power to identify F0 immigrants was improved by using large sample size (up to about 50 individuals) and by sampling all populations from which migrants may have originated. A combination of plotting genotype likelihoods and calculating mean genotype likelihood ratios (DLR) appeared to be an effective way to predict whether F0 immigrants could be identified for a particular pair of populations using a given set of markers.  相似文献   

7.
Anderson EC  Garza JC 《Genetics》2006,172(4):2567-2582
Likelihood-based parentage inference depends on the distribution of a likelihood-ratio statistic, which, in most cases of interest, cannot be exactly determined, but only approximated by Monte Carlo simulation. We provide importance-sampling algorithms for efficiently approximating very small tail probabilities in the distribution of the likelihood-ratio statistic. These importance-sampling methods allow the estimation of small false-positive rates and hence permit likelihood-based inference of parentage in large studies involving a great number of potential parents and many potential offspring. We investigate the performance of these importance-sampling algorithms in the context of parentage inference using single-nucleotide polymorphism (SNP) data and find that they may accelerate the computation of tail probabilities >1 millionfold. We subsequently use the importance-sampling algorithms to calculate the power available with SNPs for large-scale parentage studies, paying particular attention to the effect of genotyping errors and the occurrence of related individuals among the members of the putative mother-father-offspring trios. These simulations show that 60-100 SNPs may allow accurate pedigree reconstruction, even in situations involving thousands of potential mothers, fathers, and offspring. In addition, we compare the power of exclusion-based parentage inference to that of the likelihood-based method. Likelihood-based inference is much more powerful under many conditions; exclusion-based inference would require 40% more SNP loci to achieve the same accuracy as the likelihood-based approach in one common scenario. Our results demonstrate that SNPs are a powerful tool for parentage inference in large managed and/or natural populations.  相似文献   

8.
Linkage disequilibrium (LD) testing has become a popular and effective method of fine-scale disease-gene localization. It has been proposed that LD testing could also be used for genome screening, particularly as dense maps of diallelic markers become available and automation allows inexpensive genotyping of diallelic markers. We compare diallelic markers and multiallelic markers in terms of sample sizes required for detection of LD, by use of a single marker locus in a case-control study, for rare monophyletic diseases with Mendelian inheritance. We extrapolate from our results to discuss the feasibility of single-marker LD screening in more-complex situations. We have used a deterministic population genetic model to calculate the expected power to detect LD as a function of marker density, age of mutation, number of marker alleles, mode of inheritance of a rare disease, and sample size. Our calculations show that multiallelic markers always have more power to detect LD than do diallelic markers (under otherwise equivalent conditions) and that the ratio of the number of diallelic to the number of multiallelic markers needed for equivalent power increases with mutation age and complexity of mode of inheritance. Power equivalent to that achieved by a multiallelic screen can theoretically be achieved by use of a more dense diallelic screen, but mapping panels of the necessary resolution are not currently available and may be difficult to achieve. Genome screening that uses single-marker LD testing may therefore be feasible only for young (<20 generations), rare, monophyletic Mendelian diseases, such as may be found in rapidly growing genetic isolates.  相似文献   

9.
Distinguishing migration from isolation: a Markov chain Monte Carlo approach   总被引:41,自引:0,他引:41  
Nielsen R  Wakeley J 《Genetics》2001,158(2):885-896
A Markov chain Monte Carlo method for estimating the relative effects of migration and isolation on genetic diversity in a pair of populations from DNA sequence data is developed and tested using simulations. The two populations are assumed to be descended from a panmictic ancestral population at some time in the past and may (or may not) after that be connected by migration. The use of a Markov chain Monte Carlo method allows the joint estimation of multiple demographic parameters in either a Bayesian or a likelihood framework. The parameters estimated include the migration rate for each population, the time since the two populations diverged from a common ancestral population, and the relative size of each of the two current populations and of the common ancestral population. The results show that even a single nonrecombining genetic locus can provide substantial power to test the hypothesis of no ongoing migration and/or to test models of symmetric migration between the two populations. The use of the method is illustrated in an application to mitochondrial DNA sequence data from a fish species: the threespine stickleback (Gasterosteus aculeatus).  相似文献   

10.
A Bayesian method for fine mapping is presented, which deals with multiallelic markers (with two or more alleles), unknown phase, missing data, multiple causal variants, and both continuous and binary phenotypes. We consider small chromosomal segments spanned by a dense set of closely linked markers and putative genes only at marker points. In the phenotypic model, locus-specific indicator variables are used to control inclusion in or exclusion from marker contributions. To account for covariance between consecutive loci and to control fluctuations in association signals along a candidate region we introduce a joint prior for the indicators that depends on genetic or physical map distances. The potential of the method, including posterior estimation of trait-associated loci, their effects, linkage disequilibrium pattern due to close linkage of loci, and the age of a causal variant (time to most recent common ancestor), is illustrated with the well-known cystic fibrosis and Friedreich ataxia data sets by assuming that haplotypes were not available. In addition, simulation analysis with large genetic distances is shown. Estimation of model parameters is based on Markov chain Monte Carlo (MCMC) sampling and is implemented using WinBUGS. The model specification code is freely available for research purposes from http://www.rni.helsinki.fi/~mjs/.  相似文献   

11.
M. K. Kuhner  J. Yamato    J. Felsenstein 《Genetics》1995,140(4):1421-1430
We present a new way to make a maximum likelihood estimate of the parameter 4N(e)μ (effective population size times mutation rate per site, or θ) based on a population sample of molecular sequences. We use a Metropolis-Hastings Markov chain Monte Carlo method to sample genealogies in proportion to the product of their likelihood with respect to the data and their prior probability with respect to a coalescent distribution. A specific value of θ must be chosen to generate the coalescent distribution, but the resulting trees can be used to evaluate the likelihood at other values of θ, generating a likelihood curve. This procedure concentrates sampling on those genealogies that contribute most of the likelihood, allowing estimation of meaningful likelihood curves based on relatively small samples. The method can potentially be extended to cases involving varying population size, recombination, and migration.  相似文献   

12.
The problem of inferring haplotypes from genotypes of single nucleotide polymorphisms (SNPs) is essential for the understanding of genetic variation within and among populations, with important applications to the genetic analysis of disease propensities and other complex traits. The problem can be formulated as a mixture model, where the mixture components correspond to the pool of haplotypes in the population. The size of this pool is unknown; indeed, knowing the size of the pool would correspond to knowing something significant about the genome and its history. Thus methods for fitting the genotype mixture must crucially address the problem of estimating a mixture with an unknown number of mixture components. In this paper we present a Bayesian approach to this problem based on a nonparametric prior known as the Dirichlet process. The model also incorporates a likelihood that captures statistical errors in the haplotype/genotype relationship trading off these errors against the size of the pool of haplotypes. We describe an algorithm based on Markov chain Monte Carlo for posterior inference in our model. The overall result is a flexible Bayesian method, referred to as DP-Haplotyper, that is reminiscent of parsimony methods in its preference for small haplotype pools. We further generalize the model to treat pedigree relationships (e.g., trios) between the population's genotypes. We apply DP-Haplotyper to the analysis of both simulated and real genotype data, and compare to extant methods.  相似文献   

13.
A General Monte Carlo Method for Mapping Multiple Quantitative Trait Loci   总被引:2,自引:0,他引:2  
R. C. Jansen 《Genetics》1996,142(1):305-311
In this paper we address the mapping of multiple quantitative trait loci (QTLs) in line crosses for which the genetic data are highly incomplete. Such complicated situations occur, for instance, when dominant markers are used or when unequally informative markers are used in experiments with outbred populations. We describe a general and flexible Monte Carlo expectation-maximization (Monte Carlo EM) algorithm for fitting multiple-QTL models to such data. Implementation of this algorithm is straightforward in standard statistical software, but computation may take much time. The method may be generalized to cope with more complex models for animal and human pedigrees. A practical example is presented, where a three-QTL model is adopted in an outbreeding situation with dominant markers. The example is concerned with the linkage between randomly amplified polymorphic DNA (RAPD) markers and QTLs for partial resistance to Fusarium oxysporum in lily.  相似文献   

14.
非交叉配子形成体的连锁图谱构建方法   总被引:1,自引:0,他引:1  
根据非交叉(achiasmatic)遗传模型,提出采用最大似然法计算遗传交换率的方法,同时开发了构建非交叉生物(F2群体)连锁图谱的计算机软件。通过卡方验检可测性连锁分子标记。对于无交叉生物现象,采用蒙特卡洛模拟技术,对交叉(chiasmatic)和非交叉两个遗传模型遗传交换率的估计值和作图效率进行了比较。模拟结果表明,非交叉模型能提供无偏的估计值,而交叉模型则只有实际值的一半。在所有同等的条件下,基于非交叉模型的作图效率均高于基于交叉模型(无校正)的作图效率。对于非交叉配子形成体,采用基于非交叉模型的交换率计算方法能获得理想的作图效率。  相似文献   

15.
The multispecies coalescent provides an elegant theoretical framework for estimating species trees and species demographics from genetic markers. However, practical applications of the multispecies coalescent model are limited by the need to integrate or sample over all gene trees possible for each genetic marker. Here we describe a polynomial-time algorithm that computes the likelihood of a species tree directly from the markers under a finite-sites model of mutation effectively integrating over all possible gene trees. The method applies to independent (unlinked) biallelic markers such as well-spaced single nucleotide polymorphisms, and we have implemented it in SNAPP, a Markov chain Monte Carlo sampler for inferring species trees, divergence dates, and population sizes. We report results from simulation experiments and from an analysis of 1997 amplified fragment length polymorphism loci in 69 individuals sampled from six species of Ourisia (New Zealand native foxglove).  相似文献   

16.
We introduce a Monte Carlo approach to combined segregation and linkage analysis of a quantitative trait observed in an extended pedigree. In conjunction with the Monte Carlo method of likelihood-ratio evaluation proposed by Thompson and Guo, the method provides for estimation and hypothesis testing. The greatest attraction of this approach is its ability to handle complex genetic models and large pedigrees. Two examples illustrate the practicality of the method. One is of simulated data on a large pedigree; the other is a reanalysis of published data previously analyzed by other methods.  相似文献   

17.
The genetic analysis of characters that change as a function of some independent and continuous variable has received increasing attention in the biological and statistical literature. Previous work in this area has focused on the analysis of normally distributed characters that are directly observed. We propose a framework for the development and specification of models for a quantitative genetic analysis of function-valued characters that are not directly observed, such as genetic variation in age-specific mortality rates or complex threshold characters. We employ a hybrid Markov chain Monte Carlo algorithm involving a Monte Carlo EM algorithm coupled with a Markov chain approximation to the likelihood, which is quite robust and provides accurate estimates of the parameters in our models. The methods are investigated using simulated data and are applied to a large data set measuring mortality rates in the fruit fly, Drosophila melanogaster.  相似文献   

18.
We describe a novel approach to deducing order parameters and correlation times in proteins using a Bayesian statistical method, and show how likelihood contours, P(,S), and confidence levels can be obtained. These results are then compared with those obtained from a simple graphical method, as well as those from Monte Carlo simulations. The Bayes approach has the advantage that it is simple and accurate. Unlike Monte Carlo methods, it gives useful contour plots of probability (also not provided by the simple graphical method), and provides likelihood/confidence information. In addition, the Bayesian approach gives results in very good agreement with those obtained from Monte Carlo simulations, and as such use of Bayesian statistical methods appears to have a promising future for studies of order and dynamics in macromolecules.  相似文献   

19.
Using genetic marker data, we have developed a general methodology for estimating genetic relationships between a set of individuals. The purpose of this paper is to illustrate the practical utility of these methods as applied to the problem of paternity testing. Bayesian methods are used to compute the posterior probability distribution of the genetic relationship parameters. Use of an interval-estimation approach rather than a hypothesis-testing one avoids the problem of the specification of an appropriate null hypothesis in calculating the probability of paternity. Monte Carlo methods are used to evaluate the utility of two sets of genetic markers in obtaining suitably precise estimates of genetic relationship as well as the effect of the prior distribution chosen. Results indicate that with currently available markers a "true" father may be reliably distinguished from any other genetic relationship to the child and that with a reasonable number of markers one can often discriminate between an unrelated individual and one with a second-degree relationship to the child.  相似文献   

20.
We describe an extension to matched case-control studies of the parametric modelling framework developed by Diggle (1990) and Diggle and Rowlingson (1994) to investigate raised risk around putative sources of environmental pollution. We use a conditional likelihood approach for the family of risk functions considered in Diggle and Rowlingson (1994). We show that the likelihood surface that results from these models may be highly irregular, and provide a Bayesian analysis in which we investigate the posterior distribution using Markov chain Monte Carlo. An analysis of one-one matched data that were collected to investigate the relationship between respiratory disease and distance to roads in East London is presented.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号