首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Lee SH  Van der Werf JH  Tier B 《Genetics》2005,171(4):2063-2072
A linkage analysis for finding inheritance states and haplotype configurations is an essential process for linkage and association mapping. The linkage analysis is routinely based upon observed pedigree information and marker genotypes for individuals in the pedigree. It is not feasible for exact methods to use all such information for a large complex pedigree especially when there are many missing genotypic data. Proposed Markov chain Monte Carlo approaches such as a single-site Gibbs sampler or the meiosis Gibbs sampler are able to handle a complex pedigree with sparse genotypic data; however, they often have reducibility problems, causing biased estimates. We present a combined method, applying the random walk approach to the reducible sites in the meiosis sampler. Therefore, one can efficiently obtain reliable estimates such as identity-by-descent coefficients between individuals based on inheritance states or haplotype configurations, and a wider range of data can be used for mapping of quantitative trait loci within a reasonable time.  相似文献   

2.
Tong L  Thompson E 《Human heredity》2008,65(3):142-153
To detect the positions of disease loci, lod scores are calculated at multiple chromosomal positions given trait and marker data on members of pedigrees. Exact lod score calculations are often impossible when the size of the pedigree and the number of markers are both large. In this case, a Markov Chain Monte Carlo (MCMC) approach provides an approximation. However, to provide accurate results, mixing performance is always a key issue in these MCMC methods. In this paper, we propose two methods to improve MCMC sampling and hence obtain more accurate lod score estimates in shorter computation time. The first improvement generalizes the block-Gibbs meiosis (M) sampler to multiple meiosis (MM) sampler in which multiple meioses are updated jointly, across all loci. The second one divides the computations on a large pedigree into several parts by conditioning on the haplotypes of some 'key' individuals. We perform exact calculations for the descendant parts where more data are often available, and combine this information with sampling of the hidden variables in the ancestral parts. Our approaches are expected to be most useful for data on a large pedigree with a lot of missing data.  相似文献   

3.
Error detection for genetic data, using likelihood methods.   总被引:6,自引:3,他引:3       下载免费PDF全文
As genetic maps become denser, the effect of laboratory typing errors becomes more serious. We review a general method for detecting errors in pedigree genotyping data that is a variant of the likelihood-ratio test statistic. It pinpoints individuals and loci with relatively unlikely genotypes. Power and significance studies using Monte Carlo methods are shown by using simulated data with pedigree structures similar to the CEPH pedigrees and a larger experimental pedigree used in the study of idiopathic dilated cardiomyopathy (DCM). The studies show the index detects errors for small values of theta with high power and an acceptable false positive rate. The method was also used to check for errors in DCM laboratory pedigree data and to estimate the error rate in CEPH-chromosome 6 data. The errors flagged by our method in the DCM pedigree were confirmed by the laboratory. The results are consistent with estimated false-positive and false-negative rates obtained using simulation.  相似文献   

4.
A new method for segregation and linkage analysis, with pedigree data, is described. Reversible jump Markov chain Monte Carlo methods are used to implement a sampling scheme in which the Markov chain can jump between parameter subspaces corresponding to models with different numbers of quantitative-trait loci (QTL's). Joint estimation of QTL number, position, and effects is possible, avoiding the problems that can arise from misspecification of the number of QTL's in a linkage analysis. The method is illustrated by use of a data set simulated for the 9th Genetic Analysis Workshop; this data set had several oligogenic traits, generated by use of a 1,497-member pedigree. The mixing characteristics of the method appear to be good, and the method correctly recovers the simulated model from the test data set. The approach appears to have great potential both for robust linkage analysis and for the answering of more general questions regarding the genetic control of complex traits.  相似文献   

5.
A Gibbs sampling approach to linkage analysis.   总被引:9,自引:0,他引:9  
We present a Monte Carlo approach to estimation of the recombination fraction theta and the profile likelihood for a dichotomous trait and a single marker gene with 2 alleles. The method is an application of a technique known as 'Gibbs sampling', in which random samples of each of the unknowns (here genotypes, theta and nuisance parameters, including the allele frequencies and the penetrances) are drawn from their posterior distributions, given the data and the current values of all the other unknowns. Upon convergence, the resulting samples derive from the marginal distribution of all the unknowns, given only the data, so that the uncertainty in the specification of the nuisance parameters is reflected in the variance of the posterior distribution of theta. Prior knowledge about the distribution of theta and the nuisance parameters can be incorporated using a Bayesian approach, but adoption of a flat prior for theta and point priors for the nuisance parameters would correspond to the standard likelihood approach. The method is easy to program, runs quickly on a microcomputer, and could be generalized to multiple alleles, multipoint linkage, continuous phenotypes and more complex models of disease etiology. The basic approach is illustrated by application to data on cholesterol levels and an a low-density lipoprotein receptor gene in a single large pedigree.  相似文献   

6.
A heuristic algorithm for finding gene transmission patterns on large and complex pedigrees with partially observed genotype data is proposed. The method can be used to generate an initial point for a Markov chain Monte Carlo simulation or to check that the given pedigree and the genotype data are consistent. In small pedigrees, the algorithm is exact by exhaustively enumerating all possibilities, but, in large pedigrees, with a considerable amount of unknown data, only a subset of promising configurations can actually be checked. For that purpose, the configurations are ordered by combining the approximative conditional probability distribution of the unknown genotypes with the information on the relationships between individuals. We also introduce a way to divide the task into subparts, which has been shown to be useful in large pedigrees. The algorithm has been implemented in a program called APE (Allelic Path Explorer) and tested in three different settings with good results.  相似文献   

7.
A number of procedures have been developed that allow the genetic parameters of natural populations to be estimated using relationship information inferred from marker data rather than known pedigrees. Three published approaches are available; the regression, pair‐wise likelihood and Markov Chain Monte Carlo (MCMC) sib‐ship reconstruction methods. These were applied to body weight and molecular data collected from the Soay sheep population of St. Kilda, which has a previously determined pedigree. The regression and pair‐wise likelihood approaches do not specify an exact pedigree and yielded unreliable heritability estimates, that were sensitive to alteration of the fixed effects. The MCMC method, which specifies a pedigree prior to heritability estimation, yielded results closer to those determined using the known pedigree. In populations of low average relationship, such as the Soay sheep population, determination of a reliable pedigree is more useful than indirect approaches that do not specify a pedigree.  相似文献   

8.
In critical care tight control of blood glucose levels has been shown to lead to better clinical outcomes. The need to develop new protocols for tight glucose control, as well as the opportunity to optimize a variety of other drug therapies, has led to resurgence in model-based medical decision support in this area. One still valid hindrance to developing new model-based protocols using so-called virtual patients, retrospective clinical data, and Monte Carlo methods is the large amount of computational time and resources needed.This paper develops fast analytical-based methods for insulin-glucose system model that are generalizable to other similar systems. Exploiting the structure and partial solutions in a subset of the model is the key in finding accurate fast solutions to the full model. This approach successfully reduced computing time by factors of 5600-144 000 depending on the numerical error management method, for large (50-164 patients) virtual trials and Monte Carlo analysis. It thus allows new model-based or model-derived protocols to be rapidly developed via extensive simulation. The new method is rigorously compared to existing standard numerical solutions and is found to be highly accurate to within 0.2%.  相似文献   

9.
10.
The accurate estimation of the probability of identity by descent (IBD) at loci or genome positions of interest is paramount to the genetic study of quantitative and disease resistance traits. We present a Monte Carlo Markov Chain method to compute IBD probabilities between individuals conditional on DNA markers and on pedigree information. The IBDs can be obtained in a completely general pedigree at any genome position of interest, and all marker and pedigree information available is used. The method can be split into two steps at each iteration. First, phases are sampled using current genotypic configurations of relatives and second, crossover events are simulated conditional on phases. Internal track is kept of all founder origins and crossovers such that the IBD probabilities averaged over replicates are rapidly obtained. We illustrate the method with some examples. First, we show that all pedigree information should be used to obtain line origin probabilities in F2 crosses. Second, the distribution of genetic relationships between half and full sibs is analysed in both simulated data and in real data from an F2 cross in pigs.  相似文献   

11.
Various algorithms have been developed for variant calling using next-generation sequencing data, and various methods have been applied to reduce the associated false positive and false negative rates. Few variant calling programs, however, utilize the pedigree information when the family-based sequencing data are available. Here, we present a program, FamSeq, which reduces both false positive and false negative rates by incorporating the pedigree information from the Mendelian genetic model into variant calling. To accommodate variations in data complexity, FamSeq consists of four distinct implementations of the Mendelian genetic model: the Bayesian network algorithm, a graphics processing unit version of the Bayesian network algorithm, the Elston-Stewart algorithm and the Markov chain Monte Carlo algorithm. To make the software efficient and applicable to large families, we parallelized the Bayesian network algorithm that copes with pedigrees with inbreeding loops without losing calculation precision on an NVIDIA graphics processing unit. In order to compare the difference in the four methods, we applied FamSeq to pedigree sequencing data with family sizes that varied from 7 to 12. When there is no inbreeding loop in the pedigree, the Elston-Stewart algorithm gives analytical results in a short time. If there are inbreeding loops in the pedigree, we recommend the Bayesian network method, which provides exact answers. To improve the computing speed of the Bayesian network method, we parallelized the computation on a graphics processing unit. This allowed the Bayesian network method to process the whole genome sequencing data of a family of 12 individuals within two days, which was a 10-fold time reduction compared to the time required for this computation on a central processing unit.
This is a PLOS Computational Biology Software Article
  相似文献   

12.
Estimates of population size are critical for conservation and management, but accurate estimates are difficult to obtain for many species. Noninvasive genetic methods are increasingly used to estimate population size, particularly in elusive species such as large carnivores, which are difficult to count by most other methods. In most such studies, genotypes are treated simply as unique individual identifiers. Here, we develop a new estimator of population size based on pedigree reconstruction. The estimator accounts for individuals that were directly sampled, individuals that were not sampled but whose genotype could be inferred by pedigree reconstruction, and individuals that were not detected by either of these methods. Monte Carlo simulations show that the population estimate is unbiased and precise if sampling is of sufficient intensity and duration. Simulations also identified sampling conditions that can cause the method to overestimate or underestimate true population size; we present and discuss methods to correct these potential biases. The method detected 2–21% more individuals than were directly sampled across a broad range of simulated sampling schemes. Genotypes are more than unique identifiers, and the information about relationships in a set of genotypes can improve estimates of population size.  相似文献   

13.
With the widespread availability of SNP genotype data, there is great interest in analyzing pedigree haplotype data. Intermarker linkage disequilibrium for microsatellite markers is usually low due to their physical distance; however, for dense maps of SNP markers, there can be strong linkage disequilibrium between marker loci. Linkage analysis (parametric and nonparametric) and family-based association studies are currently being carried out using dense maps of SNP marker loci. Monte Carlo methods are often used for both linkage and association studies; however, to date there are no programs available which can generate haplotype and/or genotype data consisting of a large number of loci for pedigree structures. SimPed is a program that quickly generates haplotype and/or genotype data for pedigrees of virtually any size and complexity. Marker data either in linkage disequilibrium or equilibrium can be generated for greater than 20,000 diallelic or multiallelic marker loci. Haplotypes and/or genotypes are generated for pedigree structures using specified genetic map distances and haplotype and/or allele frequencies. The simulated data generated by SimPed is useful for a variety of purposes, including evaluating methods that estimate haplotype frequencies for pedigree data, evaluating type I error due to intermarker linkage disequilibrium and estimating empirical p values for linkage and family-based association studies.  相似文献   

14.
Pan W  Louis TA 《Biometrics》2000,56(1):160-166
We apply a linear mixed-effects model to multivariate failure time data. Computation of the regression parameters involves the Buckley-James method in an iterated Monte Carlo expectation-maximization algorithm, wherein the Monte Carlo E-step is implemented using the Metropolis-Hastings algorithm. From simulation studies, this approach compares favorably with the marginal independence approach, especially when there is a strong within-cluster correlation.  相似文献   

15.
Anderson EC 《Genetics》2005,170(2):955-967
This article presents an efficient importance-sampling method for computing the likelihood of the effective size of a population under the coalescent model of Berthier et al. Previous computational approaches, using Markov chain Monte Carlo, required many minutes to several hours to analyze small data sets. The approach presented here is orders of magnitude faster and can provide an approximation to the likelihood curve, even for large data sets, in a matter of seconds. Additionally, confidence intervals on the estimated likelihood curve provide a useful estimate of the Monte Carlo error. Simulations show the importance sampling to be stable across a wide range of scenarios and show that the N(e) estimator itself performs well. Further simulations show that the 95% confidence intervals around the N(e) estimate are accurate. User-friendly software implementing the algorithm for Mac, Windows, and Unix/Linux is available for download. Applications of this computational framework to other problems are discussed.  相似文献   

16.
Detection and Integration of Genotyping Errors in Statistical Genetics   总被引:15,自引:0,他引:15       下载免费PDF全文
Detection of genotyping errors and integration of such errors in statistical analysis are relatively neglected topics, given their importance in gene mapping. A few inopportunely placed errors, if ignored, can tremendously affect evidence for linkage. The present study takes a fresh look at the calculation of pedigree likelihoods in the presence of genotyping error. To accommodate genotyping error, we present extensions to the Lander-Green-Kruglyak deterministic algorithm for small pedigrees and to the Markov-chain Monte Carlo stochastic algorithm for large pedigrees. These extensions can accommodate a variety of error models and refrain from simplifying assumptions, such as allowing, at most, one error per pedigree. In principle, almost any statistical genetic analysis can be performed taking errors into account, without actually correcting or deleting suspect genotypes. Three examples illustrate the possibilities. These examples make use of the full pedigree data, multiple linked markers, and a prior error model. The first example is the estimation of genotyping error rates from pedigree data. The second-and currently most useful-example is the computation of posterior mistyping probabilities. These probabilities cover both Mendelian-consistent and Mendelian-inconsistent errors. The third example is the selection of the true pedigree structure connecting a group of people from among several competing pedigree structures. Paternity testing and twin zygosity testing are typical applications.  相似文献   

17.
Thompson E  Basu S 《Human heredity》2003,56(1-3):119-125
Our objective is the development of robust methods for assessment of evidence for linkage of loci affecting a complex trait to a marker linkage group, using data on extended pedigrees. Using Markov chain Monte Carlo (MCMC) methods, it is possible to sample realizations from the distribution of gene identity by descent (IBD) patterns on a pedigree, conditional on observed data YM at multiple marker loci. Measures of gene IBDW which capture joint genome sharing in extended pedigrees often have unknown and highly skewed distributions, particularly when conditioned on marker data. MCMC provides a direct estimate of the distribution of such measures. Let W be the IBD measure from data YM, and W* the IBD measure from pseudo-data Y*M simulated with the same data availability and genetic marker model as the true data YM, but in the absence of linkage. Then measures of the difference in distributions of W and W* provide evidence for linkage. This approach extracts more information from the data YM than either comparison to the pedigree prior distribution of W or use of statistics that are expectations of W given the data YM. A small example is presented.  相似文献   

18.
傅煜  雷渊才  曾伟生 《生态学报》2015,35(23):7738-7747
采用系统抽样体系江西省固定样地杉木连续观测数据和生物量数据,通过Monte Carlo法反复模拟由单木生物量模型推算区域尺度地上生物量的过程,估计了江西省杉木地上总生物量。基于不同水平建模样本量n及不同决定系数R~2的设计,分别研究了单木生物量模型参数变异性及模型残差变异性对区域尺度生物量估计不确定性的影响。研究结果表明:2009年江西省杉木地上生物量估计值为(19.84±1.27)t/hm~2,不确定性占生物量估计值约6.41%。生物量估计值和不确定性值达到平稳状态所需的运算时间随建模样本量及决定系数R~2的增大而减小;相对于模型参数变异性,残差变异性对不确定性的影响更小。  相似文献   

19.
Introduction

The Monte Carlo technique is widely used and recommended for including uncertainties LCA. Typically, 1000 or 10,000 runs are done, but a clear argument for that number is not available, and with the growing size of LCA databases, an excessively high number of runs may be a time-consuming thing. We therefore investigate if a large number of runs are useful, or if it might be unnecessary or even harmful.

Probability theory

We review the standard theory or probability distributions for describing stochastic variables, including the combination of different stochastic variables into a calculation. We also review the standard theory of inferential statistics for estimating a probability distribution, given a sample of values. For estimating the distribution of a function of probability distributions, two major techniques are available, analytical, applying probability theory and numerical, using Monte Carlo simulation. Because the analytical technique is often unavailable, the obvious way-out is Monte Carlo. However, we demonstrate and illustrate that it leads to overly precise conclusions on the values of estimated parameters, and to incorrect hypothesis tests.

Numerical illustration

We demonstrate the effect for two simple cases: one system in a stand-alone analysis and a comparative analysis of two alternative systems. Both cases illustrate that statistical hypotheses that should not be rejected in fact are rejected in a highly convincing way, thus pointing out a fundamental flaw.

Discussion and conclusions

Apart form the obvious recommendation to use larger samples for estimating input distributions, we suggest to restrict the number of Monte Carlo runs to a number not greater than the sample sizes used for the input parameters. As a final note, when the input parameters are not estimated using samples, but through a procedure, such as the popular pedigree approach, the Monte Carlo approach should not be used at all.

  相似文献   

20.
Markov chain Monte Carlo (MCMC) methods have been proposed to overcome computational problems in linkage and segregation analyses. This approach involves sampling genotypes at the marker and trait loci. Scalar-Gibbs is easy to implement, and it is widely used in genetics. However, the Markov chain that corresponds to scalar-Gibbs may not be irreducible when the marker locus has more than two alleles, and even when the chain is irreducible, mixing has been observed to be slow. These problems do not arise if the genotypes are sampled jointly from the entire pedigree. This paper proposes a method to jointly sample genotypes. The method combines the Elston-Stewart algorithm and iterative peeling, and is called the ESIP sampler. For a hypothetical pedigree, genotype probabilities are estimated from samples obtained using ESIP and also scalar-Gibbs. Approximate probabilities were also obtained by iterative peeling. Comparisons of these with exact genotypic probabilities obtained by the Elston-Stewart algorithm showed that ESIP and iterative peeling yielded genotypic probabilities that were very close to the exact values. Nevertheless, estimated probabilities from scalar-Gibbs with a chain of length 235 000, including a burn-in of 200 000 steps, were less accurate than probabilities estimated using ESIP with a chain of length 10 000, with a burn-in of 5 000 steps. The effective chain size (ECS) was estimated from the last 25 000 elements of the chain of length 125 000. For one of the ESIP samplers, the ECS ranged from 21 579 to 22 741, while for the scalar-Gibbs sampler, the ECS ranged from 64 to 671. Genotype probabilities were also estimated for a large real pedigree consisting of 3 223 individuals. For this pedigree, it is not feasible to obtain exact genotype probabilities by the Elston-Stewart algorithm. ESIP and iterative peeling yielded very similar results. However, results from scalar-Gibbs were less accurate.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号