首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Gao G  Hoeschele I 《Genetics》2005,171(1):365-376
Identity-by-descent (IBD) matrix calculation is an important step in quantitative trait loci (QTL) analysis using variance component models. To calculate IBD matrices efficiently for large pedigrees with large numbers of loci, an approximation method based on the reconstruction of haplotype configurations for the pedigrees is proposed. The method uses a subset of haplotype configurations with high likelihoods identified by a haplotyping method. The new method is compared with a Markov chain Monte Carlo (MCMC) method (Loki) in terms of QTL mapping performance on simulated pedigrees. Both methods yield almost identical results for the estimation of QTL positions and variance parameters, while the new method is much more computationally efficient than the MCMC approach for large pedigrees and large numbers of loci. The proposed method is also compared with an exact method (Merlin) in small simulated pedigrees, where both methods produce nearly identical estimates of position-specific kinship coefficients. The new method can be used for fine mapping with joint linkage disequilibrium and linkage analysis, which improves the power and accuracy of QTL mapping.  相似文献   

2.
Computations for genome scans need to adapt to the increasing use of dense diallelic markers as well as of full-chromosome multipoint linkage analysis with either diallelic or multiallelic markers. Whereas suitable exact-computation tools are available for use with small pedigrees, equivalent exact computation for larger pedigrees remains infeasible. Markov chain-Monte Carlo (MCMC)-based methods currently provide the only computationally practical option. To date, no systematic comparison of the performance of MCMC-based programs is available, nor have these programs been systematically evaluated for use with dense diallelic markers. Using simulated data, we evaluate the performance of two MCMC-based linkage-analysis programs--lm_markers from the MORGAN package and SimWalk2--under a variety of analysis conditions. Pedigrees consisted of 14, 52, or 98 individuals in 3, 5, or 6 generations, respectively, with increasing amounts of missing data in larger pedigrees. One hundred replicates of markers and trait data were simulated on a 100-cM chromosome, with up to 10 multiallelic and up to 200 diallelic markers used simultaneously for computation of multipoint LOD scores. Exact computation was available for comparison in most situations, and comparison with a perfectly informative marker or interprogram comparison was available in the remaining situations. Our results confirm the accuracy of both programs in multipoint analysis with multiallelic markers on pedigrees of varied sizes and missing-data patterns, but there are some computational differences. In contrast, for large numbers of dense diallelic markers, only the lm_markers program was able to provide accurate results within a computationally practical time. Thus, programs in the MORGAN package are the first available to provide a computationally practical option for accurate linkage analyses in genome scans with both large numbers of diallelic markers and large pedigrees.  相似文献   

3.
George AW 《Genetics》2005,171(2):791-801
Mapping markers from linkage data continues to be a task performed in many genetic epidemiological studies. Data collected in a study may be used to refine published map estimates and a study may use markers that do not appear in any published map. Furthermore, inaccuracies in meiotic maps can seriously bias linkage findings. To make best use of the available marker information, multilocus linkage analyses are performed. However, two computational issues greatly limit the number of markers currently mapped jointly; the number of candidate marker orders increases exponentially with marker number and computing exact multilocus likelihoods on general pedigrees is computationally demanding. In this article, a new Markov chain Monte Carlo (MCMC) approach that solves both these computational problems is presented. The MCMC approach allows many markers to be mapped jointly, using data observed on general pedigrees with unobserved individuals. The performance of the new mapping procedure is demonstrated through the analysis of simulated and real data. The MCMC procedure performs extremely well, even when there are millions of candidate orders, and gives results superior to those of CRI-MAP.  相似文献   

4.
Genetic assignment methods use genotype likelihoods to draw inference about where individuals were or were not born, potentially allowing direct, real-time estimates of dispersal. We used simulated data sets to test the power and accuracy of Monte Carlo resampling methods in generating statistical thresholds for identifying F0 immigrants in populations with ongoing gene flow, and hence for providing direct, real-time estimates of migration rates. The identification of accurate critical values required that resampling methods preserved the linkage disequilibrium deriving from recent generations of immigrants and reflected the sampling variance present in the data set being analysed. A novel Monte Carlo resampling method taking into account these aspects was proposed and its efficiency was evaluated. Power and error were relatively insensitive to the frequency assumed for missing alleles. Power to identify F0 immigrants was improved by using large sample size (up to about 50 individuals) and by sampling all populations from which migrants may have originated. A combination of plotting genotype likelihoods and calculating mean genotype likelihood ratios (DLR) appeared to be an effective way to predict whether F0 immigrants could be identified for a particular pair of populations using a given set of markers.  相似文献   

5.
We introduce a Monte Carlo approach to combined segregation and linkage analysis of a quantitative trait observed in an extended pedigree. In conjunction with the Monte Carlo method of likelihood-ratio evaluation proposed by Thompson and Guo, the method provides for estimation and hypothesis testing. The greatest attraction of this approach is its ability to handle complex genetic models and large pedigrees. Two examples illustrate the practicality of the method. One is of simulated data on a large pedigree; the other is a reanalysis of published data previously analyzed by other methods.  相似文献   

6.
汤在祥  王学枫  吴雯雯  徐辰武 《遗传》2006,28(9):1117-1122
贝叶斯学派是不同于经典数理统计的一个重要学派, 其发展的贝叶斯统计方法在现代科学的许多领域已有着广泛的应用。探讨了贝叶斯统计在遗传连锁分析中的应用, 包括遗传重组率的贝叶斯估计、遗传连锁的贝叶斯因子检验和基于马尔可夫链蒙特卡罗理论的遗传连锁图谱构建。用编制的SAS/IML程序进行了模拟研究和实例分析, 验证了贝叶斯方法在遗传连锁分析中的有效性和实用性。  相似文献   

7.
One of the most challenging areas in human genetics is the dissection of quantitative traits. In this context, the efficient use of available data is important, including, when possible, use of large pedigrees and many markers for gene mapping. In addition, methods that jointly perform linkage analysis and estimation of the trait model are appealing because they combine the advantages of a model-based analysis with the advantages of methods that do not require prespecification of model parameters for linkage analysis. Here we review a Markov chain Monte Carlo approach for such joint linkage and segregation analysis, which allows analysis of oligogenic traits in the context of multipoint linkage analysis of large pedigrees. We provide an outline for practitioners of the salient features of the method, interpretation of the results, effect of violation of assumptions, and an example analysis of a two-locus trait to illustrate the method.  相似文献   

8.
B Haubold  M Travisano  P B Rainey  R R Hudson 《Genetics》1998,150(4):1341-1348
The distribution of the number of pairwise differences calculated from comparisons between n haploid genomes has frequently been used as a starting point for testing the hypothesis of linkage equilibrium. For this purpose the variance of the pairwise differences, VD, is used as a test statistic to evaluate the null hypothesis that all loci are in linkage equilibrium. The problem is to determine the critical value of the distribution of VD. This critical value can be estimated either by Monte Carlo simulation or by assuming that VD is distributed normally and calculating a one-tailed 95% critical value for VD, L, L = EVD + 1.645 sqrt(VarVD), where E(VD) is the expectation of VD, and Var(VD) is the variance of VD. If VD (observed) > L, the null hypothesis of linkage equilibrium is rejected. Using Monte Carlo simulation we show that the formula currently available for Var(VD) is incorrect, especially for genetically highly diverse data. This has implications for hypothesis testing in bacterial populations, which are often genetically highly diverse. For this reason we derive a new, exact formula for Var(VD). The distribution of VD is examined and shown to approach normality as the sample size increases. This makes the new formula a useful tool in the investigation of large data sets, where testing for linkage using Monte Carlo simulation can be very time consuming. Application of the new formula, in conjunction with Monte Carlo simulation, to populations of Bradyrhizobium japonicum, Rhizobium leguminosarum, and Bacillus subtilis reveals linkage disequilibrium where linkage equilibrium has previously been reported.  相似文献   

9.
An empirical comparison between three different methods for estimation of pair-wise identity-by-descent (IBD) sharing at marker loci was conducted in order to quantify the resulting differences in power and localization precision in variance components-based linkage analysis. On the examined simulated, error-free data set, it was found that an increase in accuracy of allele sharing calculation resulted in an increase in power to detect linkage. Linkage analysis based on approximate multi-marker IBD matrices computed by a Markov chain Monte Carlo approach was much more powerful than linkage analysis based on exact single-marker IBD probabilities. A "multiple two-point" approximation to true "multipoint" IBD computation was found to be roughly intermediate in power. Both multi-marker approaches were similar to each other in accuracy of localization of the quantitative trait locus and far superior to the single-marker approach. The overall conclusions of this study with respect to power are expected to also hold for different data structures and situations, even though the degree of superiority of one approach over another depends on the specific circumstances. It should be kept in mind, however, that an increase in computational accuracy is expected to go hand in hand with a decrease in robustness to various sources of errors.  相似文献   

10.
For pedigrees with multiple loops, exact likelihoods could not be computed in an acceptable time frame and thus, approximate methods are used. Some of these methods are based on breaking loops and approximations of complex pedigree likelihoods using the exact likelihood of the corresponding zero-loop pedigree. Due to ignoring loops, this method results in a loss of genetic information and a decrease in the power to detect linkage. To minimize this loss, an optimal set of loop breakers has to be selected. In this paper, we present a graph theory based algorithm for automatic selection of an optimal set of loop breakers. We propose using a total relationship between measured pedigree members as a proxy to power. To minimize the loss of genetic information, we suggest selection of such breakers whose duplication in a pedigree would be accompanied by a minimal loss of total relationship between measured pedigree members. We show that our algorithm compares favorably with other existing loop-breaker selection algorithms in terms of conservation of genetic information, statistical power and CPU time of subsequent linkage analysis. We implemented our method in a software package LOOP_EDGE, which is available at http://mga.bionet.nsc.ru/nlru/.  相似文献   

11.
Lou XY  Ma JZ  Yang MC  Zhu J  Liu PY  Deng HW  Elston RC  Li MD 《Genetics》2006,172(1):647-661
It is well known that pedigree/family data record information on the coexistence in founder haplotypes of alleles at nearby loci and the cotransmission from parent to offspring that reveal different, but complementary, profiles of the genetic architecture. Either conventional linkage analysis that assumes linkage equilibrium or family-based association tests (FBATs) capture only partial information, leading to inefficiency. For example, FBATs will fail to detect even very tight linkage in the case where no allelic association exists, while a violation of the assumption of linkage equilibrium will result in biased estimation and reduced efficiency in linkage mapping. In this article, by using a data augmentation technique and the EM algorithm, we propose a likelihood-based approach that embeds both linkage and association analyses into a unified framework for general pedigree data. Relative to either linkage or association analysis, the proposed approach is expected to have greater estimation accuracy and power. Monte Carlo simulations support our theoretical expectations and demonstrate that our new methodology: (1) is more powerful than either FBATs or classic linkage analysis; (2) can unbiasedly estimate genetic parameters regardless of whether association exists, thus remedying the bias and less precision of traditional linkage analysis in the presence of association; and (3) is capable of identifying tight linkage alone. The new approach also holds the theoretical advantage that it can extract statistical information to the maximum extent and thereby improve mapping accuracy and power because it integrates multilocus population-based association study and pedigree-based linkage analysis into a coherent framework. Furthermore, our method is numerically stable and computationally efficient, as compared to existing parametric methods that use the simplex algorithm or Newton-type methods to maximize high-order multidimensional likelihood functions, and also offers the computation of Fisher's information matrix. Finally, we apply our methodology to a genetic study on bone mineral density (BMD) for the vitamin D receptor (VDR) gene and find that VDR is significantly linked to BMD at the one-third region of the wrist.  相似文献   

12.
It is usually difficult to localize genes that cause diseases with late ages at onset. These diseases frequently exhibit complex modes of inheritance, and only recent generations are available to be genotyped and phenotyped. In this situation, multipoint analysis using traditional exact linkage analysis methods, with many markers and full pedigree information, is a computationally intractable problem. Fortunately, Monte Carlo Markov chain sampling provides a tool to address this issue. By treating age at onset as a right-censored quantitative trait, we expand the methods used by Heath (1997) and illustrate them using an Alzheimer disease (AD) data set. This approach estimates the number, sizes, allele frequencies, and positions of quantitative trait loci (QTLs). In this simultaneous multipoint linkage and segregation analysis method, the QTLs are assumed to be diallelic and to interact additively. In the AD data set, we were able to localize correctly, quickly, and accurately two known genes, despite the existence of substantial genetic heterogeneity, thus demonstrating the great promise of these methods for the dissection of late-onset oligogenic diseases.  相似文献   

13.
Recently, several statistical methods for estimating fine-scale recombination rates using population samples have been developed. However, currently available methods that can be applied to large-scale data are limited to approximated likelihoods. Here, we developed a full-likelihood Markov chain Monte Carlo method for estimating recombination rate under a Bayesian framework. Genealogies underlying a sampling of chromosomes are effectively modelled by using marginal individual single nucleotide polymorphism genealogies related through an ancestral recombination graph. The method is compared with two existing composite-likelihood methods using simulated data.Simulation studies show that our method performs well for different simulation scenarios. The method is applied to two human population genetic variation datasets that have been studied by sperm typing. Our results are consistent with the estimates from sperm crossover analysis.  相似文献   

14.
A new method for segregation and linkage analysis, with pedigree data, is described. Reversible jump Markov chain Monte Carlo methods are used to implement a sampling scheme in which the Markov chain can jump between parameter subspaces corresponding to models with different numbers of quantitative-trait loci (QTL's). Joint estimation of QTL number, position, and effects is possible, avoiding the problems that can arise from misspecification of the number of QTL's in a linkage analysis. The method is illustrated by use of a data set simulated for the 9th Genetic Analysis Workshop; this data set had several oligogenic traits, generated by use of a 1,497-member pedigree. The mixing characteristics of the method appear to be good, and the method correctly recovers the simulated model from the test data set. The approach appears to have great potential both for robust linkage analysis and for the answering of more general questions regarding the genetic control of complex traits.  相似文献   

15.
Lee SH  Van der Werf JH  Tier B 《Genetics》2005,171(4):2063-2072
A linkage analysis for finding inheritance states and haplotype configurations is an essential process for linkage and association mapping. The linkage analysis is routinely based upon observed pedigree information and marker genotypes for individuals in the pedigree. It is not feasible for exact methods to use all such information for a large complex pedigree especially when there are many missing genotypic data. Proposed Markov chain Monte Carlo approaches such as a single-site Gibbs sampler or the meiosis Gibbs sampler are able to handle a complex pedigree with sparse genotypic data; however, they often have reducibility problems, causing biased estimates. We present a combined method, applying the random walk approach to the reducible sites in the meiosis sampler. Therefore, one can efficiently obtain reliable estimates such as identity-by-descent coefficients between individuals based on inheritance states or haplotype configurations, and a wider range of data can be used for mapping of quantitative trait loci within a reasonable time.  相似文献   

16.
Detection and Integration of Genotyping Errors in Statistical Genetics   总被引:15,自引:0,他引:15       下载免费PDF全文
Detection of genotyping errors and integration of such errors in statistical analysis are relatively neglected topics, given their importance in gene mapping. A few inopportunely placed errors, if ignored, can tremendously affect evidence for linkage. The present study takes a fresh look at the calculation of pedigree likelihoods in the presence of genotyping error. To accommodate genotyping error, we present extensions to the Lander-Green-Kruglyak deterministic algorithm for small pedigrees and to the Markov-chain Monte Carlo stochastic algorithm for large pedigrees. These extensions can accommodate a variety of error models and refrain from simplifying assumptions, such as allowing, at most, one error per pedigree. In principle, almost any statistical genetic analysis can be performed taking errors into account, without actually correcting or deleting suspect genotypes. Three examples illustrate the possibilities. These examples make use of the full pedigree data, multiple linked markers, and a prior error model. The first example is the estimation of genotyping error rates from pedigree data. The second-and currently most useful-example is the computation of posterior mistyping probabilities. These probabilities cover both Mendelian-consistent and Mendelian-inconsistent errors. The third example is the selection of the true pedigree structure connecting a group of people from among several competing pedigree structures. Paternity testing and twin zygosity testing are typical applications.  相似文献   

17.
Summary In epidemics of infectious diseases such as influenza, an individual may have one of four possible final states: prior immune, escaped from infection, infected with symptoms, and infected asymptomatically. The exact state is often not observed. In addition, the unobserved transmission times of asymptomatic infections further complicate analysis. Under the assumption of missing at random, data‐augmentation techniques can be used to integrate out such uncertainties. We adapt an importance‐sampling‐based Monte Carlo Expectation‐Maximization (MCEM) algorithm to the setting of an infectious disease transmitted in close contact groups. Assuming the independence between close contact groups, we propose a hybrid EM‐MCEM algorithm that applies the MCEM or the traditional EM algorithms to each close contact group depending on the dimension of missing data in that group, and discuss the variance estimation for this practice. In addition, we propose a bootstrap approach to assess the total Monte Carlo error and factor that error into the variance estimation. The proposed methods are evaluated using simulation studies. We use the hybrid EM‐MCEM algorithm to analyze two influenza epidemics in the late 1970s to assess the effects of age and preseason antibody levels on the transmissibility and pathogenicity of the viruses.  相似文献   

18.
Bayesian inference provides an appealing general framework for phylogenetic analysis, able to incorporate a wide variety of modeling assumptions and to provide a coherent treatment of uncertainty. Existing computational approaches to bayesian inference based on Markov chain Monte Carlo (MCMC) have not, however, kept pace with the scale of the data analysis problems in phylogenetics, and this has hindered the adoption of bayesian methods. In this paper, we present an alternative to MCMC based on Sequential Monte Carlo (SMC). We develop an extension of classical SMC based on partially ordered sets and show how to apply this framework--which we refer to as PosetSMC--to phylogenetic analysis. We provide a theoretical treatment of PosetSMC and also present experimental evaluation of PosetSMC on both synthetic and real data. The empirical results demonstrate that PosetSMC is a very promising alternative to MCMC, providing up to two orders of magnitude faster convergence. We discuss other factors favorable to the adoption of PosetSMC in phylogenetics, including its ability to estimate marginal likelihoods, its ready implementability on parallel and distributed computing platforms, and the possibility of combining with MCMC in hybrid MCMC-SMC schemes. Software for PosetSMC is available at http://www.stat.ubc.ca/ bouchard/PosetSMC.  相似文献   

19.
Monte Carlo Investigations have been widely used in Sample Surveys in Comparing the efficiency of various methods when exact mathematical comparisons are not possible. In this paper the same has been used for comparing the efficiency of Stratified Random Sampling with respect to Simple Random Sampling for estimation of Relative Risk in Case-Control Studies. The data used relate to a Case Control study on peptic ulcer. On the basis of Monte Carlo Investigations on 50 samples of size 10–20 (Cases and Controls), it has been observed that there is considerable gain in efficiency in using Stratified Random Sampling over Simple Random Sampling. The sensitivity of the results with the change in Sample Size has also been investigated.  相似文献   

20.
Statistical packages for constructing genetic linkage maps in inbred lines are well developed and applied extensively, while linkage analysis in outcrossing species faces some statistical challenges because of their complicated genetic structures. In this article, we present a multilocus linkage analysis via hidden Markov models for a linkage group of markers in a full-sib family. The advantage of this method is the simultaneous estimation of the recombination fractions between adjacent markers that possibly segregate in different ratios, and the calculation of likelihood for a certain order of the markers. When the number of markers decreases to two or three, the multilocus linkage analysis becomes traditional two-point or three-point linkage analysis, respectively. Monte Carlo simulations are performed to show that the recombination fraction estimates of multilocus linkage analysis are more accurate than those just using two-point linkage analysis and that the likelihood as an objective function for ordering maker loci is the most powerful method compared with other methods. By incorporating this multilocus linkage analysis, we have developed a Windows software, FsLinkageMap, for constructing genetic maps in a full-sib family. A real example is presented for illustrating linkage maps constructed by using mixed segregation markers. Our multilocus linkage analysis provides a powerful method for constructing high-density genetic linkage maps in some outcrossing plant species, especially in forest trees.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号