首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Lee SH  Van der Werf JH  Tier B 《Genetics》2005,171(4):2063-2072
A linkage analysis for finding inheritance states and haplotype configurations is an essential process for linkage and association mapping. The linkage analysis is routinely based upon observed pedigree information and marker genotypes for individuals in the pedigree. It is not feasible for exact methods to use all such information for a large complex pedigree especially when there are many missing genotypic data. Proposed Markov chain Monte Carlo approaches such as a single-site Gibbs sampler or the meiosis Gibbs sampler are able to handle a complex pedigree with sparse genotypic data; however, they often have reducibility problems, causing biased estimates. We present a combined method, applying the random walk approach to the reducible sites in the meiosis sampler. Therefore, one can efficiently obtain reliable estimates such as identity-by-descent coefficients between individuals based on inheritance states or haplotype configurations, and a wider range of data can be used for mapping of quantitative trait loci within a reasonable time.  相似文献   

2.
Markov chain Monte Carlo (MCMC) methods have been proposed to overcome computational problems in linkage and segregation analyses. This approach involves sampling genotypes at the marker and trait loci. Scalar-Gibbs is easy to implement, and it is widely used in genetics. However, the Markov chain that corresponds to scalar-Gibbs may not be irreducible when the marker locus has more than two alleles, and even when the chain is irreducible, mixing has been observed to be slow. These problems do not arise if the genotypes are sampled jointly from the entire pedigree. This paper proposes a method to jointly sample genotypes. The method combines the Elston-Stewart algorithm and iterative peeling, and is called the ESIP sampler. For a hypothetical pedigree, genotype probabilities are estimated from samples obtained using ESIP and also scalar-Gibbs. Approximate probabilities were also obtained by iterative peeling. Comparisons of these with exact genotypic probabilities obtained by the Elston-Stewart algorithm showed that ESIP and iterative peeling yielded genotypic probabilities that were very close to the exact values. Nevertheless, estimated probabilities from scalar-Gibbs with a chain of length 235 000, including a burn-in of 200 000 steps, were less accurate than probabilities estimated using ESIP with a chain of length 10 000, with a burn-in of 5 000 steps. The effective chain size (ECS) was estimated from the last 25 000 elements of the chain of length 125 000. For one of the ESIP samplers, the ECS ranged from 21 579 to 22 741, while for the scalar-Gibbs sampler, the ECS ranged from 64 to 671. Genotype probabilities were also estimated for a large real pedigree consisting of 3 223 individuals. For this pedigree, it is not feasible to obtain exact genotype probabilities by the Elston-Stewart algorithm. ESIP and iterative peeling yielded very similar results. However, results from scalar-Gibbs were less accurate.  相似文献   

3.
Markov chain Monte Carlo (MCMC) methods have been proposed to overcome computational problems in linkage and segregation analyses. This approach involves sampling genotypes at the marker and trait loci. Among MCMC methods, scalar-Gibbs is the easiest to implement, and it is used in genetics. However, the Markov chain that corresponds to scalar-Gibbs may not be irreducible when the marker locus has more than two alleles, and even when the chain is irreducible, mixing has been observed to be slow. Joint sampling of genotypes has been proposed as a strategy to overcome these problems. An algorithm that combines the Elston-Stewart algorithm and iterative peeling (ESIP sampler) to sample genotypes jointly from the entire pedigree is used in this study. Here, it is shown that the ESIP sampler yields an irreducible Markov chain, regardless of the number of alleles at a locus. Further, results obtained by ESIP sampler are compared with other methods in the literature. Of the methods that are guaranteed to be irreducible, ESIP was the most efficient.  相似文献   

4.
Thomas A 《Human heredity》2007,64(1):16-26
We review recent developments of MCMC integration methods for computations on graphical models for two applications in statistical genetics: modelling allelic association and pedigree based linkage analysis. We discuss and illustrate estimation of graphical models from haploid and diploid genotypes, and the importance of MCMC updating schemes beyond what is strictly necessary for irreducibility. We then outline an approach combining these methods to compute linkage statistics when alleles at the marker loci are in linkage disequilibrium. Other extensions suitable for analysis of SNP genotype data in pedigrees are also discussed and programs that implement these methods, and which are available from the author's web site, are described. We conclude with a discussion of how this still experimental approach might be further developed.  相似文献   

5.
QTL analysis in arbitrary pedigrees with incomplete marker information   总被引:3,自引:0,他引:3  
Vogl C  Xu S 《Heredity》2002,89(5):339-345
Mapping quantitative trait loci (QTL) in arbitrary outbred pedigrees is complicated by the combinatorial possibilities of allele flow relationships and of the founder allelic configurations. Exact methods are only available for rather short and simple pedigrees. Stochastic simulation using Markov chain Monte Carlo (MCMC) integration offers more flexibility. MCMC methods are less natural in a frequentist than in a Bayesian context, which we therefore adopt. Among the MCMC algorithms for updating marker locus genotypes, we implement the descent-graph algorithm. It can be used to update marker locus allele flow relationships and can handle arbitrarily complex pedigrees and missing marker information. Compared with updating marker genotypic information, updating QTL parameters, such as position, effects, and the allele flow relationships is relatively easy with MCMC. We treat the effect of each diploid combination of founder alleles as a random variable and only estimate the variance of these effects, ie, we model diploid genotypic effects instead of the usual partition in additive and dominance effects. This is a variant of the random model approach. The number of QTL alleles is generally unknown. In the Bayesian context, the number of QTL present on a linkage group can be treated as variable. Computer simulations suggest that the algorithm can indeed handle complex pedigrees and detect two QTL on a linkage group, but that the number of individuals in a single extended family is limited to about 50 to 100 individuals.  相似文献   

6.
Markov chain Monte Carlo (MCMC) methods have been widely used to overcome computational problems in linkage and segregation analyses. Many variants of this approach exist and are practiced; among the most popular is the Gibbs sampler. The Gibbs sampler is simple to implement but has (in its simplest form) mixing and reducibility problems; furthermore in order to initiate a Gibbs sampling chain we need a starting genotypic or allelic configuration which is consistent with the marker data in the pedigree and which has suitable weight in the joint distribution. We outline a procedure for finding such a configuration in pedigrees which have too many loci to allow for exact peeling. We also explain how this technique could be used to implement a blocking Gibbs sampler.  相似文献   

7.
Methods for detecting Quantitative Trait Loci (QTL) without markers have generally used iterative peeling algorithms for determining genotype probabilities. These algorithms have considerable shortcomings in complex pedigrees. A Monte Carlo Markov chain (MCMC) method which samples the pedigree of the whole population jointly is described. Simultaneous sampling of the pedigree was achieved by sampling descent graphs using the Metropolis-Hastings algorithm. A descent graph describes the inheritance state of each allele and provides pedigrees guaranteed to be consistent with Mendelian sampling. Sampling descent graphs overcomes most, if not all, of the limitations incurred by iterative peeling algorithms. The algorithm was able to find the QTL in most of the simulated populations. However, when the QTL was not modeled or found then its effect was ascribed to the polygenic component. No QTL were detected when they were not simulated.  相似文献   

8.
No exact method for determining genotypic and identity-by-descent probabilities is available for large complex pedigrees. Approximate methods for such pedigrees cannot be guaranteed to be unbiased. A new method is proposed that uses the Metropolis-Hastings algorithm to sample a Markov chain of descent graphs which fit the pedigree and known genotypes. Unknown genotypes are determined from each descent graph. Genotypic probabilities are estimated as their means. The algorithm is shown to be unbiased for small complex pedigrees and feasible and consistent for moderately large complex pedigrees.  相似文献   

9.
An increased availability of genotypes at marker loci has prompted the development of models that include the effect of individual genes. Selection based on these models is known as marker-assisted selection (MAS). MAS is known to be efficient especially for traits that have low heritability and non-additive gene action. BLUP methodology under non-additive gene action is not feasible for large inbred or crossbred pedigrees. It is easy to incorporate non-additive gene action in a finite locus model. Under such a model, the unobservable genotypic values can be predicted using the conditional mean of the genotypic values given the data. To compute this conditional mean, conditional genotype probabilities must be computed. In this study these probabilities were computed using iterative peeling, and three Markov chain Monte Carlo (MCMC) methods – scalar Gibbs, blocking Gibbs, and a sampler that combines the Elston Stewart algorithm with iterative peeling (ESIP). The performance of these four methods was assessed using simulated data. For pedigrees with loops, iterative peeling fails to provide accurate genotype probability estimates for some pedigree members. Also, computing time is exponentially related to the number of loci in the model. For MCMC methods, a linear relationship can be maintained by sampling genotypes one locus at a time. Out of the three MCMC methods considered, ESIP, performed the best while scalar Gibbs performed the worst.  相似文献   

10.
A heuristic algorithm for finding gene transmission patterns on large and complex pedigrees with partially observed genotype data is proposed. The method can be used to generate an initial point for a Markov chain Monte Carlo simulation or to check that the given pedigree and the genotype data are consistent. In small pedigrees, the algorithm is exact by exhaustively enumerating all possibilities, but, in large pedigrees, with a considerable amount of unknown data, only a subset of promising configurations can actually be checked. For that purpose, the configurations are ordered by combining the approximative conditional probability distribution of the unknown genotypes with the information on the relationships between individuals. We also introduce a way to divide the task into subparts, which has been shown to be useful in large pedigrees. The algorithm has been implemented in a program called APE (Allelic Path Explorer) and tested in three different settings with good results.  相似文献   

11.
The accurate estimation of the probability of identity by descent (IBD) at loci or genome positions of interest is paramount to the genetic study of quantitative and disease resistance traits. We present a Monte Carlo Markov Chain method to compute IBD probabilities between individuals conditional on DNA markers and on pedigree information. The IBDs can be obtained in a completely general pedigree at any genome position of interest, and all marker and pedigree information available is used. The method can be split into two steps at each iteration. First, phases are sampled using current genotypic configurations of relatives and second, crossover events are simulated conditional on phases. Internal track is kept of all founder origins and crossovers such that the IBD probabilities averaged over replicates are rapidly obtained. We illustrate the method with some examples. First, we show that all pedigree information should be used to obtain line origin probabilities in F2 crosses. Second, the distribution of genetic relationships between half and full sibs is analysed in both simulated data and in real data from an F2 cross in pigs.  相似文献   

12.
Ion channels are characterized by inherently stochastic behavior which can be represented by continuous-time Markov models (CTMM). Although methods for collecting data from single ion channels are available, translating a time series of open and closed channels to a CTMM remains a challenge. Bayesian statistics combined with Markov chain Monte Carlo (MCMC) sampling provide means for estimating the rate constants of a CTMM directly from single channel data. In this article, different approaches for the MCMC sampling of Markov models are combined. This method, new to our knowledge, detects overparameterizations and gives more accurate results than existing MCMC methods. It shows similar performance as QuB-MIL, which indicates that it also compares well with maximum likelihood estimators. Data collected from an inositol trisphosphate receptor is used to demonstrate how the best model for a given data set can be found in practice.  相似文献   

13.
Tong L  Thompson E 《Human heredity》2008,65(3):142-153
To detect the positions of disease loci, lod scores are calculated at multiple chromosomal positions given trait and marker data on members of pedigrees. Exact lod score calculations are often impossible when the size of the pedigree and the number of markers are both large. In this case, a Markov Chain Monte Carlo (MCMC) approach provides an approximation. However, to provide accurate results, mixing performance is always a key issue in these MCMC methods. In this paper, we propose two methods to improve MCMC sampling and hence obtain more accurate lod score estimates in shorter computation time. The first improvement generalizes the block-Gibbs meiosis (M) sampler to multiple meiosis (MM) sampler in which multiple meioses are updated jointly, across all loci. The second one divides the computations on a large pedigree into several parts by conditioning on the haplotypes of some 'key' individuals. We perform exact calculations for the descendant parts where more data are often available, and combine this information with sampling of the hidden variables in the ancestral parts. Our approaches are expected to be most useful for data on a large pedigree with a lot of missing data.  相似文献   

14.
A number of procedures have been developed that allow the genetic parameters of natural populations to be estimated using relationship information inferred from marker data rather than known pedigrees. Three published approaches are available; the regression, pair‐wise likelihood and Markov Chain Monte Carlo (MCMC) sib‐ship reconstruction methods. These were applied to body weight and molecular data collected from the Soay sheep population of St. Kilda, which has a previously determined pedigree. The regression and pair‐wise likelihood approaches do not specify an exact pedigree and yielded unreliable heritability estimates, that were sensitive to alteration of the fixed effects. The MCMC method, which specifies a pedigree prior to heritability estimation, yielded results closer to those determined using the known pedigree. In populations of low average relationship, such as the Soay sheep population, determination of a reliable pedigree is more useful than indirect approaches that do not specify a pedigree.  相似文献   

15.
Accurate and rapid methods for the detection of quantitative trait loci (QTLs) and evaluation of consequent allelic effects are required to implement marker-assisted selection in outbred populations. In this study, we present a simple deterministic method for estimating identity-by-descent (IBD) coefficients in full- and half-sib families that can be used for the detection of QTLs via a variance-component approach. In a simulated dataset, IBD coefficients among sibs estimated by the simple deterministic and Markov chain Monte Carlo (MCMC) methods with three or four alleles at each marker locus exhibited a correlation of greater than 0.99. This high correlation was also found in QTL analyses of data from an outbred pig population. Variance component analysis used both the simple deterministic and MCMC methods to estimate IBD coefficients. Both procedures detected a QTL at the same position and gave similar test statistics and heritabilities. The MCMC method, however, required much longer computation than the simple method. The conversion of estimated QTL genotypic effects into allelic effects for use in marker-assisted selection is also demonstrated.  相似文献   

16.
Quantitative trait loci (QTL) for growth and fatness traits have previously been identified on chromosomes 4 and 7 in several experimental pig populations. The segregation of these QTL in commercial pigs was studied in a sample of 2713 animals from five different populations. Variance component analysis (VCA) using a marker-based identity by descent (IBD) matrix was applied. The IBD coefficient was estimated with simple deterministic (SMD) and Markov chain Monte Carlo (MCMC) methods. Data for two growth traits, average daily gain on test and whole life daily gain, and back fat thickness were analysed. With both methods, seven out of 26 combinations of population, chromosome and trait, were significant. Additionally, QTL genotypic and allelic effects were estimated when the QTL effect was significant. The range of QTL genotypic effects in a population varied from 4.8% to 10.9% of the phenotypic mean for growth traits and 7.9% to 19.5% for back fat trait. Heritabilities of the QTL genotypic values ranged from 8.6% to 18.2% for growth traits, and 14.5% to 19.2% for back fat. Very similar results were obtained with both SMD and MCMC. However, the MCMC method required a large number of iterations, and hence computation time, especially when the QTL test position was close to the marker.  相似文献   

17.
Gao G  Hoeschele I 《Genetics》2005,171(1):365-376
Identity-by-descent (IBD) matrix calculation is an important step in quantitative trait loci (QTL) analysis using variance component models. To calculate IBD matrices efficiently for large pedigrees with large numbers of loci, an approximation method based on the reconstruction of haplotype configurations for the pedigrees is proposed. The method uses a subset of haplotype configurations with high likelihoods identified by a haplotyping method. The new method is compared with a Markov chain Monte Carlo (MCMC) method (Loki) in terms of QTL mapping performance on simulated pedigrees. Both methods yield almost identical results for the estimation of QTL positions and variance parameters, while the new method is much more computationally efficient than the MCMC approach for large pedigrees and large numbers of loci. The proposed method is also compared with an exact method (Merlin) in small simulated pedigrees, where both methods produce nearly identical estimates of position-specific kinship coefficients. The new method can be used for fine mapping with joint linkage disequilibrium and linkage analysis, which improves the power and accuracy of QTL mapping.  相似文献   

18.
Thompson E  Basu S 《Human heredity》2003,56(1-3):119-125
Our objective is the development of robust methods for assessment of evidence for linkage of loci affecting a complex trait to a marker linkage group, using data on extended pedigrees. Using Markov chain Monte Carlo (MCMC) methods, it is possible to sample realizations from the distribution of gene identity by descent (IBD) patterns on a pedigree, conditional on observed data YM at multiple marker loci. Measures of gene IBDW which capture joint genome sharing in extended pedigrees often have unknown and highly skewed distributions, particularly when conditioned on marker data. MCMC provides a direct estimate of the distribution of such measures. Let W be the IBD measure from data YM, and W* the IBD measure from pseudo-data Y*M simulated with the same data availability and genetic marker model as the true data YM, but in the absence of linkage. Then measures of the difference in distributions of W and W* provide evidence for linkage. This approach extracts more information from the data YM than either comparison to the pedigree prior distribution of W or use of statistics that are expectations of W given the data YM. A small example is presented.  相似文献   

19.
A Markov chain Monte Carlo (MCMC) implemented Bayesian method has been developed to detect quantitative trait loci (QTL) effects and Q × E interaction effects. However, the MCMC algorithm is time consuming due to repeated samplings of QTL parameters. We developed an expectation and maximization (EM) algorithm as an alternative method for detecting QTL and Q × E interaction. Simulation studies and real data analysis showed that the EM algorithm produced comparable result as the Bayesian method, but with a speed many magnitudes faster than the MCMC algorithm. We used the EM algorithm to analyze a well known barley dataset produced by the North American Barley Genome Mapping Project. The dataset contained eight quantitative traits collected from 150 doubled-haploid (DH) lines evaluated in multiple environments. Each line was genotyped for 495 polymorphic markers. The result showed that all eight traits exhibited QTL main effects and Q × E interaction effects. On average, the main effects and Q × E interaction effects contributed 34.56 and 16.23% of the total phenotypic variance, respectively. Furthermore, we found that whether or not a locus shows Q × E interaction does not depend on the presence of main effect.  相似文献   

20.
It has been shown that electropherograms of DNA sequences can be modeled with hidden Markov models. Basecalling, the procedure that determines the sequence of bases from the given eletropherogram, can then be performed using the Viterbi algorithm. A training step is required prior to basecalling in order to estimate the HMM parameters. In this paper, we propose a Bayesian approach which employs the Markov chain Monte Carlo (MCMC) method to perform basecalling. Such an approach not only allows one to naturally encode the prior biological knowledge into the basecalling algorithm, it also exploits both the training data and the basecalling data in estimating the HMM parameters, leading to more accurate estimates. Using the recently sequenced genome of the organism Legionella pneumophila we show that the MCMC basecaller outperforms the state-of-the-art basecalling algorithm in terms of total errors while requiring much less training than other proposed statistical basecallers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号