首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

Pedigree studies of complex heritable diseases often feature nominal or ordinal phenotypic measurements and missing genetic marker or phenotype data.

Methodology

We have developed a Bayesian method for Linkage analysis of Ordinal and Categorical traits (LOCate) that can analyze complex genealogical structure for family groups and incorporate missing data. LOCate uses a Gibbs sampling approach to assess linkage, incorporating a simulated tempering algorithm for fast mixing. While our treatment is Bayesian, we develop a LOD (log of odds) score estimator for assessing linkage from Gibbs sampling that is highly accurate for simulated data. LOCate is applicable to linkage analysis for ordinal or nominal traits, a versatility which we demonstrate by analyzing simulated data with a nominal trait, on which LOCate outperforms LOT, an existing method which is designed for ordinal traits. We additionally demonstrate our method''s versatility by analyzing a candidate locus (D2S1788) for panic disorder in humans, in a dataset with a large amount of missing data, which LOT was unable to handle.

Conclusion

LOCate''s accuracy and applicability to both ordinal and nominal traits will prove useful to researchers interested in mapping loci for categorical traits.  相似文献   

2.
Lee SH  Van der Werf JH  Tier B 《Genetics》2005,171(4):2063-2072
A linkage analysis for finding inheritance states and haplotype configurations is an essential process for linkage and association mapping. The linkage analysis is routinely based upon observed pedigree information and marker genotypes for individuals in the pedigree. It is not feasible for exact methods to use all such information for a large complex pedigree especially when there are many missing genotypic data. Proposed Markov chain Monte Carlo approaches such as a single-site Gibbs sampler or the meiosis Gibbs sampler are able to handle a complex pedigree with sparse genotypic data; however, they often have reducibility problems, causing biased estimates. We present a combined method, applying the random walk approach to the reducible sites in the meiosis sampler. Therefore, one can efficiently obtain reliable estimates such as identity-by-descent coefficients between individuals based on inheritance states or haplotype configurations, and a wider range of data can be used for mapping of quantitative trait loci within a reasonable time.  相似文献   

3.
Pérez-Enciso M 《Genetics》2003,163(4):1497-1510
We present a Bayesian method that combines linkage and linkage disequilibrium (LDL) information for quantitative trait locus (QTL) mapping. This method uses jointly all marker information (haplotypes) and all available pedigree information; i.e., it is not restricted to any specific experimental design and it is not required that phases are known. Infinitesimal genetic effects or environmental noise ("fixed") effects can equally be fitted. A diallelic QTL is assumed and both additive and dominant effects can be estimated. We have implemented a combined Gibbs/Metropolis-Hastings sampling to obtain the marginal posterior distributions of the parameters of interest. We have also implemented a Bayesian variant of usual disequilibrium measures like D' and r(2) between QTL and markers. We illustrate the method with simulated data in "simple" (two-generation full-sib families) and "complex" (four-generation) pedigrees. We compared the estimates with and without using linkage disequilibrium information. In general, using LDL resulted in estimates of QTL position that were much better than linkage-only estimates when there was complete disequilibrium between the mutant QTL allele and the marker. This advantage, however, decreased when the association was only partial. In all cases, additive and dominant effects were estimated accurately either with or without disequilibrium information.  相似文献   

4.
The Gibbs sampling method has been widely used for sequence analysis after it was successfully applied to the problem of identifying regulatory motif sequences upstream of genes. Since then, numerous variants of the original idea have emerged: however, in all cases the application has been to finding short motifs in collections of short sequences (typically less than 100 nucleotides long). In this paper, we introduce a Gibbs sampling approach for identifying genes in multiple large genomic sequences up to hundreds of kilobases long. This approach leverages the evolutionary relationships between the sequences to improve the gene predictions, without explicitly aligning the sequences. We have applied our method to the analysis of genomic sequence from 14 genomic regions, totaling roughly 1.8 Mb of sequence in each organism. We show that our approach compares favorably with existing ab initio approaches to gene finding, including pairwise comparison based gene prediction methods which make explicit use of alignments. Furthermore, excellent performance can be obtained with as little as four organisms, and the method overcomes a number of difficulties of previous comparison based gene finding approaches: it is robust with respect to genomic rearrangements, can work with draft sequence, and is fast (linear in the number and length of the sequences). It can also be seamlessly integrated with Gibbs sampling motif detection methods.  相似文献   

5.
The causal relationship between genes and diseases has been investigated with the development of DNA sequence. Polymorphisms incorporated in the HapMap Project have enabled fine mapping with linkage disequilibrium (LD) and prior clustering of the haplotypes on the basis of a similarity measure has often been performed in an attempt to capture coalescent events because they can reduce the amount of computation. However an inappropriate choice of similarity measure can lead to wrong conclusions and we propose a new haplotype-based clustering algorithm for fine-scale mapping by using a Bayesian partition model. To handle phase-unknown genotypes, we propose a new algorithm based on a Metropolized Gibbs sampler and it is implemented in C++. Our simulation studies found that the proposed method improves the accuracy of the estimator for the disease susceptibility locus. We illustrated the practical implication of the new analysis method by an application to fine-scale mapping of CYP2D6 in drug metabolism.  相似文献   

6.

Background  

Haplotype based linkage disequilibrium (LD) mapping has become a powerful and cost-effective method for performing genetic association studies, particularly in the search for genetic markers in linkage disequilibrium with complex disease loci. Various methods (e.g. Monte-Carlo (Gibbs sampling); EM (expectation maximization); and Clark's method) have been used to estimate haplotype frequencies from routine genotyping data.  相似文献   

7.
The problem of ascertainment in segregation analysis arises when families are selected for study through ascertainment of affected individuals. In this case, ascertainment must be corrected for in data analysis. However, methods for ascertainment correction are not available for many common sampling schemes, e.g., sequential sampling of extended pedigrees (except in the case of "single" selection). Concerns about whether ascertainment correction is even required for large pedigrees, about whether and how multiple probands in the same pedigree can be taken into account properly, and about how to apply sequential sampling strategies have occupied many investigators in recent years. We address these concerns by reconsidering a central issue, namely, how to handle pedigree structure (including size). We introduce a new distinction, between sampling in such a way that observed pedigree structure does not depend on which pedigree members are probands (proband-independent [PI] sampling) and sampling in such a way that observed pedigree structure does depend on who are the probands (proband-dependent [PD] sampling). This distinction corresponds roughly (but not exactly) to the distinction between fixed-structure and sequential sampling. We show that conditioning on observed pedigree structure in ascertained data sets obtained under PD sampling is not in general correct (with the exception of "single" selection), while PI sampling of pedigree structures larger than simple sibships is generally not possible. Yet, in practice one has little choice but to condition on observed pedigree structure. We conclude that the problem of genetic modeling in ascertained data sets is, in most situations, literally intractable. We recommend that future efforts focus on the development of robust approximate approaches to the problem.  相似文献   

8.
We present a method to perform fine mapping by placing haplotypes into clusters on the basis of risk. Each cluster has a haplotype "center." Cluster allocation is defined according to haplotype centers, with each haplotype assigned to the cluster with the "closest" center. The closeness of two haplotypes is determined by a similarity metric that measures the length of the shared segment around the location of a putative functional mutation for the particular cluster. Our method allows for missing marker information but still estimates the risks of complete haplotypes without resorting to a one-marker-at-a-time analysis. The dimensionality issues that can occur in haplotype analyses are removed by sampling over the haplotype space, allowing for estimation of haplotype risks without explicitly assigning a parameter to each haplotype to be estimated. In this way, we are able to handle haplotypes of arbitrary size. Furthermore, our clustering approach has the potential to allow us to detect the presence of multiple functional mutations.  相似文献   

9.
Heterogeneity, both inter- and intrafamilial, represents a serious problem in linkage studies of common complex diseases. In this study we simulated different scenarios with families who phenotypically have identical diseases but who genotypically have two different forms of the disease (both forms genetic). We examined the proportion of families displaying intrafamilial heterogeneity, as a function of mode of inheritance, gene frequency, penetrance, and sampling strategies. Furthermore, we compared two different ways of analyzing linkage in these data sets: a two-locus (2L) analysis versus a one-locus (SL) analysis combined with an admixture test. Data were simulated with tight linkage between one disease locus and a marker locus; the other disease locus was not linked to a marker. Our findings are as follows: (1) In contrast to what has been proposed elsewhere to minimize heterogeneity, sampling only "high-density" pedigrees will increase the proportion of families with intrafamilial heterogeneity, especially when the two forms are relatively close in frequency. (2) When one form is dominant and one is recessive, this sampling strategy will greatly decrease the proportions of families with a recessive form and may therefore make it more difficult to detect linkage to the recessive form. (3) An SL analysis combined with an admixture test achieves about the same lod scores and estimate of the recombination fraction as does a 2L analysis. Also, a 2L analysis of a sample of families with intrafamilial heterogeneity does not perform significantly better than an SL analysis. (4) Bilineal pedigrees have little effect on the mean maximum lod score and mean maximum recombination fraction, and therefore there is little danger that including these families will lead to a false exclusion of linkage.  相似文献   

10.
This study examined the method of simultaneous estimation of recombination frequency and parameters for a qualitative trait locus and compared the results with those of standard methods of linkage analysis. With both approaches we were able to detect linkage of an incompletely penetrant qualitative trait to highly polymorphic markers with recombination frequencies in the range of .00-.05. Our results suggest that detecting linkage at larger recombination frequencies may require larger data sets or large high-density families. When applied to all families without regard to informativeness of the family structure for linkage, analyses of simulated data could detect no advantage of simultaneous estimation over more traditional and much less time-consuming methods, either in detecting linkage, estimating frequency, refining estimates of parameters for the qualitative trait locus, or avoiding false evidence for linkage. However, the method of sampling affected results.  相似文献   

11.
This paper describes an analysis of systolic blood pressure (SBP) in the Genetic Analysis Workshop 13 (GAW13) simulated data. The main aim was to assess evidence for both general and specific genetic effects on the baseline blood pressure and on the rate of change (slope) of blood pressure with time. Generalized linear mixed models were fitted using Gibbs sampling in WinBUGS, and the additive polygenic random effects estimated using these models were then used as continuous phenotypes in a variance components linkage analysis. The first-stage analysis provided evidence for general genetic effects on both the baseline and slope of blood pressure, and the linkage analysis found evidence of several genes, again for both baseline and slope.  相似文献   

12.
Markov chain Monte Carlo (MCMC) methods have been widely used to overcome computational problems in linkage and segregation analyses. Many variants of this approach exist and are practiced; among the most popular is the Gibbs sampler. The Gibbs sampler is simple to implement but has (in its simplest form) mixing and reducibility problems; furthermore in order to initiate a Gibbs sampling chain we need a starting genotypic or allelic configuration which is consistent with the marker data in the pedigree and which has suitable weight in the joint distribution. We outline a procedure for finding such a configuration in pedigrees which have too many loci to allow for exact peeling. We also explain how this technique could be used to implement a blocking Gibbs sampler.  相似文献   

13.
It is usually difficult to localize genes that cause diseases with late ages at onset. These diseases frequently exhibit complex modes of inheritance, and only recent generations are available to be genotyped and phenotyped. In this situation, multipoint analysis using traditional exact linkage analysis methods, with many markers and full pedigree information, is a computationally intractable problem. Fortunately, Monte Carlo Markov chain sampling provides a tool to address this issue. By treating age at onset as a right-censored quantitative trait, we expand the methods used by Heath (1997) and illustrate them using an Alzheimer disease (AD) data set. This approach estimates the number, sizes, allele frequencies, and positions of quantitative trait loci (QTLs). In this simultaneous multipoint linkage and segregation analysis method, the QTLs are assumed to be diallelic and to interact additively. In the AD data set, we were able to localize correctly, quickly, and accurately two known genes, despite the existence of substantial genetic heterogeneity, thus demonstrating the great promise of these methods for the dissection of late-onset oligogenic diseases.  相似文献   

14.
In this paper, we study Bayesian analysis of nonlinear hierarchical mixture models with a finite but unknown number of components. Our approach is based on Markov chain Monte Carlo (MCMC) methods. One of the applications of our method is directed to the clustering problem in gene expression analysis. From a mathematical and statistical point of view, we discuss the following topics: theoretical and practical convergence problems of the MCMC method; determination of the number of components in the mixture; and computational problems associated with likelihood calculations. In the existing literature, these problems have mainly been addressed in the linear case. One of the main contributions of this paper is developing a method for the nonlinear case. Our approach is based on a combination of methods including Gibbs sampling, random permutation sampling, birth-death MCMC, and Kullback-Leibler distance.  相似文献   

15.
The problem of detecting linkage, by using the LOD-score method, of polymorphic marker loci to a disorder that is determined by recessive alleles at two independent autosomal diallelic loci has been considered. The expected LOD score and the distribution of the LOD score have been worked out for various scenarios. It is found that the expected numbers of families to be sampled for detection of linkage are within feasible limits if the recombination fractions between the marker loci and the disorder loci are less than or equal to .1. The strategy of studying affected offspring only is shown to be more efficient than the strategy of studying both affected and normal offspring. The efficiency of the "affecteds-only" strategy (1) increases with increase in sibship size, (2) decreases with increase in population prevalence of the disorder, and (3) increases with increase in recombination distances between the marker and the disorder loci. From various considerations, it is found that sampling families of sibship size three with at least one affected, and adopting the affecteds-only strategy for analysis, may be an optimal strategy.  相似文献   

16.
We propose a new Markov Chain Monte Carlo (MCMC) sampling mechanism for Bayesian phylogenetic inference. This method, which we call conjugate Gibbs, relies on analytical conjugacy properties, and is based on an alternation between data augmentation and Gibbs sampling. The data augmentation step consists in sampling a detailed substitution history for each site, and across the whole tree, given the current value of the model parameters. Provided convenient priors are used, the parameters of the model can then be directly updated by a Gibbs sampling procedure, conditional on the current substitution history. Alternating between these two sampling steps yields a MCMC device whose equilibrium distribution is the posterior probability density of interest. We show, on real examples, that this conjugate Gibbs method leads to a significant improvement of the mixing behavior of the MCMC. In all cases, the decorrelation times of the resulting chains are smaller than those obtained by standard Metropolis Hastings procedures by at least one order of magnitude. The method is particularly well suited to heterogeneous models, i.e. assuming site-specific random variables. In particular, the conjugate Gibbs formalism allows one to propose efficient implementations of complex models, for instance assuming site-specific substitution processes, that would not be accessible to standard MCMC methods.  相似文献   

17.
Biclustering microarray data by Gibbs sampling   总被引:1,自引:0,他引:1  
MOTIVATION: Gibbs sampling has become a method of choice for the discovery of noisy patterns, known as motifs, in DNA and protein sequences. Because handling noise in microarray data presents similar challenges, we have adapted this strategy to the biclustering of discretized microarray data. RESULTS: In contrast with standard clustering that reveals genes that behave similarly over all the conditions, biclustering groups genes over only a subset of conditions for which those genes have a sharp probability distribution. We have opted for a simple probabilistic model of the biclusters because it has the key advantage of providing a transparent probabilistic interpretation of the biclusters in the form of an easily interpretable fingerprint. Furthermore, Gibbs sampling does not suffer from the problem of local minima that often characterizes Expectation-Maximization. We demonstrate the effectiveness of our approach on two synthetic data sets as well as a data set from leukemia patients.  相似文献   

18.
The study of change in intermediate phenotypes over time is important in genetics. In this paper we explore a new approach to phenotype definition in the genetic analysis of longitudinal phenotypes. We utilized data from the longitudinal Framingham Heart Study Family Cohort to investigate the familial aggregation and evidence for linkage to change in systolic blood pressure (SBP) over time. We used Gibbs sampling to derive sigma-squared-A-random-effects (SSARs) for the longitudinal phenotype, and then used these as a new phenotype in subsequent genome-wide linkage analyses. Additive genetic effects (sigma2A.time) were estimated to account for approximately 9.2% of the variance in the rate of change of SBP with age, while additive genetic effects (sigma2A) were estimated to account for approximately 43.9% of the variance in SBP at the mean age. The linkage results suggested that one or more major loci regulating change in SBP over time may localize to chromosomes 2, 3, 4, 6, 10, 11, 17, and 19. The results also suggested that one or more major loci regulating level of SBP may localize to chromosomes 3, 8, and 14. Our results support a genetic component to both SBP and change in SBP with age, and are consistent with a complex, multifactorial susceptibility to the development of hypertension. The use of SSARs derived from quantitative traits as input to a conventional linkage analysis appears to be valuable in the linkage analysis of genetically complex traits. We have now demonstrated in this paper the use of SSARs in the context of longitudinal family data.  相似文献   

19.
A robust statistical method to detect linkage or association between a genetic marker and a set of distinct phenotypic traits is to combine univariate trait-specific test statistics for a more powerful overall test. This procedure does not need complex modeling assumptions, can easily handle the problem with partially missing trait values, and is applicable to the case with a mixture of qualitative and quantitative traits. In this note, we propose a simple test procedure along this line, and show its advantages over the standard combination tests for linkage or association in the literature through a data set from Genetic Analysis Workshop 12 (GAW12) and an extensive simulation study.  相似文献   

20.
Kim S  Wang Z  Dalkilic M 《Proteins》2007,66(3):671-681
The motif prediction problem is to predict short, conserved subsequences that are part of a family of sequences, and it is a very important biological problem. Gibbs is one of the first successful motif algorithms and it runs very fast compared with other algorithms, and its search behavior is based on the well-studied Gibbs random sampling. However, motif prediction is a very difficult problem and Gibbs may not predict true motifs in some cases. Thus, the authors explored a possibility of improving the prediction accuracy of Gibbs while retaining its fast runtime performance. In this paper, the authors considered Gibbs only for proteins, not for DNA binding sites. The authors have developed iGibbs, an integrated motif search framework for proteins that employs two previous techniques of their own: one for guiding motif search by clustering sequences and another by pattern refinement. These two techniques are combined to a new double clustering approach to guiding motif search. The unique feature of their framework is that users do not have to specify the number of motifs to be predicted when motifs occur in different subsets of the input sequences since it automatically clusters input sequences into clusters and predict motifs from the clusters. Tests on the PROSITE database show that their framework improved the prediction accuracy of Gibbs significantly. Compared with more exhaustive search methods like MEME, iGibbs predicted motifs more accurately and runs one order of magnitude faster.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号