共查询到20条相似文献,搜索用时 0 毫秒
1.
We apply the method of "blocking Gibbs" sampling to a problem of great importance and complexity-linkage analysis. Blocking Gibbs sampling combines exact local computations with Gibbs sampling, in a way that complements the strengths of both. The method is able to handle problems with very high complexity, such as linkage analysis in large pedigrees with many loops, a task that no other known method is able to handle. New developments of the method are outlined, and it is applied to a highly complex linkage problem in a human pedigree. 相似文献
2.
Empirical Bayes Gibbs sampling 总被引:3,自引:0,他引:3
Casella G 《Biostatistics (Oxford, England)》2001,2(4):485-500
The wide applicability of Gibbs sampling has increased the use of more complex and multi-level hierarchical models. To use these models entails dealing with hyperparameters in the deeper levels of a hierarchy. There are three typical methods for dealing with these hyperparameters: specify them, estimate them, or use a 'flat' prior. Each of these strategies has its own associated problems. In this paper, using an empirical Bayes approach, we show how the hyperparameters can be estimated in a way that is both computationally feasible and statistically valid. 相似文献
3.
Nicolas Lartillot 《Journal of computational biology》2006,13(10):1701-1722
We propose a new Markov Chain Monte Carlo (MCMC) sampling mechanism for Bayesian phylogenetic inference. This method, which we call conjugate Gibbs, relies on analytical conjugacy properties, and is based on an alternation between data augmentation and Gibbs sampling. The data augmentation step consists in sampling a detailed substitution history for each site, and across the whole tree, given the current value of the model parameters. Provided convenient priors are used, the parameters of the model can then be directly updated by a Gibbs sampling procedure, conditional on the current substitution history. Alternating between these two sampling steps yields a MCMC device whose equilibrium distribution is the posterior probability density of interest. We show, on real examples, that this conjugate Gibbs method leads to a significant improvement of the mixing behavior of the MCMC. In all cases, the decorrelation times of the resulting chains are smaller than those obtained by standard Metropolis Hastings procedures by at least one order of magnitude. The method is particularly well suited to heterogeneous models, i.e. assuming site-specific random variables. In particular, the conjugate Gibbs formalism allows one to propose efficient implementations of complex models, for instance assuming site-specific substitution processes, that would not be accessible to standard MCMC methods. 相似文献
4.
Capture--recapture estimation via Gibbs sampling 总被引:6,自引:0,他引:6
5.
Protein backbones have characteristic secondary structures, including α-helices and β-sheets. Which structure is adopted locally is strongly biased by the local amino acid sequence of the protein. Accurate (probabilistic) mappings from sequence to structure are valuable for both secondary-structure prediction and protein design. For the case of α-helix caps, we test whether the information content of the sequence–structure mapping can be self-consistently improved by using a relaxed definition of the structure. We derive helix-cap sequence motifs using database helix assignments for proteins of known structure. These motifs are refined using Gibbs sampling in competition with a null motif. Then Gibbs sampling is repeated, allowing for frameshifts of ±1 amino acid residue, in order to find sequence motifs of higher total information content. All helix-cap motifs were found to have good generalization capability, as judged by training on a small set of non-redundant proteins and testing on a larger set. For overall prediction purposes, frameshift motifs using all training examples yielded the best results. Frameshift motifs using a fraction of all training examples performed best in terms of true positives among top predictions. However, motifs without frameshifts also performed well, despite a roughly one-third lower total information content. 相似文献
6.
7.
Biclustering microarray data by Gibbs sampling 总被引:1,自引:0,他引:1
MOTIVATION: Gibbs sampling has become a method of choice for the discovery of noisy patterns, known as motifs, in DNA and protein sequences. Because handling noise in microarray data presents similar challenges, we have adapted this strategy to the biclustering of discretized microarray data. RESULTS: In contrast with standard clustering that reveals genes that behave similarly over all the conditions, biclustering groups genes over only a subset of conditions for which those genes have a sharp probability distribution. We have opted for a simple probabilistic model of the biclusters because it has the key advantage of providing a transparent probabilistic interpretation of the biclusters in the form of an easily interpretable fingerprint. Furthermore, Gibbs sampling does not suffer from the problem of local minima that often characterizes Expectation-Maximization. We demonstrate the effectiveness of our approach on two synthetic data sets as well as a data set from leukemia patients. 相似文献
8.
Statistical packages for constructing genetic linkage maps in inbred lines are well developed and applied extensively, while
linkage analysis in outcrossing species faces some statistical challenges because of their complicated genetic structures.
In this article, we present a multilocus linkage analysis via hidden Markov models for a linkage group of markers in a full-sib
family. The advantage of this method is the simultaneous estimation of the recombination fractions between adjacent markers
that possibly segregate in different ratios, and the calculation of likelihood for a certain order of the markers. When the
number of markers decreases to two or three, the multilocus linkage analysis becomes traditional two-point or three-point
linkage analysis, respectively. Monte Carlo simulations are performed to show that the recombination fraction estimates of
multilocus linkage analysis are more accurate than those just using two-point linkage analysis and that the likelihood as
an objective function for ordering maker loci is the most powerful method compared with other methods. By incorporating this
multilocus linkage analysis, we have developed a Windows software, FsLinkageMap, for constructing genetic maps in a full-sib
family. A real example is presented for illustrating linkage maps constructed by using mixed segregation markers. Our multilocus
linkage analysis provides a powerful method for constructing high-density genetic linkage maps in some outcrossing plant species,
especially in forest trees. 相似文献
9.
The Gibbs sampling method has been widely used for sequence analysis after it was successfully applied to the problem of identifying regulatory motif sequences upstream of genes. Since then, numerous variants of the original idea have emerged: however, in all cases the application has been to finding short motifs in collections of short sequences (typically less than 100 nucleotides long). In this paper, we introduce a Gibbs sampling approach for identifying genes in multiple large genomic sequences up to hundreds of kilobases long. This approach leverages the evolutionary relationships between the sequences to improve the gene predictions, without explicitly aligning the sequences. We have applied our method to the analysis of genomic sequence from 14 genomic regions, totaling roughly 1.8 Mb of sequence in each organism. We show that our approach compares favorably with existing ab initio approaches to gene finding, including pairwise comparison based gene prediction methods which make explicit use of alignments. Furthermore, excellent performance can be obtained with as little as four organisms, and the method overcomes a number of difficulties of previous comparison based gene finding approaches: it is robust with respect to genomic rearrangements, can work with draft sequence, and is fast (linear in the number and length of the sequences). It can also be seamlessly integrated with Gibbs sampling motif detection methods. 相似文献
10.
On Gibbs sampling for state space models 总被引:26,自引:0,他引:26
We show how to use the Gibbs sampler to carry out Bayesian inferenceon a linear state space model with errors that are a mixtureof normals and coefficients that can switch over time. Our approachsimultaneously generates the whole of the state vector giventhe mixture and coefficient indicator variables and simultaneouslygenerates all the indicator variables conditional on the statevectors. The states are generated efficiently using the Kalmanfilter. We illustrate our approach by several examples and empiricallycompare its performance to another Gibbs sampler where the statesare generated one at a time. The empirical results suggest thatour approach is both practical to implement and dominates theGibbs sampler that generates the states one at a time. 相似文献
11.
Multipoint linkage analysis is commonly used to evaluate linkage of a disease to multiple markers in a small region. Multipoint analysis is particularly powerful when the IBD relations of family members at the trait locus are ambiguous. The increased power arises because, unlike single-marker analyses, multipoint analysis uses haplotype information from several markers to infer the IBD relations. We wish to temper this advantage with a cautionary note: multipoint analysis is sensitive to power loss due to misspecification of intermarker distances. Such misspecification is especially problematic when dealing with closely spaced markers. We present computer simulations comparing the power of single-point and multipoint analyses, both when IBD relations are ambiguous, and when the intermarker distances are misspecified. We conclude that when evaluating markers in a small region to confirm or refute previous findings, a situation in which p values of modest statistical significance are important, single marker analyses may provide more reliable measures of the strength of support for linkage than multipoint statistics. 相似文献
12.
In this study, we propose to use the principal component analysis (PCA) and regression model to incorporate linkage disequilibrium (LD) in genomic association data analysis. To accommodate LD in genomic data and reduce multiple testing, we suggest performing PCA and extracting the PCA score to capture the variation of genomic data, after which regression analysis is used to assess the association of the disease with the principal component score. An empirical analysis result shows that both genotype-basod correlation matrix and haplotype-based LD matrix can produce similar results for PCA. Principal component score seems to be more powerful in detecting genetic association because the principal component score is quantitatively measured and may be able to capture the effect of multiple loci. 相似文献
13.
14.
15.
Improved prediction of MHC class I and class II epitopes using a novel Gibbs sampling approach 总被引:7,自引:0,他引:7
Nielsen M Lundegaard C Worning P Hvid CS Lamberth K Buus S Brunak S Lund O 《Bioinformatics (Oxford, England)》2004,20(9):1388-1397
MOTIVATION: Prediction of which peptides will bind a specific major histocompatibility complex (MHC) constitutes an important step in identifying potential T-cell epitopes suitable as vaccine candidates. MHC class II binding peptides have a broad length distribution complicating such predictions. Thus, identifying the correct alignment is a crucial part of identifying the core of an MHC class II binding motif. In this context, we wish to describe a novel Gibbs motif sampler method ideally suited for recognizing such weak sequence motifs. The method is based on the Gibbs sampling method, and it incorporates novel features optimized for the task of recognizing the binding motif of MHC classes I and II. The method locates the binding motif in a set of sequences and characterizes the motif in terms of a weight-matrix. Subsequently, the weight-matrix can be applied to identifying effectively potential MHC binding peptides and to guiding the process of rational vaccine design. RESULTS: We apply the motif sampler method to the complex problem of MHC class II binding. The input to the method is amino acid peptide sequences extracted from the public databases of SYFPEITHI and MHCPEP and known to bind to the MHC class II complex HLA-DR4(B1*0401). Prior identification of information-rich (anchor) positions in the binding motif is shown to improve the predictive performance of the Gibbs sampler. Similarly, a consensus solution obtained from an ensemble average over suboptimal solutions is shown to outperform the use of a single optimal solution. In a large-scale benchmark calculation, the performance is quantified using relative operating characteristics curve (ROC) plots and we make a detailed comparison of the performance with that of both the TEPITOPE method and a weight-matrix derived using the conventional alignment algorithm of ClustalW. The calculation demonstrates that the predictive performance of the Gibbs sampler is higher than that of ClustalW and in most cases also higher than that of the TEPITOPE method. 相似文献
16.
Diallel mating is a frequently used design for estimating the additive and dominance genetic (polygenic) effects involved in quantitative traits observed in the half- and full-sib progenies generated in plant breeding programmes. Gibbs sampling has been used for making statistical inferences for a mixed-inheritance model (MIM) that includes both major genes and polygenes. However, using this approach it has not been possible to incorporate the genetic properties of major genes with the additive and dominance polygenic effects in a diallel mating population. A parent block Gibbs sampling method was developed in this study to make statistical inferences about the major gene and polygenic effects on quantitative traits for progenies derived from a half-diallel mating design. Using simulated data sets with different major and polygenic effects, the proposed method accurately estimated the major and polygenic effects of quantitative traits, and possible genotypes of parents and progenies. The impact of specifying different prior distributions was examined and was found to have little effect on inference on the posterior distribution. This approach was applied to an experimental data set of Loblolly pine (Pinus taeda L.) derived from a 6-parent half-diallel mating. The result indicated that there might be a recessive major gene affecting height growth in this diallel population. 相似文献
17.
A coalescent approach to study linkage disequilibrium between single-nucleotide polymorphisms
下载免费PDF全文

We present the results of extensive simulations that emulate the development and distribution of linkage disequilibrium (LD) between single-nucleotide polymorphisms (SNPs) and a gene locus that is phenotypically stratified into two classes (disease phenotype and wild-type phenotype). Our approach, based on coalescence theory, allows an explicit modeling of the demographic history of the population without conditioning on the age of the mutation, and serves as an efficient tool to carry out simulations. More specifically, we compare the influence that a constant population size or an exponentially growing population has on the amount of LD. These results indicate that attempts to locate single disease genes are most likely successful in small and constant populations. On the other hand, if we consider an exponentially growing population that started to expand from an initially constant population of reasonable size, then our simulations indicate a lower success rate. The power to detect association is enhanced if haplotypes constructed from several SNPs are used as markers. The versatility of the coalescence approach also allows the analysis of other relevant factors that influence the chances that a disease gene will be located. We show that several alleles leading to the same disease have no substantial influence on the amount of LD, as long as the differences between the disease-causing alleles are confined to the same region of the gene locus and as long as each allele occurs in an appreciable frequency. Our simulations indicate that mapping of less-frequent diseases is more likely to be successful. Moreover, we show that successful attempts to map complex diseases depend crucially on the phenotype-genotype correlations of all alleles at the disease locus. An analysis of lipoprotein lipase data indicates that our simulations capture the major features of LD occurring in biological data. 相似文献
18.
Kannan Tharakaraman Leonardo Mari?o-Ramírez Sergey L Sheetlin David Landsman John L Spouge 《BMC bioinformatics》2006,7(1):408
Background
Many DNA regulatory elements occur as multiple instances within a target promoter. Gibbs sampling programs for finding DNA regulatory elements de novo can be prohibitively slow in locating all instances of such an element in a sequence set. 相似文献19.
20.
A formula is given for the advantage of n-point sampling, which approaches infinity with n. However, 2-point and 3-point analyses extract nearly all the information in such samples and at the same time communicate this information as lods and chi 2, which can be combined with other data by simple addition without reevaluation of the likelihood. When null interference is assumed, map distances and multiple recombination frequencies are inflated, and there is substantial loss of efficiency and of support for the correct order. 相似文献