期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A unified Markov chain Monte Carlo framework for mapping multiple quantitative trait loci 总被引：17，自引：0，他引：17

Yi N 《Genetics》2004,167(2):967-975

In this article, a unified Markov chain Monte Carlo (MCMC) framework is proposed to identify multiple quantitative trait loci (QTL) for complex traits in experimental designs, based on a composite space representation of the problem that has fixed dimension. The proposed unified approach includes the existing Bayesian QTL mapping methods using reversible jump MCMC algorithm as special cases. We also show that a variety of Bayesian variable selection methods using Gibbs sampling can be applied to the composite model space for mapping multiple QTL. The unified framework not only results in some new algorithms, but also gives useful insight into some of the important factors governing the performance of Gibbs sampling and reversible jump for mapping multiple QTL. Finally, we develop strategies to improve the performance of MCMC algorithms. 相似文献

2.

A transdimensional Bayesian model for pattern recognition in DNA sequences

Li SM Wakefield J Self S 《Biostatistics (Oxford, England)》2008,9(4):668-685

相似文献

3.

Detection of dispersed short tandem repeats using reversible jump Markov chain Monte Carlo

Tong Liang Xiaodan Fan Qiwei Li Shuo-yen R. Li 《Nucleic acids research》2012,40(19):e147

Tandem repeats occur frequently in biological sequences. They are important for studying genome evolution and human disease. A number of methods have been designed to detect a single tandem repeat in a sliding window. In this article, we focus on the case that an unknown number of tandem repeat segments of the same pattern are dispersively distributed in a sequence. We construct a probabilistic generative model for the tandem repeats, where the sequence pattern is represented by a motif matrix. A Bayesian approach is adopted to compute this model. Markov chain Monte Carlo (MCMC) algorithms are used to explore the posterior distribution as an effort to infer both the motif matrix of tandem repeats and the location of repeat segments. Reversible jump Markov chain Monte Carlo (RJMCMC) algorithms are used to address the transdimensional model selection problem raised by the variable number of repeat segments. Experiments on both synthetic data and real data show that this new approach is powerful in detecting dispersed short tandem repeats. As far as we know, it is the first work to adopt RJMCMC algorithms in the detection of tandem repeats. 相似文献

4.

SlidingBayes: exploring recombination using a sliding window approach based on Bayesian phylogenetic inference

Paraskevis D Deforche K Lemey P Magiorkinis G Hatzakis A Vandamme AM 《Bioinformatics (Oxford, England)》2005,21(7):1274-1275

We developed a software tool (SlidingBayes) for recombination analysis based on Bayesian phylogenetic inference. Sliding-Bayes provides a powerful approach for detecting potential recombination, especially between highly divergent sequences and complex HIV-1 recombinants for which simpler methods like neighbor joining (NJ) may be less powerful. SlidingBayes guides Markov Chain Monte Carlo (MCMC) sampling performed by MrBayes in a sliding window across the alignment (Bayesian scanning). The tool can be used for nucleotide and amino acid sequences and combines all the modeling possibilities of MrBayes with the ability to plot the posterior probability support for clustering of various combinations of taxa. 相似文献

5.

Sequential ordinal modeling with applications to survival data 总被引：2，自引：0，他引：2

Albert JH Chib S 《Biometrics》2001,57(3):829-836

This paper considers the class of sequential ordinal models in relation to other models for ordinal response data. Markov chain Monte Carlo (MCMC) algorithms, based on the approach of Albert and Chib (1993, Journal of the American Statistical Association 88, 669-679), are developed for the fitting of these models. The ideas and methods are illustrated in detail with a real data example on the length of hospital stay for patients undergoing heart surgery. A notable aspect of this analysis is the comparison, based on marginal likelihoods and training sample priors, of several nonnested models, such as the sequential model, the cumulative ordinal model, and Weibull and log-logistic models. 相似文献

6.

A reversible jump Markov chain Monte Carlo algorithm for bacterial promoter motifs discovery.

Pierre Nicolas Anne-Sophie Tocquet Vincent Miele Florence Muri 《Journal of computational biology》2006,13(3):651-667

Effective probabilistic modeling approaches have been developed to find motifs of biological function in DNA sequences. However, the problem of automated model choice remains largely open and becomes more essential as the number of sequences to be analyzed is constantly increasing. Here we propose a reversible jump Markov chain Monte Carlo algorithm for estimating both parameters and model dimension of a Bayesian hidden semi-Markov model dedicated to bacterial promoter motif discovery. Bacterial promoters are complex motifs composed of two boxes separated by a spacer of variable but constrained length and occurring close to the protein translation start site. The algorithm allows simultaneous estimations of the width of the boxes, of the support size of the spacer length distribution, and of the order of the Markovian model used for the "background" nucleotide composition. The application of this method on three sequence sets points out the good behavior of the algorithm and the biological relevance of the estimated promoter motifs. 相似文献

7.

NestedMICA as an ab initio protein motif discovery tool

Mutlu Doğruel Thomas A Down Tim JP Hubbard 《BMC bioinformatics》2008,9(1):19

相似文献

8.

A Monte Carlo EM Algorithm for De Novo Motif Discovery in Biomolecular Sequences 总被引：1，自引：0，他引：1

Bi Chengpeng 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2009,6(3):370-386

Motif discovery methods play pivotal roles in deciphering the genetic regulatory codes (i.e., motifs) in genomes as well as in locating conserved domains in protein sequences. The Expectation Maximization (EM) algorithm is one of the most popular methods used in de novo motif discovery. Based on the position weight matrix (PWM) updating technique, this paper presents a Monte Carlo version of the EM motif-finding algorithm that carries out stochastic sampling in local alignment space to overcome the conventional EM's main drawback of being trapped in a local optimum. The newly implemented algorithm is named as Monte Carlo EM Motif Discovery Algorithm (MCEMDA). MCEMDA starts from an initial model, and then it iteratively performs Monte Carlo simulation and parameter update until convergence. A log-likelihood profiling technique together with the top-k strategy is introduced to cope with the phase shifts and multiple modal issues in motif discovery problem. A novel grouping motif alignment (GMA) algorithm is designed to select motifs by clustering a population of candidate local alignments and successfully applied to subtle motif discovery. MCEMDA compares favorably to other popular PWM-based and word enumerative motif algorithms tested using simulated (l, d)-motif cases, documented prokaryotic, and eukaryotic DNA motif sequences. Finally, MCEMDA is applied to detect large blocks of conserved domains using protein benchmarks and exhibits its excellent capacity while compared with other multiple sequence alignment methods. 相似文献

9.

A Bayesian approach to DNA sequence segmentation 总被引：3，自引：0，他引：3

Boys RJ Henderson DA 《Biometrics》2004,60(3):573-581

Many deoxyribonucleic acid (DNA) sequences display compositional heterogeneity in the form of segments of similar structure. This article describes a Bayesian method that identifies such segments by using a Markov chain governed by a hidden Markov model. Markov chain Monte Carlo (MCMC) techniques are employed to compute all posterior quantities of interest and, in particular, allow inferences to be made regarding the number of segment types and the order of Markov dependence in the DNA sequence. The method is applied to the segmentation of the bacteriophage lambda genome, a common benchmark sequence used for the comparison of statistical segmentation algorithms. 相似文献

10.

Protein multiple alignment incorporating primary and secondary structure information.

Nak-Kyeong Kim Jun Xie 《Journal of computational biology》2006,13(10):1735-1748

Identifying common local segments, also called motifs, in multiple protein sequences plays an important role for establishing homology between proteins. Homology is easy to establish when sequences are similar (sharing an identity > 25%). However, for distant proteins, it is much more difficult to align motifs that are not similar in sequences but still share common structures or functions. This paper is a first attempt to align multiple protein sequences using both primary and secondary structure information. A new sequence model is proposed so that the model assigns high probabilities not only to motifs that contain conserved amino acids but also to motifs that present common secondary structures. The proposed method is tested in a structural alignment database BAliBASE. We show that information brought by the predicted secondary structures greatly improves motif identification. A website of this program is available at www.stat.purdue.edu/~junxie/2ndmodel/sov.html. 相似文献

11.

A Bayesian Hidden Markov Model for Motif Discovery Through Joint Modeling of Genomic Sequence and ChIP‐Chip Data

Jonathan A. L. Gelfond Mayetri Gupta Joseph G. Ibrahim 《Biometrics》2009,65(4):1087-1095

相似文献

12.

Bayesian basecalling for DNA sequence analysis using hidden Markov models

Liang KC Wang X Anastassiou D 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2007,4(3):430-440

It has been shown that electropherograms of DNA sequences can be modeled with hidden Markov models. Basecalling, the procedure that determines the sequence of bases from the given eletropherogram, can then be performed using the Viterbi algorithm. A training step is required prior to basecalling in order to estimate the HMM parameters. In this paper, we propose a Bayesian approach which employs the Markov chain Monte Carlo (MCMC) method to perform basecalling. Such an approach not only allows one to naturally encode the prior biological knowledge into the basecalling algorithm, it also exploits both the training data and the basecalling data in estimating the HMM parameters, leading to more accurate estimates. Using the recently sequenced genome of the organism Legionella pneumophila we show that the MCMC basecaller outperforms the state-of-the-art basecalling algorithm in terms of total errors while requiring much less training than other proposed statistical basecallers. 相似文献

13.

Cover1

Kuo-ching Liang Xiaodong Wang Anastassiou D. 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2007,4(1):c1-c1

It has been shown that electropherograms of DNA sequences can be modeled with hidden Markov models. Basecalling, the procedure that determines the sequence of bases from the given electropherogram can then be performed using the Viterbi algorithm. A training step is required prior to basecalling in order to estimate the HMM parameters. In this paper, we propose a Bayesian approach which employs the Markov chain Monte Carlo (MCMC) method to perform basecalling. Such an approach not only allows one to naturally encode the prior biological knowledge into the basecalling algorithm, it also exploits both the training data and the basecalling data in estimating the HMM parameters, leading to more accurate estimates. Using the recently sequenced genome of the organism Legionella pneumophila, we show that the MCMC basecaller outperforms the state-of-the-art basecalling algorithm in terms of total errors while requiring much less training than other proposed statistical basecallers. 相似文献

14.

多元性状同胞对连锁分析方法及其在原发性高血压基因定位数据中的应用

许宗利方淯靖方积乾《生物数学学报》2003,18(2):176-181

提出新的以广义最小二乘法原理处理同胞对数据间的相关性,以多元响应回归的方法处理多个性状数据间的相关性的多元性状同胞对连锁分析方法,模型的参数估计使用MCMC方法．并把此模型应用于原发性高血压基因定位的实际数据中．结果表明,与把多元性状拆成单一性状进行分析的方法相比,本文的方法可以提高估计的精度和检验的效能．相似文献

15.

Irreducibility and efficiency of ESIP to sample marker genotypes in large pedigrees with loops

Soledad A Fernández Rohan L Fernando Bernt Guldbrandtsen Christian Stricker Matthias Schelling Alicia L Carriquiry 《遗传、选种与进化》2002,34(5):537-555

Markov chain Monte Carlo (MCMC) methods have been proposed to overcome computational problems in linkage and segregation analyses. This approach involves sampling genotypes at the marker and trait loci. Among MCMC methods, scalar-Gibbs is the easiest to implement, and it is used in genetics. However, the Markov chain that corresponds to scalar-Gibbs may not be irreducible when the marker locus has more than two alleles, and even when the chain is irreducible, mixing has been observed to be slow. Joint sampling of genotypes has been proposed as a strategy to overcome these problems. An algorithm that combines the Elston-Stewart algorithm and iterative peeling (ESIP sampler) to sample genotypes jointly from the entire pedigree is used in this study. Here, it is shown that the ESIP sampler yields an irreducible Markov chain, regardless of the number of alleles at a locus. Further, results obtained by ESIP sampler are compared with other methods in the literature. Of the methods that are guaranteed to be irreducible, ESIP was the most efficient. 相似文献

16.

Detection of multiple QTL with epistatic effects under a mixed inheritance model in an outbred population

Akira Narita Yoshiyuki Sasaki 《遗传、选种与进化》2004,36(4):415-433

A quantitative trait depends on multiple quantitative trait loci (QTL) and on the interaction between two or more QTL, named epistasis. Several methods to detect multiple QTL in various types of design have been proposed, but most of these are based on the assumption that each QTL works independently and epistasis has not been explored sufficiently. The objective of the study was to propose an integrated method to detect multiple QTL with epistases using Bayesian inference via a Markov chain Monte Carlo (MCMC) algorithm. Since the mixed inheritance model is assumed and the deterministic algorithm to calculate the probabilities of QTL genotypes is incorporated in the method, this can be applied to an outbred population such as livestock. Additionally, we treated a pair of QTL as one variable in the Reversible jump Markov chain Monte Carlo (RJMCMC) algorithm so that two QTL were able to be simultaneously added into or deleted from a model. As a result, both of the QTL can be detected, not only in cases where either of the two QTL has main effects and they have epistatic effects between each other, but also in cases where neither of the two QTL has main effects but they have epistatic effects. The method will help ascertain the complicated structure of quantitative traits. 相似文献

17.

A comparison of strategies for Markov chain Monte Carlo computation in quantitative genetics

Rasmus Waagepetersen Noelia Ibán?z-Escriche Daniel Sorensen 《遗传、选种与进化》2008,40(2):161-176

In quantitative genetics, Markov chain Monte Carlo (MCMC) methods are indispensable for statistical inference in non-standard models like generalized linear models with genetic random effects or models with genetically structured variance heterogeneity. A particular challenge for MCMC applications in quantitative genetics is to obtain efficient updates of the high-dimensional vectors of genetic random effects and the associated covariance parameters. We discuss various strategies to approach this problem including reparameterization, Langevin-Hastings updates, and updates based on normal approximations. The methods are compared in applications to Bayesian inference for three data sets using a model with genetically structured variance heterogeneity. 相似文献

18.

Inferring complex DNA substitution processes on phylogenies using uniformization and data augmentation

Mateiu L Rannala B 《Systematic biology》2006,55(2):259-269

A new method is developed for calculating sequence substitution probabilities using Markov chain Monte Carlo (MCMC) methods. The basic strategy is to use uniformization to transform the original continuous time Markov process into a Poisson substitution process and a discrete Markov chain of state transitions. An efficient MCMC algorithm for evaluating substitution probabilities by this approach using a continuous gamma distribution to model site-specific rates is outlined. The method is applied to the problem of inferring branch lengths and site-specific rates from nucleotide sequences under a general time-reversible (GTR) model and a computer program BYPASSR is developed. Simulations are used to examine the performance of the new program relative to an existing program BASEML that uses a discrete approximation for the gamma distributed prior on site-specific rates. It is found that BASEML and BYPASSR are in close agreement when inferring branch lengths, regardless of the number of rate categories used, but that BASEML tends to underestimate high site-specific substitution rates, and to overestimate intermediate rates, when fewer than 50 rate categories are used. Rate estimates obtained using BASEML agree more closely with those of BYPASSR as the number of rate categories increases. Analyses of the posterior distributions of site-specific rates from BYPASSR suggest that a large number of taxa are needed to obtain precise estimates of site-specific rates, especially when rates are very high or very low. The method is applied to analyze 45 sequences of the alpha 2B adrenergic receptor gene (A2AB) from a sample of eutherian taxa. In general, the pattern expected for regions under negative selection is observed with third codon positions having the highest inferred rates, followed by first codon positions and with second codon positions having the lowest inferred rates. Several sites show exceptionally high substitution rates at second codon positions that may represent the effects of positive selection. 相似文献

19.

Estimation of amino acid residue substitution rates at local spatial regions and application in protein function inference: a Bayesian Monte Carlo approach 总被引：2，自引：0，他引：2

Tseng YY Liang J 《Molecular biology and evolution》2006,23(2):421-436

The amino acid sequences of proteins provide rich information for inferring distant phylogenetic relationships and for predicting protein functions. Estimating the rate matrix of residue substitutions from amino acid sequences is also important because the rate matrix can be used to develop scoring matrices for sequence alignment. Here we use a continuous time Markov process to model the substitution rates of residues and develop a Bayesian Markov chain Monte Carlo method for rate estimation. We validate our method using simulated artificial protein sequences. Because different local regions such as binding surfaces and the protein interior core experience different selection pressures due to functional or stability constraints, we use our method to estimate the substitution rates of local regions. Our results show that the substitution rates are very different for residues in the buried core and residues on the solvent-exposed surfaces. In addition, the rest of the proteins on the binding surfaces also have very different substitution rates from residues. Based on these findings, we further develop a method for protein function prediction by surface matching using scoring matrices derived from estimated substitution rates for residues located on the binding surfaces. We show with examples that our method is effective in identifying functionally related proteins that have overall low sequence identity, a task known to be very challenging. 相似文献

20.

Using the quantitative genetic threshold model for inferences between and within species

Felsenstein J 《Philosophical transactions of the Royal Society of London. Series B, Biological sciences》2005,360(1459):1427-1434

Sewall Wright's threshold model has been used in modelling discrete traits that may have a continuous trait underlying them, but it has proven difficult to make efficient statistical inferences with it. The availability of Markov chain Monte Carlo (MCMC) methods makes possible likelihood and Bayesian inference using this model. This paper discusses prospects for the use of the threshold model in morphological systematics to model the evolution of discrete all-or-none traits. There the threshold model has the advantage over 0/1 Markov process models in that it not only accommodates polymorphism within species, but can also allow for correlated evolution of traits with far fewer parameters that need to be inferred. The MCMC importance sampling methods needed to evaluate likelihood ratios for the threshold model are introduced and described in some detail. 相似文献