期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A dynamic Bayesian network approach to protein secondary structure prediction

Xin-Qiu Yao Huaiqiu Zhu Zhen-Su She 《BMC bioinformatics》2008,9(1):49

Background

Protein secondary structure prediction method based on probabilistic models such as hidden Markov model (HMM) appeals to many because it provides meaningful information relevant to sequence-structure relationship. However, at present, the prediction accuracy of pure HMM-type methods is much lower than that of machine learning-based methods such as neural networks (NN) or support vector machines (SVM). 相似文献

2.

A linear memory algorithm for Baum-Welch training

István?Miklós Irmtraud?M?Meyer Email author 《BMC bioinformatics》2005,6(1):231

Background:

Baum-Welch training is an expectation-maximisation algorithm for training the emission and transition probabilities of hidden Markov models in a fully automated way. It can be employed as long as a training set of annotated sequences is known, and provides a rigorous way to derive parameter values which are guaranteed to be at least locally optimal. For complex hidden Markov models such as pair hidden Markov models and very long training sequences, even the most efficient algorithms for Baum-Welch training are currently too memory-consuming. This has so far effectively prevented the automatic parameter training of hidden Markov models that are currently used for biological sequence analyses. 相似文献

3.

Evolutionary models for insertions and deletions in a probabilistic modeling framework

Elena?Rivas Email author 《BMC bioinformatics》2005,6(1):63

Background

Probabilistic models for sequence comparison (such as hidden Markov models and pair hidden Markov models for proteins and mRNAs, or their context-free grammar counterparts for structural RNAs) often assume a fixed degree of divergence. Ideally we would like these models to be conditional on evolutionary divergence time. 相似文献

4.

A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure

Sean R Eddy 《BMC bioinformatics》2002,3(1):18-16

Background

Covariance models (CMs) are probabilistic models of RNA secondary structure, analogous to profile hidden Markov models of linear sequence. The dynamic programming algorithm for aligning a CM to an RNA sequence of length N is O(N ³) in memory. This is only practical for small RNAs. 相似文献

5.

Deep transformation models for functional outcome prediction after acute ischemic stroke

Lisa Herzog Lucas Kook Andrea Götschi Katrin Petermann Martin Hänsel Janne Hamann Oliver Dürr Susanne Wegener Beate Sick 《Biometrical journal. Biometrische Zeitschrift》2023,65(6):2100379

In many medical applications, interpretable models with high prediction performance are sought. Often, those models are required to handle semistructured data like tabular and image data. We show how to apply deep transformation models (DTMs) for distributional regression that fulfill these requirements. DTMs allow the data analyst to specify (deep) neural networks for different input modalities making them applicable to various research questions. Like statistical models, DTMs can provide interpretable effect estimates while achieving the state-of-the-art prediction performance of deep neural networks. In addition, the construction of ensembles of DTMs that retain model structure and interpretability allows quantifying epistemic and aleatoric uncertainty. In this study, we compare several DTMs, including baseline-adjusted models, trained on a semistructured data set of 407 stroke patients with the aim to predict ordinal functional outcome three months after stroke. We follow statistical principles of model-building to achieve an adequate trade-off between interpretability and flexibility while assessing the relative importance of the involved data modalities. We evaluate the models for an ordinal and dichotomized version of the outcome as used in clinical practice. We show that both tabular clinical and brain imaging data are useful for functional outcome prediction, whereas models based on tabular data only outperform those based on imaging data only. There is no substantial evidence for improved prediction when combining both data modalities. Overall, we highlight that DTMs provide a powerful, interpretable approach to analyzing semistructured data and that they have the potential to support clinical decision-making. 相似文献

6.

Hidden Markov model speed heuristic and iterative HMM search procedure 总被引：1，自引：0，他引：1

L Steven Johnson Sean R Eddy Elon Portugaly 《BMC bioinformatics》2010,11(1):431

Background

Profile hidden Markov models (profile-HMMs) are sensitive tools for remote protein homology detection, but the main scoring algorithms, Viterbi or Forward, require considerable time to search large sequence databases. 相似文献

7.

PREDTAP: a system for prediction of peptide binding to the human transporter associated with antigen processing

Guang Lan Zhang Nikolai Petrovsky Chee Keong Kwoh J Thomas August Vladimir Brusic 《Immunome research》2006,2(1):1-12

相似文献

8.

Parsing Social Network Survey Data from Hidden Populations Using Stochastic Context-Free Grammars

Art F. Y. Poon Kimberly C. Brouwer Steffanie A. Strathdee Michelle Firestone-Cruz Remedios M. Lozada Sergei L. Kosakovsky Pond Douglas D. Heckathorn Simon D. W. Frost 《PloS one》2009,4(9)

Background

Human populations are structured by social networks, in which individuals tend to form relationships based on shared attributes. Certain attributes that are ambiguous, stigmatized or illegal can create a ÔhiddenÕ population, so-called because its members are difficult to identify. Many hidden populations are also at an elevated risk of exposure to infectious diseases. Consequently, public health agencies are presently adopting modern survey techniques that traverse social networks in hidden populations by soliciting individuals to recruit their peers, e.g., respondent-driven sampling (RDS). The concomitant accumulation of network-based epidemiological data, however, is rapidly outpacing the development of computational methods for analysis. Moreover, current analytical models rely on unrealistic assumptions, e.g., that the traversal of social networks can be modeled by a Markov chain rather than a branching process.

Methodology/Principal Findings

Here, we develop a new methodology based on stochastic context-free grammars (SCFGs), which are well-suited to modeling tree-like structure of the RDS recruitment process. We apply this methodology to an RDS case study of injection drug users (IDUs) in Tijuana, México, a hidden population at high risk of blood-borne and sexually-transmitted infections (i.e., HIV, hepatitis C virus, syphilis). Survey data were encoded as text strings that were parsed using our custom implementation of the inside-outside algorithm in a publicly-available software package (HyPhy), which uses either expectation maximization or direct optimization methods and permits constraints on model parameters for hypothesis testing. We identified significant latent variability in the recruitment process that violates assumptions of Markov chain-based methods for RDS analysis: firstly, IDUs tended to emulate the recruitment behavior of their own recruiter; and secondly, the recruitment of like peers (homophily) was dependent on the number of recruits.

Conclusions

SCFGs provide a rich probabilistic language that can articulate complex latent structure in survey data derived from the traversal of social networks. Such structure that has no representation in Markov chain-based models can interfere with the estimation of the composition of hidden populations if left unaccounted for, raising critical implications for the prevention and control of infectious disease epidemics. 相似文献

9.

Parameter estimation for robust HMM analysis of ChIP-chip data

Peter Humburg David Bulger Glenn Stone 《BMC bioinformatics》2008,9(1):343

相似文献

10.

In silico segmentations of lentivirus envelope sequences

Aurélia Boissin-Quillon Didier Piau Caroline Leroux 《BMC bioinformatics》2007,8(1):99

Background

The gene encoding the envelope of lentiviruses exhibits a considerable plasticity, particularly the region which encodes the surface (SU) glycoprotein. Interestingly, mutations do not appear uniformly along the sequence of SU, but they are clustered in restricted areas, called variable (V) regions, which are interspersed with relatively more stable regions, called constant (C) regions. We look for specific signatures of C/V regions, using hidden Markov models constructed with SU sequences of the equine, human, small ruminant and simian lentiviruses. 相似文献

11.

Using 3D Hidden Markov Models that explicitly represent spatial coordinates to model and compare protein structures

Vadim?Alexandrov Email author Mark?Gerstein 《BMC bioinformatics》2004,5(1):2

Background

Hidden Markov Models (HMMs) have proven very useful in computational biology for such applications as sequence pattern matching, gene-finding, and structure prediction. Thus far, however, they have been confined to representing 1D sequence (or the aspects of structure that could be represented by character strings). 相似文献

12.

Predicting conserved protein motifs with Sub-HMMs

Kevin Horan Christian R Shelton Thomas Girke 《BMC bioinformatics》2010,11(1):205

Background

Profile HMMs (hidden Markov models) provide effective methods for modeling the conserved regions of protein families. A limitation of the resulting domain models is the difficulty to pinpoint their much shorter functional sub-features, such as catalytically relevant sequence motifs in enzymes or ligand binding signatures of receptor proteins. 相似文献

13.

Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign

Arif Ozgun Harmanci Gaurav Sharma David H Mathews 《BMC bioinformatics》2007,8(1):130

Background

Joint alignment and secondary structure prediction of two RNA sequences can significantly improve the accuracy of the structural predictions. Methods addressing this problem, however, are forced to employ constraints that reduce computation by restricting the alignments and/or structures (i.e. folds) that are permissible. In this paper, a new methodology is presented for the purpose of establishing alignment constraints based on nucleotide alignment and insertion posterior probabilities. Using a hidden Markov model, posterior probabilities of alignment and insertion are computed for all possible pairings of nucleotide positions from the two sequences. These alignment and insertion posterior probabilities are additively combined to obtain probabilities of co-incidence for nucleotide position pairs. A suitable alignment constraint is obtained by thresholding the co-incidence probabilities. The constraint is integrated with Dynalign, a free energy minimization algorithm for joint alignment and secondary structure prediction. The resulting method is benchmarked against the previous version of Dynalign and against other programs for pairwise RNA structure prediction. 相似文献

14.

Machine learning techniques in disease forecasting: a case study on rice blast prediction

Rakesh Kaundal Amar S Kapoor Gajendra PS Raghava 《BMC bioinformatics》2006,7(1):485-16

Background

Diverse modeling approaches viz. neural networks and multiple regression have been followed to date for disease prediction in plant populations. However, due to their inability to predict value of unknown data points and longer training times, there is need for exploiting new prediction softwares for better understanding of plant-pathogen-environment relationships. Further, there is no online tool available which can help the plant researchers or farmers in timely application of control measures. This paper introduces a new prediction approach based on support vector machines for developing weather-based prediction models of plant diseases. 相似文献

15.

Efficient algorithms for training the parameters of hidden Markov models using stochastic expectation maximization (EM) training and Viterbi training

Tin Y Lam Irmtraud M Meyer 《Algorithms for molecular biology : AMB》2010,5(1):38

Background

Hidden Markov models are widely employed by numerous bioinformatics programs used today. Applications range widely from comparative gene prediction to time-series analyses of micro-array data. The parameters of the underlying models need to be adjusted for specific data sets, for example the genome of a particular species, in order to maximize the prediction accuracy. Computationally efficient algorithms for parameter training are thus key to maximizing the usability of a wide range of bioinformatics applications. 相似文献

16.

A markov classification model for metabolic pathways

Timothy Hancock Hiroshi Mamitsuka 《Algorithms for molecular biology : AMB》2010,5(1):10

Background

This paper considers the problem of identifying pathways through metabolic networks that relate to a specific biological response. Our proposed model, HME3M, first identifies frequently traversed network paths using a Markov mixture model. Then by employing a hierarchical mixture of experts, separate classifiers are built using information specific to each path and combined into an ensemble prediction for the response. 相似文献

17.

Error statistics of hidden Markov model and hidden Boltzmann model results

Lee A Newberg 《BMC bioinformatics》2009,10(1):212

Background

Hidden Markov models and hidden Boltzmann models are employed in computational biology and a variety of other scientific fields for a variety of analyses of sequential data. Whether the associated algorithms are used to compute an actual probability or, more generally, an odds ratio or some other score, a frequent requirement is that the error statistics of a given score be known. What is the chance that random data would achieve that score or better? What is the chance that a real signal would achieve a given score threshold? 相似文献

18.

Novel computational analysis of protein binding array data identifies direct targets of <Emphasis Type="Italic">Nkx2.2</Emphasis> in the pancreas

Jonathon T Hill Keith R Anderson Teresa L Mastracci Klaus H Kaestner Lori Sussel 《BMC bioinformatics》2011,12(1):62

相似文献

19.

Prediction of the disulfide bonding state of cysteines in proteins with hidden neural networks 总被引：2，自引：0，他引：2

Martelli PL Fariselli P Malaguti L Casadio R 《Protein engineering》2002,15(12):951-953

A hybrid system (hidden neural network) based on a hidden Markov model (HMM) and neural networks (NN) was trained to predict the bonding states of cysteines in proteins starting from the residue chains. Training was performed using 4136 cysteine-containing segments extracted from 969 non-homologous proteins of well-resolved 3D structure and without chain-breaks. After a 20-fold cross-validation procedure, the efficiency of the prediction scores as high as 80% using neural networks based on evolutionary information. When the whole protein is taken into account by means of an HMM, a hybrid system is generated, whose emission probabilities are computed using the NN output (hidden neural networks). In this case, the predictor accuracy increases up to 88%. Further, when tested on a protein basis, the hybrid system can correctly predict 84% of the chains in the data set, with a gain of at least 27% over the NN predictor. 相似文献

20.

Distill: a suite of web servers for the prediction of one-, two- and three-dimensional structural features of proteins

Davide?Baú Alberto?JM?Martin Catherine?Mooney Alessandro?Vullo Ian?Walsh Gianluca?Pollastri Email author 《BMC bioinformatics》2006,7(1):402

Background

We describe Distill, a suite of servers for the prediction of protein structural features: secondary structure; relative solvent accessibility; contact density; backbone structural motifs; residue contact maps at 6, 8 and 12 Angstrom; coarse protein topology. The servers are based on large-scale ensembles of recursive neural networks and trained on large, up-to-date, non-redundant subsets of the Protein Data Bank. Together with structural feature predictions, Distill includes a server for prediction of C_αtraces for short proteins (up to 200 amino acids). 相似文献