共查询到20条相似文献,搜索用时 0 毫秒
1.
MicroRNAs are one class of small single-stranded RNA of about 22 nt serving as important negative gene regulators. In animals,
miRNAs mainly repress protein translation by binding itself to the 3′ UTR regions of mRNAs with imperfect complementary pairing.
Although bioinformatics investigations have resulted in a number of target prediction tools, all of these have a common shortcoming—a
high false positive rate. Therefore, it is important to further filter the predicted targets. In this paper, based on miRNA:target
duplex, we construct a second-order Hidden Markov Model, implement Baum-Welch training algorithm and apply this model to further
process predicted targets. The model trains the classifier by 244 positive and 49 negative miRNA:target interaction pairs
and achieves a sensitivity of 72.54%, specificity of 55.10% and accuracy of 69.62% by 10-fold cross-validation experiments.
In order to further verify the applicability of the algorithm, previously collected datasets, including 195 positive and 38
negative, are chosen to test it, with consistent results. We believe that our method will provide some guidance for experimental
biologists, especially in choosing miRNA targets for validation. 相似文献
2.
3.
4.
广义隐Markov模型(GHMM)是基因识别的一种重要模型,但是其计算量比传统的隐Markov模型大得多,以至于不能直 接在基因识别中使用。根据原核生物基因的结构特点,提出了一种高效的简化算法,其计算量是序列长度的线性函数。在此 基础上,构建了针对原核生物基因的识别程序GeneMiner,对实际数据的测试表明,此算法是有效的。 相似文献
5.
Multiscale simulation is employed to examine changes in atomistic-level protein structure due to long wavelength membrane undulations and plane stress fields. An ensemble of atomistic-level simulations of a model of a transmembrane influenza A virus M2 proton channel in a dimyristoylphosphatidylcholine (DMPC) bilayer is coupled to a corresponding mesoscopic model of a DMPC bilayer in an explicit mesoscopic solvent. Structural variations in the key proton gating His37 residues of the M2 channel are examined. Small, but distinct variations in the structure of the His37 residues are observed in both the open and closed states of the channel as a result of the coupling to mesoscopic-level membrane motions. 相似文献
6.
We developed a computer program, GeneHackerTL, which predictsthe most probable translation initiation site for a given nucleotidesequence. The program requires that information be extractedfrom the nucleotide sequence data surrounding the translationinitiation sites according to the framework of the Hidden MarkovModel. Since the translation initiation sites of 72 highly abundantproteins have already been assigned on the genome of Synechocystissp. strain PCC6803 by amino-terminal analysis, we extractednecessary information for GeneHackerTL from the nucleotide sequencedata. The prediction rate of the GeneHackerTL for these proteinswas estimated to be 86.1%. We then used GeneHackerTL for predictionof the translation initiation sites of 24 other proteins, ofwhich the initiation sites were not assigned experimentally,because of the lack of a potential initiation codon at the amino-terminalposition. For 20 out of the 24 proteins, the initiation siteswere predicted in the upstream of their amino-terminal positions.According to this assignment, the processed regions representa typical feature of signal peptides. We could also predictmultiple translation initiation sites for a particular genefor which at least two initiation sites were experimentallydetected. This program would be e.ective for the predictionof translation initiationsites of other proteins, not only inthis species but also in other prokaryotes as well. 相似文献
7.
With the development of genome sequencing for many organisms, more and more raw sequences need to be annotated. Gene prediction by computational methods for finding the location of protein coding regions is one of the essential issues in bioinformatics. Two classes of methods are generally adopted: similarity based searches and ab initio prediction. Here, we review the development of gene prediction methods, summarize the measures for evaluating predictor quality, highlight open problems in this area, and discuss future research directions. 相似文献
8.
现有蛋白质亚细胞定位方法针对水溶性蛋白质而设计,对跨膜蛋白并不适用。而专门的跨膜拓扑预测器,又不是为亚细胞定位而设计的。文章改进了跨膜拓扑预测器TMPHMMLoc的模型结构,设计了一个新的二阶隐马尔可夫模型;采用推广到二阶模型的Baum-Welch算法估计模型参数,并把将各个亚细胞位置建立的模型整合为一个预测器。数据集上测试结果表明,此方法性能显著优于针对可溶性蛋白设计的支持向量机方法和模糊k最邻近方法,也优于TMPHMMLoc中提出的隐马尔可夫模型方法,是一个有效的跨膜蛋白亚细胞定位预测方法。 相似文献
9.
Song YS 《Bulletin of mathematical biology》2006,68(2):361-384
In hidden Markov models, the probability of observing a set of strings can be computed using recursion relations. We construct a sufficient condition for simplifying the recursion relations for a certain class of hidden Markov models. If the condition is satisfied, then one can construct a reduced recursion where the dependence on Markov states completely disappears. We discuss a specific example—namely, statistical multiple alignment based on the TKF-model—in which the sufficient condition is satisfied. 相似文献
10.
Hidden Markov models (HMMs) are effective tools to detect series of statistically homogeneous structures, but they are not well suited to analyse complex structures. For example, the duration of stay in a state of a HMM must follow a geometric law. Numerous other methodological difficulties are encountered when using HMMs to segregate genes from transposons or retroviruses, or to determine the isochore classes of genes. The aim of this paper is to analyse these methodological difficulties, and to suggest new tools for the exploration of genome data. We show that HMMs can be used to analyse complex gene structures with bell-shaped length distribution by using convolution of geometric distributions. Thus, we have introduced macros-states to model the distributions of the lengths of the regions. Our study shows that simple HMM could be used to model the isochore organisation of the mouse genome. This potential use of markovian models to help in data exploration has been underestimated until now. 相似文献
11.
Background
Computational identification of apicoplast-targeted proteins is important in drug target determination for diseases such as malaria. While there are established methods for identifying proteins with a bipartite signal in multiple species of Apicomplexa, not all apicoplast-targeted proteins possess this bipartite signature. The publication of recent experimental findings of apicoplast membrane proteins, called transmembrane proteins, that do not possess a bipartite signal has made it feasible to devise a machine learning approach for identifying this new class of apicoplast-targeted proteins computationally.Methodology/principal findings
In this work, we develop a method for predicting apicoplast-targeted transmembrane proteins for multiple species of Apicomplexa, whereby several classifiers trained on different feature sets and based on different algorithms are evaluated and combined in an ensemble classification model to obtain the best expected performance. The feature sets considered are the hydrophobicity and composition characteristics of amino acids over transmembrane domains, the existence of short sequence motifs over cytosolically disposed regions, and Gene Ontology (GO) terms associated with given proteins. Our model, ApicoAMP, is an ensemble classification model that combines decisions of classifiers following the majority vote principle. ApicoAMP is trained on a set of proteins from 11 apicomplexan species and achieves 91% overall expected accuracy.Conclusions/significance
ApicoAMP is the first computational model capable of identifying apicoplast-targeted transmembrane proteins in Apicomplexa. The ApicoAMP prediction software is available at http://code.google.com/p/apicoamp/ and http://bcb.eecs.wsu.edu. 相似文献12.
Jafar Razmara Safaai B Deris Rosli Bin Md Illias Sepideh Parvizpour 《Bioinformation》2013,9(7):345-348
A hidden Markov model (HMM) has been utilized to predict and generate artificial secretory signal peptide sequences. The
strength of signal peptides of proteins from different subcellular locations via Lactococcus lactis bacteria correlated with their
HMM bit scores in the model. The results show that the HMM bit score +12 are determined as the threshold for discriminating
secreteory signal sequences from the others. The model is used to generate artificial signal peptides with different bit scores for
secretory proteins. The signal peptide with the maximum bit score strongly directs proteins secretion. 相似文献
13.
Kikuchi N Kwon YD Gotoh M Narimatsu H 《Biochemical and biophysical research communications》2003,310(2):574-579
In order to investigate the relationship between glycosyltransferase families and the motif for them, we classified 47 glycosyltransferase families in the CAZy database into four superfamilies, GTS-A, -B, -C, and -D, using a profile Hidden Markov Model method. On the basis of the classification and the similarity between GTS-A and nucleotidylyltransferase family catalyzing the synthesis of nucleotide-sugar, we proposed that ancient oligosaccharide might have been synthesized by the origin of GTS-B whereas the origin of GTS-A might be the gene encoding for synthesis of nucleotide-sugar as the donor and have evolved to glycosyltransferases to catalyze the synthesis of divergent carbohydrates. We also suggested that the divergent evolution of each superfamily in the corresponding subcellular component has increased the complexities of eukaryotic carbohydrate structure. 相似文献
14.
Bai Y 《Biochemical and biophysical research communications》2003,305(4):785-788
It has long been suggested that existence of partially folded intermediates may be essential for proteins to fold in a biologically meaningful time scale. Although partially folded intermediates have been commonly observed in larger proteins, they are generally not detectable in the kinetic folding of smaller proteins (approximately 100 amino acids or less). Recent native-state hydrogen exchange studies suggest that partially folded intermediates may exist behind the rate-limiting transition state in small proteins and evade detection by conventional kinetic methods. 相似文献
15.
修正非齐次模型是在齐次模型和非齐次模型基础上提出的适用于蛋白质编码区的马尔可夫模型。此模型可以用来分析生物物种进化和基因突变,模型中的马尔可夫度与序列进化水平相关联,转移矩阵与基因突变相关联。本文通过比较7类不同物种-1度马尔可夫链的含量,验证了生物物种进化反映在密码子使用上的特征;通过密码子位点间转移矩阵的计算,分析了基因突变在密码子不同位点上发生的可能性。 相似文献
16.
Joachim Füllekrug 《Protoplasma》1999,207(1-2):8-15
Summary Localization of resident proteins provides identity to subcellular compartments. Most proteins depend on a combination of both retention and retrieval to maintain their steady-state distribution. Rerl is a putative receptor protein mediating retrieval of membrane proteins of the endoplasmic reticulum. This retrieval relies on an unusual hydrophobic target sequence, the transmembrane domain. Apart from Rerl, coatomer is also required to retrieve escaped membrane proteins from the early Golgi region back to the endoplasmic reticulum. Current evidence suggests that the Rerl-mediated retrieval of membrane proteins is a general sorting pathway in eukaryotic cells contributing to the maintenance of compartmental identity in the early secretory pathway. 相似文献
17.
The highly conserved fourth transmembrane segment (S4) is the primary voltage sensor of the voltage-dependent channel and would move outward upon membrane depolarization. S4 comprises repetitive amino acid triads, each containing one basic (presumably charged and voltage-sensing) followed by two hydrophobic residues. We showed that the triad organization is functionally extended into the S3-4 linker right external to S4 in Shaker K(+) channels. The arginine (and lysine) substitutes for the third and the sixth residues (Ala-359 and Met-356, respectively) external to the outmost basic residue (Arg-362) in S4 dramatically and additively stabilize S4 in the resting conformation. Also, Leu-361 and Leu-358 play a very similar role in stabilization of S4 in the resting position, presumably by their hydrophobic side chains. Moreover, the double mutation A359R/E283A leads to a partially extruded position of S4 and consequently prominent closed-state inactivation, suggesting that Glu-283 in S2 may coordinate with the arginines in the extruded S4 upon depolarization. We conclude that the triad organization extends into the S3-4 linker for about six amino acids in terms of their microenvironment. These approximately six residues should retain the same helical structure as S4, and their microenvironment serves as part of the "gating canal" accommodating the extruding S4. Upon depolarization, S4 most likely moves initially as a sliding helix and follows the path that is set by the approximately six residues in the S3-4 linker in the resting state, whereas further S4 translocation could be more like, for example, a paddle, without orderly coordination from the contiguous surroundings. 相似文献
18.
Khan RN Martinac B Madsen BW Milne RK Yeo GF Edeson RO 《Mathematical biosciences》2005,193(2):139-158
Patch clamp data from the large conductance mechanosensitive channel (MscL) in E. coli was studied with the aim of developing a strategy for statistical analysis based on hidden Markov models (HMMs) and determining the number of conductance levels of the channel, together with mean current, mean dwell time and equilibrium probability of occupancy for each level. The models incorporated state-dependent white noise and moving average adjustment for filtering, with maximum likelihood parameter estimates obtained using an EM (expectation-maximisation) based iteration. Adjustment for filtering was included as it could be expected that the electronic filter used in recording would have a major effect on obviously brief intermediate conductance level sojourns. Preliminary data analysis revealed that the brevity of intermediate level sojourns caused difficulties in assignment of data points to levels as a result of over-estimation of noise variances. When reasonable constraints were placed on these variances using the better determined noise variances for the closed and fully open levels, idealisation anomalies were eliminated. Nevertheless, simulations suggested that mean sojourn times for the intermediate levels were still considerably over-estimated, and that recording bandwidth was a major limitation; improved results were obtained with higher bandwidth data (10 kHz sampled at 25 kHz). The simplest model consistent with these data had four open conductance levels, intermediate levels being approximately 20%, 51% and 74% of fully open. The mean lifetime at the fully open level was about 1 ms; estimates for the three intermediate levels were 54-92 micros, probably still over-estimates. 相似文献
19.
新近的基因识别软件比先前的软件有着显著的提高,但是在外显子水平上的敏感性和特异性仍然不十分令人满意.这是因为已有软件对于剪接位点,翻译起始等生物信号位点的识别还不够有效.如果能够分别提高这些生物信号位点的识别效果,就能够提高整体的基因识别效率.隐半马氏模型能够很好地刻画3'剪接位点(acceptor)的结构.据此开发的一套对acceptor进行识别的算法在Burset/Guigo的数据集上经过检验,获得了比已有算法更好的识别率.该模型的成功还使得我们对剪接点上游的分支位点和嘧啶富含区的概貌有了一定的认识,加深了人们对于acceptor的结构和剪接过程的理解. 相似文献
20.
ABSTRACT Actigraphy is widely used in sleep studies but lacks a universal unsupervised algorithm for sleep/wake identification. An unsupervised algorithm is useful in large-scale population studies and in cases where polysomnography (PSG) is unavailable, as it does not require sleep outcome labels to train the model but utilizes information solely contained in actigraphy to learn sleep and wake characteristics and separate the two states. In this study, we proposed a machine learning unsupervised algorithm based on the Hidden Markov Model (HMM) for sleep/wake identification. The proposed algorithm is also an individualized approach that takes into account individual variabilities and analyzes each individual actigraphy profile separately to infer sleep and wake states. We used Actiwatch and PSG data from 43 individuals in the Multi-Ethnic Study of Atherosclerosis study to evaluate the method performance. Epoch-by-epoch comparisons and sleep variable comparisons were made between our algorithm, the unsupervised algorithm embedded in the Actiwatch software (AS), and the pre-trained supervised UCSD algorithm. Using PSG as the reference, the accuracy was 85.7% for HMM, 84.7% for AS, and 85.0% for UCSD. The sensitivity was 99.3%, 99.7%, and 98.9% for HMM, AS, and UCSD, respectively, and the specificity was 36.4%, 30.0%, and 31.7%, respectively. The Kappa statistic was 0.446 for HMM, 0.399 for AS, and 0.311 for UCSD, suggesting fair to moderate agreement between PSG and actigraphy. The Bland–Altman plots further show that the total sleep time, sleep latency, and sleep efficiency estimates by HMM were closer to PSG with narrower 95% limits of agreement than AS and UCSD. All three methods tend to overestimate sleep and underestimate wake compared to PSG. Our HMM approach is also able to differentiate relatively active and sedentary individuals by quantifying variabilities in activity counts: individuals with higher estimated activity variabilities tend to show more frequent sedentary behaviors. Our unsupervised data-driven HMM algorithm achieved better performance than the commonly used Actiwatch software algorithm and the pre-trained UCSD algorithm. HMM can help expand the application of actigraphy in cases where PSG is hard to acquire and supervised methods cannot be trained. In addition, the estimated HMM parameters can characterize individual activity patterns and sedentary tendencies that can be further utilized in downstream analysis. 相似文献