首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background  

Soon after the first algorithms for RNA folding became available, it was recognised that the prediction of only one energetically optimal structure is insufficient to achieve reliable results. An in-depth analysis of the folding space as a whole appeared necessary to deduce the structural properties of a given RNA molecule reliably. Folding space analysis comprises various methods such as suboptimal folding, computation of base pair probabilities, sampling procedures and abstract shape analysis. Common to many approaches is the idea of partitioning the folding space into classes of structures, for which certain properties can be derived.  相似文献   

2.
MOTIVATION: Experimentalists have amassed extensive evidence over the past four decades that proteins appear to fold during production by the ribosome. Protein structure prediction methods, however, do not incorporate this property of folding. A thorough study to find the fingerprint of such sequential folding is the first step towards using it in folding algorithms, so assisting structure prediction. RESULTS: We explore computationally the existence of evidence for cotranslational folding, based on large sets of experimentally determined structures in the PDB. Our perspective is that cotranslational folding is the norm, but that the effect is masked in most classes. We show that it is most evident in alpha/beta proteins, confirming recent findings. We also find mild evidence that older proteins may fold cotranslationally. A tool is provided for determining, within a protein, where cotranslation is most evident.  相似文献   

3.
From structure prediction to genomic screens for novel non-coding RNAs   总被引:1,自引:0,他引:1  
Non-coding RNAs (ncRNAs) are receiving more and more attention not only as an abundant class of genes, but also as regulatory structural elements (some located in mRNAs). A key feature of RNA function is its structure. Computational methods were developed early for folding and prediction of RNA structure with the aim of assisting in functional analysis. With the discovery of more and more ncRNAs, it has become clear that a large fraction of these are highly structured. Interestingly, a large part of the structure is comprised of regular Watson-Crick and GU wobble base pairs. This and the increased amount of available genomes have made it possible to employ structure-based methods for genomic screens. The field has moved from folding prediction of single sequences to computational screens for ncRNAs in genomic sequence using the RNA structure as the main characteristic feature. Whereas early methods focused on energy-directed folding of single sequences, comparative analysis based on structure preserving changes of base pairs has been efficient in improving accuracy, and today this constitutes a key component in genomic screens. Here, we cover the basic principles of RNA folding and touch upon some of the concepts in current methods that have been applied in genomic screens for de novo RNA structures in searches for novel ncRNA genes and regulatory RNA structure on mRNAs. We discuss the strengths and weaknesses of the different strategies and how they can complement each other.  相似文献   

4.
The massively parallel genetic algorithm (GA) for RNA structure prediction uses the concepts of mutation, recombination, and survival of the fittest to evolve a population of thousands of possible RNA structures toward a solution structure. As described below, the properties of the algorithm are ideally suited to use in the prediction of possible folding pathways and functional intermediates of RNA molecules given their sequences. Utilizing Stem Trace, an interactive visualization tool for RNA structure comparison, analysis of not only the solution ensembles developed by the algorithm, but also the stages of development of each of these solutions, can give strong insight into these folding pathways. The GA allows the incorporation of information from biological experiments, making it possible to test the influence of particular interactions between structural elements on the dynamics of the folding pathway. These methods are used to reveal the folding pathways of the potato spindle tuber viroid (PSTVd) and the host killing mechanism of Escherichia coli plasmid R1, both of which are successfully explored through the combination of the GA and Stem Trace. We also present novel intermediate folds of each molecule, which appear to be phylogenetically supported, as determined by use of the methods described below.  相似文献   

5.
图论方法研究蛋白质结构预测问题   总被引:4,自引:0,他引:4  
图论方法在蛋白质结构预测中占有重要地位。该文简要介绍图的连通子图、图的最大团、图的完美匹配及图谱法在蛋白质结构预测中的应用。对国内外近年来应用这些方法在蛋白质3D结构预测及折叠的研究工作进行了回顾,并分析、比较了这几种方法的效果和特点。  相似文献   

6.
7.
Abstract

The process of designing novel RNA sequences by inverse RNA folding, available in tools such as RNAinverse and InfoRNA, can be thought of as a reconstruction of RNAs from secondary structure. In this reconstruction problem, no physical measures are considered as additional constraints that are independent of structure, aside of the goal to reach the same secondary structure as the input using energy minimization methods. An extension of the reconstruction problem can be formulated since in many cases of natural RNAs, it is desired to analyze the sequence and structure of RNA molecules using various physical quantifiable measures. In prior works that used secondary structure predictions, it has been shown that natural RNAs differ significantly from random RNAs in some of these measures. Thus, we relax the problem of reconstructing RNAs from secondary structure into reconstructing RNAs from shapes, and in turn incorporate physical quantities as constraints. This allows for the design of novel RNA sequences by inverse folding while considering various physical quantities of interest such as thermodynamic stability, mutational robustness, and linguistic complexity. At the expense of altering the number of nucleotides in stems and loops, for example, physical measures can be taken into account. We use evolutionary computation for the new reconstruction problem and illustrate the procedure on various natural RNAs.  相似文献   

8.
The process of designing novel RNA sequences by inverse RNA folding, available in tools such as RNAinverse and InfoRNA, can be thought of as a reconstruction of RNAs from secondary structure. In this reconstruction problem, no physical measures are considered as additional constraints that are independent of structure, aside of the goal to reach the same secondary structure as the input using energy minimization methods. An extension of the reconstruction problem can be formulated since in many cases of natural RNAs, it is desired to analyze the sequence and structure of RNA molecules using various physical quantifiable measures. In prior works that used secondary structure predictions, it has been shown that natural RNAs differ significantly from random RNAs in some of these measures. Thus, we relax the problem of reconstructing RNAs from secondary structure into reconstructing RNAs from shapes, and in turn incorporate physical quantities as constraints. This allows for the design of novel RNA sequences by inverse folding while considering various physical quantities of interest such as thermodynamic stability, mutational robustness, and linguistic complexity. At the expense of altering the number of nucleotides in stems and loops, for example, physical measures can be taken into account. We use evolutionary computation for the new reconstruction problem and illustrate the procedure on various natural RNAs.  相似文献   

9.
10.
The prediction of the secondary structure of proteins from their amino acid sequences remains a key component of many approaches to the protein folding problem. The most abundant form of regular secondary structure in proteins is the alpha-helix, in which specific residue preferences exist at the N-terminal locations. Propensities derived from these observed amino acid frequencies in the Protein Data Bank (PDB) database correlate well with experimental free energies measured for residues at different N-terminal positions in alanine-based peptides. We report a novel method to exploit this data to improve protein secondary structure prediction through identification of the correct N-terminal sequences in alpha-helices, based on existing popular methods for secondary structure prediction. With this algorithm, the number of correctly predicted alpha-helix start positions was improved from 30% to 38%, while the overall prediction accuracy (Q3) remained the same, using cross-validated testing. Although the algorithm was developed and tested on multiple sequence alignment-based secondary structure predictions, it was also able to improve the predictions of start locations by methods that use single sequences to make their predictions. Furthermore, the residue frequencies at N-terminal positions of the improved predictions better reflect those seen at the N-terminal positions of alpha-helices in proteins. This has implications for areas such as comparative modeling, where a more accurate prediction of the N-terminal regions of alpha-helices should benefit attempts to model adjacent loop regions. The algorithm is available as a Web tool, located at http://rocky.bms.umist.ac.uk/elephant.  相似文献   

11.
RNA molecules are important cellular components involved in many fundamental biological processes. Understanding the mechanisms behind their functions requires knowledge of their tertiary structures. Though computational RNA folding approaches exist, they often require manual manipulation and expert intuition; predicting global long-range tertiary contacts remains challenging. Here we develop a computational approach and associated program module (RNAJAG) to predict helical arrangements/topologies in RNA junctions. Our method has two components: junction topology prediction and graph modeling. First, junction topologies are determined by a data mining approach from a given secondary structure of the target RNAs; second, the predicted topology is used to construct a tree graph consistent with geometric preferences analyzed from solved RNAs. The predicted graphs, which model the helical arrangements of RNA junctions for a large set of 200 junctions using a cross validation procedure, yield fairly good representations compared to the helical configurations in native RNAs, and can be further used to develop all-atom models as we show for two examples. Because junctions are among the most complex structural elements in RNA, this work advances folding structure prediction methods of large RNAs. The RNAJAG module is available to academic users upon request.  相似文献   

12.
Molecular dynamics (MD) simulations can now predict ms-timescale folding processes of small proteins; however, this presently requires hundreds of thousands of CPU hours and is primarily applicable to short peptides with few long-range interactions. Larger and slower-folding proteins, such as many with extended β-sheet structure, would require orders of magnitude more time and computing resources. Furthermore, when the objective is to determine only which folding events are necessary and limiting, atomistic detail MD simulations can prove unnecessary. Here, we introduce the program tFolder as an efficient method for modelling the folding process of large β-sheet proteins using sequence data alone. To do so, we extend existing ensemble β-sheet prediction techniques, which permitted only a fixed anti-parallel β-barrel shape, with a method that predicts arbitrary β-strand/β-strand orientations and strand-order permutations. By accounting for all partial and final structural states, we can then model the transition from random coil to native state as a Markov process, using a master equation to simulate population dynamics of folding over time. Thus, all putative folding pathways can be energetically scored, including which transitions present the greatest barriers. Since correct folding pathway prediction is likely determined by the accuracy of contact prediction, we demonstrate the accuracy of tFolder to be comparable with state-of-the-art methods designed specifically for the contact prediction problem alone. We validate our method for dynamics prediction by applying it to the folding pathway of the well-studied Protein G. With relatively very little computation time, tFolder is able to reveal critical features of the folding pathways which were only previously observed through time-consuming MD simulations and experimental studies. Such a result greatly expands the number of proteins whose folding pathways can be studied, while the algorithmic integration of ensemble prediction with Markovian dynamics can be applied to many other problems.  相似文献   

13.
14.
In this review we consider several experimental and theoretical approaches for investigation of RNA folding and determination of nucleotides that play an important role upon folding of such molecules as tRNA and several classes of ribozymes. It has been shown that nucleotides in the D- and T-loop regions are the last to be involved in tRNA structure or they are not included in the folding nucleus of tRNA. Using the specially elaborated method SHAPE it has been demonstrated that the model of hierarchical folding which was recognized for a long time is not correct for tRNA folding. In the second part of the given review the algorithms and programs used for the prediction of secondary structures of RNA as well as for modeling of RNA folding are considered.  相似文献   

15.
Zhu Y  Li T  Li D  Zhang Y  Xiong W  Sun J  Tang Z  Chen G 《Amino acids》2012,42(5):1749-1755
Numerous methods for predicting γ-turns in proteins have been developed. However, the results they generally provided are not very good, with a Matthews correlation coefficient (MCC) ≤0.18. Here, an attempt has been made to develop a method to improve the accuracy of γ-turn prediction. First, we employ the geometric mean metric as optimal criterion to evaluate the performance of support vector machine for the highly imbalanced γ-turn dataset. This metric tries to maximize both the sensitivity and the specificity while keeping them balanced. Second, a predictor to generate protein shape string by structure alignment against the protein structure database has been designed and the predicted shape string is introduced as new variable for γ-turn prediction. Based on this perception, we have developed a new method for γ-turn prediction. After training and testing the benchmark dataset of 320 non-homologous protein chains using a fivefold cross-validation technique, the present method achieves excellent performance. The overall prediction accuracy Q total can achieve 92.2% and the MCC is 0.38, which outperform the existing γ-turn prediction methods. Our results indicate that the protein shape string is useful for predicting protein tight turns and it is reasonable to use the dihedral angle information as a variable for machine learning to predict protein folding. The dataset used in this work and the software to generate predicted shape string from structure database can be obtained from anonymous ftp site freely.  相似文献   

16.
Li ZC  Zhou XB  Lin YR  Zou XY 《Amino acids》2008,35(3):581-590
Structural class characterizes the overall folding type of a protein or its domain. Most of the existing methods for determining the structural class of a protein are based on a group of features that only possesses a kind of discriminative information for the prediction of protein structure class. However, different types of discriminative information associated with primary sequence have been completely missed, which undoubtedly has reduced the success rate of prediction. We present a novel method for the prediction of protein structure class by coupling the improved genetic algorithm (GA) with the support vector machine (SVM). This improved GA was applied to the selection of an optimized feature subset and the optimization of SVM parameters. Jackknife tests on the working datasets indicated that the prediction accuracies for the different classes were in the range of 97.8–100% with an overall accuracy of 99.5%. The results indicate that the approach has a high potential to become a useful tool in bioinformatics.  相似文献   

17.
Protein structure prediction   总被引:4,自引:0,他引:4  
J Garnier 《Biochimie》1990,72(8):513-524
Current methods developed for predicting protein structure are reviewed. The most widely used algorithms of Chou and Fasman and Garnier et al for predicting secondary structure are compared to the most recent ones including sequence similarity methods, neural network, pattern recognition or joint prediction methods. The best of these methods correctly predict 63-65% of the residues in the database with cross-validation for 3 conformations, helix, beta strand and coli with a standard deviation of 6-8% per protein. However, when a homologous protein is already in the database, the accuracy of prediction by the similarity peptide method of Levin and Garnier reaches about 90%. Some conclusions can be drawn on the mechanism of protein folding. As all the prediction methods only use the local sequence for prediction (+/- 8 residues maximum) one can infer that 65% of the conformation of a residue is dictated on average by the local sequence, the rest is brought by the folding. The best predicted proteins or peptide segments are those for which the folding has less effect on the conformation. Presently, prediction of tertiary structure is only of practical use when the structure of a homologous protein is already known. Amino acid alignment to define residues of equivalent spatial position is critical for modelling of the protein. We showed for serine proteases that secondary structure prediction can help to define a better alignment. Non-homologous segments of the polypeptide chain, such as loops, libraries of known loops and/or energy minimization with various force fields, are used without yet giving satisfactory solutions. An example of modelling by homology, aided by secondary structure prediction on 2 regulatory proteins, Fnr and FixK is presented.  相似文献   

18.
鉴于蛋白质折叠速率预测对研究其蛋白质功能的重要性,许多的科研工作者都开始对影响蛋白质折叠速率的因素进行研究。各种预测参数和方法被提出。利用蛋白质编码序列的不同特征参数,不同的二级结构及不同的折叠类的蛋白质对折叠速率的不同影响,我们选取蛋白质编码序列的新的特征值,即选取蛋白质序列的LZ复杂度,等电点等特征值。然后把这些特征值与20种氨基酸的属性αc、Cα、K0、Pβ、Ra、ΔASA、PI、ΔGhD、Nm、LZ、Mu、El融合,建立多元线性回归模型,并利用回归模型计算了13个全α类蛋白质、18个全β类蛋白质、13个混合类蛋白质和39个未分类蛋白质的ln(kf)与预测值之间的相关系数分别达到0.89、0.93、0.98、0.86。在Jack-knife方法的验证下发现在不同的结构中混合特征值与相应折叠速率有很好的相关性。结果表明,在蛋白质折叠过程中,蛋白质序列的LZ复杂度、等电点等特征值可能影响蛋白质的折叠速率及其结构。  相似文献   

19.
Ribonucleic acid (RNA) secondary structure prediction continues to be a significant challenge, in particular when attempting to model sequences with less rigidly defined structures, such as messenger and non-coding RNAs. Crucial to interpreting RNA structures as they pertain to individual phenotypes is the ability to detect RNAs with large structural disparities caused by a single nucleotide variant (SNV) or riboSNitches. A recently published human genome-wide parallel analysis of RNA structure (PARS) study identified a large number of riboSNitches as well as non-riboSNitches, providing an unprecedented set of RNA sequences against which to benchmark structure prediction algorithms. Here we evaluate 11 different RNA folding algorithms’ riboSNitch prediction performance on these data. We find that recent algorithms designed specifically to predict the effects of SNVs on RNA structure, in particular remuRNA, RNAsnp and SNPfold, perform best on the most rigorously validated subsets of the benchmark data. In addition, our benchmark indicates that general structure prediction algorithms (e.g. RNAfold and RNAstructure) have overall better performance if base pairing probabilities are considered rather than minimum free energy calculations. Although overall aggregate algorithmic performance on the full set of riboSNitches is relatively low, significant improvement is possible if the highest confidence predictions are evaluated independently.  相似文献   

20.
R Lück  S Grf    G Steger 《Nucleic acids research》1999,27(21):4208-4217
A tool for prediction of conserved secondary structure of a set of homologous single-stranded RNAs is presented. For each RNA of the set the structure distribution is calculated and stored in a base pair probability matrix. Gaps, resulting from a multiple sequence alignment of the RNA set, are introduced into the individual probability matrices. These 'aligned' probability matrices are summed up to give a consensus probability matrix emphasizing the conserved structural elements of the RNA set. Because the multiple sequence alignment is independent of any structural constraints, such an alignment may result in introduction of gaps into the homologous probability matrices that disrupt a common consensus structure. By use of its graphical user interface the presented tool allows the removal of such misalignments, which are easily recognized, from the individual probability matrices by optimizing the sequence alignment with respect to a structural alignment. From the consensus probability matrix a consensus structure is extracted, which is viewable in three different graphical representations. The functionality of the tool is demonstrated using a small set of U7 RNAs, which are involved in 3'-end processing of histone mRNA precursors. Supplementary Material lists further results obtained. Advantages and drawbacks of the tool are discussed in comparison to several other algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号