首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Successful prediction of the beta-hairpin motif will be helpful for understanding the of the fold recognition. Some algorithms have been proposed for the prediction of beta-hairpin motifs. However, the parameters used by these methods were primarily based on the amino acid sequences. Here, we proposed a novel model for predicting beta-hairpin structure based on the chemical shift. Firstly, we analyzed the statistical distribution of chemical shifts of six nuclei in not beta-hairpin and beta-hairpin motifs. Secondly, we used these chemical shifts as features combined with three algorithms to predict beta-hairpin structure. Finally, we achieved the best prediction, namely sensitivity of 92%, the specificity of 94% with 0.85 of Mathew’s correlation coefficient using quadratic discriminant analysis algorithm, which is clearly superior to the same method for the prediction of beta-hairpin structure from 20 amino acid compositions in the three-fold cross-validation. Our finding showed that the chemical shift is an effective parameter for beta-hairpin prediction, suggesting the quadratic discriminant analysis is a powerful algorithm for the prediction of beta-hairpin.  相似文献   

2.
Computational tools for prediction of the secondary structure of two or more interacting nucleic acid molecules are useful for understanding mechanisms for ribozyme function, determining the affinity of an oligonucleotide primer to its target, and designing good antisense oligonucleotides, novel ribozymes, DNA code words, or nanostructures. Here, we introduce new algorithms for prediction of the minimum free energy pseudoknot-free secondary structure of two or more nucleic acid molecules, and for prediction of alternative low-energy (sub-optimal) secondary structures for two nucleic acid molecules. We provide a comprehensive analysis of our predictions against secondary structures of interacting RNA molecules drawn from the literature. Analysis of our tools on 17 sequences of up to 200 nucleotides that do not form pseudoknots shows that they have 79% accuracy, on average, for the minimum free energy predictions. When the best of 100 sub-optimal foldings is taken, the average accuracy increases to 91%. The accuracy decreases as the sequences increase in length and as the number of pseudoknots and tertiary interactions increases. Our algorithms extend the free energy minimization algorithm of Zuker and Stiegler for secondary structure prediction, and the sub-optimal folding algorithm by Wuchty et al. Implementations of our algorithms are freely available in the package MultiRNAFold.  相似文献   

3.
Thermodynamic stability and mutational robustness of secondary structure are critical to the function and evolutionary longevity of RNA molecules. We hypothesize that natural and artificial selection for functional molecules favors the formation of structures that are stable to both thermal and mutational perturbation. There is little direct evidence, however, that functional RNA molecules have been selected for their stability. Here we use thermodynamic secondary structure prediction algorithms to compare the thermal and mutational robustness of over 1000 naturally and artificially evolved molecules. Although we find evidence for the evolution of both types of stability in both sets of molecules, the naturally evolved functional RNA molecules were significantly more stable than those selected in vitro, and artificially evolved catalysts (ribozymes) were more stable than artificially evolved binding species (aptamers). The thermostability of RNA molecules bred in the laboratory is probably not constrained by a lack of suitable variation in the sequence pool but, rather, by intrinsic biases in the selection process.  相似文献   

4.
Thermodynamic folding algorithms and structure probing experiments are commonly used to determine the secondary structure of RNAs. Here we propose a formal framework to reconcile information from both prediction algorithms and probing experiments. The thermodynamic energy parameters are adjusted using 'pseudo-energies' to minimize the discrepancy between prediction and experiment. Our framework differs from related approaches that used pseudo-energies in several key aspects. (i) The energy model is only changed when necessary and no adjustments are made if prediction and experiment are consistent. (ii) Pseudo-energies remain biophysically interpretable and hold positional information where experiment and model disagree. (iii) The whole thermodynamic ensemble of structures is considered thus allowing to reconstruct mixtures of suboptimal structures from seemingly contradicting data. (iv) The noise of the energy model and the experimental data is explicitly modeled leading to an intuitive weighting factor through which the problem can be seen as folding with 'soft' constraints of different strength. We present an efficient algorithm to iteratively calculate pseudo-energies within this framework and demonstrate how this approach can be used in combination with SHAPE chemical probing data to improve secondary structure prediction. We further demonstrate that the pseudo-energies correlate with biophysical effects that are known to affect RNA folding such as chemical nucleotide modifications and protein binding.  相似文献   

5.
Ribonucleic acid (RNA) secondary structure prediction continues to be a significant challenge, in particular when attempting to model sequences with less rigidly defined structures, such as messenger and non-coding RNAs. Crucial to interpreting RNA structures as they pertain to individual phenotypes is the ability to detect RNAs with large structural disparities caused by a single nucleotide variant (SNV) or riboSNitches. A recently published human genome-wide parallel analysis of RNA structure (PARS) study identified a large number of riboSNitches as well as non-riboSNitches, providing an unprecedented set of RNA sequences against which to benchmark structure prediction algorithms. Here we evaluate 11 different RNA folding algorithms’ riboSNitch prediction performance on these data. We find that recent algorithms designed specifically to predict the effects of SNVs on RNA structure, in particular remuRNA, RNAsnp and SNPfold, perform best on the most rigorously validated subsets of the benchmark data. In addition, our benchmark indicates that general structure prediction algorithms (e.g. RNAfold and RNAstructure) have overall better performance if base pairing probabilities are considered rather than minimum free energy calculations. Although overall aggregate algorithmic performance on the full set of riboSNitches is relatively low, significant improvement is possible if the highest confidence predictions are evaluated independently.  相似文献   

6.
The high levels of sequence diversity and rapid rates of evolution of HIV-1 represent the main challenges for developing effective therapies. However, there are constraints imposed by the three-dimensional protein structure that affect the sequence space accessible to the evolution of HIV-1. Here, we present a strategy for predicting the set of possible amino acid replacements in HIV. Our approach is based on the identification of likely amino acid changes in the context of these structural constraints using environment-specific substitution matrices as well as considering the physical constraints imposed by local structure. Assessment of the power of various published algorithms in predicting the evolution of HIV-1 Gag P17 shows that it is possible to use these methods to make accurate predictions of the sequence diversity. Our own method, SubFit, uses knowledge of local structural constraints; it achieves similar prediction success with the best-performing methods. We also show that erroneous predictions are largely due to infrequently occurring amino acids that will probably have severe fitness costs for the protein. Future improvements; for example, incorporating covariation and immunological constraints will permit more reliable prediction of viral evolution.  相似文献   

7.
Despite more than 50 years of effort, the causes and mechanisms of small rodent population fluctuations remain unknown. The two major questions are as follows: (1) what is the cause of population decline and (2) what is the cause of cyclicity and its geographical variation? At present, no hypothesis can provide answers to both these questions. Recently, progress has been made by Boonstra (1994), who proposed the senescence hypothesis to explain the cause of cyclic decline in population numbers. Here, we tested the main prediction that voles in decline are older than in other phases of the cycle, by analysing changes in age structure in a fluctuating population of the bank vole (Clethrionomys glareolus). The results generally support this prediction; however, the differences in absolute age seem to be too small to explain the occurrence of senescent animals exclusively in declines. We propose a new model to explain changes in age structure and the mechanisms behind the decline and geographic variation in cyclicity. It is based on the idea that voles are oldest in declines, developed independently of Boonstra. However, it differs in three respects: (1) it is more general and thereby applicable to the whole cycle; (2) density-dependent changes in age structure are based on the bimodality in a female's age at first reproduction; and (3) it stresses developmental rather than physiological changes in the quality of decline of animals as being relevant to the rate of senescence. We propose that seasonality of the environment is a principal candidate to explain geographical variation in cyclicity. We present substantial theoretical and empirical evidence to indicate that in more seasonal environments with shortened vegetation periods, population dynamics is inevitably less stable due to increased variation in two critical parameters – age at first reproduction and the length of the breeding season – which determine population growth rates. Any external perturbation may then easily destabilize population numbers. The general applicability of the seasonality-senescence hypothesis to other mammalian species decreases with declining r and increasing life span. The hypothesis is falsifiable, and testable predictions are provided.  相似文献   

8.
A content-balancing accuracy index, called Q(9), has been proposed to evaluate algorithms of protein secondary structure prediction. Here the content-balancing means that the evaluation is independent of the contents of helix, strand and coil in the protein being predicted. It is shown that Q(9) is much superior to the widely used index Q(3). Therefore, algorithms are more objectively evaluated by Q(9) than Q(3). Based on 396 non-homologous proteins, five algorithms of secondary structure prediction were evaluated and compared by the new index Q(9). Of the five algorithms, PHD turned out to be the unique algorithm with an average Q(9) better than 60%. Based on the new index, it is shown that the performance of the consensus method based on a jury-decision from several algorithms is even worse than that of the best individual method. Rather than Q(3), we believe that Q(9) should be used to evaluate algorithms of protein secondary structure prediction in future studies in order to improve prediction quality.  相似文献   

9.
Analysis of network dynamics became a focal point to understand and predict changes of complex systems. Here we introduce Turbine, a generic framework enabling fast simulation of any algorithmically definable dynamics on very large networks. Using a perturbation transmission model inspired by communicating vessels, we define a novel centrality measure: perturbation centrality. Hubs and inter-modular nodes proved to be highly efficient in perturbation propagation. High perturbation centrality nodes of the Met-tRNA synthetase protein structure network were identified as amino acids involved in intra-protein communication by earlier studies. Changes in perturbation centralities of yeast interactome nodes upon various stresses well recapitulated the functional changes of stressed yeast cells. The novelty and usefulness of perturbation centrality was validated in several other model, biological and social networks. The Turbine software and the perturbation centrality measure may provide a large variety of novel options to assess signaling, drug action, environmental and social interventions.  相似文献   

10.
The functional annotation of proteins is one of the most important tasks in the post-genomic era. Although many computational approaches have been developed in recent years to predict protein function, most of these traditional algorithms do not take interrelationships among functional terms into account, such as different GO terms usually coannotate with some common proteins. In this study, we propose a new functional similarity measure in the form of Jaccard coefficient to quantify these interrelationships and also develop a framework for incorporating GO term similarity into protein function prediction process. The experimental results of cross-validation on S. cerevisiae and Homo sapiens data sets demonstrate that our method is able to improve the performance of protein function prediction. In addition, we find that small size terms associated with a few of proteins obtain more benefit than the large size ones when considering functional interrelationships. We also compare our similarity measure with other two widely used measures, and results indicate that when incorporated into function prediction algorithms, our proposed measure is more effective. Experiment results also illustrate that our algorithms outperform two previous competing algorithms, which also take functional interrelationships into account, in prediction accuracy. Finally, we show that our method is robust to annotations in the database which are not complete at present. These results give new insights about the importance of functional interrelationships in protein function prediction.  相似文献   

11.
Accurate prediction of RNA pseudoknotted secondary structures from the base sequence is a challenging computational problem. Since prediction algorithms rely on thermodynamic energy models to identify low-energy structures, prediction accuracy relies in large part on the quality of free energy change parameters. In this work, we use our earlier constraint generation and Boltzmann likelihood parameter estimation methods to obtain new energy parameters for two energy models for secondary structures with pseudoknots, namely, the Dirks–Pierce (DP) and the Cao–Chen (CC) models. To train our parameters, and also to test their accuracy, we create a large data set of both pseudoknotted and pseudoknot-free secondary structures. In addition to structural data our training data set also includes thermodynamic data, for which experimentally determined free energy changes are available for sequences and their reference structures. When incorporated into the HotKnots prediction algorithm, our new parameters result in significantly improved secondary structure prediction on our test data set. Specifically, the prediction accuracy when using our new parameters improves from 68% to 79% for the DP model, and from 70% to 77% for the CC model.  相似文献   

12.
Current approaches to RNA structure prediction range from physics-based methods, which rely on thousands of experimentally measured thermodynamic parameters, to machine-learning (ML) techniques. While the methods for parameter estimation are successfully shifting toward ML-based approaches, the model parameterizations so far remained fairly constant. We study the potential contribution of increasing the amount of information utilized by RNA folding prediction models to the improvement of their prediction quality. This is achieved by proposing novel models, which refine previous ones by examining more types of structural elements, and larger sequential contexts for these elements. Our proposed fine-grained models are made practical thanks to the availability of large training sets, advances in machine-learning, and recent accelerations to RNA folding algorithms. We show that the application of more detailed models indeed improves prediction quality, while the corresponding running time of the folding algorithm remains fast. An additional important outcome of this experiment is a new RNA folding prediction model (coupled with a freely available implementation), which results in a significantly higher prediction quality than that of previous models. This final model has about 70,000 free parameters, several orders of magnitude more than previous models. Being trained and tested over the same comprehensive data sets, our model achieves a score of 84% according to the F?-measure over correctly-predicted base-pairs (i.e., 16% error rate), compared to the previously best reported score of 70% (i.e., 30% error rate). That is, the new model yields an error reduction of about 50%. Trained models and source code are available at www.cs.bgu.ac.il/?negevcb/contextfold.  相似文献   

13.
The Critical Assessment of PRedicted Interactions (CAPRI) experiment was designed in 2000 to test protein docking algorithms in blind predictions of the structure of protein-protein complexes. In four years, 17 complexes offered by crystallographers as targets prior to publication, have been subjected to structure prediction by docking their two components. Models of these complexes were submitted by predictor groups and assessed by comparing their geometry to the X-ray structure and by evaluating the quality of the prediction of the regions of interaction and of the pair wise residue contacts. Prediction was successful on 12 of the 17 targets, most of the failures being due to large conformation changes that the algorithms could not cope with. Progress in the prediction quality observed in four years indicates that the experiment is a powerful incentive to develop new procedures that allow for flexibility during docking and incorporate nonstructural information. We therefore call upon structural biologists who study protein-protein complexes to provide targets for further rounds of CAPRI predictions.  相似文献   

14.
Automated function prediction (AFP) methods increasingly use knowledge discovery algorithms to map sequence, structure, literature, and/or pathway information about proteins whose functions are unknown into functional ontologies, typically (a portion of) the Gene Ontology (GO). While there are a growing number of methods within this paradigm, the general problem of assessing the accuracy of such prediction algorithms has not been seriously addressed. We present first an application for function prediction from protein sequences using the POSet Ontology Categorizer (POSOC) to produce new annotations by analyzing collections of GO nodes derived from annotations of protein BLAST neighborhoods. We then also present hierarchical precision and hierarchical recall as new evaluation metrics for assessing the accuracy of any predictions in hierarchical ontologies, and discuss results on a test set of protein sequences. We show that our method provides substantially improved hierarchical precision (measure of predictions made that are correct) when applied to the nearest BLAST neighbors of target proteins, as compared with simply imputing that neighborhood's annotations to the target. Moreover, when our method is applied to a broader BLAST neighborhood, hierarchical precision is enhanced even further. In all cases, such increased hierarchical precision performance is purchased at a modest expense of hierarchical recall (measure of all annotations that get predicted at all).  相似文献   

15.
Swanson R  Vannucci M  Tsai JW 《Proteins》2009,74(3):701-711
Protein structure prediction has a number of important ad hoc similarity measures for evaluating predictions, but would benefit from a measure that is able to provide a common framework for a broad range of comparisons. Here we show that a mutual information-like measure can provide a comprehensive framework for evaluating protein structure prediction of all types. We discuss the concept of information, its application to secondary structure, and the obstacle to applying it to 3D structure. On the basis of the insights from the secondary structure case, we present an approach to work around the 3D difficulties, and develop a method to measure the mutual information provided by a 3D structure prediction. We integrate the evaluation of all types of protein structure prediction into a single framework, and compare the amount of information provided by various prediction methods, including secondary structure prediction. Within this broadened framework, the idea that structure is better preserved than sequence during evolution is evaluated quantitatively for the globin family. A nearly perfect sequence match in the globin family corresponds to about 300 bits of information, whereas a nearly perfect structural match for the same two proteins corresponds to about 2500 bits of information, where bits of information describes the probability of obtaining a match of similar closeness by chance. Mutual information provides both a theoretical basis for evaluating structure similarity and an explanatory surround for existing similarity measures.  相似文献   

16.
17.
Structural and functional annotation of the large and growing database of genomic sequences is a major problem in modern biology. Protein structure prediction by detecting remote homology to known structures is a well-established and successful annotation technique. However, the broad spectrum of evolutionary change that accompanies the divergence of close homologues to become remote homologues cannot easily be captured with a single algorithm. Recent advances to tackle this problem have involved the use of multiple predictive algorithms available on the Internet. Here we demonstrate how such ensembles of predictors can be designed in-house under controlled conditions and permit significant improvements in recognition by using a concept taken from protein loop energetics and applying it to the general problem of 3D clustering. We have developed a stringent test that simulates the situation where a protein sequence of interest is submitted to multiple different algorithms and not one of these algorithms can make a confident (95%) correct assignment. A method of meta-server prediction (Phyre) that exploits the benefits of a controlled environment for the component methods was implemented. At 95% precision or higher, Phyre identified 64.0% of all correct homologous query-template relationships, and 84.0% of the individual test query proteins could be accurately annotated. In comparison to the improvement that the single best fold recognition algorithm (according to training) has over PSI-Blast, this represents a 29.6% increase in the number of correct homologous query-template relationships, and a 46.2% increase in the number of accurately annotated queries. It has been well recognised in fold prediction, other bioinformatics applications, and in many other areas, that ensemble predictions generally are superior in accuracy to any of the component individual methods. However there is a paucity of information as to why the ensemble methods are superior and indeed this has never been systematically addressed in fold recognition. Here we show that the source of ensemble power stems from noise reduction in filtering out false positive matches. The results indicate greater coverage of sequence space and improved model quality, which can consequently lead to a reduction in the experimental workload of structural genomics initiatives.  相似文献   

18.
A pair of neural network-based algorithms is presented for predicting the tertiary structural class and the secondary structure of proteins. Each algorithm realizes improvements in accuracy based on information provided by the other. Structural class prediction of proteins nonhomologous to any in the training set is improved significantly, from 62.3% to 73.9%, and secondary structure prediction accuracy improves slightly, from 62.26% to 62.64%. A number of aspects of neural network optimization and testing are examined. They include network overtraining and an output filter based on a rolling average. Secondary structure prediction results vary greatly depending on the particular proteins chosen for the training and test sets; consequently, an appropriate measure of accuracy reflects the more unbiased approach of “jackknife” cross-validation (testing each protein in the database individually).  相似文献   

19.
Multiprotein systems mediate most regulatory processes in living organisms. Although the structures of the individual proteins are often defined, less is known of the structures of multiprotein systems. Computational methods for predicting interfaces, using evolutionary conservation and/or physicochemical data, have been developed. Here we consider the use of solvent accessibility, residue propensity, and hydrophobicity, in conjunction with secondary structure data, as prediction parameters. We analyze the influence of residue type and secondary structure on solvent accessibility and define a measure of "relative exposedness." Clustering abnormally high scoring residues provides a basis for predicting interaction sites. The analysis is extended to investigate abnormally exposed secondary structure elements, particularly beta-sheet strands. We show that surface-exposed beta-strands lacking protective features are more likely to be found at protein-protein interfaces, allowing us to create an algorithm with approximately 68% and approximately 75% accuracy in differentiating between interacting and edge strands in isolated beta-strands and beta-sheet strands, respectively. These methods of identifying abnormally exposed surface regions are combined in an algorithm, which, on a data set of 77 unbound and disjoint (single chain extracted from complex) structures, predicts 79% of the protein-protein interfaces correctly. If enzyme-inhibitor complexes, where the inhibitor mimics a nonprotein substrate, are excluded, the accuracy increases to 85%.  相似文献   

20.
RNA duplex stability depends strongly on ionic conditions, and inside cells RNAs are exposed to both monovalent and multivalent ions. Despite recent advances, we do not have general methods to quantitatively account for the effects of monovalent and multivalent ions on RNA stability, and the thermodynamic parameters for secondary structure prediction have only been derived at 1M [Na(+)]. Here, by mechanically unfolding and folding a 20 bp RNA hairpin using optical tweezers, we study the RNA thermodynamics and kinetics at different monovalent and mixed monovalent/Mg(2+) salt conditions. We measure the unfolding and folding rupture forces and apply Kramers theory to extract accurate information about the hairpin free energy landscape under tension at a wide range of ionic conditions. We obtain non-specific corrections for the free energy of formation of the RNA hairpin and measure how the distance of the transition state to the folded state changes with force and ionic strength. We experimentally validate the Tightly Bound Ion model and obtain values for the persistence length of ssRNA. Finally, we test the approximate rule by which the non-specific binding affinity of divalent cations at a given concentration is equivalent to that of monovalent cations taken at 100-fold concentration for small molecular constructs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号