首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Free energy minimization has been the most popular method for RNA secondary structure prediction for decades. It is based on a set of empirical free energy change parameters derived from experiments using a nearest-neighbor model. In this study, a program, MaxExpect, that predicts RNA secondary structure by maximizing the expected base-pair accuracy, is reported. This approach was first pioneered in the program CONTRAfold, using pair probabilities predicted with a statistical learning method. Here, a partition function calculation that utilizes the free energy change nearest-neighbor parameters is used to predict base-pair probabilities as well as probabilities of nucleotides being single-stranded. MaxExpect predicts both the optimal structure (having highest expected pair accuracy) and suboptimal structures to serve as alternative hypotheses for the structure. Tested on a large database of different types of RNA, the maximum expected accuracy structures are, on average, of higher accuracy than minimum free energy structures. Accuracy is measured by sensitivity, the percentage of known base pairs correctly predicted, and positive predictive value (PPV), the percentage of predicted pairs that are in the known structure. By favoring double-strandedness or single-strandedness, a higher sensitivity or PPV of prediction can be favored, respectively. Using MaxExpect, the average PPV of optimal structure is improved from 66% to 68% at the same sensitivity level (73%) compared with free energy minimization.  相似文献   

2.
A partition function calculation for RNA secondary structure is presented that uses a current set of nearest neighbor parameters for conformational free energy at 37 degrees C, including coaxial stacking. For a diverse database of RNA sequences, base pairs in the predicted minimum free energy structure that are predicted by the partition function to have high base pairing probability have a significantly higher positive predictive value for known base pairs. For example, the average positive predictive value, 65.8%, is increased to 91.0% when only base pairs with probability of 0.99 or above are considered. The quality of base pair predictions can also be increased by the addition of experimentally determined constraints, including enzymatic cleavage, flavin mono-nucleotide cleavage, and chemical modification. Predicted secondary structures can be color annotated to demonstrate pairs with high probability that are therefore well determined as compared to base pairs with lower probability of pairing.  相似文献   

3.
A crucial step in the determination of the three-dimensional native structures of RNA is the prediction of their secondary structures, which are stable independent of the tertiary fold. Accurate prediction of the secondary structure requires context-dependent estimates of the interaction parameters. We have exploited the growing database of natively folded RNA structures in the Protein Data Bank (PDB) to obtain stacking interaction parameters using a knowledge-based approach. Remarkably, the calculated values of the resulting statistical potentials (SPs) are in excellent agreement with the parameters determined using measurements in small oligonucleotides. We validate the SPs by predicting 74% of the base-pairs in a dataset of structures using the ViennaRNA package. Interestingly, this number is similar to that obtained using the measured thermodynamic parameters. We also tested the efficacy of the SP in predicting secondary structure by using gapless threading, which we advocate as an alternative method for rapidly predicting RNA structures. For RNA molecules with less than 700 nucleotides, about 70% of the native base-pairs are correctly predicted. As a further validation of the SPs we calculated Z-scores, which measure the relative stability of the native state with respect to a manifold of higher free energy states. The computed Z-scores agree with estimates made using calorimetric measurements for a few RNA molecules. Structural analysis was used to rationalize the success and failures of SP and experimentally determined parameters. First, from the near perfect linear relationship between the number of native base-pairs and sequence length, we show that nearly 46% of nucleotides are not in stacks. Second, by analyzing the suboptimal structures that are generated in gapless threading we show that the SPs and experimentally determined parameters are most successful in predicting stacks that end in hairpins. These results show that further improvement in secondary structure prediction requires reliable estimates of interaction parameters for loops, bulges, and stacks that do not end in hairpins.  相似文献   

4.
Prediction of RNA secondary structure by free energy minimization has been the standard for over two decades. Here we describe a novel method that forsakes this paradigm for predictions based on Boltzmann-weighted structure ensemble. We introduce the notion of a centroid structure as a representative for a set of structures and describe a procedure for its identification. In comparison with the minimum free energy (MFE) structure using diverse types of structural RNAs, the centroid of the ensemble makes 30.0% fewer prediction errors as measured by the positive predictive value (PPV) with marginally improved sensitivity. The Boltzmann ensemble can be separated into a small number (3.2 on average) of clusters. Among the centroids of these clusters, the "best cluster centroid" as determined by comparison to the known structure simultaneously improves PPV by 46.5% and sensitivity by 21.7%. For 58% of the studied sequences for which the MFE structure is outside the cluster containing the best centroid, the improvements by the best centroid are 62.5% for PPV and 31.4% for sensitivity. These results suggest that the energy well containing the MFE structure under the current incomplete energy model is often different from the one for the unavailable complete model that presumably contains the unique native structure. Centroids are available on the Sfold server at http://sfold.wadsworth.org.  相似文献   

5.
A complete set of nearest neighbor parameters to predict the enthalpy change of RNA secondary structure formation was derived. These parameters can be used with available free energy nearest neighbor parameters to extend the secondary structure prediction of RNA sequences to temperatures other than 37°C. The parameters were tested by predicting the secondary structures of sequences with known secondary structure that are from organisms with known optimal growth temperatures. Compared with the previous set of enthalpy nearest neighbor parameters, the sensitivity of base pair prediction improved from 65.2 to 68.9% at optimal growth temperatures ranging from 10 to 60°C. Base pair probabilities were predicted with a partition function and the positive predictive value of structure prediction is 90.4% when considering the base pairs in the lowest free energy structure with pairing probability of 0.99 or above. Moreover, a strong correlation is found between the predicted melting temperatures of RNA sequences and the optimal growth temperatures of the host organism. This indicates that organisms that live at higher temperatures have evolved RNA sequences with higher melting temperatures.  相似文献   

6.

Background

A detailed understanding of an RNA's correct secondary and tertiary structure is crucial to understanding its function and mechanism in the cell. Free energy minimization with energy parameters based on the nearest-neighbor model and comparative analysis are the primary methods for predicting an RNA's secondary structure from its sequence. Version 3.1 of Mfold has been available since 1999. This version contains an expanded sequence dependence of energy parameters and the ability to incorporate coaxial stacking into free energy calculations. We test Mfold 3.1 by performing the largest and most phylogenetically diverse comparison of rRNA and tRNA structures predicted by comparative analysis and Mfold, and we use the results of our tests on 16S and 23S rRNA sequences to assess the improvement between Mfold 2.3 and Mfold 3.1.

Results

The average prediction accuracy for a 16S or 23S rRNA sequence with Mfold 3.1 is 41%, while the prediction accuracies for the majority of 16S and 23S rRNA structures tested are between 20% and 60%, with some having less than 20% prediction accuracy. The average prediction accuracy was 71% for 5S rRNA and 69% for tRNA. The majority of the 5S rRNA and tRNA sequences have prediction accuracies greater than 60%. The prediction accuracy of 16S rRNA base-pairs decreases exponentially as the number of nucleotides intervening between the 5' and 3' halves of the base-pair increases.

Conclusion

Our analysis indicates that the current set of nearest-neighbor energy parameters in conjunction with the Mfold folding algorithm are unable to consistently and reliably predict an RNA's correct secondary structure. For 16S or 23S rRNA structure prediction, Mfold 3.1 offers little improvement over Mfold 2.3. However, the nearest-neighbor energy parameters do work well for shorter RNA sequences such as tRNA or 5S rRNA, or for larger rRNAs when the contact distance between the base-pairs is less than 100 nucleotides.  相似文献   

7.
RNA junctions are important structural elements that form when three or more helices come together in space in the tertiary structures of RNA molecules. Determining their structural configuration is important for predicting RNA 3D structure. We introduce a computational method to predict, at the secondary structure level, the coaxial helical stacking arrangement in junctions, as well as classify the junction topology. Our approach uses a data mining approach known as random forests, which relies on a set of decision trees trained using length, sequence and other variables specified for any given junction. The resulting protocol predicts coaxial stacking within three- and four-way junctions with an accuracy of 81% and 77%, respectively; the accuracy increases to 83% and 87%, respectively, when knowledge from the junction family type is included. Coaxial stacking predictions for the five to ten-way junctions are less accurate (60%) due to sparse data available for training. Additionally, our application predicts the junction family with an accuracy of 85% for three-way junctions and 74% for four-way junctions. Comparisons with other methods, as well applications to unsolved RNAs, are also presented. The web server Junction-Explorer to predict junction topologies is freely available at: http://bioinformatics.njit.edu/junction.  相似文献   

8.
Thermodynamic parameters of coaxial stacking at complementary helix-helix interfaces GX*pYG/CZVC (X,Y=A,C,T,G;*-nick) created by contiguous oligonucleotide hybridization were determined. The data obtained were compared to the thermodynamic parameters of coaxial stacking at the interfaces CX*pYC/GZVG. Multiple linear regression analysis has revealed that the free-energy increments of interaction for the contacts GX*pYG/CZVC and CX*pYC/GZVG can be described by a set of uniform Delta G degrees(X*pY/ZV) values. The difference in the observed free-energy of the coaxial stacking between the two sets is defined by the contribution from the factors reflecting structural differences between compared DNA duplexes.  相似文献   

9.
Contiguous stacking hybridization of oligodeoxyribonucleotides with a stem of preformed minihairpin structure of a DNA template was studied with the use of UV‐melting technique. It was shown that the free‐energy of the coaxial stacking interaction (ΔG°ST at 37°C, 1 M NaCl, pH 7.4) at the complementary interface XA*pTY/ZATV (an asterisk stands for a nick) strongly depends on the type of nearest neighbor bases X and Y flanking the nicked dinucleotide step. The maximum efficiency of the coaxial stacking was observed for the PuA*pTPy/PuATPy interface, whereas the minimum efficiency was obtained for the PyA*pTPu/PyATPu interface. A 5′‐phosphate residue in the nick enhances the coaxial stacking. In dependence on duplex structure the observed efficiency of A*T/AT coaxial stacking varied from (? 0.97 kcal/mol) for unphosphorylated TA*TA/TATA interface to three‐fold higher value (? 2.78 kcal/mol) for GA*pTT/AATC interface.  相似文献   

10.
Parallel thermodynamic analysis of the coaxial stacking effect of two bases localized in one strand of DNA duplexes has been performed. Oligonucleotides were immobilized in an array of three-dimensional polyacrylamide gel pads of microchips (MAGIChips‘). The stacking effect was studied for all combinations of two bases and assessed by measuring the increase in melting temperature and in the free energy of duplexes formed by 5mers stacked to microchip-immobilized 10mers. For any given interface, the effect was studied for perfectly paired bases, as well as terminal mismatches, single base overlaps, single and double gaps, and modified terminal bases. Thermodynamic parameters of contiguous stacking determined by using microchips closely correlated with data obtained in solution. The extension of immobilized oligonucleotides with 5,6-dihydroxyuridine, a urea derivative of deoxyribose, or by phosphate, decreased the stacking effect moderately, while extension with FITC or Texas Red virtually eliminated stacking. The extension of the immobilized oligonucleotides with either acridine or 5-nitroindole increased stacking to mispaired bases and in some GC-rich interfaces. The measurements of stacking parameters were performed in different melting buffers. Although melting temperatures of AT- and GC-rich oligonucleotides in 5 M tetramethylammonium chloride were equalized, the energy of stacking interaction was significantly diminished.  相似文献   

11.
This work investigates whether mRNA has a lower estimated folding free energy than random sequences. The free energy estimates are calculated by the mfold program for prediction of RNA secondary structures. For a set of 46 mRNAs it is shown that the predicted free energy is not significantly different from random sequences with the same dinucleotide distribution. For random sequences with the same mononucleotide distribution it has previously been shown that the native mRNA sequences have a lower predicted free energy, which indicates a more stable structure than random sequences. However, dinucleotide content is important when assessing the significance of predicted free energy as the physical stability of RNA secondary structure is known to depend on dinucleotide base stacking energies. Even known RNA secondary structures, like tRNAs, can be shown to have predicted free energies indistinguishable from randomized sequences. This suggests that the predicted free energy is not always a good determinant for RNA folding.  相似文献   

12.
Classical atom-scale molecular dynamics simulations, constrained free energy calculations, and quantum mechanical (QM) calculations are employed to study the diffusive translocation of ciprofloxacin (CPFX) across lipid membranes. CPFX is considered here as a representative of the fluoroquinolone antibiotics class. Neutral and zwitterionic CPFX coexist at physiological pH, with the latter being predominant. Simulations reveal that only the neutral form permeates the bilayer, and it does so through a novel mechanism that involves dissolution of concerted stacks of zwitterionic ciprofloxacins. Subsequent QM analysis of the observed molecular stacking shows the important role of partial charge neutralization in the stacks, highlighting how the zwitterionic form of the drug is neutralized for translocation. The findings propose a translocation mechanism in which zwitterionic CPFX molecules approach the membrane in stacks, but they diffuse through the membrane as neutral CPFX monomers due to intermolecular transfer of protons favored by partial solvation loss. The mechanism is expected to be of importance in the permeation and translocation of a variety of ampholitic drugs with stacking tendencies.  相似文献   

13.
The accurate prediction of the secondary and tertiary structure of an RNA with different folding algorithms is dependent on several factors, including the energy functions. However, an RNA higher-order structure cannot be predicted accurately from its sequence based on a limited set of energy parameters. The inter- and intramolecular forces between this RNA and other small molecules and macromolecules, in addition to other factors in the cell such as pH, ionic strength, and temperature, influence the complex dynamics associated with transition of a single stranded RNA to its secondary and tertiary structure. Since all of the factors that affect the formation of an RNAs 3D structure cannot be determined experimentally, statistically derived potential energy has been used in the prediction of protein structure. In the current work, we evaluate the statistical free energy of various secondary structure motifs, including base-pair stacks, hairpin loops, and internal loops, using their statistical frequency obtained from the comparative analysis of more than 50,000 RNA sequences stored in the RNA Comparative Analysis Database (rCAD) at the Comparative RNA Web (CRW) Site. Statistical energy was computed from the structural statistics for several datasets. While the statistical energy for a base-pair stack correlates with experimentally derived free energy values, suggesting a Boltzmann-like distribution, variation is observed between different molecules and their location on the phylogenetic tree of life. Our statistical energy values calculated for several structural elements were utilized in the Mfold RNA-folding algorithm. The combined statistical energy values for base-pair stacks, hairpins and internal loop flanks result in a significant improvement in the accuracy of secondary structure prediction; the hairpin flanks contribute the most.  相似文献   

14.
Monocytes and neutrophils play key roles in the cytokine storm triggered by SARS-CoV-2 infection, which changes their conformation and function. These changes are detectable at the cellular and molecular level and may be different to what is observed in other respiratory infections. Here, we applied machine learning (ML) to develop and validate an algorithm to diagnose COVID-19 using blood parameters. In this retrospective single-center study, 49 hemogram parameters from 12,321 patients with clinical suspicion of COVID-19 and tested by RT-PCR (4239 positive and 8082 negative) were analysed. The dataset was randomly divided into training and validation sets. Blood cell parameters and patient age were used to construct the predictive model with the support vector machine (SVM) tool. The model constructed from the training set (5936 patients) achieved an accuracy for diagnosis of SARS-CoV-2 infection of 0.952 (95% CI: 0.875–0.892). Test sensitivity and specificity was 0.868 and 0.899, respectively, with a positive (PPV) and negative (NPV) predictive value of 0.896 and 0.872, respectively (prevalence 0.50). The validation set model (4964 patients) achieved an accuracy of 0.894 (95% CI: 0.883–0.903). Test sensitivity and specificity was 0.8922 and 0.8951, respectively, with a positive (PPV) and negative (NPV) predictive value of 0.817 and 0.94, respectively (prevalence 0.34). The area under the receiver operating characteristic curve was 0.952 for the algorithm performance. This algorithm may allow to rule out COVID-19 diagnosis with 94% of probability. This represents a great advance for early diagnostic orientation and guiding clinical decisions.  相似文献   

15.
Three-way DNA junctions can adopt several different conformers, which differ in the coaxial stacking of the arms. These structural variants are often dominated by one conformer, which is determined by the DNA sequence. In this study we have compared several three-way DNA junctions in order to assess how the arrangement of bases around the branch point affects the conformer distribution. The results show that rearranging the different arms, while retaining their base sequences, can affect the conformer distribution. In some instances this generates a structure that appears to contain parallel coaxially stacked helices rather than the usual anti-parallel arrangement. Although the conformer equilibrium can be affected by the order of purines and pyrimidines around the branch point, this is not sufficient to predict the conformer distribution. We find that the folding of three-way junctions can be separated into two groups of dinucleotide steps. These two groups show distinctive stacking properties in B-DNA, suggesting there is a correlation between B-DNA stacking and coaxial stacking in DNA junctions.  相似文献   

16.
Zhu J  Zhu Q  Shi Y  Liu H 《Proteins》2003,52(4):598-608
One strategy for ab initio protein structure prediction is to generate a large number of possible structures (decoys) and select the most fitting ones based on a scoring or free energy function. The conformational space of a protein is huge, and chances are rare that any heuristically generated structure will directly fall in the neighborhood of the native structure. It is desirable that, instead of being thrown away, the unfitting decoy structures can provide insights into native structures so prediction can be made progressively. First, we demonstrate that a recently parameterized physics-based effective free energy function based on the GROMOS96 force field and a generalized Born/surface area solvent model is, as several other physics-based and knowledge-based models, capable of distinguishing native structures from decoy structures for a number of widely used decoy databases. Second, we observe a substantial increase in correlations of the effective free energies with the degree of similarity between the decoys and the native structure, if the similarity is measured by the content of native inter-residue contacts in a decoy structure rather than its root-mean-square deviation from the native structure. Finally, we investigate the possibility of predicting native contacts based on the frequency of occurrence of contacts in decoy structures. For most proteins contained in the decoy databases, a meaningful amount of native contacts can be predicted based on plain frequencies of occurrence at a relatively high level of accuracy. Relative to using plain frequencies, overwhelming improvements in sensitivity of the predictions are observed for the 4_state_reduced decoy sets by applying energy-dependent weighting of decoy structures in determining the frequency. There, approximately 80% native contacts can be predicted at an accuracy of approximately 80% using energy-weighted frequencies. The sensitivity of the plain frequency approach is much lower (20% to 40%). Such improvements are, however, not observed for the other decoy databases. The rationalization and implications of the results are discussed.  相似文献   

17.
RNA伪结预测是RNA研究的一个难点问题。文中提出一种基于堆积协变信息与最小自由能的RNA伪结预测方法。该方法使用已知结构的RNA比对序列(ClustalW比对和结构比对)测试此方法, 侧重考虑相邻碱基对之间相互作用形成的堆积协变信息, 并结合最小自由能方法对碱基配对综合评分, 通过逐步迭代求得含伪结的RNA二级结构。结果表明, 此方法能正确预测伪结, 其平均敏感性和特异性优于参考算法, 并且结构比对的预测性能比ClustalW比对的预测性能更加稳定。文中同时讨论了不同协变信息权重因子对预测性能的影响, 发现权重因子比值在l1: l2=5:1时, 预测性能达到最优。  相似文献   

18.
We develop a statistical mechanical model for RNA/RNA complexes with both intramolecular and intermolecular interactions. As an application of the model, we compute the free energy landscapes, which give the full distribution for all the possible conformations, for U4/U6 and U2/U6 in major spliceosome and U4atac/U6atac and U12/U6atac in minor spliceosome. Different snRNA experiments found contrasting structures, our free energy landscape theory shows why these structures emerge and how they compete with each other. For yeast U2/U6, the model predicts that the two distinct experimental structures, the four-helix junction structure and the helix Ib-containing structure, can actually coexist and specifically compete with each other. In addition, the energy landscapes suggest possible mechanisms for the conformational switches in splicing. For instance, our calculation shows that coaxial stacking is essential for stabilizing the four-helix junction in yeast U2/U6. Therefore, inhibition of the coaxial stacking possibly by protein-binding may activate the conformational switch from the four-helix junction to the helix Ib-containing structure. Moreover, the change of the energy landscape shape gives information about the conformational changes. We find multiple (native-like and misfolded) intermediates formed through base-pairing rearrangements in snRNA complexes. For example, the unfolding of the U2/U6 undergoes a transition to a misfolded state which is functional, while in the unfolding of U12/U6atac, the functional helix Ib is found to be the last one to unfold and is thus the most stable structural component. Furthermore, the energy landscape gives the stabilities of all the possible (functional) intermediates and such information is directly related to splicing efficiency.  相似文献   

19.
Owing to their structural diversity, RNAs perform many diverse biological functions in the cell. RNA secondary structure is thus important for predicting RNA function. Here, we propose a new combinatorial optimization algorithm, named RGRNA, to improve the accuracy of predicting RNA secondary structure. Following the establishment of a stempool, the stems are sorted by length, and chosen from largest to smallest. If the stem selected is the true stem, the secondary structure of this stem when combined with another stem selected at random will have low free energy, and the free energy will tend to gradually diminish. The free energy is considered as a parameter and the structure is converted into binary numbers to determine stem compatibility, for step-by-step prediction of the secondary structure for all combinations of stems. The RNA secondary structure can be predicted by the RGRNA method. Our experimental results show that the proposed algorithm outperforms RNAfold in terms of sensitivity, specificity, and Matthews correlation coefficient value.  相似文献   

20.
An improved dynamic programming algorithm is reported for RNA secondary structure prediction by free energy minimization. Thermodynamic parameters for the stabilities of secondary structure motifs are revised to include expanded sequence dependence as revealed by recent experiments. Additional algorithmic improvements include reduced search time and storage for multibranch loop free energies and improved imposition of folding constraints. An extended database of 151,503 nt in 955 structures? determined by comparative sequence analysis was assembled to allow optimization of parameters not based on experiments and to test the accuracy of the algorithm. On average, the predicted lowest free energy structure contains 73 % of known base-pairs when domains of fewer than 700 nt are folded; this compares with 64 % accuracy for previous versions of the algorithm and parameters. For a given sequence, a set of 750 generated structures contains one structure that, on average, has 86 % of known base-pairs. Experimental constraints, derived from enzymatic and flavin mononucleotide cleavage, improve the accuracy of structure predictions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号