首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
An improved dynamic programming algorithm is reported for RNA secondary structure prediction by free energy minimization. Thermodynamic parameters for the stabilities of secondary structure motifs are revised to include expanded sequence dependence as revealed by recent experiments. Additional algorithmic improvements include reduced search time and storage for multibranch loop free energies and improved imposition of folding constraints. An extended database of 151,503 nt in 955 structures? determined by comparative sequence analysis was assembled to allow optimization of parameters not based on experiments and to test the accuracy of the algorithm. On average, the predicted lowest free energy structure contains 73 % of known base-pairs when domains of fewer than 700 nt are folded; this compares with 64 % accuracy for previous versions of the algorithm and parameters. For a given sequence, a set of 750 generated structures contains one structure that, on average, has 86 % of known base-pairs. Experimental constraints, derived from enzymatic and flavin mononucleotide cleavage, improve the accuracy of structure predictions.  相似文献   

2.
Prediction of common folding structures of homologous RNAs.   总被引:2,自引:2,他引:0       下载免费PDF全文
K Han  H J Kim 《Nucleic acids research》1993,21(5):1251-1257
We have developed an algorithm and a computer program for simultaneously folding homologous RNA sequences. Given an alignment of M homologous sequences of length N, the program performs phylogenetic comparative analysis and predicts a common secondary structure conserved in the sequences. When the structure is not uniquely determined, it infers multiple structures which appear most plausible. This method is superior to energy minimization methods in the sense that it is not sensitive to point mutation of a sequence. It is also superior to usual phylogenetic comparative methods in that it does not require manual scrutiny for covariation or secondary structures. The most plausible 1-5 structures are produced in O(MN2 + N3) time and O(N2) space, which are the same requirements as those of widely used dynamic programs based on energy minimization for folding a single sequence. This is the first algorithm probably practical both in terms of time and space for finding secondary structures of homologous RNA sequences. The algorithm has been implemented in C on a Sun SparcStation, and has been verified by testing on tRNAs, 5S rRNAs, 16S rRNAs, TAR RNAs of human immunodeficiency virus type 1 (HIV-1), and RRE RNAs of HIV-1. We have also applied the program to cis-acting packaging sequences of HIV-1, for which no generally accepted structures yet exist, and propose potentially stable structures. Simulation of the program with random sequences with the same base composition and the same degree of similarity as the above sequences shows that structures common to homologous sequences are very unlikely to occur by chance in random sequences.  相似文献   

3.
Abstract

The secondary structures of Tetrahymena thermophila rRNA IVS sequence involved in the self-splicing reactions, are theoretically investigated with a refined computer method previously proposed, able to select a set of the deepest free energy RNA secondary structures under constraints of model hypotheses and experimental evidences. The secondary structures obtained are characterized by the close proximity of self-reactions sites and account for double mutations experiments, and differential digestion data.  相似文献   

4.
The secondary structures of Tetrahymena thermophila rRNA IVS sequence involved in the self-splicing reactions, are theoretically investigated with a refined computer method previously proposed, able to select a set of the deepest free energy RNA secondary structures under constraints of model hypotheses and experimental evidences. The secondary structures obtained are characterized by the close proximity of self-reactions sites and account for double mutations experiments, and differential digestion data.  相似文献   

5.
A theoretical and computational approach to ab initio structure prediction for polypeptides in water is described and applied to selected amino acid sequences for testing and preliminary validation. The method builds systematically on the extensive efforts applied to parameterization of molecular dynamics (MD) force fields, employs an empirically well-validated continuum dielectric model for solvation, and an eminently parallelizable approach to conformational search. The effective free energy of polypeptide chains is estimated from AMBER united atom potential functions, with internal degrees of freedom for both backbone and amino acid side chains explicitly treated. The hydration free energy of each structure is determined using the Generalized Born/Solvent Accessibility (GBSA) method, modified and reparameterized to include atom types consistent with the AMBER force field. The conformational search procedure employs a multiple copy, Monte Carlo simulated annealing (MCSA) protocol in full torsion angle space, applied iteratively on sets of structures of progressively lower free energy until a prediction of a structure with lowest effective free energy is obtained. Calibration tests for the effective energy function and search algorithm are performed on the alanine dipeptide, selected protein crystal structures, and united atom decoys on barnase, crambin, and six examples from the Rosetta set. Specific demonstration cases of the method are provided for the 8-mer sequence of Ala residues, a 12-residue peptide with longer side chains QLLKKLLQQLKQ, a de novo designed 16 residue peptide of sequence (AAQAA)3Y, a 15-residue sequence with a beta sheet motif, GEWTWDATKTFTVTE, and a 36 residue small protein, Villin headpiece. The Ala 8-mer readily formed an alpha-helix. An alpha-helix structure was predicted for the 16-mer, consistent with observed results from IR and CD spectroscopy and with the pattern in psi/straight phi angles of known protein structures. The predicted structure for the 12-mer, composed of a mix of helix and less regular elements of secondary structure, lies 2.65 A RMS from the observed crystal structure. Structure prediction for the 8-mer beta-motif resulted in form 4.50 A RMS from the crystal geometry. For Villin, the predicted native form is very close to the crystal structure, RMS values of 3.5 A (including sidechains), and 1.01 A (main chain only). The methodology permits a detailed analysis of the molecular forces which dominate various segments of the predicted folding trajectory. Analysis of the results in terms of internal torsional, electrostatic and van der Waals and the electrostatic and non-electrostatic contributions to hydration, including the hydrophobic effect, is presented.  相似文献   

6.
Recent advances in modeling protein structures at the atomic level have made it possible to tackle "de novo" computational protein design. Most procedures are based on combinatorial optimization using a scoring function that estimates the folding free energy of a protein sequence on a given main-chain structure. However, the computation of the conformational entropy in the folded state is generally an intractable problem, and its contribution to the free energy is not properly evaluated. In this article, we propose a new automated protein design methodology that incorporates such conformational entropy based on statistical mechanics principles. We define the free energy of a protein sequence by the corresponding partition function over rotamer states. The free energy is written in variational form in a pairwise approximation and minimized using the Belief Propagation algorithm. In this way, a free energy is associated to each amino acid sequence: we use this insight to rescore the results obtained with a standard minimization method, with the energy as the cost function. Then, we set up a design method that directly uses the free energy as a cost function in combination with a stochastic search in the sequence space. We validate the methods on the design of three superficial sites of a small SH3 domain, and then apply them to the complete redesign of 27 proteins. Our results indicate that accounting for entropic contribution in the score function affects the outcome in a highly nontrivial way, and might improve current computational design techniques based on protein stability.  相似文献   

7.
With the rapid increase in the size of the genome sequence database, computational analysis of RNA will become increasingly important in revealing structure-function relationships and potential drug targets. RNA secondary structure prediction for a single sequence is 73 % accurate on average for a large database of known secondary structures. This level of accuracy provides a good starting point for determining a secondary structure either by comparative sequence analysis or by the interpretation of experimental studies. Dynalign is a new computer algorithm that improves the accuracy of structure prediction by combining free energy minimization and comparative sequence analysis to find a low free energy structure common to two sequences without requiring any sequence identity. It uses a dynamic programming construct suggested by Sankoff. Dynalign, however, restricts the maximum distance, M, allowed between aligned nucleotides in the two sequences. This makes the calculation tractable because the complexity is simplified to O(M(3)N(3)), where N is the length of the shorter sequence.The accuracy of Dynalign was tested with sets of 13 tRNAs, seven 5 S rRNAs, and two R2 3' UTR sequences. On average, Dynalign predicted 86.1 % of known base-pairs in the tRNAs, as compared to 59.7 % for free energy minimization alone. For the 5 S rRNAs, the average accuracy improves from 47.8 % to 86.4 %. The secondary structure of the R2 3' UTR from Drosophila takahashii is poorly predicted by standard free energy minimization. With Dynalign, however, the structure predicted in tandem with the sequence from Drosophila melanogaster nearly matches the structure determined by comparative sequence analysis.  相似文献   

8.

Background  

Scanning large genomes with a sliding window in search of locally stable RNA structures is a well motivated problem in bioinformatics. Given a predefined window size L and an RNA sequence S of size N (L < N), the consecutive windows folding problem is to compute the minimal free energy (MFE) for the folding of each of the L-sized substrings of S. The consecutive windows folding problem can be naively solved in O(NL3) by applying any of the classical cubic-time RNA folding algorithms to each of the N-L windows of size L. Recently an O(NL2) solution for this problem has been described.  相似文献   

9.
We have calculated the stability of decoy structures of several proteins (from the CASP3 models and the Park and Levitt decoy set) relative to the native structures. The calculations were performed with the force field-consistent ES/IS method, in which an implicit solvent (IS) model is used to calculate the average solvation free energy for snapshots from explicit simulations (ESs). The conformational free energy is obtained by adding the internal energy of the solute from the ESs and an entropic term estimated from the covariance positional fluctuation matrix. The set of atomic Born radii and the cavity-surface free energy coefficient used in the implicit model has been optimized to be consistent with the all-atom force field used in the ESs (cedar/gromos with simple point charge (SPC) water model). The decoys are found to have a consistently higher free energy than that of the native structure; the gap between the native structure and the best decoy varies between 10 and 15 kcal/mole, on the order of the free energy difference that typically separates the native state of a protein from the unfolded state. The correlation between the free energy and the extent to which the decoy structures differ from the native (as root mean square deviation) is very weak; hence, the free energy is not an accurate measure for ranking the structurally most native-like structures from among a set of models. Analysis of the energy components shows that stability is attained as a result of three major driving forces: (1) minimum size of the protein-water surface interface; (2) minimum total electrostatic energy, which includes solvent polarization; and (3) minimum protein packing energy. The detailed fit required to optimize the last term may underlie difficulties encountered in recovering the native fold from an approximate decoy or model structure.  相似文献   

10.
Functionally homologous RNA sequences can substantially diverge in their primary sequences but it can be reasonably assumed that they are related in their higher-degree structures. The problem to find such structures and simultaneously satisfy as far as possible the free-energy-minimization criterion, is considered here in two aspects. Firstly a quantitative measure of the folding consensus among secondary structures is defined, translating each structure into a linear representation and using the correlation theorem to compare them. Secondly an algorithm for the parallel search for secondary structures according to the free-energy-minimization criterion, but with a filtering action on the basis of the folding consensus measure is presented. The method is tested on groups of RNA sequences different in origin and in functions, for which proposals of homologous secondary structures based on experimental data exist. A comparison of the results with a blank consisting of a search on the basis of the free energy minimization alone is always performed. In these tests the method shows its ability in obtaining, from different sequences, secondary structures characterized by a high-folding consensus measure also when lower free energy but not homologous structures are possible. Two applications are also shown. The first demonstrates the transfer of experimental data available for one sequence, to a functionally related and therefore homologous one. The second application is the possibility of using a topological probe in the search for precise structural motifs.  相似文献   

11.
In this study we apply a genetic algorithm to a set of RNA sequences to find common RNA secondary structures. Our method is a three-step procedure. At the first stage of the procedure for each sequence, a genetic algorithm is used to optimize the structures in a population to a certain degree of stability. In this step, the free energy of a structure is the fitness criterion for the algorithm. Next, for each structure, we define a measure of structural conservation with respect to those in other sequences. We use this measure in a genetic algorithm to improve the structural similarity among sequences for the structures in the population of a sequence. Finally, we select those structures satisfying certain conditions of structural stability and similarity as predicted common structures for a set of RNA sequences. We have obtained satisfactory results from a set of tRNA, 5S rRNA, rev response elements (RRE) of HIV-1 and RRE of HIV-2/SIV, respectively.  相似文献   

12.
One of the key issues in the theoretical prediction of RNA folding is the prediction of loop structure from the sequence. RNA loop free energies are dependent on the loop sequence content. However, most current models account only for the loop length-dependence. The previously developed “Vfold” model (a coarse-grained RNA folding model) provides an effective method to generate the complete ensemble of coarse-grained RNA loop and junction conformations. However, due to the lack of sequence-dependent scoring parameters, the method is unable to identify the native and near-native structures from the sequence. In this study, using a previously developed iterative method for extracting the knowledge-based potential parameters from the known structures, we derive a set of dinucleotide-based statistical potentials for RNA loops and junctions. A unique advantage of the approach is its ability to go beyond the the (known) native structures by accounting for the full free energy landscape, including all the nonnative folds. The benchmark tests indicate that for given loop/junction sequences, the statistical potentials enable successful predictions for the coarse-grained 3D structures from the complete conformational ensemble generated by the Vfold model. The predicted coarse-grained structures can provide useful initial folds for further detailed structural refinement.  相似文献   

13.
A new method to detect remote relationships between protein sequences and known three-dimensional structures based on direct energy calculations and without reliance on statistics has been developed. The likelihood of a residue to occupy a given position on the structural template was represented by an estimate of the stabilization free energy made after explicit prediction of the substituted side chain conformation. The profile matrix derived from these energy values and modified by increasing the residue self-exchange values successfully predicted compatibility of heatshock protein and globin sequences with the three-dimensional structures of actin and phycocyanin, respectively, from a full protein sequence databank search. The high sensitivity of the method makes it a unique tool for predicting the three-dimensional fold for the rapidly growing number of protein sequences. © 1994 Wiley-Liss, Inc.  相似文献   

14.
The computational identification of the optimal three-dimensional fold of even a small peptide chain from its sequence, without reference to other known structures, is a complex problem. There have been several attempts at solving this by sampling the potential energy surface of the molecule in a systematic manner. Here we present a new method to carry out the sampling, and to identify low energy conformers of the molecule. The method uses mutually orthogonal Latin squares to select (of the order of) n(2) points from the multidimensional conformation space of size m(n), where n is the number of dimensions (i.e., the number of conformational variables), and m specifies the fineness of the search grid. The sampling is accomplished by first calculating the value of the potential energy function at each one of the selected points. This is followed by analysis of these values of the potential energy to obtain the optimal value for each of the n-variables separately. We show that the set of the n-optimal values obtained in this manner specifies a low energy conformation of the molecule. Repeated application of the method identifies other low energy structures. The computational complexity of this algorithm scales as the fourth power of the size of the molecule. We applied this method to several small peptides, such as the neuropeptide enkephalin, and could identify a set of low energy conformations for each. Many of the structures identified by this method have also been previously identified and characterized by experiment and theory. We also compared the best structures obtained for the tripeptide (Ala)(3) by the present method, with those obtained by an exhaustive grid search, and showed that the algorithm is successful in identifying all the low energy conformers of this molecule.  相似文献   

15.
Efficient siRNA selection using hybridization thermodynamics   总被引:1,自引:1,他引:0       下载免费PDF全文
Small interfering RNA (siRNA) are widely used to infer gene function. Here, insights in the equilibrium of siRNA-target hybridization are used for selection of efficient siRNA. The accessibilities of siRNA and target mRNA for hybridization, as measured by folding free energy change, are shown to be significantly correlated with efficacy. For this study, a partition function calculation that considers all possible secondary structures is used to predict target site accessibility; a significant improvement over calculations that consider only the predicted lowest free energy structure or a set of low free energy structures. The predicted thermodynamic features, in addition to siRNA sequence features, are used as input for a support vector machine that selects functional siRNA. The method works well for predicting efficient siRNA (efficacy >70%) in a large siRNA data set from Novartis. The positive predictive value (percentage of sites predicted to be efficient for silencing that are) is as high as 87.6%. The sensitivity and specificity are 22.7 and 96.5%, respectively. When tested on data from different sources, the positive predictive value increased 8.1% by adding equilibrium terms to 25 local sequence features. Prediction of hybridization affinity using partition functions is now available in the RNAstructure software package.  相似文献   

16.
We describe here an energy based computer software suite for narrowing down the search space of tertiary structures of small globular proteins. The protocol comprises eight different computational modules that form an automated pipeline. It combines physics based potentials with biophysical filters to arrive at 10 plausible candidate structures starting from sequence and secondary structure information. The methodology has been validated here on 50 small globular proteins consisting of 2-3 helices and strands with known tertiary structures. For each of these proteins, a structure within 3-6 A RMSD (root mean square deviation) of the native has been obtained in the 10 lowest energy structures. The protocol has been web enabled and is accessible at http://www.scfbio-iitd.res.in/bhageerath.  相似文献   

17.
MOTIVATION: We describe algorithms implemented in a new software package, RNAbor, to investigate structures in a neighborhood of an input secondary structure S of an RNA sequence s. The input structure could be the minimum free energy structure, the secondary structure obtained by analysis of the X-ray structure or by comparative sequence analysis, or an arbitrary intermediate structure. RESULTS: A secondary structure T of s is called a delta-neighbor of S if T and S differ by exactly delta base pairs. RNAbor computes the number (N(delta)), the Boltzmann partition function (Z(delta)) and the minimum free energy (MFE(delta)) and corresponding structure over the collection of all delta-neighbors of S. This computation is done simultaneously for all delta < or = m, in run time O (mn3) and memory O(mn2), where n is the sequence length. We apply RNAbor for the detection of possible RNA conformational switches, and compare RNAbor with the switch detection method paRNAss. We also provide examples of how RNAbor can at times improve the accuracy of secondary structure prediction. AVAILABILITY: http://bioinformatics.bc.edu/clotelab/RNAbor/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

18.
Das B  Meirovitch H 《Proteins》2003,51(3):470-483
A new procedure for optimizing parameters of implicit solvation models introduced by us has been applied successfully first to cyclic peptides and more recently to three surface loops of ribonuclease A (Das and Meirovitch, Proteins 2001;43:303-314) using the simplified model E(tot) = E(FF)(epsilon = nr) + Sigma(i) sigma(i)A(i), where sigma(i) are atomic solvation parameters (ASPs) to be optimized, A(i) is the solvent accessible surface area of atom i, E(FF)(epsilon = nr) is the AMBER force-field energy of the loop-loop and loop-template interactions with a distance-dependent dielectric constant, epsilon = nr, where n is a parameter. The loop is free to move while the protein template is held fixed in its X-ray structure; an extensive conformational search for energy minimized loop structures is carried out with our local torsional deformation method. The optimal ASPs and n are those for which the structure with the lowest minimized energy [E(tot)(n,sigma(i))] becomes the experimental X-ray structure, or less strictly, the energy gap between these structures is within 2-3 kcal/mol. To check if a set of ASPs can be defined, which is transferable to a large number of loops, we optimize individual sets of ASPs (based on n = 2) for 12 surface loops from which an "averaged" best-fit set is defined. This set is then applied to the 12 loops and an independent "test" group of 8 loops leading in most cases to very small RMSD values; thus, this set can be useful for structure prediction of loops in homology modeling. For three loops we also calculate the free energy gaps to find that they are only slightly smaller than their energy counterparts, indicating that only larger n will enable reducing too large gaps. Because of its simplicity, this model allowed carrying out an extensive application of our methodology, providing thereby a large number of benchmark results for comparison with future calculations based on n > 2 as well as on more sophisticated solvation models with as yet unknown performance for loops.  相似文献   

19.
Accurate prediction of RNA pseudoknotted secondary structures from the base sequence is a challenging computational problem. Since prediction algorithms rely on thermodynamic energy models to identify low-energy structures, prediction accuracy relies in large part on the quality of free energy change parameters. In this work, we use our earlier constraint generation and Boltzmann likelihood parameter estimation methods to obtain new energy parameters for two energy models for secondary structures with pseudoknots, namely, the Dirks–Pierce (DP) and the Cao–Chen (CC) models. To train our parameters, and also to test their accuracy, we create a large data set of both pseudoknotted and pseudoknot-free secondary structures. In addition to structural data our training data set also includes thermodynamic data, for which experimentally determined free energy changes are available for sequences and their reference structures. When incorporated into the HotKnots prediction algorithm, our new parameters result in significantly improved secondary structure prediction on our test data set. Specifically, the prediction accuracy when using our new parameters improves from 68% to 79% for the DP model, and from 70% to 77% for the CC model.  相似文献   

20.
Although proteins are a fundamental unit in biology, the mechanism by which proteins fold into their native state is not well understood. In this work, we explore the assembly of secondary structure units via geometric constraint-based simulations and the effect of refinement of assembled structures using reservoir replica exchange molecular dynamics. Our approach uses two crucial features of these methods: i), geometric simulations speed up the search for nativelike topologies as there are no energy barriers to overcome; and ii), molecular dynamics identifies the low free energy structures and further refines these structures toward the actual native conformation. We use eight α-, β-, and α/β-proteins to test our method. The geometric simulations of our test set result in an average RMSD from native of 3.7 Å and this further reduces to 2.7 Å after refinement. We also explore the question of robustness of assembly for inaccurate (shifted and shortened) secondary structure. We find that the RMSD from native is highly dependent on the accuracy of secondary structure input, and even slightly shifting the location of secondary structure along the amino acid sequence can lead to a rapid decrease in RMSD to native due to incorrect packing.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号