首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 468 毫秒
1.
A statistical reference for RNA secondary structures with minimum free energies is computed by folding large ensembles of random RNA sequences. Four nucleotide alphabets are used: two binary alphabets, AU and GC, the biophysical AUGC and the synthetic GCXK alphabet. RNA secondary structures are made of structural elements, such as stacks, loops, joints, and free ends. Statistical properties of these elements are computed for small RNA molecules of chain lengths up to 100. The results of RNA structure statistics depend strongly on the particular alphabet chosen. The statistical reference is compared with the data derived from natural RNA molecules with similar base frequencies. Secondary structures are represented as trees. Tree editing provides a quantitative measure for the distance dt, between two structures. We compute a structure density surface as the conditional probability of two structures having distance t given that their sequences have distance h. This surface indicates that the vast majority of possible minimum free energy secondary structures occur within a fairly small neighborhood of any typical (random) sequence. Correlation lengths for secondary structures in their tree representations are computed from probability densities. They are appropriate measures for the complexity of the sequence-structure relation. The correlation length also provides a quantitative estimate for the mean sensitivity of structures to point mutations. © 1993 John Wiley & Sons, Inc.  相似文献   

2.
J S McCaskill 《Biopolymers》1990,29(6-7):1105-1119
A novel application of dynamic programming to the folding problem for RNA enables one to calculate the full equilibrium partition function for secondary structure and the probabilities of various substructures. In particular, both the partition function and the probabilities of all base pairs are computed by a recursive scheme of polynomial order N3 in the sequence length N. The temperature dependence of the partition function gives information about melting behavior for the secondary structure. The pair binding probabilities, the computation of which depends on the partition function, are visually summarized in a "box matrix" display and this provides a useful tool for examining the full ensemble of probable alternative equilibrium structures. The calculation of this ensemble representation allows a proper application and assessment of the predictive power of the secondary structure method, and yields important information on alternatives and intermediates in addition to local information about base pair opening and slippage. The results are illustrated for representative tRNA, 5S RNA, and self-replicating and self-splicing RNA molecules, and allow a direct comparison with enzymatic structure probes. The effect of changes in the thermodynamic parameters on the equilibrium ensemble provides a further sensitivity check to the predictions.  相似文献   

3.
Weitao Sun  Jing He 《Proteins》2009,77(1):159-173
Secondary structure topology in this article refers to the order and the direction of the secondary structures, such as helices and strands, with respect to the protein sequence. Even when the locations of the secondary structure Cα atoms are known, there are still (N!2N)(M!2M) different possible topologies for a protein with N helices and M strands. This work explored the question if the native topology is likely to be identified among a large set of all possible geometrically constrained topologies through an evaluation of the residue contact energy formed by the secondary structures, instead of the entire chain. We developed a contact pair specific and distance specific multiwell function based on the statistical characterization of the side chain distances of 413 proteins in the Protein Data Bank. The multiwell function has specific parameters to each of the 210 pairs of residue contacts. We illustrated a general mathematical method to extend a single well function to a multiwell function to represent the statistical data. We have performed a mutation analysis using 50 proteins to generate all the possible geometrically constrained topologies of the secondary structures. The result shows that the native topology is within the top 25% of the list ranked by the effective contact energies of the secondary structures for all the 50 proteins, and is within the top 5% for 34 proteins. As an application, the method was used to derive the structure of the skeletons from a low resolution density map that can be obtained through electron cryomicroscopy. Proteins 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

4.
Abstract

Supercoiling causes global twist of DNA structure and the supercoiled state has wide influence on conformational transition. A statistical mechanical approach was made for prediction of the transition probability to non-B DNA structures under torsional stress. A conditional partition function was defined as the sum over all possible states of the DNA sequence with basepair 1 and basepair n being in B-form helix and a recurrence formula was developed which expressed the partition function for basepair n with those for less number of pairs. This new definition permits a quick enumeration of every configuration of secondary structures. Energetic parameters of all conformations concerned, involving B-form, interior loop, cruciform and Z-form, were included in the equation. The probability of transition to each non-B conformation could be derived from these conditional partition functions. For treatment of effects of superhelicity, supercoiling energy was considered, and a twist of each conformation was determined to minimize the supercoiling energy. As the twist itself affects the transition probability, the whole scheme of equations was solved by renormalization technique. The present method permits a simultaneous treatment of serveral types of conformations under a common torsional stress.

A set of energetic parameters of DNA secondary structures has been chosen for calculation. Some DNA sequences were submitted to the calculation, and all the sequences that we submitted gave stable convergence. Some of them have been investigated the critical supercoil density for the transition to non-B DNA structures. Even though the reliability of the set of parameters was not enough, the prediction of secondary structure transition showed good agreement with reported observation. Hence, the present algorithm can estimate the probability of local conformational change of DNA under a given supercoil density, and also be employed to predict some specific sequences in which conformational change is sensitive to superhelicity.  相似文献   

5.
A complete set of nearest neighbor parameters to predict the enthalpy change of RNA secondary structure formation was derived. These parameters can be used with available free energy nearest neighbor parameters to extend the secondary structure prediction of RNA sequences to temperatures other than 37°C. The parameters were tested by predicting the secondary structures of sequences with known secondary structure that are from organisms with known optimal growth temperatures. Compared with the previous set of enthalpy nearest neighbor parameters, the sensitivity of base pair prediction improved from 65.2 to 68.9% at optimal growth temperatures ranging from 10 to 60°C. Base pair probabilities were predicted with a partition function and the positive predictive value of structure prediction is 90.4% when considering the base pairs in the lowest free energy structure with pairing probability of 0.99 or above. Moreover, a strong correlation is found between the predicted melting temperatures of RNA sequences and the optimal growth temperatures of the host organism. This indicates that organisms that live at higher temperatures have evolved RNA sequences with higher melting temperatures.  相似文献   

6.
A partition function calculation for RNA secondary structure is presented that uses a current set of nearest neighbor parameters for conformational free energy at 37 degrees C, including coaxial stacking. For a diverse database of RNA sequences, base pairs in the predicted minimum free energy structure that are predicted by the partition function to have high base pairing probability have a significantly higher positive predictive value for known base pairs. For example, the average positive predictive value, 65.8%, is increased to 91.0% when only base pairs with probability of 0.99 or above are considered. The quality of base pair predictions can also be increased by the addition of experimentally determined constraints, including enzymatic cleavage, flavin mono-nucleotide cleavage, and chemical modification. Predicted secondary structures can be color annotated to demonstrate pairs with high probability that are therefore well determined as compared to base pairs with lower probability of pairing.  相似文献   

7.
Algorithms predicting RNA secondary structures based on different folding criteria – minimum free energies (mfe), kinetic folding (kin), maximum matching (mm) – and different parameter sets are studied systematically. Two base pairing alphabets were used: the binary GC and the natural four-letter AUGC alphabet. Computed structures and free energies depend strongly on both the algorithm and the parameter set. Statistical properties, such as mean number of base pairs, mean numbers of stacks, mean loop sizes, etc., are much less sensitive to the choice of parameter set and even of algorithm. Some features of RNA secondary structures, such as structure correlation functions, shape space covering and neutral networks, seem to depend only on the base pairing logic (GC or AUGC alphabet). Received: 16 May 1996 / Accepted: 10 July 1996  相似文献   

8.
Accurate model evaluation is a crucial step in protein structure prediction. For this purpose, statistical potentials, which evaluate a model structure based on the observed atomic distance frequencies in comparison with those in reference states, have been widely used. The reference state is a virtual state where all of the atomic interactions are turned off, and it provides a standard to measure the observed frequencies. In this study, we examined seven all‐atom distance‐dependent potentials with different reference states. As results, we observed that the variations of atom pair composition and those of distance distributions in the reference states produced systematic changes in the hydrophobic and attractive characteristics of the potentials. The performance evaluations with the CASP7 structures indicated that the preference of hydrophobic interactions improved the correlation between the energy and the GDT‐TS score, but decreased the Z‐score of the native structure. The attractiveness of potential improved both the correlation and Z‐score for template‐based modeling targets, but the benefit was smaller in free modeling targets. These results indicated that the performances of the potentials were more strongly influenced by their characteristics than by the accuracy of the definitions of the reference states.  相似文献   

9.
There are many effective ways to represent a minimum free energy RNA secondary structure that make it easy to locate its helices and loops. It is a greater challenge to visualize the thermal average probabilities of all folds in a partition function sum; dot plot representations are often puzzling. Therefore, we introduce the RNAbows visualization tool for RNA base pair probabilities. RNAbows represent base pair probabilities with line thickness and shading, yielding intuitive diagrams. RNAbows aid in disentangling incompatible structures, allow comparisons between clusters of folds, highlight differences between wild-type and mutant folds, and are also rather beautiful.  相似文献   

10.
Abstract

A general strategy for performing energy minimization of proteins using the SYBYL molecular modelling program has been developed. The influence of several variables including energy minimization procedure, solvation, dielectric function and dielectric constant have been investigated in order to develop a general method, which is capable of producing high quality protein structures. Avian pancreatic polypeptide (APP) and bovine pancreatic phospholipase A2 (BP PLA2) were selected for the calculations, because high quality X-ray structures exist and because all classes of secondary structure are represented in the structures. The energy minimized structures were evaluated relative to the corresponding X-ray structures. The overall similarity was checked by calculating RMS distances for all atom positions. Backbone conformation was checked by Ramachandran plots and secondary structure elements evaluated by the length on hydrogen bonds. The dimensions of active site in BP PLA2 is very dependent on electrostatic interactions, due to the presence of the positively charged calcium ion. Thus, the distances between calcium and the calcium-coordinating groups were used as a quality index for this protein. Energy minimized structures of the trimeric PLA2 from Indian cobra (N.n.n. PLA2) were used for assessing the impact of protein-protein interactions. Based on the above mentioned criteria, it could be concluded that using the following conditions: Dielectric constant ? = 4 or 20; a distance dependent dielectric function and stepwise energy minimization, it is possible to reproduce X-ray structures very accurately without including explicit solvent molecules.  相似文献   

11.
MOTIVATION: A k-point mutant of a given RNA sequence s = s(1), ..., s(n) is an RNA sequence s' = s'(1),..., s'(n) obtained by mutating exactly k-positions in s; i.e. Hamming distance between s and s' equals k. To understand the effect of pointwise mutation in RNA, we consider the distribution of energies of all secondary structures of k-point mutants of a given RNA sequence. RESULTS: Here we describe a novel algorithm to compute the mean and standard deviation of energies of all secondary structures of k-point mutants of a given RNA sequence. We then focus on the tail of the energy distribution and compute, using the algorithm AMSAG, the k-superoptimal structure; i.e. the secondary structure of a < or =k-point mutant having least free energy over all secondary structures of all k'-point mutants of a given RNA sequence, for k' < or = k. Evidence is presented that the k-superoptimal secondary structure is often closer, as measured by base pair distance and two additional distance measures, to the secondary structure derived by comparative sequence analysis than that derived by the Zuker minimum free energy structure of the original (wild type or unmutated) RNA.  相似文献   

12.
Abstract

The flexibility of alternating poly (dA—dT) has been investigated by the technique of transient electric dichroism. Rotational relaxation times, which are very sensitive to changes in the end-to-end length of flexible polymers, are determined from the field free dichroism decay curves of four, well defined fragments of poly (dA—dT) ranging in size from 136 to 270 base pairs. Persistence lengths, calculated from the results of Hagerman and Zimm (Biopolymers (1981) 29, 1481–1502), are in the range 200–250 A. This makes alternating dA—dT sequences about twice as flexible as naturally occurring, “random” sequence DNA. Considering a bend around a nucleosome, for example, this difference in persistence length translates to an energy difference between poly (dA—dT) and random sequence DNA of 0. 17 kT/base pair or 1 kcal per 10 base pair stretch. This energy difference is sufficiently large to suggest that dA—dT sequences could serve as markers in DNA packaging, for example, at sites where DNA must tightly bend to accommodate structures.  相似文献   

13.
Much structural information is encoded in the internal distances; a distance matrix-based approach can be used to predict protein structure and dynamics, and for structural refinement. Our approach is based on the square distance matrix D = [r ij 2 ] containing all square distances between residues in proteins. This distance matrix contains more information than the contact matrix C, that has elements of either 0 or 1 depending on whether the distance r ij is greater or less than a cutoff value r cutoff. We have performed spectral decomposition of the distance matrices $ {\mathbf{D}} = \sum {\lambda_{k} {\mathbf{v}}_{k} {\mathbf{v}}_{k}^{T} } Much structural information is encoded in the internal distances; a distance matrix-based approach can be used to predict protein structure and dynamics, and for structural refinement. Our approach is based on the square distance matrix D = [r ij2] containing all square distances between residues in proteins. This distance matrix contains more information than the contact matrix C, that has elements of either 0 or 1 depending on whether the distance r ij is greater or less than a cutoff value r cutoff. We have performed spectral decomposition of the distance matrices , in terms of eigenvalues and the corresponding eigenvectors and found that it contains at most five nonzero terms. A dominant eigenvector is proportional to r 2—the square distance of points from the center of mass, with the next three being the principal components of the system of points. By predicting r 2 from the sequence we can approximate a distance matrix of a protein with an expected RMSD value of about 7.3 ?, and by combining it with the prediction of the first principal component we can improve this approximation to 4.0 ?. We can also explain the role of hydrophobic interactions for the protein structure, because r is highly correlated with the hydrophobic profile of the sequence. Moreover, r is highly correlated with several sequence profiles which are useful in protein structure prediction, such as contact number, the residue-wise contact order (RWCO) or mean square fluctuations (i.e. crystallographic temperature factors). We have also shown that the next three components are related to spatial directionality of the secondary structure elements, and they may be also predicted from the sequence, improving overall structure prediction. We have also shown that the large number of available HIV-1 protease structures provides a remarkable sampling of conformations, which can be viewed as direct structural information about the dynamics. After structure matching, we apply principal component analysis (PCA) to obtain the important apparent motions for both bound and unbound structures. There are significant similarities between the first few key motions and the first few low-frequency normal modes calculated from a static representative structure with an elastic network model (ENM) that is based on the contact matrix C (related to D), strongly suggesting that the variations among the observed structures and the corresponding conformational changes are facilitated by the low-frequency, global motions intrinsic to the structure. Similarities are also found when the approach is applied to an NMR ensemble, as well as to atomic molecular dynamics (MD) trajectories. Thus, a sufficiently large number of experimental structures can directly provide important information about protein dynamics, but ENM can also provide a similar sampling of conformations. Finally, we use distance constraints from databases of known protein structures for structure refinement. We use the distributions of distances of various types in known protein structures to obtain the most probable ranges or the mean-force potentials for the distances. We then impose these constraints on structures to be refined or include the mean-force potentials directly in the energy minimization so that more plausible structural models can be built. This approach has been successfully used by us in 2006 in the CASPR structure refinement ().  相似文献   

14.
Abstract

The ability of a dinucleotide-step based elastic-rod model of DNA to predict nucleosome binding free energies is investigated using four available sets of elastic parameters. We compare the predicted free energies to experimental values derived from nucleosome reconstitution experiments for 84 DNA sequences. Elastic parameters (conformation and stiffnessess) obtained from MD simulations are shown to be the most reliable predictors, as compared to those obtained from analysis of base-pair step melting temperatures, or from analysis of x-ray structures. We have also studied the effect of varying the folded conformation of nucleosomal DNA by means of our Fourier filtering knock-out and knock-in procedure. This study confirmed the above ranking of elastic parameters, and helped to reveal problems inherent in models using only a local elastic energy function. Long-range interactions were added to the elastic-rod model in an effort to improve its predictive ability. For this purpose a Debye-Huckel energy term with a single, homogenous point charge per base- pair was introduced. This term contains only three parameters,—its weight relative to the elastic energy, the Debye screening length, and a minimum sequence distance for including pairwise interactions between charges. After optimization of these parameters, our Debye-Huckel term is attractive, and yields the same level of correlation with experiment (R = 0.75) as was achieved merely by varying the nucleosomal shape in the elastic-rod model. We suggest this result indicates a linker DNA—histone attraction or, possibly, entropic effects, that lead to a stabilization of a nucleosome away from the ends of DNA segments longer than 147 bp. Such effects are not accounted for by a localized elastic energy model.  相似文献   

15.
Predicting secondary structures of RNA molecules is one of the fundamental problems of and thus a challenging task in computational structural biology. Over the past decades, mainly two different approaches have been considered to compute predictions of RNA secondary structures from a single sequence: the first one relies on physics-based and the other on probabilistic RNA models. Particularly, the free energy minimization (MFE) approach is usually considered the most popular and successful method. Moreover, based on the paradigm-shifting work by McCaskill which proposes the computation of partition functions (PFs) and base pair probabilities based on thermodynamics, several extended partition function algorithms, statistical sampling methods and clustering techniques have been invented over the last years. However, the accuracy of the corresponding algorithms is limited by the quality of underlying physics-based models, which include a vast number of thermodynamic parameters and are still incomplete. The competing probabilistic approach is based on stochastic context-free grammars (SCFGs) or corresponding generalizations, like conditional log-linear models (CLLMs). These methods abstract from free energies and instead try to learn about the structural behavior of the molecules by learning (a manageable number of) probabilistic parameters from trusted RNA structure databases. In this work, we introduce and evaluate a sophisticated SCFG design that mirrors state-of-the-art physics-based RNA structure prediction procedures by distinguishing between all features of RNA that imply different energy rules. This SCFG actually serves as the foundation for a statistical sampling algorithm for RNA secondary structures of a single sequence that represents a probabilistic counterpart to the sampling extension of the PF approach. Furthermore, some new ways to derive meaningful structure predictions from generated sample sets are presented. They are used to compare the predictive accuracy of our model to that of other probabilistic and energy-based prediction methods. Particularly, comparisons to lightweight SCFGs and corresponding CLLMs for RNA structure prediction indicate that more complex SCFG designs might yield higher accuracy but eventually require more comprehensive and pure training sets. Investigations on both the accuracies of predicted foldings and the overall quality of generated sample sets (especially on an abstraction level, called abstract shapes of generated structures, that is relevant for biologists) yield the conclusion that the Boltzmann distribution of the PF sampling approach is more centered than the ensemble distribution induced by the sophisticated SCFG model, which implies a greater structural diversity within generated samples. In general, neither of the two distinct ensemble distributions is more adequate than the other and the corresponding results obtained by statistical sampling can be expected to bare fundamental differences, such that the method to be preferred for a particular input sequence strongly depends on the considered RNA type.  相似文献   

16.
Abstract

In this paper we report the results of extensive Monte Carlo simulations of a pure fluid of Buckingham modified exponential-six molecules. Data are presented for the configurational energy and pressure covering a wide range of temperatures and densities. These data are interpreted using the generalized van der Waals partition function with a novel separation into free volume and mean potential terms. We find, surprisingly, that the Buckingham fluid is described by a simple van der Waals-like equation of state provided that the b parameter is temperature dependent and chosen in a theoretically correct manner.  相似文献   

17.
We introduce a new variant of the root mean square distance (RMSD) for comparing protein structures whose range of values is independent of protein size. This new dimensionless measure (relative RMSD, or RRMSD) is zero between identical structures and one between structures that are as globally dissimilar as an average pair of random polypeptides of respective sizes. The RRMSD probability distribution between random polypeptides converges to a universal curve as the chain length increases. The correlation coefficients between aligned random structures are computed as a function of polypeptide size showing two characteristic lengths of 4.7 and 37 residues. These lengths mark the separation between phases of different structural order between native protein fragments. The implications for threading are discussed.  相似文献   

18.
Abstract

A new simple quantitative representation of three-dimensional structure of globular proteins is proposed which is useful for comparison of distantly related problems, computer sorting of large sets of conformations, and search of structurally similar domains in protein data base. The folding course of the polypeptide backbone is approximated by a set of successive vectors corresponding to the elements of regular secondary structure (e.g. α-helices, strands of β- sheets) and non-regular segments. The parameters specifying the spatial organization of segments in this vector model are internal coordinates, namely, lengths of the vectors, planar and dihedral angles. Quantitative representation proposed allows to circumvent the problem of insertions/deletions and to avoid the stage of best superposition during protein comparison An application was made to the comparison of three-dimensional structures of scorpion toxins Centruroides sculpturatus Ewing v-3, Buthus eupeus M9 and I5A, which have different chain lengths and low sequence similarity.  相似文献   

19.
A distance constraint approach is applied to two-dimensional models of proteins in order to visualize the nature of protein folding and to examine the relative roles of different ranges of interaction. Three different native structures (I, II, and III) are considered; they have two different kinds of residues, viz., hydrophobic and hydrophilic, and different sequences of these residues. We examine how the distance constraint approach functions in the prediction of protein folding when we know the sequence of the residues, the (fixed) bond lengths, the mean distances between residues i and i + 2, and i and i + 3, and the mean distances for hydrophobic–hydrophobic, hydrophobic–hydrophilic, and hydrophilic–hydrophilic contacts between residues i and i + j, where j ≥ 4. This approach involves optimization of an object function with respect to 98 variables and is not free of the multiple-minimum problem. The optimization is always terminated if the chain is entangled and/or the segments (residues) are packed too compactly to move. In order to escape from such situations and to take the excluded-volume effect into account, a Monte Carlo method is used after the optimization is trapped in local minima. Success in the prediction of folding is found to depend on the starting conformations and on the native conformations. Fair success is obtained in predicting the helix-like structure in protein I and the overall structure of protein III, but not the β-like structures of proteins I and II. Insofar as the prediction of the structure of protein III is reasonable, it appears that some sequences of residues produce greater constraints on their conformations than others, if one considers only the hydrophobic and hydrophilic nature of the residues. These results imply that, in the folding of real proteins in three dimensions, the competition for hydrophobic (and hydrophilic) residues for inside (outside) positions in the molecule probably constitutes a necessary but not a sufficient condition to form and stabilize the native structure. The failure to predict the structure of protein II, and part of that of protein I, suggests that there are two types of long-range interactions. One (which we considered here) is nonspecific (i.e., is defined only in terms of contacts between residues of the same or different polarity) and acts at any stage of protein folding; the other (which we did not consider here) is a specific interaction between residues in pairs and contributes only when the residues in the specific pair take on the native conformation. Presumably, incorporation of such specific long-range interactions, together with the nonspecific ones, is necessary for successful protein folding, using the distance constraint approach.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号