首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A computer program is presented which determines the secondary structure of linear RNA molecules by simulating a hypothetical process of folding. This process implies the concept of 'nucleation centres', regions in RNA which locally trigger the folding. During the simulation, the RNA is allowed to fold into pseudoknotted structures, unlike all other programs predicting RNA secondary structure. The simulation uses published, experimentally determined free energy values for nearest neighbour base pair stackings and loop regions, except for new extrapolated values for loops larger than seven nucleotides. The free energy value for a loop arising from pseudoknot formation is set to a single, estimated value of 4.2 kcal/mole. Especially in the case of long RNA sequences, our program appears superior to other secondary structure predicting programs described so far, as tests on tRNAs, the LSU intron of Tetrahymena thermophila and a number of plant viral RNAs show. In addition, pseudoknotted structures are often predicted successfully. The program is written in mainframe APL and is adapted to run on IBM compatible PCs, Atari ST and Macintosh personal computers. On an 8 MHz 8088 standard PC without coprocessor, using STSC APL, it folds a sequence of 700 nucleotides in one and a half hour.  相似文献   

2.
Sadeghi M  Parto S  Arab S  Ranjbar B 《FEBS letters》2005,579(16):3397-3400
We have used a statistical approach for protein secondary structure prediction based on information theory and simultaneously taking into consideration pairwise residue types and conformational states. Since the prediction of residue secondary structure by one residue window sliding make ambiguity in state prediction, we used a dynamic programming algorithm to find the path with maximum score. A score system for residue pairs in particular conformations is derived for adjacent neighbors up to ten residue apart in sequence. The three state overall per-residue accuracy, Q3, of this method in a jackknife test with dataset created from PDBSELECT is more than 70%.  相似文献   

3.
The atomic pairs in contact for atoms from pairs of amino-acid residues on pairs of helices in a protein database consisting of 48 proteins of known tertiary structure from the Brookhaven Protein Data Bank are searched and counted to construct a primary scoring system. Each score in the primary scoring system is weighted further with the possibility of occurrence of each residue pair in the protein database to give a final scoring matrix. Scores for predicting change in folding of α-helices in a mutant protein are calculated by assuming that every pair of helices in the protein can closely interact with each other. It is shown that the change in folding of α-helices in several mutant proteins are reflected in both the change of the contact scores and the helix geometry calculated.  相似文献   

4.
The structure of the M1 protein of the influenza virus A/Puerto Rico/8/34 (PR8, subtype H1N1) in solution at acidic pH and in the composition of the virion has been studied by the tritium planigraphy method. A model of the spatial structure was constructed using a special algorithm simulating the experiment and a set of algorithms for predicting the secondary structure and disordered regions in proteins. The tertiary structure was refined using the Rosetta program. For a comparison of the structures in solution and inside the virion, the data of X-ray diffraction analysis for the NM domain were also used. The main difference in the structures of the protein in solution and the crystalline state is observed in the region of contact of N and M domains, which in the crystalline state is packed more densely. The regions of the maximum label incorporation almost completely coincide with unstructured regions in the protein that were predicted by the bioinformatics analysis. These regions are concentrated in the C domain and in loop regions between M, N, and C domains. The data were confirmed by analytical centrifugation and dynamic light scattering. Anomalous hydrodynamic dimensions and a low structuration of the M1 protein in solution were found. The polyfunctionality of the protein in the cell is probably related to its flexible tertiary structure, which, owing to unstructured regions, provides contact with various partner molecules.  相似文献   

5.
RNA伪结预测是RNA研究的一个难点问题。文中提出一种基于堆积协变信息与最小自由能的RNA伪结预测方法。该方法使用已知结构的RNA比对序列(ClustalW比对和结构比对)测试此方法, 侧重考虑相邻碱基对之间相互作用形成的堆积协变信息, 并结合最小自由能方法对碱基配对综合评分, 通过逐步迭代求得含伪结的RNA二级结构。结果表明, 此方法能正确预测伪结, 其平均敏感性和特异性优于参考算法, 并且结构比对的预测性能比ClustalW比对的预测性能更加稳定。文中同时讨论了不同协变信息权重因子对预测性能的影响, 发现权重因子比值在l1: l2=5:1时, 预测性能达到最优。  相似文献   

6.
Spatial structure of the influenza virus A/Puerto Rico/8/34 (PR8, subtype H1N1) M1 protein in a solution and composition of the virion was studied by tritium planigraphy technique. The special algorithm for modeling of the spatial structure is used to simulate the experiment, as well as a set of algorithms predicting secondary structure and disordered regions in proteins. Tertiary structures were refined using the program Rosetta. To compare the structures in solution and in virion, also used the X-ray diffraction data for NM-domain. The main difference between protein structure in solution and crystal is observed in the contact region of N- and M-domains, which are more densely packed in the crystalline state. Locations include the maximum label is almost identical to the unstructured regions of proteins predicted by bioinformatics analysis. These areas are concentrated in the C-domain and in the loop regions between the M-, N-, and C-domains. Analytical centrifugation and dynamic laser light scattering confirm data of tritium planigraphy. Anomalous hydrodynamic size, and low structuring of the M1 protein in solution were found. The multifunctionality of protein in the cell appears to be associated with its plastic tertiary structure, which provides at the expense of unstructured regions of contact with various molecules-partners.  相似文献   

7.
Using evolutionary information contained in multiple sequence alignments as input to neural networks, secondary structure can be predicted at significantly increased accuracy. Here, we extend our previous three-level system of neural networks by using additional input information derived from multiple alignments. Using a position-specific conservation weight as part of the input increases performance. Using the number of insertions and deletions reduces the tendency for overprediction and increases overall accuracy. Addition of the global amino acid content yields a further improvement, mainly in predicting structural class. The final network system has a sustained overall accuracy of 71.6% in a multiple cross-validation test on 126 unique protein chains. A test on a new set of 124 recently solved protein structures that have no significant sequence similarity to the learning set confirms the high level of accuracy. The average cross-validated accuracy for all 250 sequence-unique chains is above 72%. Using various data sets, the method is compared to alternative prediction methods, some of which also use multiple alignments: the performance advantage of the network system is at least 6 percentage points in three-state accuracy. In addition, the network estimates secondary structure content from multiple sequence alignments about as well as circular dichroism spectroscopy on a single protein and classifies 75% of the 250 proteins correctly into one of four protein structural classes. Of particular practical importance is the definition of a position-specific reliability index. For 40% of all residues the method has a sustained three-state accuracy of 88%, as high as the overall average for homology modelling. A further strength of the method is greatly increased accuracy in predicting the placement of secondary structure segments. © 1994 Wiley-Liss, Inc.  相似文献   

8.
Evaluation and improvements in the automatic alignment of protein sequences   总被引:6,自引:0,他引:6  
The accuracy of protein sequence alignment obtained by applying a commonly used global sequence comparison algorithm is assessed. Alignments based on the superposition of the three-dimensional structures are used as a standard for testing the automatic, sequence-based methods. Alignments obtained from the global comparison of five pairs of homologous protein sequences studied gave 54% agreement overall for residues in secondary structures. The inclusion of information about the secondary structure of one of the proteins in order to limit the number of gaps inserted in regions of secondary structure, improved this figure to 68%. A similarity score of greater than six standard deviation units suggests that an alignment which is greater than 75% correct within secondary structural regions can be obtained automatically for the pair of sequences.  相似文献   

9.
10.
11.
Simultaneous modeling of multiple loops in proteins.   总被引:1,自引:1,他引:0       下载免费PDF全文
The most reliable methods for predicting protein structure are by way of homologous extension, using structural information from a closely related protein, or by "threading" through a set of predefined protein folds ("inverse folding"). Both sets of methods provide a model for the core of the protein--the structurally conserved secondary structures. Due to the large variability both in sequence and size of the loops that connect these secondary structures, they generally cannot be modeled using these techniques. Loop-closure algorithms are aimed at predicting loop structures, given their end-to-end distance. Various such algorithms have been described, and all have been tested by predicting the structure of a single loop in a known protein. In this paper we propose a method, which is based on the bond-scaling-relaxation loop-closure algorithm, for simultaneously predicting the structures of multiple loops, and demonstrate that, for two spatially close loops, simultaneous closure invariably leads to more accurate predictions than sequential closure. The accuracy of the predictions obtained for pairs of loops in the size range of 5-7 residues each is comparable to that obtained by other methods, when predicting the structures of single loops: the RMS deviations from the native conformations of various test cases modeled are approximately 0.6-1.7 A for backbone atoms and 1.1-3.3 A for all-atoms.  相似文献   

12.
Davis AR  Znosko BM 《Biochemistry》2008,47(38):10178-10187
Due to their prevalence and roles in biological systems, single mismatches adjacent to G-U pairs are important RNA structural elements. Since there are only limited experimental values for the stability of single mismatches adjacent to G-U pairs, current algorithms using free energy minimization to predict RNA secondary structure from sequence assign predicted thermodynamic values to these types of single mismatches. Here, thermodynamic data are reported for frequently occurring single mismatches adjacent to at least one G-U pair. This experimental data can be used in place of predicted thermodynamic values in algorithms that predict secondary structure from sequence using free energy minimization. When predicting the thermodynamic contributions of previously unmeasured single mismatches, most algorithms apply the same thermodynamic penalty for an A-U pair adjacent to a single mismatch and a G-U pair adjacent to a single mismatch. A recent study, however, suggests that the penalty for a G-U pair adjacent to a tandem mismatch should be 1.2 +/- 0.1 kcal/mol, and the penalty for an A-U pair adjacent to a tandem mismatch should be 0.5 +/- 0.2 kcal/mol [Christiansen, M. E. and Znosko, B. M. (2008) Biochemistry 47, 4329-4336]. Therefore, the data reported here are combined with the existing thermodynamic dataset of single mismatches, and nearest neighbor parameters are derived for an A-U pair adjacent to a single mismatch (1.1 +/- 0.1 kcal/mol) and a G-U pair adjacent to a single mismatch (1.4 +/- 0.1 kcal/mol).  相似文献   

13.
MOTIVATION: Disulfide bonds are primary covalent crosslinks between two cysteine residues in proteins that play critical roles in stabilizing the protein structures and are commonly found in extracy-toplasmatic or secreted proteins. In protein folding prediction, the localization of disulfide bonds can greatly reduce the search in conformational space. Therefore, there is a great need to develop computational methods capable of accurately predicting disulfide connectivity patterns in proteins that could have potentially important applications. RESULTS: We have developed a novel method to predict disulfide connectivity patterns from protein primary sequence, using a support vector regression (SVR) approach based on multiple sequence feature vectors and predicted secondary structure by the PSIPRED program. The results indicate that our method could achieve a prediction accuracy of 74.4% and 77.9%, respectively, when averaged on proteins with two to five disulfide bridges using 4-fold cross-validation, measured on the protein and cysteine pair on a well-defined non-homologous dataset. We assessed the effects of different sequence encoding schemes on the prediction performance of disulfide connectivity. It has been shown that the sequence encoding scheme based on multiple sequence feature vectors coupled with predicted secondary structure can significantly improve the prediction accuracy, thus enabling our method to outperform most of other currently available predictors. Our work provides a complementary approach to the current algorithms that should be useful in computationally assigning disulfide connectivity patterns and helps in the annotation of protein sequences generated by large-scale whole-genome projects. AVAILABILITY: The prediction web server and Supplementary Material are accessible at http://foo.maths.uq.edu.au/~huber/disulfide  相似文献   

14.
A method is described to construct sets of decoy models that can be used to generate a background score distribution for protein structure comparison. The models are derived directly from the two proteins being compared and retain all the essential properties of the structures, including length, density, shape and secondary structure composition but have different folds. As each comparison involves a pair of proteins of the same length, no explicit normalisation is required to adjust for the length of the proteins being compared. This allows substructure (or domain) matches to score almost equally to the comparison of isolated domains. A normalised probability measure was derived that allows joint family/family comparison. The method was applied to some of the CASP6 models for targets with new folds.  相似文献   

15.
A complete set of nearest neighbor parameters to predict the enthalpy change of RNA secondary structure formation was derived. These parameters can be used with available free energy nearest neighbor parameters to extend the secondary structure prediction of RNA sequences to temperatures other than 37°C. The parameters were tested by predicting the secondary structures of sequences with known secondary structure that are from organisms with known optimal growth temperatures. Compared with the previous set of enthalpy nearest neighbor parameters, the sensitivity of base pair prediction improved from 65.2 to 68.9% at optimal growth temperatures ranging from 10 to 60°C. Base pair probabilities were predicted with a partition function and the positive predictive value of structure prediction is 90.4% when considering the base pairs in the lowest free energy structure with pairing probability of 0.99 or above. Moreover, a strong correlation is found between the predicted melting temperatures of RNA sequences and the optimal growth temperatures of the host organism. This indicates that organisms that live at higher temperatures have evolved RNA sequences with higher melting temperatures.  相似文献   

16.
MOTIVATION: A large body of evidence suggests that protein structural information is frequently encoded in local sequences-sequence-structure relationships derived from local structure/sequence analyses could significantly enhance the capacities of protein structure prediction methods. In this paper, the prediction capacity of a database (LSBSP2) that organizes local sequence-structure relationships encoded in local structures with two consecutive secondary structure elements is tested with two computational procedures for protein structure prediction. The goal is twofold: to test the folding hypothesis that local structures are determined by local sequences, and to enhance our capacity in predicting protein structures from their amino acid sequences. RESULTS: The LSBSP2 database contains a large set of sequence profiles derived from exhaustive pair-wise structural alignments for local structures with two consecutive secondary structure elements. One computational procedure makes use of the PSI-BLAST alignment program to predict local structures for testing sequence fragments by matching the testing sequence fragments onto the sequence profiles in the LSBSP2 database. The results show that 54% of the test sequence fragments were predicted with local structures that match closely with their native local structures. The other computational procedure is a filter system that is capable of removing false positives as possible from a set of PSI-BLAST hits. An assessment with a large set of non-redundant protein structures shows that the PSI-BLAST + filter system improves the prediction specificity by up to two-fold over the prediction specificity of the PSI-BLAST program for distantly related protein pairs. Tests with the two computational procedures above demonstrate that local sequence-structure relationships can indeed enhance our capacity in protein structure prediction. The results also indicate that local sequences encoded with strong local structure propensities play an important role in determining the native state folding topology.  相似文献   

17.
Exponential growth in the number of available protein sequences is unmatched by the slower growth in the number of structures. As a result, the development of efficient and fast protein secondary structure prediction methods is essential for the broad comprehension of protein structures. Computational methods that can efficiently determine secondary structure can in turn facilitate protein tertiary structure prediction, since most methods rely initially on secondary structure predictions. Recently, we have developed a fast learning optimized prediction methodology (FLOPRED) for predicting protein secondary structure (Saraswathi et al. in JMM 18:4275, 2012). Data are generated by using knowledge-based potentials combined with structure information from the CATH database. A neural network-based extreme learning machine (ELM) and advanced particle swarm optimization (PSO) are used with this data to obtain better and faster convergence to more accurate secondary structure predicted results. A five-fold cross-validated testing accuracy of 83.8 % and a segment overlap (SOV) score of 78.3 % are obtained in this study. Secondary structure predictions and their accuracy are usually presented for three secondary structure elements: α-helix, β-strand and coil but rarely have the results been analyzed with respect to their constituent amino acids. In this paper, we use the results obtained with FLOPRED to provide detailed behaviors for different amino acid types in the secondary structure prediction. We investigate the influence of the composition, physico-chemical properties and position specific occurrence preferences of amino acids within secondary structure elements. In addition, we identify the correlation between these properties and prediction accuracy. The present detailed results suggest several important ways that secondary structure predictions can be improved in the future that might lead to improved protein design and engineering.  相似文献   

18.
19.
A pair of neural network-based algorithms is presented for predicting the tertiary structural class and the secondary structure of proteins. Each algorithm realizes improvements in accuracy based on information provided by the other. Structural class prediction of proteins nonhomologous to any in the training set is improved significantly, from 62.3% to 73.9%, and secondary structure prediction accuracy improves slightly, from 62.26% to 62.64%. A number of aspects of neural network optimization and testing are examined. They include network overtraining and an output filter based on a rolling average. Secondary structure prediction results vary greatly depending on the particular proteins chosen for the training and test sets; consequently, an appropriate measure of accuracy reflects the more unbiased approach of “jackknife” cross-validation (testing each protein in the database individually).  相似文献   

20.
A novel algorithm is proposed for predicting transmembrane protein secondary structure from two-dimensional vector trajectories consisting of a hydropathy index and formal charge of a test amino acid sequence using stochastic dynamical system models. Two prediction problems are discussed. One is the prediction of transmembrane region counts; another is that of transmembrane regions, i.e. predicting whether or not each amino acid belongs to a transmembrane region. The prediction accuracies, using a collection of well-characterized transmembrane protein sequences and benchmarking sequences, suggest that the proposed algorithm performs reasonably well. An experiment was performed with a glutamate transporter homologue from Pyrococcus horikoshii. The predicted transmembrane regions of the five human glutamate transporter sequences and observations based on the computed likelihood are reported.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号