首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 9 毫秒
1.
We have modified and improved the GOR algorithm for the protein secondary structure prediction by using the evolutionary information provided by multiple sequence alignments, adding triplet statistics, and optimizing various parameters. We have expanded the database used to include the 513 non-redundant domains collected recently by Cuff and Barton (Proteins 1999;34:508-519; Proteins 2000;40:502-511). We have introduced a variable size window that allowed us to include sequences as short as 20-30 residues. A significant improvement over the previous versions of GOR algorithm was obtained by combining the PSI-BLAST multiple sequence alignments with the GOR method. The new algorithm will form the basis for the future GOR V release on an online prediction server. The average accuracy of the prediction of secondary structure with multiple sequence alignment and full jack-knife procedure was 73.5%. The accuracy of the prediction increases to 74.2% by limiting the prediction to 375 (of 513) sequences having at least 50 PSI-BLAST alignments. The average accuracy of the prediction of the new improved program without using multiple sequence alignments was 67.5%. This is approximately a 3% improvement over the preceding GOR IV algorithm (Garnier J, Gibrat JF, Robson B. Methods Enzymol 1996;266:540-553; Kloczkowski A, Ting K-L, Jernigan RL, Garnier J. Polymer 2002;43:441-449). We have discussed alternatives to the segment overlap (Sov) coefficient proposed by Zemla et al. (Proteins 1999;34:220-223).  相似文献   

2.
Using evolutionary information contained in multiple sequence alignments as input to neural networks, secondary structure can be predicted at significantly increased accuracy. Here, we extend our previous three-level system of neural networks by using additional input information derived from multiple alignments. Using a position-specific conservation weight as part of the input increases performance. Using the number of insertions and deletions reduces the tendency for overprediction and increases overall accuracy. Addition of the global amino acid content yields a further improvement, mainly in predicting structural class. The final network system has a sustained overall accuracy of 71.6% in a multiple cross-validation test on 126 unique protein chains. A test on a new set of 124 recently solved protein structures that have no significant sequence similarity to the learning set confirms the high level of accuracy. The average cross-validated accuracy for all 250 sequence-unique chains is above 72%. Using various data sets, the method is compared to alternative prediction methods, some of which also use multiple alignments: the performance advantage of the network system is at least 6 percentage points in three-state accuracy. In addition, the network estimates secondary structure content from multiple sequence alignments about as well as circular dichroism spectroscopy on a single protein and classifies 75% of the 250 proteins correctly into one of four protein structural classes. Of particular practical importance is the definition of a position-specific reliability index. For 40% of all residues the method has a sustained three-state accuracy of 88%, as high as the overall average for homology modelling. A further strength of the method is greatly increased accuracy in predicting the placement of secondary structure segments. © 1994 Wiley-Liss, Inc.  相似文献   

3.

Background  

We present a novel method of protein fold decoy discrimination using machine learning, more specifically using neural networks. Here, decoy discrimination is represented as a machine learning problem, where neural networks are used to learn the native-like features of protein structures using a set of positive and negative training examples. A set of native protein structures provides the positive training examples, while negative training examples are simulated decoy structures obtained by reversing the sequences of native structures. Various features are extracted from the training dataset of positive and negative examples and used as inputs to the neural networks.  相似文献   

4.
Nair R  Rost B 《Proteins》2003,53(4):917-930
The native sub-cellular compartment of a protein is one aspect of its function. Thus, predicting localization is an important step toward predicting function. Short zip code-like sequence fragments regulate some of the shuttling between compartments. Cataloguing and predicting such motifs is the most accurate means of determining localization in silico. However, only few motifs are currently known, and not all the trafficking appears regulated in this way. The amino acid composition of a protein correlates with its localization. All general prediction methods employed this observation. Here, we explored the evolutionary information contained in multiple alignments and aspects of protein structure to predict localization in absence of homology and targeting motifs. Our final system combined statistical rules and a variety of neural networks to achieve an overall four-state accuracy above 65%, a significant improvement over systems using only composition. The system was at its best for extra-cellular and nuclear proteins; it was significantly less accurate than TargetP for mitochondrial proteins. Interestingly, all methods that were developed on SWISS-PROT sequences failed grossly when fed with sequences from proteins of known structures taken from PDB. We therefore developed two separate systems: one for proteins of known structure and one for proteins of unknown structure. Finally, we applied the PDB-based system along with homology-based inferences and automatic text analysis to annotate all eukaryotic proteins in the PDB (http://cubic.bioc.columbia.edu/db/LOC3D). We imagine that this pilot method-certainly in combination with similar tools-may be valuable target selection in structural genomics.  相似文献   

5.
The pattern of residue substitution in divergently evolving families of globular proteins is highly variable. At each position in a fold there are constraints on the identities of amino acids from both the three-dimensional structure and the function of the protein. To characterize and quantify the structural constraints, we have made a comparative analysis of families of homologous globular proteins. Residues are classified according to amino acid type, secondary structure, accessibility of the sidechain, and existence of hydrogen bonds from sidechain to other sidechains or peptide carbonyl or amide functions. There are distinct patterns of substitution especially where residues are both solvent inaccessible and hydrogen bonded through their sidechains. The patterns of residue substitution can be used to construct templates or to identify 'key' residues if one or more structures are known. Conversely, analysis of conversation and substitution across a large family of aligned sequences in terms of substitution profiles can allow prediction of tertiary environment or indicate a functional role. Similar analyses can be used to test the validity of putative structures if several homologous sequences are available.  相似文献   

6.
Wu S  Zhang Y 《Nucleic acids research》2007,35(10):3375-3382
We developed LOMETS, a local threading meta-server, for quick and automated predictions of protein tertiary structures and spatial constraints. Nine state-of-the-art threading programs are installed and run in a local computer cluster, which ensure the quick generation of initial threading alignments compared with traditional remote-server-based meta-servers. Consensus models are generated from the top predictions of the component-threading servers, which are at least 7% more accurate than the best individual servers based on TM-score at a t-test significance level of 0.1%. Moreover, side-chain and C-alpha (C(alpha)) contacts of 42 and 61% accuracy respectively, as well as long- and short-range distant maps, are automatically constructed from the threading alignments. These data can be easily used as constraints to guide the ab initio procedures such as TASSER for further protein tertiary structure modeling. The LOMETS server is freely available to the academic community at http://zhang.bioinformatics.ku.edu/LOMETS.  相似文献   

7.

Background  

The prediction of the secondary structure of proteins is one of the most studied problems in bioinformatics. Despite their success in many problems of biological sequence analysis, Hidden Markov Models (HMMs) have not been used much for this problem, as the complexity of the task makes manual design of HMMs difficult. Therefore, we have developed a method for evolving the structure of HMMs automatically, using Genetic Algorithms (GAs).  相似文献   

8.
Three-dimensional protein structures can be described with a library of 3D fragments that define a structural alphabet. We have previously proposed such an alphabet, composed of 16 patterns of five consecutive amino acids, called Protein Blocks (PBs). These PBs have been used to describe protein backbones and to predict local structures from protein sequences. The Q16 prediction rate reaches 40.7% with an optimization procedure. This article examines two aspects of PBs. First, we determine the effect of the enlargement of databanks on their definition. The results show that the geometrical features of the different PBs are preserved (local RMSD value equal to 0.41 A on average) and sequence-structure specificities reinforced when databanks are enlarged. Second, we improve the methods for optimizing PB predictions from sequences, revisiting the optimization procedure and exploring different local prediction strategies. Use of a statistical optimization procedure for the sequence-local structure relation improves prediction accuracy by 8% (Q16 = 48.7%). Better recognition of repetitive structures occurs without losing the prediction efficiency of the other local folds. Adding secondary structure prediction improved the accuracy of Q16 by only 1%. An entropy index (Neq), strongly related to the RMSD value of the difference between predicted PBs and true local structures, is proposed to estimate prediction quality. The Neq is linearly correlated with the Q16 prediction rate distributions, computed for a large set of proteins. An "expected" prediction rate QE16 is deduced with a mean error of 5%.  相似文献   

9.
We present a model of amino acid sequence evolution based on a hidden Markov model that extends to transmembrane proteins previous methods that incorporate protein structural information into phylogenetics. Our model aims to give a better understanding of processes of molecular evolution and to extract structural information from multiple alignments of transmembrane sequences and use such information to improve phylogenetic analyses. This should be of value in phylogenetic studies of transmembrane proteins: for example, mitochondrial proteins have acquired a special importance in phylogenetics and are mostly transmembrane proteins. The improvement in fit to example data sets of our new model relative to less complex models of amino acid sequence evolution is statistically tested. To further illustrate the potential utility of our method, phylogeny estimation is performed on primate CCR5 receptor sequences, sequences of l and m subunits of the light reaction center in purple bacteria, guinea pig sequences with respect to lagomorph and rodent sequences of calcitonin receptor and K-substance receptor, and cetacean sequences of cytochrome b.  相似文献   

10.
The knowledge collated from the known protein structures has revealed that the proteins are usually folded into the four structural classes: all-α, all-β, α/β and α + β. A number of methods have been proposed to predict the protein's structural class from its primary structure; however, it has been observed that these methods fail or perform poorly in the cases of distantly related sequences. In this paper, we propose a new method for protein structural class prediction using low homology (twilight-zone) protein sequences dataset. Since protein structural class prediction is a typical classification problem, we have developed a Support Vector Machine (SVM)-based method for protein structural class prediction that uses features derived from the predicted secondary structure and predicted burial information of amino acid residues. The examination of different individual as well as feature combinations revealed that the combination of secondary structural content, secondary structural and solvent accessibility state frequencies of amino acids gave rise to the best leave-one-out cross-validation accuracy of ~81% which is comparable to the best accuracy reported in the literature so far.  相似文献   

11.
Dong Q  Wang X  Lin L  Wang Y 《Proteins》2008,72(1):163-172
In recent years, protein structure prediction using local structure information has made great progress. Many fragment libraries or structure alphabets have been developed. In this study, the entropies and correlations of local structures are first calculated. The results show that neighboring local structures are strongly correlated. Then, a dual-layer model has been designed for protein local structure prediction. The position-specific score matrix, generated by PSI-BLAST, is inputted to the first-layer classifier, whose output is further enhanced by a second-layer classifier. The neural network is selected as the classifier. Two structure alphabets are explored, which are represented in Cartesian coordinate space and in torsion angles space respectively. Testing on the nonredundant dataset shows that the dual-layer model is an efficient method for protein local structure prediction. The Q-scores are 0.456 and 0.585 for the two structure alphabets, which is a significant improvement in comparison with related works.  相似文献   

12.

Background  

Residue depth allows determining how deeply a given residue is buried, in contrast to the solvent accessibility that differentiates between buried and solvent-exposed residues. When compared with the solvent accessibility, the depth allows studying deep-level structures and functional sites, and formation of the protein folding nucleus. Accurate prediction of residue depth would provide valuable information for fold recognition, prediction of functional sites, and protein design.  相似文献   

13.
One key element in understanding the molecular machinery of the cell is to understand the structure and function of each protein encoded in the genome. A very successful means of inferring the structure or function of a previously unannotated protein is via sequence similarity with one or more proteins whose structure or function is already known. Toward this end, we propose a means of representing proteins using pairwise sequence similarity scores. This representation, combined with a discriminative classification algorithm known as the support vector machine (SVM), provides a powerful means of detecting subtle structural and evolutionary relationships among proteins. The algorithm, called SVM-pairwise, when tested on its ability to recognize previously unseen families from the SCOP database, yields significantly better performance than SVM-Fisher, profile HMMs, and PSI-BLAST.  相似文献   

14.

Background

Protein sequence profile-profile alignment is an important approach to recognizing remote homologs and generating accurate pairwise alignments. It plays an important role in protein sequence database search, protein structure prediction, protein function prediction, and phylogenetic analysis.

Results

In this work, we integrate predicted solvent accessibility, torsion angles and evolutionary residue coupling information with the pairwise Hidden Markov Model (HMM) based profile alignment method to improve profile-profile alignments. The evaluation results demonstrate that adding predicted relative solvent accessibility and torsion angle information improves the accuracy of profile-profile alignments. The evolutionary residue coupling information is helpful in some cases, but its contribution to the improvement is not consistent.

Conclusion

Incorporating the new structural information such as predicted solvent accessibility and torsion angles into the profile-profile alignment is a useful way to improve pairwise profile-profile alignment methods.  相似文献   

15.
MOTIVATION: Many important biological processes such as cell signaling, transport of membrane-impermeable molecules, cell-cell communication, cell recognition and cell adhesion are mediated by membrane proteins. Unfortunately, as these proteins are not water soluble, it is extremely hard to experimentally determine their structure. Therefore, improved methods for predicting the structure of these proteins are vital in biological research. In order to improve transmembrane topology prediction, we evaluate the combined use of both integrated signal peptide prediction and evolutionary information in a single algorithm. RESULTS: A new method (MEMSAT3) for predicting transmembrane protein topology from sequence profiles is described and benchmarked with full cross-validation on a standard data set of 184 transmembrane proteins. The method is found to predict both the correct topology and the locations of transmembrane segments for 80% of the test set. This compares with accuracies of 62-72% for other popular methods on the same benchmark. By using a second neural network specifically to discriminate transmembrane from globular proteins, a very low overall false positive rate (0.5%) can also be achieved in detecting transmembrane proteins. AVAILABILITY: An implementation of the described method is available both as a web server (http://www.psipred.net) and as downloadable source code from http://bioinf.cs.ucl.ac.uk/memsat. Both the server and source code files are free to non-commercial users. Benchmark and training data are also available from http://bioinf.cs.ucl.ac.uk/memsat.  相似文献   

16.
One major problem with the existing algorithm for the prediction of protein structural classes is low accuracies for proteins from α/β and α+β classes. In this study, three novel features were rationally designed to model the differences between proteins from these two classes. In combination with other rational designed features, an 11-dimensional vector prediction method was proposed. By means of this method, the overall prediction accuracy based on 25PDB dataset was 1.5% higher than the previous best-performing method, MODAS. Furthermore, the prediction accuracy for proteins from α+β class based on 25PDB dataset was 5% higher than the previous best-performing method, SCPRED. The prediction accuracies obtained with the D675 and FC699 datasets were also improved.  相似文献   

17.
Hering JA  Innocent PR  Haris PI 《Proteomics》2003,3(8):1464-1475
Fourier transform infrared (FTIR) spectroscopy is a very flexible technique for characterization of protein secondary structure. Measurements can be carried out rapidly in a number of different environments based on only small quantities of proteins. For this technique to become more widely used for protein secondary structure characterization, however, further developments in methods to accurately quantify protein secondary structure are necessary. Here we propose a structural classification of proteins (SCOP) class specialized neural networks architecture combining an adaptive neuro-fuzzy inference system (ANFIS) with SCOP class specialized backpropagation neural networks for improved protein secondary structure prediction. Our study shows that proteins can be accurately classified into two main classes "all alpha proteins" and "all beta proteins" merely based on the amide I band maximum position of their FTIR spectra. ANFIS is employed to perform the classification task to demonstrate the potential of this architecture with moderately complex problems. Based on studies using a reference set of 17 proteins and an evaluation set of 4 proteins, improved predictions were achieved compared to a conventional neural network approach, where structure specialized neural networks are trained based on protein spectra of both "all alpha" and "all beta" proteins. The standard errors of prediction (SEPs) in % structure were improved by 4.05% for helix structure, by 5.91% for sheet structure, by 2.68% for turn structure, and by 2.15% for bend structure. For other structure, an increase of SEP by 2.43% was observed. Those results were confirmed by a "leave-one-out" run with the combined set of 21 FTIR spectra of proteins.  相似文献   

18.
New methods, essentially based on hidden Markov models (HMM) and neural networks (NN), can predict the topography of both beta-barrel and all-alpha membrane proteins with high accuracy and a low rate of false positives and false negatives. These methods have been integrated in a suite of programs to filter proteomes of Gram-negative bacteria, searching for new membrane proteins.  相似文献   

19.
Knowledge of three dimensional structure is essential to understand the function of a protein. Although the overall fold is made from the whole details of its sequence, a small group of residues, often called as structural motifs, play a crucial role in determining the protein fold and its stability. Identification of such structural motifs requires sufficient number of sequence and structural homologs to define conservation and evolutionary information. Unfortunately, there are many structures in the protein structure databases have no homologous structures or sequences. In this work, we report an SVM method, SMpred, to identify structural motifs from single protein structure without using sequence and structural homologs. SMpred method was trained and tested using 132 proteins domains containing 581 motifs. SMpred method achieved 78.79% accuracy with 79.06% sensitivity and 78.53% specificity. The performance of SMpred was evaluated with MegaMotifBase using 188 proteins containing 1161 motifs. Out of 1161 motifs, SMpred correctly identified 1503 structural motifs reported in MegaMotifBase. Further, we showed that SMpred is useful approach for the length deviant superfamilies and single member superfamilies. This result suggests the usefulness of our approach for facilitating the identification of structural motifs in protein structure in the absence of sequence and structural homologs. The dataset and executable for the SMpred algorithm is available at http://www3.ntu.edu.sg/home/EPNSugan/index_files/SMpred.htm.  相似文献   

20.
MOTIVATION: In our previous approach, we proposed a hybrid method for protein secondary structure prediction called HYPROSP, which combined our proposed knowledge-based prediction algorithm PROSP and PSIPRED. The knowledge base constructed for PROSP contains small peptides together with their secondary structural information. The hybrid strategy of HYPROSP uses a global quantitative measure, match rate, to determine whether PROSP or PSIPRED is to be used for the prediction of a target protein. HYPROSP made slight improvement of Q(3) over PSIPRED because PROSP predicted well for proteins with match rate >80%. As the portion of proteins with match rate >80% is quite small and as the performance of PSIPRED also improves, the advantage of HYPROSP is diluted. To overcome this limitation and further improve the hybrid prediction method, we present in this paper a new hybrid strategy HYPROSP II that is based on a new quantitative measure called local match rate. RESULTS: Local match rate indicates the amount of structural information that each amino acid can extract from the knowledge base. With the local match rate, we are able to define a confidence level of the PROSP prediction results for each amino acid. Our new hybrid approach, HYPROSP II, is proposed as follows: for each amino acid in a target protein, we combine the prediction results of PROSP and PSIPRED using a hybrid function defined on their respective confidence levels. Two datasets in nrDSSP and EVA are used to perform a 10-fold cross validation. The average Q(3) of HYPROSP II is 81.8% and 80.7% on nrDSSP and EVA datasets, respectively, which is 2.0% and 1.1% better than that of PSIPRED. For local structures with match rate >80%, the average Q(3) improvement is 4.4% on the nrDSSP dataset. The use of local match rate improves the accuracy better than global match rate. There has been a long history of attempts to improve secondary structure prediction. We believe that HYPROSP II has greatly utilized the power of peptide knowledge base and raised the prediction accuracy to a new high. The method we developed in this paper could have a profound effect on the general use of knowledge base techniques for various predictionalgorithms. AVAILABILITY: The Linux executable file of HYPROSP II, as well as both nrDSSP and EVA datasets can be downloaded from http://bioinformatics.iis.sinica.edu.tw/HYPROSPII/.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号