首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
2.
Alpha helix transmembrane proteins (αTMPs) represent roughly 30% of all open reading frames (ORFs) in a typical genome and are involved in many critical biological processes. Due to the special physicochemical properties, it is hard to crystallize and obtain high resolution structures experimentally, thus, sequence-based topology prediction is highly desirable for the study of transmembrane proteins (TMPs), both in structure prediction and function prediction. Various model-based topology prediction methods have been developed, but the accuracy of those individual predictors remain poor due to the limitation of the methods or the features they used. Thus, the consensus topology prediction method becomes practical for high accuracy applications by combining the advances of the individual predictors. Here, based on the observation that inter-helical interactions are commonly found within the transmembrane helixes (TMHs) and strongly indicate the existence of them, we present a novel consensus topology prediction method for αTMPs, CNTOP, which incorporates four top leading individual topology predictors, and further improves the prediction accuracy by using the predicted inter-helical interactions. The method achieved 87% prediction accuracy based on a benchmark dataset and 78% accuracy based on a non-redundant dataset which is composed of polytopic αTMPs. Our method derives the highest topology accuracy than any other individual predictors and consensus predictors, at the same time, the TMHs are more accurately predicted in their length and locations, where both the false positives (FPs) and the false negatives (FNs) decreased dramatically. The CNTOP is available at: http://ccst.jlu.edu.cn/JCSB/cntop/CNTOP.html.  相似文献   

3.
Global protein function prediction from protein-protein interaction networks   总被引:20,自引:0,他引:20  
Determining protein function is one of the most challenging problems of the post-genomic era. The availability of entire genome sequences and of high-throughput capabilities to determine gene coexpression patterns has shifted the research focus from the study of single proteins or small complexes to that of the entire proteome. In this context, the search for reliable methods for assigning protein function is of primary importance. There are various approaches available for deducing the function of proteins of unknown function using information derived from sequence similarity or clustering patterns of co-regulated genes, phylogenetic profiles, protein-protein interactions (refs. 5-8 and Samanta, M.P. and Liang, S., unpublished data), and protein complexes. Here we propose the assignment of proteins to functional classes on the basis of their network of physical interactions as determined by minimizing the number of protein interactions among different functional categories. Function assignment is proteome-wide and is determined by the global connectivity pattern of the protein network. The approach results in multiple functional assignments, a consequence of the existence of multiple equivalent solutions. We apply the method to analyze the yeast Saccharomyces cerevisiae protein-protein interaction network. The robustness of the approach is tested in a system containing a high percentage of unclassified proteins and also in cases of deletion and insertion of specific protein interactions.  相似文献   

4.
BackgroundSimilarity based computational methods are a useful tool for predicting protein functions from protein–protein interaction (PPI) datasets. Although various similarity-based prediction algorithms have been proposed, unsatisfactory prediction results have occurred on many occasions. The purpose of this type of algorithm is to predict functions of an unannotated protein from the functions of those proteins that are similar to the unannotated protein. Therefore, the prediction quality largely depends on how to select a set of proper proteins (i.e., a prediction domain) from which the functions of an unannotated protein are predicted, and how to measure the similarity between proteins. Another issue with existing algorithms is they only believe the function prediction is a one-off procedure, ignoring the fact that interactions amongst proteins are mutual and dynamic in terms of similarity when predicting functions. How to resolve these major issues to increase prediction quality remains a challenge in computational biology.ResultsIn this paper, we propose an innovative approach to predict protein functions of unannotated proteins iteratively from a PPI dataset. The iterative approach takes into account the mutual and dynamic features of protein interactions when predicting functions, and addresses the issues of protein similarity measurement and prediction domain selection by introducing into the prediction algorithm a new semantic protein similarity and a method of selecting the multi-layer prediction domain. The new protein similarity is based on the multi-layered information carried by protein functions. The evaluations conducted on real protein interaction datasets demonstrated that the proposed iterative function prediction method outperformed other similar or non-iterative methods, and provided better prediction results.ConclusionsThe new protein similarity derived from multi-layered information of protein functions more reasonably reflects the intrinsic relationships among proteins, and significant improvement to the prediction quality can occur through incorporation of mutual and dynamic features of protein interactions into the prediction algorithm.  相似文献   

5.
6.
7.
The function of the protein is primarily dictated by its structure. Therefore it is far more logical to find the functional clues of the protein in its overall 3-dimensional fold or its global structure. In this paper, we have developed a novel Support Vector Machines (SVM) based prediction model for functional classification and prediction of proteins using features extracted from its global structure based on fragment libraries. Fragment libraries have been previously used for abintio modelling of proteins and protein structure comparisons. The query protein structure is broken down into a collection of short contiguous backbone fragments and this collection is discretized using a library of fragments. The input feature vector is frequency vector that counts the number of each library fragment in the collection of fragments by all-to-all fragment comparisons. SVM models were trained and optimised for obtaining the best 10-fold Cross validation accuracy for classification. As an example, this method was applied for prediction and classification of Cell Adhesion molecules (CAMs). Thirty-four different fragment libraries with sizes ranging from 4 to 400 and fragment lengths ranging from 4 to 12 were used for obtaining the best prediction model. The best 10-fold CV accuracy of 95.25% was obtained for library of 400 fragments of length 10. An accuracy of 87.5% was obtained on an unseen test dataset consisting of 20 CAMs and 20 NonCAMs. This shows that protein structure can be accurately and uniquely described using 400 representative fragments of length 10.  相似文献   

8.
从蛋白质序列出发,采用分组重量编码(Encoding Based on Grouped Weight,简记EBGW),并结合最近邻居算法对蛋白质功能进行预测。对酵母(Saccharomyces cerevisiae)蛋白质的1826条序列进行预测,整体预测准确率与其他基于序列信息的蛋白质功能预测方法相当。实验结果表明基于EBGW编码方案的新方法可有效地应用于蛋白质功能预测。  相似文献   

9.
《Genomics》2020,112(2):1941-1946
In this paper, a step-by-step classification algorithm based on double-layer SVM model is constructed to predict the secondary structure of proteins. The most important feature of this algorithm is to improve the prediction accuracy of α+β and α/β classes through transforming the prediction of two classes of proteins, α+β and α/β classes, with low accuracy in the past, into the prediction of all-α and all-β classes with high accuracy. A widely-used dataset, 25PDB dataset with sequence similarity lower than 40%, is used to evaluate this method. The results show that this method has good performance, and on the basis of ensuring the accuracy of other three structural classes of proteins, the accuracy of α+β class proteins is improved significantly.  相似文献   

10.
Li Z  Zhou X  Dai Z  Zou X 《Amino acids》2012,43(2):793-804
The coupling between G protein-coupled receptors (GPCRs) and guanine nucleotide-binding proteins (G proteins) regulates various signal transductions from extracellular space into the cell. However, the coupling mechanism between GPCRs and G proteins is still unknown, and experimental determination of their coupling specificity and function is both expensive and time consuming. Therefore, it is significant to develop a theoretical method to predict the coupling specificity between GPCRs and G proteins as well as their function using their primary sequences. In this study, a novel four-layer predictor (GPCRsG_CWTIT) based on support vector machine (SVM), continuous wavelet transform (CWT) and information theory (IT) is developed to classify G proteins and predict the coupling specificity between GPCRs and G proteins. SVM is used for construction of models. CWT and IT are used to characterize the primary structure of protein. Performance of GPCRsG_CWTIT is evaluated with cross-validation test on various working dataset. The overall accuracy of the G proteins at the levels of class and family is 98.23 and 85.42%, respectively. The accuracy of the coupling specificity prediction varies from 74.60 to 94.30%. These results indicate that the proposed predictor is an effective and feasible tool to predict the coupling specificity between GPCRs and G proteins as well as their functions using only the protein full sequence. The establishment of such an accurate prediction method will facilitate drug discovery by improving the ability to identify and predict protein-protein interactions. GPCRsG_CWTIT and dataset can be acquired freely on request from the authors.  相似文献   

11.
Development of novel statistical potentials for protein fold recognition   总被引:5,自引:0,他引:5  
The need to perform large-scale studies of protein fold recognition, structure prediction and protein-protein interactions has led to novel developments of residue-level minimal models of proteins. A minimum requirement for useful protein force-fields is that they be successful in the recognition of native conformations. The balance between the level of detail in describing the specific interactions within proteins and the accuracy obtained using minimal protein models is the focus of many current protein studies. Recent results suggest that the introduction of explicit orientation dependence in a coarse-grained, residue-level model improves the ability of inter-residue potentials to recognize the native state. New statistical and optimization computational algorithms can be used to obtain accurate residue-dependent potentials for use in protein fold recognition and, more importantly, structure prediction.  相似文献   

12.
The increasing number and diversity of protein sequence families requires new methods to define and predict details regarding function. Here, we present a method for analysis and prediction of functional sub-types from multiple protein sequence alignments. Given an alignment and set of proteins grouped into sub-types according to some definition of function, such as enzymatic specificity, the method identifies positions that are indicative of functional differences by comparison of sub-type specific sequence profiles, and analysis of positional entropy in the alignment. Alignment positions with significantly high positional relative entropy correlate with those known to be involved in defining sub-types for nucleotidyl cyclases, protein kinases, lactate/malate dehydrogenases and trypsin-like serine proteases. We highlight new positions for these proteins that suggest additional experiments to elucidate the basis of specificity. The method is also able to predict sub-type for unclassified sequences. We assess several variations on a prediction method, and compare them to simple sequence comparisons. For assessment, we remove close homologues to the sequence for which a prediction is to be made (by a sequence identity above a threshold). This simulates situations where a protein is known to belong to a protein family, but is not a close relative of another protein of known sub-type. Considering the four families above, and a sequence identity threshold of 30 %, our best method gives an accuracy of 96 % compared to 80 % obtained for sequence similarity and 74 % for BLAST. We describe the derivation of a set of sub-type groupings derived from an automated parsing of alignments from PFAM and the SWISSPROT database, and use this to perform a large-scale assessment. The best method gives an average accuracy of 94 % compared to 68 % for sequence similarity and 79 % for BLAST. We discuss implications for experimental design, genome annotation and the prediction of protein function and protein intra-residue distances.  相似文献   

13.
14.
Prediction of the β-Hairpins in Proteins Using Support Vector Machine   总被引:1,自引:0,他引:1  
Hu XZ  Li QZ 《The protein journal》2008,27(2):115-122
By using of the composite vector with increment of diversity and scoring function to express the information of sequence, a support vector machine (SVM) algorithm for predicting β-hairpin motifs is proposed. The prediction is done on a dataset of 3,088 non homologous proteins containing 6,027 β-hairpins. The overall accuracy of prediction and Matthew’s correlation coefficient are 79.9% and 0.59 for the independent testing dataset. In addition, a higher accuracy of 83.3% and Matthew’s correlation coefficient of 0.67 in the independent testing dataset are obtained on a dataset previously used by Kumar et al. (Nuclic Acid Res 33:154–159). The performance of the method is also evaluated by predicting the β-hairpins of in the CASP6 proteins, and the better results are obtained. Moreover, this method is used to predict four kinds of supersecondary structures. The overall accuracy of prediction is 64.5% for the independent testing dataset.  相似文献   

15.
We demonstrate the applicability of our previously developed Bayesian probabilistic approach for predicting residue solvent accessibility to the problem of predicting secondary structure. Using only single-sequence data, this method achieves a three-state accuracy of 67% over a database of 473 non-homologous proteins. This approach is more amenable to inspection and less likely to overlearn specifics of a dataset than "black box" methods such as neural networks. It is also conceptually simpler and less computationally costly. We also introduce a novel method for representing and incorporating multiple-sequence alignment information within the prediction algorithm, achieving 72% accuracy over a dataset of 304 non-homologous proteins. This is accomplished by creating a statistical model of the evolutionarily derived correlations between patterns of amino acid substitution and local protein structure. This model consists of parameter vectors, termed "substitution schemata," which probabilistically encode the structure-based heterogeneity in the distributions of amino acid substitutions found in alignments of homologous proteins. The model is optimized for structure prediction by maximizing the mutual information between the set of schemata and the database of secondary structures. Unlike "expert heuristic" methods, this approach has been demonstrated to work well over large datasets. Unlike the opaque neural network algorithms, this approach is physicochemically intelligible. Moreover, the model optimization procedure, the formalism for predicting one-dimensional structural features and our previously developed method for tertiary structure recognition all share a common Bayesian probabilistic basis. This consistency starkly contrasts with the hybrid and ad hoc nature of methods that have dominated this field in recent years.  相似文献   

16.

Background

Accurate annotation of protein functions is still a big challenge for understanding life in the post-genomic era. Many computational methods based on protein-protein interaction (PPI) networks have been proposed to predict the function of proteins. However, the precision of these predictions still needs to be improved, due to the incompletion and noise in PPI networks. Integrating network topology and biological information could improve the accuracy of protein function prediction and may also lead to the discovery of multiple interaction types between proteins. Current algorithms generate a single network, which is archived using a weighted sum of all types of protein interactions.

Method

The influences of different types of interactions on the prediction of protein functions are not the same. To address this, we construct multilayer protein networks (MPN) by integrating PPI networks, the domain of proteins, and information on protein complexes. In the MPN, there is more than one type of connections between pairwise proteins. Different types of connections reflect different roles and importance in protein function prediction. Based on the MPN, we propose a new protein function prediction method, named function prediction based on multilayer protein networks (FP-MPN). Given an un-annotated protein, the FP-MPN method visits each layer of the MPN in turn and generates a set of candidate neighbors with known functions. A set of predicted functions for the testing protein is then formed and all of these functions are scored and sorted. Each layer plays different importance on the prediction of protein functions. A number of top-ranking functions are selected to annotate the unknown protein.

Conclusions

The method proposed in this paper was a better predictor when used on Saccharomyces cerevisiae protein data than other function prediction methods previously used. The proposed FP-MPN method takes different roles of connections in protein function prediction into account to reduce the artificial noise by introducing biological information.
  相似文献   

17.
18.
Yuan Z  Burrage K  Mattick JS 《Proteins》2002,48(3):566-570
A Support Vector Machine learning system has been trained to predict protein solvent accessibility from the primary structure. Different kernel functions and sliding window sizes have been explored to find how they affect the prediction performance. Using a cut-off threshold of 15% that splits the dataset evenly (an equal number of exposed and buried residues), this method was able to achieve a prediction accuracy of 70.1% for single sequence input and 73.9% for multiple alignment sequence input, respectively. The prediction of three and more states of solvent accessibility was also studied and compared with other methods. The prediction accuracies are better than, or comparable to, those obtained by other methods such as neural networks, Bayesian classification, multiple linear regression, and information theory. In addition, our results further suggest that this system may be combined with other prediction methods to achieve more reliable results, and that the Support Vector Machine method is a very useful tool for biological sequence analysis.  相似文献   

19.
A graph-theory algorithm for rapid protein side-chain prediction   总被引:19,自引:0,他引:19       下载免费PDF全文
Fast and accurate side-chain conformation prediction is important for homology modeling, ab initio protein structure prediction, and protein design applications. Many methods have been presented, although only a few computer programs are publicly available. The SCWRL program is one such method and is widely used because of its speed, accuracy, and ease of use. A new algorithm for SCWRL is presented that uses results from graph theory to solve the combinatorial problem encountered in the side-chain prediction problem. In this method, side chains are represented as vertices in an undirected graph. Any two residues that have rotamers with nonzero interaction energies are considered to have an edge in the graph. The resulting graph can be partitioned into connected subgraphs with no edges between them. These subgraphs can in turn be broken into biconnected components, which are graphs that cannot be disconnected by removal of a single vertex. The combinatorial problem is reduced to finding the minimum energy of these small biconnected components and combining the results to identify the global minimum energy conformation. This algorithm is able to complete predictions on a set of 180 proteins with 34342 side chains in <7 min of computer time. The total chi(1) and chi(1 + 2) dihedral angle accuracies are 82.6% and 73.7% using a simple energy function based on the backbone-dependent rotamer library and a linear repulsive steric energy. The new algorithm will allow for use of SCWRL in more demanding applications such as sequence design and ab initio structure prediction, as well addition of a more complex energy function and conformational flexibility, leading to increased accuracy.  相似文献   

20.
Hu LL  Huang T  Cai YD  Chou KC 《PloS one》2011,6(7):e22989
Determining the body fluids where secreted proteins can be secreted into is important for protein function annotation and disease biomarker discovery. In this study, we developed a network-based method to predict which kind of body fluids human proteins can be secreted into. For a newly constructed benchmark dataset that consists of 529 human-secreted proteins, the prediction accuracy for the most possible body fluid location predicted by our method via the jackknife test was 79.02%, significantly higher than the success rate by a random guess (29.36%). The likelihood that the predicted body fluids of the first four orders contain all the true body fluids where the proteins can be secreted into is 62.94%. Our method was further demonstrated with two independent datasets: one contains 57 proteins that can be secreted into blood; while the other contains 61 proteins that can be secreted into plasma/serum and were possible biomarkers associated with various cancers. For the 57 proteins in first dataset, 55 were correctly predicted as blood-secrete proteins. For the 61 proteins in the second dataset, 58 were predicted to be most possible in plasma/serum. These encouraging results indicate that the network-based prediction method is quite promising. It is anticipated that the method will benefit the relevant areas for both basic research and drug development.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号