首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 265 毫秒
1.

Background

The past few years have seen a rapid development in novel high-throughput technologies that have created large-scale data on protein-protein interactions (PPI) across human and most model species. This data is commonly represented as networks, with nodes representing proteins and edges representing the PPIs. A fundamental challenge to bioinformatics is how to interpret this wealth of data to elucidate the interaction of patterns and the biological characteristics of the proteins. One significant purpose of this interpretation is to predict unknown protein functions. Although many approaches have been proposed in recent years, the challenge still remains how to reasonably and precisely measure the functional similarities between proteins to improve the prediction effectiveness.

Results

We used a Semantic and Layered Protein Function Prediction (SLPFP) framework to more effectively predict unknown protein functions at different functional levels. The framework relies on a new protein similarity measurement and a clustering-based protein function prediction algorithm. The new protein similarity measurement incorporates the topological structure of the PPI network, as well as the protein’s semantic information in terms of known protein functions at different functional layers. Experiments on real PPI datasets were conducted to evaluate the effectiveness of the proposed framework in predicting unknown protein functions.

Conclusion

The proposed framework has a higher prediction accuracy compared with other similar approaches. The prediction results are stable even for a large number of proteins. Furthermore, the framework is able to predict unknown functions at different functional layers within the Munich Information Center for Protein Sequence (MIPS) hierarchical functional scheme. The experimental results demonstrated that the new protein similarity measurement reflects more reasonably and precisely relationships between proteins.  相似文献   

2.
BackgroundSimilarity based computational methods are a useful tool for predicting protein functions from protein–protein interaction (PPI) datasets. Although various similarity-based prediction algorithms have been proposed, unsatisfactory prediction results have occurred on many occasions. The purpose of this type of algorithm is to predict functions of an unannotated protein from the functions of those proteins that are similar to the unannotated protein. Therefore, the prediction quality largely depends on how to select a set of proper proteins (i.e., a prediction domain) from which the functions of an unannotated protein are predicted, and how to measure the similarity between proteins. Another issue with existing algorithms is they only believe the function prediction is a one-off procedure, ignoring the fact that interactions amongst proteins are mutual and dynamic in terms of similarity when predicting functions. How to resolve these major issues to increase prediction quality remains a challenge in computational biology.ResultsIn this paper, we propose an innovative approach to predict protein functions of unannotated proteins iteratively from a PPI dataset. The iterative approach takes into account the mutual and dynamic features of protein interactions when predicting functions, and addresses the issues of protein similarity measurement and prediction domain selection by introducing into the prediction algorithm a new semantic protein similarity and a method of selecting the multi-layer prediction domain. The new protein similarity is based on the multi-layered information carried by protein functions. The evaluations conducted on real protein interaction datasets demonstrated that the proposed iterative function prediction method outperformed other similar or non-iterative methods, and provided better prediction results.ConclusionsThe new protein similarity derived from multi-layered information of protein functions more reasonably reflects the intrinsic relationships among proteins, and significant improvement to the prediction quality can occur through incorporation of mutual and dynamic features of protein interactions into the prediction algorithm.  相似文献   

3.
Dramatic improvements in high throughput sequencing technologies have led to a staggering growth in the number of predicted genes. However, a large fraction of these newly discovered genes do not have a functional assignment. Fortunately, a variety of novel high-throughput genome-wide functional screening technologies provide important clues that shed light on gene function. The integration of heterogeneous data to predict protein function has been shown to improve the accuracy of automated gene annotation systems. In this paper, we propose and evaluate a probabilistic approach for protein function prediction that integrates protein-protein interaction (PPI) data, gene expression data, protein motif information, mutant phenotype data, and protein localization data. First, functional linkage graphs are constructed from PPI data and gene expression data, in which an edge between nodes (proteins) represents evidence for functional similarity. The assumption here is that graph neighbors are more likely to share protein function, compared to proteins that are not neighbors. The functional linkage graph model is then used in concert with protein domain, mutant phenotype and protein localization data to produce a functional prediction. Our method is applied to the functional prediction of Saccharomyces cerevisiae genes, using Gene Ontology (GO) terms as the basis of our annotation. In a cross validation study we show that the integrated model increases recall by 18%, compared to using PPI data alone at the 50% precision. We also show that the integrated predictor is significantly better than each individual predictor. However, the observed improvement vs. PPI depends on both the new source of data and the functional category to be predicted. Surprisingly, in some contexts integration hurts overall prediction accuracy. Lastly, we provide a comprehensive assignment of putative GO terms to 463 proteins that currently have no assigned function.  相似文献   

4.
In recent years, significant effort has been given to predicting protein functions from protein interaction data generated from high throughput techniques. However, predicting protein functions correctly and reliably still remains a challenge. Recently, many computational methods have been proposed for predicting protein functions. Among these methods, clustering based methods are the most promising. The existing methods, however, mainly focus on protein relationship modeling and the prediction algorithms that statically predict functions from the clusters that are related to the unannotated proteins. In fact, the clustering itself is a dynamic process and the function prediction should take this dynamic feature of clustering into consideration. Unfortunately, this dynamic feature of clustering is ignored in the existing prediction methods. In this paper, we propose an innovative progressive clustering based prediction method to trace the functions of relevant annotated proteins across all clusters that are generated through the progressive clustering of proteins. A set of prediction criteria is proposed to predict functions of unannotated proteins from all relevant clusters and traced functions. The method was evaluated on real protein interaction datasets and the results demonstrated the effectiveness of the proposed method compared with representative existing methods.  相似文献   

5.

Background

Effectively predicting protein complexes not only helps to understand the structures and functions of proteins and their complexes, but also is useful for diagnosing disease and developing new drugs. Up to now, many methods have been developed to detect complexes by mining dense subgraphs from static protein-protein interaction (PPI) networks, while ignoring the value of other biological information and the dynamic properties of cellular systems.

Results

In this paper, based on our previous works CPredictor and CPredictor2.0, we present a new method for predicting complexes from PPI networks with both gene expression data and protein functional annotations, which is called CPredictor3.0. This new method follows the viewpoint that proteins in the same complex should roughly have similar functions and are active at the same time and place in cellular systems. We first detect active proteins by using gene express data of different time points and cluster proteins by using gene ontology (GO) functional annotations, respectively. Then, for each time point, we do set intersections with one set corresponding to active proteins generated from expression data and the other set corresponding to a protein cluster generated from functional annotations. Each resulting unique set indicates a cluster of proteins that have similar function(s) and are active at that time point. Following that, we map each cluster of active proteins of similar function onto a static PPI network, and get a series of induced connected subgraphs. We treat these subgraphs as candidate complexes. Finally, by expanding and merging these candidate complexes, the predicted complexes are obtained.We evaluate CPredictor3.0 and compare it with a number of existing methods on several PPI networks and benchmarking complex datasets. The experimental results show that CPredictor3.0 achieves the highest F1-measure, which indicates that CPredictor3.0 outperforms these existing method in overall.

Conclusion

CPredictor3.0 can serve as a promising tool of protein complex prediction.
  相似文献   

6.
The use of high-throughput techniques to generate large volumes of protein-protein interaction (PPI) data has increased the need for methods that systematically and automatically suggest functional relationships among proteins. In a yeast PPI network, previous work has shown that the local connection topology, particularly for two proteins sharing an unusually large number of neighbors, can predict functional association. In this study we improved the prediction scheme by developing a new algorithm and applied it on a human PPI network to make a genome-wide functional inference. We used the new algorithm to measure and reduce the influence of hub proteins on detecting function-associated protein pairs. We used the annotations of the Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) as benchmarks to compare and evaluate the function relevance. The application of our algorithms to human PPI data yielded 4,233 significant functional associations among 1,754 proteins. Further functional comparisons between them allowed us to assign 466 KEGG pathway annotations to 274 proteins and 123 GO annotations to 114 proteins with estimated false discovery rates of <21% for KEGG and <30% for GO. We clustered 1,729 proteins by their functional associations and made functional inferences from detailed analysis on one subcluster highly enriched in the TGF-β signaling pathway (P<10−50). Analysis of another four subclusters also suggested potential new players in six signaling pathways worthy of further experimental investigations. Our study gives clear insight into the common neighbor-based prediction scheme and provides a reliable method for large-scale functional annotation in this post-genomic era.  相似文献   

7.
He J  Gu H  Liu W 《PloS one》2012,7(6):e37155
It is well known that an important step toward understanding the functions of a protein is to determine its subcellular location. Although numerous prediction algorithms have been developed, most of them typically focused on the proteins with only one location. In recent years, researchers have begun to pay attention to the subcellular localization prediction of the proteins with multiple sites. However, almost all the existing approaches have failed to take into account the correlations among the locations caused by the proteins with multiple sites, which may be the important information for improving the prediction accuracy of the proteins with multiple sites. In this paper, a new algorithm which can effectively exploit the correlations among the locations is proposed by using gaussian process model. Besides, the algorithm also can realize optimal linear combination of various feature extraction technologies and could be robust to the imbalanced data set. Experimental results on a human protein data set show that the proposed algorithm is valid and can achieve better performance than the existing approaches.  相似文献   

8.

Background

Accurate annotation of protein functions is still a big challenge for understanding life in the post-genomic era. Many computational methods based on protein-protein interaction (PPI) networks have been proposed to predict the function of proteins. However, the precision of these predictions still needs to be improved, due to the incompletion and noise in PPI networks. Integrating network topology and biological information could improve the accuracy of protein function prediction and may also lead to the discovery of multiple interaction types between proteins. Current algorithms generate a single network, which is archived using a weighted sum of all types of protein interactions.

Method

The influences of different types of interactions on the prediction of protein functions are not the same. To address this, we construct multilayer protein networks (MPN) by integrating PPI networks, the domain of proteins, and information on protein complexes. In the MPN, there is more than one type of connections between pairwise proteins. Different types of connections reflect different roles and importance in protein function prediction. Based on the MPN, we propose a new protein function prediction method, named function prediction based on multilayer protein networks (FP-MPN). Given an un-annotated protein, the FP-MPN method visits each layer of the MPN in turn and generates a set of candidate neighbors with known functions. A set of predicted functions for the testing protein is then formed and all of these functions are scored and sorted. Each layer plays different importance on the prediction of protein functions. A number of top-ranking functions are selected to annotate the unknown protein.

Conclusions

The method proposed in this paper was a better predictor when used on Saccharomyces cerevisiae protein data than other function prediction methods previously used. The proposed FP-MPN method takes different roles of connections in protein function prediction into account to reduce the artificial noise by introducing biological information.
  相似文献   

9.
Xie  Minzhu  Lei  Xiaowen  Zhong  Jianchen  Ouyang  Jianxing  Li  Guijing 《BMC bioinformatics》2022,23(8):1-13
Background

Essential proteins are indispensable to the development and survival of cells. The identification of essential proteins not only is helpful for the understanding of the minimal requirements for cell survival, but also has practical significance in disease diagnosis, drug design and medical treatment. With the rapidly amassing of protein–protein interaction (PPI) data, computationally identifying essential proteins from protein–protein interaction networks (PINs) becomes more and more popular. Up to now, a number of various approaches for essential protein identification based on PINs have been developed.

Results

In this paper, we propose a new and effective approach called iMEPP to identify essential proteins from PINs by fusing multiple types of biological data and applying the influence maximization mechanism to the PINs. Concretely, we first integrate PPI data, gene expression data and Gene Ontology to construct weighted PINs, to alleviate the impact of high false-positives in the raw PPI data. Then, we define the influence scores of nodes in PINs with both orthological data and PIN topological information. Finally, we develop an influence discount algorithm to identify essential proteins based on the influence maximization mechanism.

Conclusions

We applied our method to identifying essential proteins from saccharomyces cerevisiae PIN. Experiments show that our iMEPP method outperforms the existing methods, which validates its effectiveness and advantage.

  相似文献   

10.
Assigning biological functions to uncharacterized proteins is a fundamental problem in the postgenomic era. The increasing availability of large amounts of data on protein-protein interactions (PPIs) has led to the emergence of a considerable number of computational methods for determining protein function in the context of a network. These algorithms, however, treat each functional class in isolation and thereby often suffer from the difficulty of the scarcity of labeled data. In reality, different functional classes are naturally dependent on one another. We propose a new algorithm, Multi-label Correlated Semi-supervised Learning (MCSL), to incorporate the intrinsic correlations among functional classes into protein function prediction by leveraging the relationships provided by the PPI network and the functional class network. The guiding intuition is that the classification function should be sufficiently smooth on subgraphs where the respective topologies of these two networks are a good match. We encode this intuition as regularized learning with intraclass and interclass consistency, which can be understood as an extension of the graph-based learning with local and global consistency (LGC) method. Cross validation on the yeast proteome illustrates that MCSL consistently outperforms several state-of-the-art methods. Most notably, it effectively overcomes the problem associated with scarcity of label data. The supplementary files are freely available at http://sites.google.com/site/csaijiang/MCSL.  相似文献   

11.
Protein docking methods are powerful computational tools to study protein-protein interactions (PPI). While a significant number of docking algorithms have been developed, they are usually based on rigid protein models or with limited considerations of protein flexibility and the desolvation effect is rarely considered in docking energy functions, which may lower the accuracy of the predictions. To address these issues, we introduce a PPI energy function based on the site-identification by ligand competitive saturation (SILCS) framework and utilize the fast Fourier transform (FFT) correlation approach. The free energy content of the SILCS FragMaps represent an alternative to traditional energy grids and they can be efficiently utilized to guide FFT-based protein docking. Application of the approach to eight diverse test cases, including seven from Protein Docking Benchmark 5.0, showed the PPI prediction using SILCS approach (SILCS-PPI) to be competitive with several commonly used protein docking methods indicating that the method has the ability to both qualitatively and quantitatively inform the prediction of PPI. Results show the utility of the SILCS-PPI docking approach for determination of probability distributions of PPI interactions over the surface of both partner proteins, allowing for identification of alternate binding poses. Such binding poses are confirmed by experimental crystal contacts in our test cases. While more computationally demanding than available PPI docking technologies, we anticipate that the SILCS-PPI docking approach will offer an alternative methodology for improved evaluation of PPIs that could be used in a variety of fields from systems biology to excipient design for biologics-based drugs.  相似文献   

12.
Protein–protein interaction (PPI) establishes the central basis for complex cellular networks in a biological cell. Association of proteins with other proteins occurs at varying affinities, yet with a high degree of specificity. PPIs lead to diverse functionality such as catalysis, regulation, signaling, immunity, and inhibition, playing a crucial role in functional genomics. The molecular principle of such interactions is often elusive in nature. Therefore, a comprehensive analysis of known protein complexes from the Protein Data Bank (PDB) is essential for the characterization of structural interface features to determine structure–function relationship. Thus, we analyzed a nonredundant dataset of 278 heterodimer protein complexes, categorized into major functional classes, for distinguishing features. Interestingly, our analysis has identified five key features (interface area, interface polar residue abundance, hydrogen bonds, solvation free energy gain from interface formation, and binding energy) that are discriminatory among the functional classes using Kruskal-Wallis rank sum test. Significant correlations between these PPI interface features amongst functional categories are also documented. Salt bridges correlate with interface area in regulator-inhibitors (r = 0.75). These representative features have implications for the prediction of potential function of novel protein complexes. The results provide molecular insights for better understanding of PPIs and their relation to biological functions.  相似文献   

13.
Kim Y  Min B  Yi GS 《Proteome science》2012,10(Z1):S9

Background

Deciphering protein-protein interaction (PPI) in domain level enriches valuable information about binding mechanism and functional role of interacting proteins. The 3D structures of complex proteins are reliable source of domain-domain interaction (DDI) but the number of proven structures is very limited. Several resources for the computationally predicted DDI have been generated but they are scattered in various places and their prediction show erratic performances. A well-organized PPI and DDI analysis system integrating these data with fair scoring system is necessary.

Method

We integrated three structure-based DDI datasets and twenty computationally predicted DDI datasets and constructed an interaction analysis system, named IDDI, which enables to browse protein and domain interactions with their relationships. To integrate heterogeneous DDI information, a novel scoring scheme is introduced to determine the reliability of DDI by considering the prediction scores of each DDI and the confidence levels of each prediction method in the datasets, and independencies between predicted datasets. In addition, we connected this DDI information to the comprehensive PPI information and developed a unified interface for the interaction analysis exploring interaction networks at both protein and domain level.

Result

IDDI provides 204,705 DDIs among total 7,351 Pfam domains in the current version. The result presents that total number of DDIs is increased eight times more than that of previous studies. Due to the increment of data, 50.4% of PPIs could be correlated with DDIs which is more than twice of previous resources. Newly designed scoring scheme outperformed the previous system in its accuracy too. User interface of IDDI system provides interactive investigation of proteins and domains in interactions with interconnected way. A specific example is presented to show the efficiency of the systems to acquire the comprehensive information of target protein with PPI and DDI relationships. IDDI is freely available at http://pcode.kaist.ac.kr/iddi/.
  相似文献   

14.

Background  

Genome sequencing projects generate massive amounts of sequence data but there are still many proteins whose functions remain unknown. The availability of large scale protein-protein interaction data sets makes it possible to develop new function prediction methods based on protein-protein interaction (PPI) networks. Although several existing methods combine multiple information resources, there is no study that integrates protein domain information and PPI networks to predict protein functions.  相似文献   

15.
The prediction of the network of protein-protein interactions (PPI) of an organism is crucial for the understanding of biological processes and for the development of new drugs. Machine learning methods have been successfully applied to the prediction of PPI in yeast by the integration of multiple direct and indirect biological data sources. However, experimental data are not available for most organisms. We propose here an ensemble machine learning approach for the prediction of PPI that depends solely on features independent from experimental data. We developed new estimators of the coevolution between proteins and combined them in an ensemble learning procedure.We applied this method to a dataset of known co-complexed proteins in Escherichia coli and compared it to previously published methods. We show that our method allows prediction of PPI with an unprecedented precision of 95.5% for the first 200 sorted pairs of proteins compared to 28.5% on the same dataset with the previous best method.A close inspection of the best predicted pairs allowed us to detect new or recently discovered interactions between chemotactic components, the flagellar apparatus and RNA polymerase complexes in E. coli.  相似文献   

16.
《Genomics》2020,112(1):174-183
Protein complexes are one of the most important functional units for deriving biological processes within the cell. Experimental methods have provided valuable data to infer protein complexes. However, these methods have inherent limitations. Considering these limitations, many computational methods have been proposed to predict protein complexes, in the last decade. Almost all of these in-silico methods predict protein complexes from the ever-increasing protein–protein interaction (PPI) data. These computational approaches usually use the PPI data in the format of a huge protein–protein interaction network (PPIN) as input and output various sub-networks of the given PPIN as the predicted protein complexes. Some of these methods have already reached a promising efficiency in protein complex detection. Nonetheless, there are challenges in prediction of other types of protein complexes, specially sparse and small ones. New methods should further incorporate the knowledge of biological properties of proteins to improve the performance. Additionally, there are several challenges that should be considered more effectively in designing the new complex prediction algorithms in the future. This article not only reviews the history of computational protein complex prediction but also provides new insight for improvement of new methodologies. In this article, most important computational methods for protein complex prediction are evaluated and compared. In addition, some of the challenges in the reconstruction of the protein complexes are discussed. Finally, various tools for protein complex prediction and PPIN analysis as well as the current high-throughput databases are reviewed.  相似文献   

17.
Using indirect protein-protein interactions for protein complex prediction   总被引:1,自引:0,他引:1  
Protein complexes are fundamental for understanding principles of cellular organizations. As the sizes of protein-protein interaction (PPI) networks are increasing, accurate and fast protein complex prediction from these PPI networks can serve as a guide for biological experiments to discover novel protein complexes. However, it is not easy to predict protein complexes from PPI networks, especially in situations where the PPI network is noisy and still incomplete. Here, we study the use of indirect interactions between level-2 neighbors (level-2 interactions) for protein complex prediction. We know from previous work that proteins which do not interact but share interaction partners (level-2 neighbors) often share biological functions. We have proposed a method in which all direct and indirect interactions are first weighted using topological weight (FS-Weight), which estimates the strength of functional association. Interactions with low weight are removed from the network, while level-2 interactions with high weight are introduced into the interaction network. Existing clustering algorithms can then be applied to this modified network. We have also proposed a novel algorithm that searches for cliques in the modified network, and merge cliques to form clusters using a "partial clique merging" method. Experiments show that (1) the use of indirect interactions and topological weight to augment protein-protein interactions can be used to improve the precision of clusters predicted by various existing clustering algorithms; and (2) our complex-finding algorithm performs very well on interaction networks modified in this way. Since no other information except the original PPI network is used, our approach would be very useful for protein complex prediction, especially for prediction of novel protein complexes.  相似文献   

18.
Protein–protein interactions (PPIs) play very important roles in many cellular processes, and provide rich information for discovering biological facts and knowledge. Although various experimental approaches have been developed to generate large amounts of PPI data for different organisms, high-throughput experimental data usually suffers from high error rates, and as a consequence, the biological knowledge discovered from this data is distorted or incorrect. Therefore, it is vital to assess the quality of protein interaction data and extract reliable protein interactions from the high-throughput experimental data. In this paper, we propose a new Semantic Reliability (SR) method to assess the reliability of each protein interaction and identify potential false-positive protein interactions in a dataset. For each pair of target interacting proteins, the SR method takes into account the semantic influence between proteins that interact with the target proteins, and the semantic influence between the target proteins themselves when assessing the interaction reliability. Evaluations on real protein interaction datasets demonstrated that our method outperformed other existing methods in terms of extracting more reliable interactions from original protein interaction datasets.  相似文献   

19.
We present a new computational method for predicting ligand binding residues and functional sites in protein sequences. These residues and sites tend to be not only conserved, but also exhibit strong correlation due to the selection pressure during evolution in order to maintain the required structure and/or function. To explore the effect of correlations among multiple positions in the sequences, the method uses graph theoretic clustering and kernel-based canonical correlation analysis (kCCA) to identify binding and functional sites in protein sequences as the residues that exhibit strong correlation between the residues’ evolutionary characterization at the sites and the structure-based functional classification of the proteins in the context of a functional family. The results of testing the method on two well-curated data sets show that the prediction accuracy as measured by Receiver Operating Characteristic (ROC) scores improves significantly when multipositional correlations are accounted for.  相似文献   

20.
Zhao XM  Wang Y  Chen L  Aihara K 《Proteins》2008,72(1):461-473
Domains are structural and functional units of proteins and play an important role in functional genomics. Theoretically, the functions of a protein can be directly inferred if the biological functions of its component domains are determined. Despite the important role that domains play, only a small number of domains have been annotated so far, and few works have been performed to predict the functions of domains. Hence, it is necessary to develop automatic methods for predicting domain functions based on various available data. In this article, two new methods, that is, the threshold-based classification method and the support vector machines method, are proposed for protein domain function prediction by integrating heterogeneous information sources, including protein-domain mapping features, domain-domain interactions, and domain coexisting features. We show that the integration of heterogeneous information sources improves not only prediction accuracy but also annotation reliability when compared with the methods using only individual information sources.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号