首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background  

Genome scale data on protein interactions are generally represented as large networks, or graphs, where hundreds or thousands of proteins are linked to one another. Since proteins tend to function in groups, or complexes, an important goal has been to reliably identify protein complexes from these graphs. This task is commonly executed using clustering procedures, which aim at detecting densely connected regions within the interaction graphs. There exists a wealth of clustering algorithms, some of which have been applied to this problem. One of the most successful clustering procedures in this context has been the Markov Cluster algorithm (MCL), which was recently shown to outperform a number of other procedures, some of which were specifically designed for partitioning protein interactions graphs. A novel promising clustering procedure termed Affinity Propagation (AP) was recently shown to be particularly effective, and much faster than other methods for a variety of problems, but has not yet been applied to partition protein interaction graphs.  相似文献   

2.
Nesvizhskii AI 《Proteomics》2012,12(10):1639-1655
Analysis of protein interaction networks and protein complexes using affinity purification and mass spectrometry (AP/MS) is among most commonly used and successful applications of proteomics technologies. One of the foremost challenges of AP/MS data is a large number of false-positive protein interactions present in unfiltered data sets. Here we review computational and informatics strategies for detecting specific protein interaction partners in AP/MS experiments, with a focus on incomplete (as opposite to genome wide) interactome mapping studies. These strategies range from standard statistical approaches, to empirical scoring schemes optimized for a particular type of data, to advanced computational frameworks. The common denominator among these methods is the use of label-free quantitative information such as spectral counts or integrated peptide intensities that can be extracted from AP/MS data. We also discuss related issues such as combining multiple biological or technical replicates, and dealing with data generated using different tagging strategies. Computational approaches for benchmarking of scoring methods are discussed, and the need for generation of reference AP/MS data sets is highlighted. Finally, we discuss the possibility of more extended modeling of experimental AP/MS data, including integration with external information such as protein interaction predictions based on functional genomics data.  相似文献   

3.
Recent large-scale data sets of protein complex purifications have provided unprecedented insights into the organization of cellular protein complexes. Several computational methods have been developed to detect co-complexed proteins in these data sets. Their common aim is the identification of biologically relevant protein complexes. However, much less is known about the network of direct physical protein contacts within the detected protein complexes. Therefore, our work investigates whether direct physical contacts can be computationally derived by combining raw data of large-scale protein complex purifications. We assess four established scoring schemes and introduce a new scoring approach that is specifically devised to infer direct physical protein contacts from protein complex purifications. The physical contacts identified by the five methods are comprehensively benchmarked against different reference sets that provide evidence for true physical contacts. Our results show that raw purification data can indeed be exploited to determine high-confidence physical protein contacts within protein complexes. In particular, our new method outperforms competing approaches at discovering physical contacts involving proteins that have been screened multiple times in purification experiments. It also excels in the analysis of recent protein purification screens of molecular chaperones and protein kinases. In contrast to previous findings, we observe that physical contacts inferred from purification experiments of protein complexes can be qualitatively comparable to binary protein interactions measured by experimental high-throughput assays such as yeast two-hybrid. This suggests that computationally derived physical contacts might complement binary protein interaction assays and guide large-scale interactome mapping projects by prioritizing putative physical contacts for further experimental screens.  相似文献   

4.
5.
Lee AJ  Lin MC  Hsu CM 《Bio Systems》2011,103(3):392-399
Many methods have been proposed for mining protein complexes from a protein-protein interaction network; however, most of them focus on unweighted networks and cannot find overlapping protein complexes. Since one protein may serve different roles within different functional groups, mining overlapping protein complexes in a weighted protein-protein interaction network has attracted more and more attention recently. In this paper, we propose an effective method, called MDOS (Mining Dense Overlapping Subgraphs), for mining dense overlapping protein complexes (subgraphs) in a weighted protein-protein interaction network. The proposed method can integrate the information about known complexes into a weighted protein-protein interaction network to improve the mining results. The experiment results show that our method mines more known complexes and has higher sensitivity and accuracy than the CODENSE and MCL methods.  相似文献   

6.
Zaki N  Berengueres J  Efimov D 《Proteins》2012,80(10):2459-2468
Detecting protein complexes from protein‐protein interaction (PPI) network is becoming a difficult challenge in computational biology. There is ample evidence that many disease mechanisms involve protein complexes, and being able to predict these complexes is important to the characterization of the relevant disease for diagnostic and treatment purposes. This article introduces a novel method for detecting protein complexes from PPI by using a protein ranking algorithm (ProRank). ProRank quantifies the importance of each protein based on the interaction structure and the evolutionarily relationships between proteins in the network. A novel way of identifying essential proteins which are known for their critical role in mediating cellular processes and constructing protein complexes is proposed and analyzed. We evaluate the performance of ProRank using two PPI networks on two reference sets of protein complexes created from Munich Information Center for Protein Sequence, containing 81 and 162 known complexes, respectively. We compare the performance of ProRank to some of the well known protein complex prediction methods (ClusterONE, CMC, CFinder, MCL, MCode and Core) in terms of precision and recall. We show that ProRank predicts more complexes correctly at a competitive level of precision and recall. The level of the accuracy achieved using ProRank in comparison to other recent methods for detecting protein complexes is a strong argument in favor of the proposed method. Proteins 2012;. © 2012 Wiley Periodicals, Inc.  相似文献   

7.

Background  

Identifying all protein-protein interactions in an organism is a major objective of proteomics. A related goal is to know which protein pairs are present in the same protein complex. High-throughput methods such as yeast two-hybrid (Y2H) and affinity purification coupled with mass spectrometry (APMS) have been used to detect interacting proteins on a genomic scale. However, both Y2H and APMS methods have substantial false-positive rates. Aside from high-throughput interaction screens, other gene- or protein-pair characteristics may also be informative of physical interaction. Therefore it is desirable to integrate multiple datasets and utilize their different predictive value for more accurate prediction of co-complexed relationship.  相似文献   

8.
9.
Tu S  Chen R  Xu L 《Proteome science》2011,9(Z1):S18
BACKGROUND: Identifying biologically relevant protein complexes from a large protein-protein interaction (PPI) network, is essential to understand the organization of biological systems. However, high-throughput experimental techniques that can produce a large amount of PPIs are known to yield non-negligible rates of false-positives and false-negatives, making the protein complexes difficult to be identified. RESULTS: We propose a binary matrix factorization (BMF) algorithm under the Bayesian Ying-Yang (BYY) harmony learning, to detect protein complexes by clustering the proteins which share similar interactions through factorizing the binary adjacent matrix of a PPI network. The proposed BYY-BMF algorithm automatically determines the cluster number while this number is pre-given for most existing BMF algorithms. Also, BYY-BMF's clustering results does not depend on any parameters or thresholds, unlike the Markov Cluster Algorithm (MCL) that relies on a so-called inflation parameter. On synthetic PPI networks, the predictions evaluated by the known annotated complexes indicate that BYY-BMF is more robust than MCL for most cases. On real PPI networks from the MIPS and DIP databases, BYY-BMF obtains a better balanced prediction accuracies than MCL and a spectral analysis method, while MCL has its own advantages, e.g., with good separation values.  相似文献   

10.
Protein–protein interactions (PPIs) play crucial roles in a number of biological processes. Recently, protein interaction networks (PINs) for several model organisms and humans have been generated, but few large-scale researches for mice have ever been made neither experimentally nor computationally. In the work, we undertook an effort to map a mouse PIN, in which protein interactions are hidden in enormous amount of biomedical literatures. Following a co-occurrence-based text-mining approach, a probabilistic model—naïve Bayesian was used to filter false-positive interactions by integrating heterogeneous kinds of evidence from genomic and proteomic datasets. A support vector machine algorithm was further used to choose protein pairs with physical interactions. By comparing with the currently available PPI datasets from several model organisms and humans, it showed that the derived mouse PINs have similar topological properties at the global level, but a high local divergence. The mouse protein interaction dataset is stored in the Mouse protein–protein interaction DataBase (MppDB) that is useful source of information for system-level understanding of gene function and biological processes in mammals. Access to the MppDB database is public available at http://bio.scu.edu.cn/mppi.  相似文献   

11.
Martin O  Schomburg D 《Proteins》2008,70(4):1367-1378
Biological systems and processes rely on a complex network of molecular interactions. While the association of biological macromolecules is a fundamental biochemical phenomenon crucial for the understanding of complex living systems, protein-protein docking methods aim for the computational prediction of protein complexes from individual subunits. Docking algorithms generally produce large numbers of putative protein complexes with only few of these conformations resembling the native complex structure within an acceptable degree of structural similarity. A major challenge in the field of docking is to extract near-native structure(s) out of the large pool of solutions, the so called scoring or ranking problem. A series of structural, chemical, biological and physical properties are used in this work to classify docked protein-protein complexes. These properties include specialized energy functions, evolutionary relationship, class specific residue interface propensities, gap volume, buried surface area, empiric pair potentials on residue and atom level as well as measures for the tightness of fit. Efficient comprehensive scoring functions have been developed using probabilistic Support Vector Machines in combination with this array of properties on the largest currently available protein-protein docking benchmark. The established classifiers are shown to be specific for certain types of protein-protein complexes and are able to detect near-native complex conformations from large sets of decoys with high sensitivity. Using classification probabilities the ranking of near-native structures was drastically improved, leading to a significant enrichment of near-native complex conformations within the top ranks. It could be shown that the developed schemes outperform five other previously published scoring functions.  相似文献   

12.

Background  

Reversible phosphorylation of proteins is involved in a wide range of processes, ranging from signaling cascades to regulation of protein complex assembly. Little is known about the structure and evolution of phosphorylation networks. Recent high-throughput phosphoproteomics studies have resulted in the rapid accumulation of phosphopeptide datasets for many model organisms. Here, we exploit these novel data for the comparative analysis of phosphorylation events between different species of eukaryotes.  相似文献   

13.
Local network alignment is an important component of the analysis of protein-protein interaction networks that may lead to the identification of evolutionary related complexes. We present AlignNemo, a new algorithm that, given the networks of two organisms, uncovers subnetworks of proteins that relate in biological function and topology of interactions. The discovered conserved subnetworks have a general topology and need not to correspond to specific interaction patterns, so that they more closely fit the models of functional complexes proposed in the literature. The algorithm is able to handle sparse interaction data with an expansion process that at each step explores the local topology of the networks beyond the proteins directly interacting with the current solution. To assess the performance of AlignNemo, we ran a series of benchmarks using statistical measures as well as biological knowledge. Based on reference datasets of protein complexes, AlignNemo shows better performance than other methods in terms of both precision and recall. We show our solutions to be biologically sound using the concept of semantic similarity applied to Gene Ontology vocabularies. The binaries of AlignNemo and supplementary details about the algorithms and the experiments are available at: sourceforge.net/p/alignnemo.  相似文献   

14.
Recently, a number of advanced screening technologies have allowed for the comprehensive quantification of aggravating and alleviating genetic interactions among gene pairs. In parallel, TAP-MS studies (tandem affinity purification followed by mass spectroscopy) have been successful at identifying physical protein interactions that can indicate proteins participating in the same molecular complex. Here, we propose a method for the joint learning of protein complexes and their functional relationships by integration of quantitative genetic interactions and TAP-MS data. Using 3 independent benchmark datasets, we demonstrate that this method is >50% more accurate at identifying functionally related protein pairs than previous approaches. Application to genes involved in yeast chromosome organization identifies a functional map of 91 multimeric complexes, a number of which are novel or have been substantially expanded by addition of new subunits. Interestingly, we find that complexes that are enriched for aggravating genetic interactions (i.e., synthetic lethality) are more likely to contain essential genes, linking each of these interactions to an underlying mechanism. These results demonstrate the importance of both large-scale genetic and physical interaction data in mapping pathway architecture and function.  相似文献   

15.

Background  

Elucidating protein-protein interactions (PPIs) is essential to constructing protein interaction networks and facilitating our understanding of the general principles of biological systems. Previous studies have revealed that interacting protein pairs can be predicted by their primary structure. Most of these approaches have achieved satisfactory performance on datasets comprising equal number of interacting and non-interacting protein pairs. However, this ratio is highly unbalanced in nature, and these techniques have not been comprehensively evaluated with respect to the effect of the large number of non-interacting pairs in realistic datasets. Moreover, since highly unbalanced distributions usually lead to large datasets, more efficient predictors are desired when handling such challenging tasks.  相似文献   

16.
The identification of temporal protein complexes would make great contribution to our knowledge of the dynamic organization characteristics in protein interaction networks (PINs). Recent studies have focused on integrating gene expression data into static PIN to construct dynamic PIN which reveals the dynamic evolutionary procedure of protein interactions, but they fail in practice for recognizing the active time points of proteins with low or high expression levels. We construct a Time-Evolving PIN (TEPIN) with a novel method called Deviation Degree, which is designed to identify the active time points of proteins based on the deviation degree of their own expression values. Owing to the differences between protein interactions, moreover, we weight TEPIN with connected affinity and gene co-expression to quantify the degree of these interactions. To validate the efficiencies of our methods, ClusterONE, CAMSE and MCL algorithms are applied on the TEPIN, DPIN (a dynamic PIN constructed with state-of-the-art three-sigma method) and SPIN (the original static PIN) to detect temporal protein complexes. Each algorithm on our TEPIN outperforms that on other networks in terms of match degree, sensitivity, specificity, F-measure and function enrichment etc. In conclusion, our Deviation Degree method successfully eliminates the disadvantages which exist in the previous state-of-the-art dynamic PIN construction methods. Moreover, the biological nature of protein interactions can be well described in our weighted network. Weighted TEPIN is a useful approach for detecting temporal protein complexes and revealing the dynamic protein assembly process for cellular organization.  相似文献   

17.

Background

One of the crucial steps toward understanding the biological functions of a cellular system is to investigate protein–protein interaction (PPI) networks. As an increasing number of reliable PPIs become available, there is a growing need for discovering PPIs to reconstruct PPI networks of interesting organisms. Some interolog-based methods and homologous PPI families have been proposed for predicting PPIs from the known PPIs of source organisms.

Results

Here, we propose a multiple-strategy scoring method to identify reliable PPIs for reconstructing the mouse PPI network from two well-known organisms: human and fly. We firstly identified the PPI candidates of target organisms based on homologous PPIs, sharing significant sequence similarities (joint E-value ≤ 1 × 10−40), from source organisms using generalized interolog mapping. These PPI candidates were evaluated by our multiple-strategy scoring method, combining sequence similarities, normalized ranks, and conservation scores across multiple organisms. According to 106,825 PPI candidates in yeast derived from human and fly, our scoring method can achieve high prediction accuracy and outperform generalized interolog mapping. Experiment results show that our multiple-strategy score can avoid the influence of the protein family size and length to significantly improve PPI prediction accuracy and reflect the biological functions. In addition, the top-ranked and conserved PPIs are often orthologous/essential interactions and share the functional similarity. Based on these reliable predicted PPIs, we reconstructed a comprehensive mouse PPI network, which is a scale-free network and can reflect the biological functions and high connectivity of 292 KEGG modules, including 216 pathways and 76 structural complexes.

Conclusions

Experimental results show that our scoring method can improve the predicting accuracy based on the normalized rank and evolutionary conservation from multiple organisms. Our predicted PPIs share similar biological processes and cellular components, and the reconstructed genome-wide PPI network can reflect network topology and modularity. We believe that our method is useful for inferring reliable PPIs and reconstructing a comprehensive PPI network of an interesting organism.  相似文献   

18.
Proteins do not carry out their functions alone. Instead, they often act by participating in macromolecular complexes and play different functional roles depending on the other members of the complex. It is therefore interesting to identify co-complex relationships. Although protein complexes can be identified in a high-throughput manner by experimental technologies such as affinity purification coupled with mass spectrometry (APMS), these large-scale datasets often suffer from high false positive and false negative rates. Here, we present a computational method that predicts co-complexed protein pair (CCPP) relationships using kernel methods from heterogeneous data sources. We show that a diffusion kernel based on random walks on the full network topology yields good performance in predicting CCPPs from protein interaction networks. In the setting of direct ranking, a diffusion kernel performs much better than the mutual clustering coefficient. In the setting of SVM classifiers, a diffusion kernel performs much better than a linear kernel. We also show that combination of complementary information improves the performance of our CCPP recognizer. A summation of three diffusion kernels based on two-hybrid, APMS, and genetic interaction networks and three sequence kernels achieves better performance than the sequence kernels or diffusion kernels alone. Inclusion of additional features achieves a still better ROC(50) of 0.937. Assuming a negative-to-positive ratio of 600ratio1, the final classifier achieves 89.3% coverage at an estimated false discovery rate of 10%. Finally, we applied our prediction method to two recently described APMS datasets. We find that our predicted positives are highly enriched with CCPPs that are identified by both datasets, suggesting that our method successfully identifies true CCPPs. An SVM classifier trained from heterogeneous data sources provides accurate predictions of CCPPs in yeast. This computational method thereby provides an inexpensive method for identifying protein complexes that extends and complements high-throughput experimental data.  相似文献   

19.
NetworkBLAST: comparative analysis of protein networks   总被引:2,自引:0,他引:2  
The identification of protein complexes is a fundamental challenge in interpreting protein-protein interaction data. Cross-species analysis allows coping with the high levels of noise that are typical to these data. The NetworkBLAST web-server provides a platform for identifying protein complexes in protein-protein interaction networks. It can analyze a single network or two networks from different species. In the latter case, NetworkBLAST outputs a set of putative complexes that are evolutionarily conserved across the two networks. AVAILABILITY: NetworkBLAST is available as web-server at: www.cs.tau.ac.il/~roded/networkblast.htm.  相似文献   

20.
MOTIVATION: Data on protein-protein interactions (PPIs) are increasing exponentially. To date, large-scale protein interaction networks are available for human and most model species. The arising challenge is to organize these networks into models of cellular machinery. As in other biological domains, a comparative approach provides a powerful basis for addressing this challenge. RESULTS: We develop a probabilistic model for protein complexes that are conserved across two species. The model describes the evolution of conserved protein complexes from an ancestral species by protein interaction attachment and detachment and gene duplication events. We apply our model to search for conserved protein complexes within the PPI networks of yeast and fly, which are the largest networks in public databases. We detect 150 conserved complexes that match well-known complexes in yeast and are coherent in their functional annotations both in yeast and in fly. In comparison with two previous approaches, our model yields higher specificity and sensitivity levels in protein complex detection. AVAILABILITY: The program is available upon request.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号