首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Global protein function prediction from protein-protein interaction networks   总被引:20,自引:0,他引:20  
Determining protein function is one of the most challenging problems of the post-genomic era. The availability of entire genome sequences and of high-throughput capabilities to determine gene coexpression patterns has shifted the research focus from the study of single proteins or small complexes to that of the entire proteome. In this context, the search for reliable methods for assigning protein function is of primary importance. There are various approaches available for deducing the function of proteins of unknown function using information derived from sequence similarity or clustering patterns of co-regulated genes, phylogenetic profiles, protein-protein interactions (refs. 5-8 and Samanta, M.P. and Liang, S., unpublished data), and protein complexes. Here we propose the assignment of proteins to functional classes on the basis of their network of physical interactions as determined by minimizing the number of protein interactions among different functional categories. Function assignment is proteome-wide and is determined by the global connectivity pattern of the protein network. The approach results in multiple functional assignments, a consequence of the existence of multiple equivalent solutions. We apply the method to analyze the yeast Saccharomyces cerevisiae protein-protein interaction network. The robustness of the approach is tested in a system containing a high percentage of unclassified proteins and also in cases of deletion and insertion of specific protein interactions.  相似文献   

2.
Many aspects of cell signalling, trafficking, and targeting are governed by interactions between globular protein domains and short peptide segments. These domains often bind multiple peptides that share a common sequence pattern, or “linear motif” (e.g., SH3 binding to PxxP). Many domains are known, though comparatively few linear motifs have been discovered. Their short length (three to eight residues), and the fact that they often reside in disordered regions in proteins makes them difficult to detect through sequence comparison or experiment. Nevertheless, each new motif provides critical molecular details of how interaction networks are constructed, and can explain how one protein is able to bind to very different partners. Here we show that binding motifs can be detected using data from genome-scale interaction studies, and thus avoid the normally slow discovery process. Our approach based on motif over-representation in non-homologous sequences, rediscovers known motifs and predicts dozens of others. Direct binding experiments reveal that two predicted motifs are indeed protein-binding modules: a DxxDxxxD protein phosphatase 1 binding motif with a KD of 22 μM and a VxxxRxYS motif that binds Translin with a KD of 43 μM. We estimate that there are dozens or even hundreds of linear motifs yet to be discovered that will give molecular insight into protein networks and greatly illuminate cellular processes.  相似文献   

3.
4.

Background

Recent computational techniques have facilitated analyzing genome-wide protein-protein interaction data for several model organisms. Various graph-clustering algorithms have been applied to protein interaction networks on the genomic scale for predicting the entire set of potential protein complexes. In particular, the density-based clustering algorithms which are able to generate overlapping clusters, i.e. the clusters sharing a set of nodes, are well-suited to protein complex detection because each protein could be a member of multiple complexes. However, their accuracy is still limited because of complex overlap patterns of their output clusters.

Results

We present a systematic approach of refining the overlapping clusters identified from protein interaction networks. We have designed novel metrics to assess cluster overlaps: overlap coverage and overlapping consistency. We then propose an overlap refinement algorithm. It takes as input the clusters produced by existing density-based graph-clustering methods and generates a set of refined clusters by parameterizing the metrics. To evaluate protein complex prediction accuracy, we used the f-measure by comparing each refined cluster to known protein complexes. The experimental results with the yeast protein-protein interaction data sets from BioGRID and DIP demonstrate that accuracy on protein complex prediction has increased significantly after refining cluster overlaps.

Conclusions

The effectiveness of the proposed cluster overlap refinement approach for protein complex detection has been validated in this study. Analyzing overlaps of the clusters from protein interaction networks is a crucial task for understanding of functional roles of proteins and topological characteristics of the functional systems.
  相似文献   

5.
Protein–protein interactions play a key role in many biological systems. High‐throughput methods can directly detect the set of interacting proteins in yeast, but the results are often incomplete and exhibit high false‐positive and false‐negative rates. Recently, many different research groups independently suggested using supervised learning methods to integrate direct and indirect biological data sources for the protein interaction prediction task. However, the data sources, approaches, and implementations varied. Furthermore, the protein interaction prediction task itself can be subdivided into prediction of (1) physical interaction, (2) co‐complex relationship, and (3) pathway co‐membership. To investigate systematically the utility of different data sources and the way the data is encoded as features for predicting each of these types of protein interactions, we assembled a large set of biological features and varied their encoding for use in each of the three prediction tasks. Six different classifiers were used to assess the accuracy in predicting interactions, Random Forest (RF), RF similarity‐based k‐Nearest‐Neighbor, Naïve Bayes, Decision Tree, Logistic Regression, and Support Vector Machine. For all classifiers, the three prediction tasks had different success rates, and co‐complex prediction appears to be an easier task than the other two. Independently of prediction task, however, the RF classifier consistently ranked as one of the top two classifiers for all combinations of feature sets. Therefore, we used this classifier to study the importance of different biological datasets. First, we used the splitting function of the RF tree structure, the Gini index, to estimate feature importance. Second, we determined classification accuracy when only the top‐ranking features were used as an input in the classifier. We find that the importance of different features depends on the specific prediction task and the way they are encoded. Strikingly, gene expression is consistently the most important feature for all three prediction tasks, while the protein interactions identified using the yeast‐2‐hybrid system were not among the top‐ranking features under any condition. Proteins 2006. © 2006 Wiley‐Liss, Inc.  相似文献   

6.
Function prediction and protein networks   总被引:3,自引:0,他引:3  
In the genomics era, the interactions between proteins are at the center of attention. Genomic-context methods used to predict these interactions have been put on a quantitative basis, revealing that they are at least on an equal footing with genomics experimental data. A survey of experimentally confirmed predictions proves the applicability of these methods, and new concepts to predict protein interactions in eukaryotes have been described. Finally, the interaction networks that can be obtained by combining the predicted pair-wise interactions have enough internal structure to detect higher levels of organization, such as 'functional modules'.  相似文献   

7.
Schächter V 《BioTechniques》2002,(Z1):16-8, 20-4, 26-7
We survey recent techniques for construction and prediction of large-scale protein interaction networks, focusing on computational processing steps. Special emphasis is placed on critical assessment of data completeness and reliability of the various approaches. Once built, protein interaction networks can be used for functional annotation or to generate higher-level biological hypotheses on pathways.  相似文献   

8.
Modular organization of protein interaction networks   总被引:6,自引:0,他引:6  
MOTIVATION: Accumulating evidence suggests that biological systems are composed of interacting, separable, functional modules. Identifying these modules is essential to understand the organization of biological systems. RESULT: In this paper, we present a framework to identify modules within biological networks. In this approach, the concept of degree is extended from the single vertex to the sub-graph, and a formal definition of module in a network is used. A new agglomerative algorithm was developed to identify modules from the network by combining the new module definition with the relative edge order generated by the Girvan-Newman (G-N) algorithm. A JAVA program, MoNet, was developed to implement the algorithm. Applying MoNet to the yeast core protein interaction network from the database of interacting proteins (DIP) identified 86 simple modules with sizes larger than three proteins. The modules obtained are significantly enriched in proteins with related biological process Gene Ontology terms. A comparison between the MoNet modules and modules defined by Radicchi et al. (2004) indicates that MoNet modules show stronger co-clustering of related genes and are more robust to ties in betweenness values. Further, the MoNet output retains the adjacent relationships between modules and allows the construction of an interaction web of modules providing insight regarding the relationships between different functional modules. Thus, MoNet provides an objective approach to understand the organization and interactions of biological processes in cellular systems. AVAILABILITY: MoNet is available upon request from the authors.  相似文献   

9.
Recent advances in proteomics and computational biology have lead to a flood of protein interaction data and resulting interaction networks (e.g. (Gavin et al., 2002)). Here I first analyse the status and quality of parts lists (genes and proteins), then comparatively assess large-scale protein interaction data (von Mering et al., 2002) and finally try to identify biological meaningful units (e.g. pathways, cellular processes) within interaction networks that are derived from the conservation of gene neighborhood (Snel et al., 2002). Possible extensions of gene neighborhood analysis to eukaryotes (von Mering and Bork, 2002) will be discussed.  相似文献   

10.
11.
《Genomics》2020,112(2):1754-1760
Recently, lncRNAs have attracted accumulating attentions because more and more experimental researches have shown lncRNA can play critical roles in many biological processes. Predicting potential interactions between lncRNAs and proteins are key to understand the lncRNAs biological functions. But traditional biological experiments are expensive and time-consuming, network similarity methods provide a powerful solution to computationally predict lncRNA-protein interactions. In this work, a novel path-based lncRNA-protein interaction (PBLPI) prediction model is proposed by integrating protein semantic similarity, lncRNA functional similarity, known human lncRNA-protein interactions, and Gaussian interaction profile kernel similarity. PBLPI model utilizes three interlinked sub-graphs to construct a heterogeneous graph, and then infers potential lncRNA-protein interactions through depth-first search algorithm. Consequently, PBLPI achieves reliable performance in the frameworks of 5-fold cross validation (average AUC is 0.9244 and AUPR is 0.6478). In the case study, we use “Mus musculus” data to further validate the reliability of PBLPI method. It is anticipated that PBLPI would become a useful tool to identify potential lncRNA-protein interactions.  相似文献   

12.
Over the past decade there has been a growing acknowledgement that a large proportion of proteins within most proteomes contain disordered regions. Disordered regions are segments of the protein chain which do not adopt a stable structure. Recognition of disordered regions in a protein is of great importance for protein structure prediction, protein structure determination and function annotation as these regions have a close relationship with protein expression and functionality. As a result, a great many protein disorder prediction methods have been developed so far. Here, we present an overview of current protein disorder prediction methods including an analysis of their advantages and shortcomings. In order to help users to select alternative tools under different circumstances, we also evaluate 23 disorder predictors on the benchmark data of the most recent round of the Critical Assessment of protein Structure Prediction (CASP) and assess their accuracy using several complementary measures.  相似文献   

13.
14.
Genetic interaction analysis,in which two mutations have a combined effect not exhibited by either mutation alone, is a powerful and widespread tool for establishing functional linkages between genes. In the yeast Saccharomyces cerevisiae, ongoing screens have generated >4,800 such genetic interaction data. We demonstrate that by combining these data with information on protein-protein, prote in-DNA or metabolic networks, it is possible to uncover physical mechanisms behind many of the observed genetic effects. Using a probabilistic model, we found that 1,922 genetic interactions are significantly associated with either between- or within-pathway explanations encoded in the physical networks, covering approximately 40% of known genetic interactions. These models predict new functions for 343 proteins and suggest that between-pathway explanations are better than within-pathway explanations at interpreting genetic interactions identified in systematic screens. This study provides a road map for how genetic and physical interactions can be integrated to reveal pathway organization and function.  相似文献   

15.
A new computational approach for real protein folding prediction   总被引:4,自引:0,他引:4  
An effective and fast minimization approach is proposed for the prediction of protein folding, in which the 'relative entropy' is used as a minimization function and the off-lattice model is used. In this approach, we only use the information of distances between the consecutive Calpha atoms along the peptide chain and a generalized form of the contact potential for 20 types of amino acids. Tests of the algorithm are performed on the real proteins. The root mean square deviations of the structures of eight folded target proteins versus the native structures are in a reasonable range. In principle, this method is an improvement on the energy minimization approach.  相似文献   

16.

Background

Protein complexes play an important role in biological processes. Recent developments in experiments have resulted in the publication of many high-quality, large-scale protein-protein interaction (PPI) datasets, which provide abundant data for computational approaches to the prediction of protein complexes. However, the precision of protein complex prediction still needs to be improved due to the incompletion and noise in PPI networks.

Results

There exist complex and diverse relationships among proteins after integrating multiple sources of biological information. Considering that the influences of different types of interactions are not the same weight for protein complex prediction, we construct a multi-relationship protein interaction network (MPIN) by integrating PPI network topology with gene ontology annotation information. Then, we design a novel algorithm named MINE (identifying protein complexes based on Multi-relationship protein Interaction NEtwork) to predict protein complexes with high cohesion and low coupling from MPIN.

Conclusions

The experiments on yeast data show that MINE outperforms the current methods in terms of both accuracy and statistical significance.
  相似文献   

17.
To ensure signalling fidelity, kinases must act only on a defined subset of cellular targets. Appreciating the basis for this substrate specificity is essential for understanding the role of an individual protein kinase in a particular cellular process. The specificity in the cell is determined by a combination of "peptide specificity" of the kinase (the molecular recognition of the sequence surrounding the phosphorylation site), substrate recruitment and phosphatase activity. Peptide specificity plays a crucial role and depends on the complementarity between the kinase and the substrate and therefore on their three-dimensional structures. Methods for experimental identification of kinase substrates and characterization of specificity are expensive and laborious, therefore, computational approaches are being developed to reduce the amount of experimental work required in substrate identification. We discuss the structural basis of substrate specificity of protein kinases and review the experimental and computational methods used to obtain specificity information.  相似文献   

18.
MOTIVATION: Theoretical models of biological networks are valuable tools in evolutionary inference. Theoretical models based on gene duplication and divergence provide biologically plausible evolutionary mechanics. Similarities found between empirical networks and their theoretically generated counterpart are considered evidence of the role modeled mechanics play in biological evolution. However, the method by which these models are parameterized can lead to questions about the validity of the inferences. Selecting parameter values in order to produce a particular topological value obfuscates the possibility that the model may produce a similar topology for a large range of parameter values. Alternately, a model may produce a large range of topologies, allowing (incorrect) parameter values to produce a valid topology from an otherwise flawed model. In order to lend biological credence to the modeled evolutionary mechanics, parameter values should be derived from the empirical data. Furthermore, recent work indicates that the timing and fate of gene duplications are critical to proper derivation of these parameters. RESULTS: We present a methodology for deriving evolutionary rates from empirical data that is used to parameterize duplication and divergence models of protein interaction network evolution. Our method avoids shortcomings of previous methods, which failed to consider the effect of subsequent duplications. From our parameter values, we find that concurrent and existing existing duplication and divergence models are insufficient for modeling protein interaction network evolution. We introduce a model enhancement based on heritable interaction sites on the surface of a protein and find that it more closely reflects the high clustering found in the empirical network.  相似文献   

19.
Computational analysis of human protein interaction networks   总被引:4,自引:0,他引:4  
Large amounts of human protein interaction data have been produced by experiments and prediction methods. However, the experimental coverage of the human interactome is still low in contrast to predicted data. To gain insight into the value of publicly available human protein network data, we compared predicted datasets, high-throughput results from yeast two-hybrid screens, and literature-curated protein-protein interactions. This evaluation is not only important for further methodological improvements, but also for increasing the confidence in functional hypotheses derived from predictions. Therefore, we assessed the quality and the potential bias of the different datasets using functional similarity based on the Gene Ontology, structural iPfam domain-domain interactions, likelihood ratios, and topological network parameters. This analysis revealed major differences between predicted datasets, but some of them also scored at least as high as the experimental ones regarding multiple quality measures. Therefore, since only small pair wise overlap between most datasets is observed, they may be combined to enlarge the available human interactome data. For this purpose, we additionally studied the influence of protein length on data quality and the number of disease proteins covered by each dataset. We could further demonstrate that protein interactions predicted by more than one method achieve an elevated reliability.  相似文献   

20.
With an ever-increasing amount of available data on protein-protein interaction (PPI) networks and research revealing that these networks evolve at a modular level, discovery of conserved patterns in these networks becomes an important problem. Although available data on protein-protein interactions is currently limited, recently developed algorithms have been shown to convey novel biological insights through employment of elegant mathematical models. The main challenge in aligning PPI networks is to define a graph theoretical measure of similarity between graph structures that captures underlying biological phenomena accurately. In this respect, modeling of conservation and divergence of interactions, as well as the interpretation of resulting alignments, are important design parameters. In this paper, we develop a framework for comprehensive alignment of PPI networks, which is inspired by duplication/divergence models that focus on understanding the evolution of protein interactions. We propose a mathematical model that extends the concepts of match, mismatch, and gap in sequence alignment to that of match, mismatch, and duplication in network alignment and evaluates similarity between graph structures through a scoring function that accounts for evolutionary events. By relying on evolutionary models, the proposed framework facilitates interpretation of resulting alignments in terms of not only conservation but also divergence of modularity in PPI networks. Furthermore, as in the case of sequence alignment, our model allows flexibility in adjusting parameters to quantify underlying evolutionary relationships. Based on the proposed model, we formulate PPI network alignment as an optimization problem and present fast algorithms to solve this problem. Detailed experimental results from an implementation of the proposed framework show that our algorithm is able to discover conserved interaction patterns very effectively, in terms of both accuracies and computational cost.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号