首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 265 毫秒
1.
Computational analysis of human protein interaction networks   总被引:4,自引:0,他引:4  
Large amounts of human protein interaction data have been produced by experiments and prediction methods. However, the experimental coverage of the human interactome is still low in contrast to predicted data. To gain insight into the value of publicly available human protein network data, we compared predicted datasets, high-throughput results from yeast two-hybrid screens, and literature-curated protein-protein interactions. This evaluation is not only important for further methodological improvements, but also for increasing the confidence in functional hypotheses derived from predictions. Therefore, we assessed the quality and the potential bias of the different datasets using functional similarity based on the Gene Ontology, structural iPfam domain-domain interactions, likelihood ratios, and topological network parameters. This analysis revealed major differences between predicted datasets, but some of them also scored at least as high as the experimental ones regarding multiple quality measures. Therefore, since only small pair wise overlap between most datasets is observed, they may be combined to enlarge the available human interactome data. For this purpose, we additionally studied the influence of protein length on data quality and the number of disease proteins covered by each dataset. We could further demonstrate that protein interactions predicted by more than one method achieve an elevated reliability.  相似文献   

2.
随着基因组规模的高通量实验鉴定技术和计算预测方法的发展,出现了大量蛋白质相互作用数据,但大规模蛋白质相互作用数据中的较高比例的假阳性影响了相互作用数据的质量。生物信息学方法能够从已有的数据和知识出发,通过计算方法系统评估大规模蛋白质相互作用的可信度。本文从过程模型设计、数据集构建、特征选择与综合属性抽取、一些算法使用、实例概述等方面介绍了生物信息学方法评估蛋白质相互作用可信度的研究特点与进展。  相似文献   

3.
MOTIVATION: Recent screening techniques have made large amounts of protein-protein interaction data available, from which biologically important information such as the function of uncharacterized proteins, the existence of novel protein complexes, and novel signal-transduction pathways can be discovered. However, experimental data on protein interactions contain many false positives, making these discoveries difficult. Therefore computational methods of assessing the reliability of each candidate protein-protein interaction are urgently needed. RESULTS: We developed a new 'interaction generality' measure (IG2) to assess the reliability of protein-protein interactions using only the topological properties of their interaction-network structure. Using yeast protein-protein interaction data, we showed that reliable protein-protein interactions had significantly lower IG2 values than less-reliable interactions, suggesting that IG2 values can be used to evaluate and filter interaction data to enable the construction of reliable protein-protein interaction networks.  相似文献   

4.
Recently a number of computational approaches have been developed for the prediction of protein–protein interactions. Complete genome sequencing projects have provided the vast amount of information needed for these analyses. These methods utilize the structural, genomic, and biological context of proteins and genes in complete genomes to predict protein interaction networks and functional linkages between proteins. Given that experimental techniques remain expensive, time-consuming, and labor-intensive, these methods represent an important advance in proteomics. Some of these approaches utilize sequence data alone to predict interactions, while others combine multiple computational and experimental datasets to accurately build protein interaction maps for complete genomes. These methods represent a complementary approach to current high-throughput projects whose aim is to delineate protein interaction maps in complete genomes. We will describe a number of computational protocols for protein interaction prediction based on the structural, genomic, and biological context of proteins in complete genomes, and detail methods for protein interaction network visualization and analysis.  相似文献   

5.
《Genomics》2020,112(2):1754-1760
Recently, lncRNAs have attracted accumulating attentions because more and more experimental researches have shown lncRNA can play critical roles in many biological processes. Predicting potential interactions between lncRNAs and proteins are key to understand the lncRNAs biological functions. But traditional biological experiments are expensive and time-consuming, network similarity methods provide a powerful solution to computationally predict lncRNA-protein interactions. In this work, a novel path-based lncRNA-protein interaction (PBLPI) prediction model is proposed by integrating protein semantic similarity, lncRNA functional similarity, known human lncRNA-protein interactions, and Gaussian interaction profile kernel similarity. PBLPI model utilizes three interlinked sub-graphs to construct a heterogeneous graph, and then infers potential lncRNA-protein interactions through depth-first search algorithm. Consequently, PBLPI achieves reliable performance in the frameworks of 5-fold cross validation (average AUC is 0.9244 and AUPR is 0.6478). In the case study, we use “Mus musculus” data to further validate the reliability of PBLPI method. It is anticipated that PBLPI would become a useful tool to identify potential lncRNA-protein interactions.  相似文献   

6.
Local network alignment is an important component of the analysis of protein-protein interaction networks that may lead to the identification of evolutionary related complexes. We present AlignNemo, a new algorithm that, given the networks of two organisms, uncovers subnetworks of proteins that relate in biological function and topology of interactions. The discovered conserved subnetworks have a general topology and need not to correspond to specific interaction patterns, so that they more closely fit the models of functional complexes proposed in the literature. The algorithm is able to handle sparse interaction data with an expansion process that at each step explores the local topology of the networks beyond the proteins directly interacting with the current solution. To assess the performance of AlignNemo, we ran a series of benchmarks using statistical measures as well as biological knowledge. Based on reference datasets of protein complexes, AlignNemo shows better performance than other methods in terms of both precision and recall. We show our solutions to be biologically sound using the concept of semantic similarity applied to Gene Ontology vocabularies. The binaries of AlignNemo and supplementary details about the algorithms and the experiments are available at: sourceforge.net/p/alignnemo.  相似文献   

7.
MOTIVATION: Biological processes in cells are properly performed by gene regulations, signal transductions and interactions between proteins. To understand such molecular networks, we propose a statistical method to estimate gene regulatory networks and protein-protein interaction networks simultaneously from DNA microarray data, protein-protein interaction data and other genome-wide data. RESULTS: We unify Bayesian networks and Markov networks for estimating gene regulatory networks and protein-protein interaction networks according to the reliability of each biological information source. Through the simultaneous construction of gene regulatory networks and protein-protein interaction networks of Saccharomyces cerevisiae cell cycle, we predict the role of several genes whose functions are currently unknown. By using our probabilistic model, we can detect false positives of high-throughput data, such as yeast two-hybrid data. In a genome-wide experiment, we find possible gene regulatory relationships and protein-protein interactions between large protein complexes that underlie complex regulatory mechanisms of biological processes.  相似文献   

8.
High-throughput methods for detecting protein interactions, such as mass spectrometry and yeast two-hybrid assays, continue to produce vast amounts of data that may be exploited to infer protein function and regulation. As this article went to press, the pool of all published interaction information on Saccharomyces cerevisiae was 15,143 interactions among 4,825 proteins, and power-law scaling supports an estimate of 20,000 specific protein interactions. To investigate the biases, overlaps, and complementarities among these data, we have carried out an analysis of two high-throughput mass spectrometry (HMS)-based protein interaction data sets from budding yeast, comparing them to each other and to other interaction data sets. Our analysis reveals 198 interactions among 222 proteins common to both data sets, many of which reflect large multiprotein complexes. It also indicates that a "spoke" model that directly pairs bait proteins with associated proteins is roughly threefold more accurate than a "matrix" model that connects all proteins. In addition, we identify a large, previously unsuspected nucleolar complex of 148 proteins, including 39 proteins of unknown function. Our results indicate that existing large-scale protein interaction data sets are nonsaturating and that integrating many different experimental data sets yields a clearer biological view than any single method alone.  相似文献   

9.
Chen Y  Xu D 《Nucleic acids research》2004,32(21):6414-6424
As we are moving into the post genome-sequencing era, various high-throughput experimental techniques have been developed to characterize biological systems on the genomic scale. Discovering new biological knowledge from the high-throughput biological data is a major challenge to bioinformatics today. To address this challenge, we developed a Bayesian statistical method together with Boltzmann machine and simulated annealing for protein functional annotation in the yeast Saccharomyces cerevisiae through integrating various high-throughput biological data, including yeast two-hybrid data, protein complexes and microarray gene expression profiles. In our approach, we quantified the relationship between functional similarity and high-throughput data, and coded the relationship into ‘functional linkage graph’, where each node represents one protein and the weight of each edge is characterized by the Bayesian probability of function similarity between two proteins. We also integrated the evolution information and protein subcellular localization information into the prediction. Based on our method, 1802 out of 2280 unannotated proteins in yeast were assigned functions systematically.  相似文献   

10.
The interactions between proteins allow the cell's life. A number of experimental, genome-wide, high-throughput studies have been devoted to the determination of protein-protein interactions and the consequent interaction networks. Here, the bioinformatics methods dealing with protein-protein interactions and interaction network are overviewed. 1. Interaction databases developed to collect and annotate this immense amount of data; 2. Automated data mining techniques developed to extract information about interactions from the published literature; 3. Computational methods to assess the experimental results developed as a consequence of the finding that the results of high-throughput methods are rather inaccurate; 4. Exploitation of the information provided by protein interaction networks in order to predict functional features of the proteins; and 5. Prediction of protein-protein interactions.  相似文献   

11.
Wu X  Zhu L  Guo J  Zhang DY  Lin K 《Nucleic acids research》2006,34(7):2137-2150
A map of protein–protein interactions provides valuable insight into the cellular function and machinery of a proteome. By measuring the similarity between two Gene Ontology (GO) terms with a relative specificity semantic relation, here, we proposed a new method of reconstructing a yeast protein–protein interaction map that is solely based on the GO annotations. The method was validated using high-quality interaction datasets for its effectiveness. Based on a Z-score analysis, a positive dataset and a negative dataset for protein–protein interactions were derived. Moreover, a gold standard positive (GSP) dataset with the highest level of confidence that covered 78% of the high-quality interaction dataset and a gold standard negative (GSN) dataset with the lowest level of confidence were derived. In addition, we assessed four high-throughput experimental interaction datasets using the positives and the negatives as well as GSPs and GSNs. Our predicted network reconstructed from GSPs consists of 40753 interactions among 2259 proteins, and forms 16 connected components. We mapped all of the MIPS complexes except for homodimers onto the predicted network. As a result, ~35% of complexes were identified interconnected. For seven complexes, we also identified some nonmember proteins that may be functionally related to the complexes concerned. This analysis is expected to provide a new approach for predicting the protein–protein interaction maps from other completely sequenced genomes with high-quality GO-based annotations.  相似文献   

12.
Colland F  Daviet L 《Biochimie》2004,86(9-10):625-632
Functional proteomics is a promising technique for the rational identification of novel therapeutic targets by elucidation of the function of newly identified proteins in disease-relevant cellular pathways. Of the recently described high-throughput approaches for analyzing protein-protein interactions, the yeast two-hybrid (Y2H) system has turned out to be one of the most suitable for genome-wide analysis. However, this system presents a challenging technical problem: the high prevalence of false positives and false negatives in datasets due to intrinsic limitations of the technology and the use of a high-throughput, genetic assay. We discuss here the different experimental strategies applied to Y2H assays, their general limitations and advantages. We also address the issue of the contribution of protein interaction mapping to functional biology, especially when combined with complementary genomic and proteomic analyses. Finally, we illustrate how the combination of protein interaction maps with relevant functional assays can provide biological support to large-scale protein interaction datasets and contribute to the identification and validation of potential therapeutic targets.  相似文献   

13.
14.
We characterized and evaluated the functional attributes of three yeast high-confidence protein-protein interaction data sets derived from affinity purification/mass spectrometry, protein-fragment complementation assay, and yeast two-hybrid experiments. The interacting proteins retrieved from these data sets formed distinct, partially overlapping sets with different protein-protein interaction characteristics. These differences were primarily a function of the deployed experimental technologies used to recover these interactions. This affected the total coverage of interactions and was especially evident in the recovery of interactions among different functional classes of proteins. We found that the interaction data obtained by the yeast two-hybrid method was the least biased toward any particular functional characterization. In contrast, interacting proteins in the affinity purification/mass spectrometry and protein-fragment complementation assay data sets were over- and under-represented among distinct and different functional categories. We delineated how these differences affected protein complex organization in the network of interactions, in particular for strongly interacting complexes (e.g. RNA and protein synthesis) versus weak and transient interacting complexes (e.g. protein transport). We quantified methodological differences in detecting protein interactions from larger protein complexes, in the correlation of protein abundance among interacting proteins, and in their connectivity of essential proteins. In the latter case, we showed that minimizing inherent methodology biases removed many of the ambiguous conclusions about protein essentiality and protein connectivity. We used these findings to rationalize how biological insights obtained by analyzing data sets originating from different sources sometimes do not agree or may even contradict each other. An important corollary of this work was that discrepancies in biological insights did not necessarily imply that one detection methodology was better or worse, but rather that, to a large extent, the insights reflected the methodological biases themselves. Consequently, interpreting the protein interaction data within their experimental or cellular context provided the best avenue for overcoming biases and inferring biological knowledge.  相似文献   

15.
Here we introduce the ‘interaction generality’ measure, a new method for computationally assessing the reliability of protein–protein interactions obtained in biological experiments. This measure is basically the number of proteins involved in a given interaction and also adopts the idea that interactions observed in a complicated interaction network are likely to be true positives. Using a group of yeast protein–protein interactions identified in various biological experiments, we show that interactions with low generalities are more likely to be reproducible in other independent assays. We constructed more reliable networks by eliminating interactions whose generalities were above a particular threshold. The rate of interactions with common cellular roles increased from 63% in the unadjusted estimates to 79% in the refined networks. As a result, the rate of cross-talk between proteins with different cellular roles decreased, enabling very clear predictions of the functions of some unknown proteins. The results suggest that the interaction generality measure will make interaction data more useful in all organisms and may yield insights into the biological roles of the proteins studied.  相似文献   

16.
Experimental high-throughput studies of protein-protein interactions are beginning to provide enough data for comprehensive computational studies. Today, about ten large data sets, each with thousands of interacting pairs, coarsely sample the interactions in fly, human, worm, and yeast. Another about 55,000 pairs of interacting proteins have been identified by more careful, detailed biochemical experiments. Most interactions are experimentally observed in prokaryotes and simple eukaryotes; very few interactions are observed in higher eukaryotes such as mammals. It is commonly assumed that pathways in mammals can be inferred through homology to model organisms, e.g. the experimental observation that two yeast proteins interact is transferred to infer that the two corresponding proteins in human also interact. Two pairs for which the interaction is conserved are often described as interologs. The goal of this investigation was a large-scale comprehensive analysis of such inferences, i.e. of the evolutionary conservation of interologs. Here, we introduced a novel score for measuring the overlap between protein-protein interaction data sets. This measure appeared to reflect the overall quality of the data and was the basis for our two surprising results from our large-scale analysis. Firstly, homology-based inferences of physical protein-protein interactions appeared far less successful than expected. In fact, such inferences were accurate only for extremely high levels of sequence similarity. Secondly, and most surprisingly, the identification of interacting partners through sequence similarity was significantly more reliable for protein pairs within the same organism than for pairs between species. Our analysis underlined that the discrepancies between different datasets are large, even when using the same type of experiment on the same organism. This reality considerably constrains the power of homology-based transfer of interactions. In particular, the experimental probing of interactions in distant model organisms has to be undertaken with some caution. More comprehensive images of protein-protein networks will require the combination of many high-throughput methods, including in silico inferences and predictions. http://www.rostlab.org/results/2006/ppi_homology/  相似文献   

17.
Comprehensive understanding of biological systems requires efficient and systematic assimilation of high-throughput datasets in the context of the existing knowledge base. A major limitation in the field of proteomics is the lack of an appropriate software platform that can synthesize a large number of experimental datasets in the context of the existing knowledge base. Here, we describe a software platform, termed PROTEOME-3D, that utilizes three essential features for systematic analysis of proteomics data: creation of a scalable, queryable, customized database for identified proteins from published literature; graphical tools for displaying proteome landscapes and trends from multiple large-scale experiments; and interactive data analysis that facilitates identification of crucial networks and pathways. Thus, PROTEOME-3D offers a standardized platform to analyze high-throughput experimental datasets for the identification of crucial players in co-regulated pathways and cellular processes.  相似文献   

18.
Predicting protein functions computationally from massive protein-protein interaction (PPI) data generated by high-throughput technology is one of the challenges and fundamental problems in the post-genomic era. Although there have been many approaches developed for computationally predicting protein functions, the mutual correlations among proteins in terms of protein functions have not been thoroughly investigated and incorporated into existing prediction methods, especially in voting based prediction methods. In this paper, we propose an innovative method to predict protein functions from PPI data by aggregating the functional correlations among relevant proteins using the Choquet-Integral in fuzzy theory. This functional aggregation measures the real impact of each relevant protein function on the final prediction results, and reduces the impact of repeated functional information on the prediction. Accordingly, a new protein similarity and a new iterative prediction algorithm are proposed in this paper. The experimental evaluations on real PPI datasets demonstrate the effectiveness of our method.  相似文献   

19.
The investigation of the interplay between genes, proteins, metabolites and diseases plays a central role in molecular and cellular biology. Whole genome sequencing has made it possible to examine the behavior of all the genes in a genome by high-throughput experimental techniques and to pinpoint molecular interactions on a genome-wide scale, which form the backbone of systems biology. In particular, Bayesian network (BN) is a powerful tool for the ab-initial identification of causal and non-causal relationships between biological factors directly from experimental data. However, scalability is a crucial issue when we try to apply BNs to infer such interactions. In this paper, we not only introduce the Bayesian network formalism and its applications in systems biology, but also review recent technical developments for scaling up or speeding up the structural learning of BNs, which is important for the discovery of causal knowledge from large-scale biological datasets. Specifically, we highlight the basic idea, relative pros and cons of each technique and discuss possible ways to combine different algorithms towards making BN learning more accurate and much faster.  相似文献   

20.
Many methods developed for estimating the reliability of protein–protein interactions are based on the topology of protein–protein interaction networks. This paper describes a new reliability measure for protein–protein interactions, which does not rely on the topology of protein interaction networks, but expresses biological information on functional roles, sub-cellular localisations and protein classes as a scoring schema. The new measure is useful for filtering many spurious interactions, as well as for estimating the reliability of protein interaction data. In particular, the reliability measure can be used to search protein–protein interactions with the desired reliability in databases. The reliability-based search engine is available at http://yeast.hpid.org. We believe this is the first search engine for interacting proteins, which is made available to public. The search engine and the reliability measure of protein interactions should provide useful information for determining proteins to focus on.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号