共查询到20条相似文献,搜索用时 0 毫秒
1.
Widely predicting specific protein functions based on protein-protein interaction data and gene expression profile 下载免费PDF全文
GESTs (gene expression similarity and taxonomy similarity), a gene functional prediction approach previously proposed by us,
is based on gene expression similarity and concept similarity of functional classes defined in Gene Ontology (GO). In this
paper, we extend this method to protein-protein interaction data by introducing several methods to filter the neighbors in
protein interaction networks for a protein of unknown function(s). Unlike other conventional methods, the proposed approach
automatically selects the most appropriate functional classes as specific as possible during the learning process, and calls
on genes annotated to nearby classes to support the predictions to some small-sized specific classes in GO. Based on the yeast
protein-protein interaction information from MIPS and a dataset of gene expression profiles, we assess the performances of
our approach for predicting protein functions to “biology process” by three measures particularly designed for functional
classes organized in GO. Results show that our method is powerful for widely predicting gene functions with very specific
functional terms. Based on the GO database published in December 2004, we predict some proteins whose functions were unknown
at that time, and some of the predictions have been confirmed by the new SGD annotation data published in April, 2006. 相似文献
2.
Widely predicting specific protein functions based on protein-protein interaction data and gene expression profile 总被引:1,自引:2,他引:1
GESTs (gene expression similarity and taxonomy similarity), a gene functional prediction approach previously proposed by us, is based on gene expression similarity and concept similarity of functional classes defined in Gene Ontology (GO). In this paper, we extend this method to protein-protein interac-tion data by introducing several methods to filter the neighbors in protein interaction networks for a protein of unknown function(s). Unlike other conventional methods, the proposed approach automati-cally selects the most appropriate functional classes as specific as possible during the learning proc-ess, and calls on genes annotated to nearby classes to support the predictions to some small-sized specific classes in GO. Based on the yeast protein-protein interaction information from MIPS and a dataset of gene expression profiles, we assess the performances of our approach for predicting protein functions to “biology process” by three measures particularly designed for functional classes organ-ized in GO. Results show that our method is powerful for widely predicting gene functions with very specific functional terms. Based on the GO database published in December 2004, we predict some proteins whose functions were unknown at that time, and some of the predictions have been confirmed by the new SGD annotation data published in April, 2006. 相似文献
3.
Background
Genome sequencing projects generate massive amounts of sequence data but there are still many proteins whose functions remain unknown. The availability of large scale protein-protein interaction data sets makes it possible to develop new function prediction methods based on protein-protein interaction (PPI) networks. Although several existing methods combine multiple information resources, there is no study that integrates protein domain information and PPI networks to predict protein functions. 相似文献4.
ABSTRACT: BACKGROUND: Identification of essential proteins plays a significant role in understanding minimal requirements for the cellular survival and development. Many computational methods have been proposed for predicting essential proteins by using the topological features of protein-protein interaction (PPI) networks. However, most of these methods ignored intrinsic biological meaning of proteins. Moreover, PPI data contains many false positives and false negatives. To overcome these limitations, recently many research groups have started to focus on identification of essential proteins by integrating PPI networks with other biological information. However, none of their methods has widely been acknowledged. RESULTS: By considering the facts that essential proteins are more evolutionarily conserved than nonessential proteins and essential proteins frequently bind each other, we propose an iteration method for predicting essential proteins by integrating the orthology with PPI networks, named by ION. Differently from other methods, ION identifies essential proteins depending on not only the connections between proteins but also their orthologous properties and features of their neighbors. ION is implemented to predict essential proteins in S. cerevisiae. Experimental results show that ION can achieve higher identification accuracy than eight other existing centrality methods in terms of area under the curve (AUC). Moreover, ION identifies a large amount of essential proteins which have been ignored by eight other existing centrality methods because of their low-connectivity. Many proteins ranked in top 100 by ION are both essential and belong to the complexes with certain biological functions. Furthermore, no matter how many reference organisms were selected, ION outperforms all eight other existing centrality methods. While using as many as possible reference organisms can improve the performance of ION. Additionally, ION also shows good prediction performance in E.Coli K-12. CONCLUSIONS: The accuracy of predicting essential proteins can be improved by integrating the orthology with PPI networks. 相似文献
5.
Wei Peng Jianxin Wang Weiping Wang Qing Liu Fang-Xiang Wu Yi Pan 《BMC systems biology》2012,6(1):1-17
Background
Understanding the information-processing capabilities of signal transduction networks, how those networks are disrupted in disease, and rationally designing therapies to manipulate diseased states require systematic and accurate reconstruction of network topology. Data on networks central to human physiology, such as the inflammatory signalling networks analyzed here, are found in a multiplicity of on-line resources of pathway and interactome databases (Cancer CellMap, GeneGo, KEGG, NCI-Pathway Interactome Database (NCI-PID), PANTHER, Reactome, I2D, and STRING). We sought to determine whether these databases contain overlapping information and whether they can be used to construct high reliability prior knowledge networks for subsequent modeling of experimental data.Results
We have assembled an ensemble network from multiple on-line sources representing a significant portion of all machine-readable and reconcilable human knowledge on proteins and protein interactions involved in inflammation. This ensemble network has many features expected of complex signalling networks assembled from high-throughput data: a power law distribution of both node degree and edge annotations, and topological features of a ??bow tie?? architecture in which diverse pathways converge on a highly conserved set of enzymatic cascades focused around PI3K/AKT, MAPK/ERK, JAK/STAT, NF??B, and apoptotic signaling. Individual pathways exhibit ??fuzzy?? modularity that is statistically significant but still involving a majority of ??cross-talk?? interactions. However, we find that the most widely used pathway databases are highly inconsistent with respect to the actual constituents and interactions in this network. Using a set of growth factor signalling networks as examples (epidermal growth factor, transforming growth factor-beta, tumor necrosis factor, and wingless), we find a multiplicity of network topologies in which receptors couple to downstream components through myriad alternate paths. Many of these paths are inconsistent with well-established mechanistic features of signalling networks, such as a requirement for a transmembrane receptor in sensing extracellular ligands.Conclusions
Wide inconsistencies among interaction databases, pathway annotations, and the numbers and identities of nodes associated with a given pathway pose a major challenge for deriving causal and mechanistic insight from network graphs. We speculate that these inconsistencies are at least partially attributable to cell, and context-specificity of cellular signal transduction, which is largely unaccounted for in available databases, but the absence of standardized vocabularies is an additional confounding factor. As a result of discrepant annotations, it is very difficult to identify biologically meaningful pathways from interactome networks a priori. However, by incorporating prior knowledge, it is possible to successively build out network complexity with high confidence from a simple linear signal transduction scaffold. Such reduced complexity networks appear suitable for use in mechanistic models while being richer and better justified than the simple linear pathways usually depicted in diagrams of signal transduction. 相似文献6.
Protein-protein interactions (PPIs) are crucial to most biochemical processes in human beings. Although many human PPIs have been identified by experiments, the number is still limited compared to the available protein sequences of human organisms. Recently, many computational methods have been proposed to facilitate the recognition of novel human PPIs. However the existing methods only concentrated on the information of individual PPI, while the systematic characteristic of protein-protein interaction networks (PINs) was ignored. In this study, a new method was proposed by combining the global information of PINs and protein sequence information. Random forest (RF) algorithm was implemented to develop the prediction model, and a high accuracy of 91.88% was obtained. Furthermore, the RF model was tested using three independent datasets with good performances, suggesting that our method is a useful tool for identification of PPIs and investigation into PINs as well. 相似文献
7.
Determining protein function is one of the most challenging problems of the post-genomic era. The availability of entire genome sequences and of high-throughput capabilities to determine gene coexpression patterns has shifted the research focus from the study of single proteins or small complexes to that of the entire proteome. In this context, the search for reliable methods for assigning protein function is of primary importance. There are various approaches available for deducing the function of proteins of unknown function using information derived from sequence similarity or clustering patterns of co-regulated genes, phylogenetic profiles, protein-protein interactions (refs. 5-8 and Samanta, M.P. and Liang, S., unpublished data), and protein complexes. Here we propose the assignment of proteins to functional classes on the basis of their network of physical interactions as determined by minimizing the number of protein interactions among different functional categories. Function assignment is proteome-wide and is determined by the global connectivity pattern of the protein network. The approach results in multiple functional assignments, a consequence of the existence of multiple equivalent solutions. We apply the method to analyze the yeast Saccharomyces cerevisiae protein-protein interaction network. The robustness of the approach is tested in a system containing a high percentage of unclassified proteins and also in cases of deletion and insertion of specific protein interactions. 相似文献
8.
We introduce clustering with overlapping neighborhood expansion (ClusterONE), a method for detecting potentially overlapping protein complexes from protein-protein interaction data. ClusterONE-derived complexes for several yeast data sets showed better correspondence with reference complexes in the Munich Information Center for Protein Sequence (MIPS) catalog and complexes derived from the Saccharomyces Genome Database (SGD) than the results of seven popular methods. The results also showed a high extent of functional homogeneity. 相似文献
9.
Background
Identifying associated phenotypes of proteins is a challenge of the modern genetics since the multifactorial trait often results from contributions of many proteins. Besides the high-through phenotype assays, the computational methods are alternative ways to identify the phenotypes of proteins.Methodology/Principal Findings
Here, we proposed a new method for predicting protein phenotypes in yeast based on protein-protein interaction network. Instead of only the most likely phenotype, a series of possible phenotypes for the query protein were generated and ranked acording to the tethering potential score. As a result, the first order prediction accuracy of our method achieved 65.4% evaluated by Jackknife test of 1,267 proteins in budding yeast, much higher than the success rate (15.4%) of a random guess. And the likelihood of the first 3 predicted phenotypes including all the real phenotypes of the proteins was 70.6%.Conclusions/Significance
The candidate phenotypes predicted by our method provided useful clues for the further validation. In addition, the method can be easily applied to the prediction of protein associated phenotypes in other organisms. 相似文献10.
Chen-Ching Lin Jen-Tsung Hsiang Chia-Yi Wu Yen-Jen Oyang Hsueh-Fen Juan Hsuan-Cheng Huang 《BMC systems biology》2010,4(1):138
Background
Molecular networks represent the backbone of molecular activity within cells and provide opportunities for understanding the mechanism of diseases. While protein-protein interaction data constitute static network maps, integration of condition-specific co-expression information provides clues to the dynamic features of these networks. Dilated cardiomyopathy is a leading cause of heart failure. Although previous studies have identified putative biomarkers or therapeutic targets for heart failure, the underlying molecular mechanism of dilated cardiomyopathy remains unclear. 相似文献11.
Background
Proteins dynamically interact with each other to perform their biological functions. The dynamic operations of protein interaction networks (PPI) are also reflected in the dynamic formations of protein complexes. Existing protein complex detection algorithms usually overlook the inherent temporal nature of protein interactions within PPI networks. Systematically analyzing the temporal protein complexes can not only improve the accuracy of protein complex detection, but also strengthen our biological knowledge on the dynamic protein assembly processes for cellular organization.Results
In this study, we propose a novel computational method to predict temporal protein complexes. Particularly, we first construct a series of dynamic PPI networks by joint analysis of time-course gene expression data and protein interaction data. Then a Time Smooth Overlapping Complex Detection model (TS-OCD) has been proposed to detect temporal protein complexes from these dynamic PPI networks. TS-OCD can naturally capture the smoothness of networks between consecutive time points and detect overlapping protein complexes at each time point. Finally, a nonnegative matrix factorization based algorithm is introduced to merge those very similar temporal complexes across different time points.Conclusions
Extensive experimental results demonstrate the proposed method is very effective in detecting temporal protein complexes than the state-of-the-art complex detection techniques.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2105-15-335) contains supplementary material, which is available to authorized users. 相似文献12.
A taxonomy of organ-specific breast cancer metastases based on a protein-protein interaction network
Sanz-Pamplona R García-García J Franco S Messeguer X Driouch K Oliva B Sierra A 《Molecular bioSystems》2012,8(8):2085-2096
We carried out a systems-level study of the mechanisms underlying organ-specific metastases of breast cancer. We followed a network-based approach using microarray expression data from human breast cancer metastases to select organ-specific proteins that exert a range of functions allowing cell survival and growth in the microenvironment of distant organs. MinerProt, a home-made software application, was used to group organ-specific signatures of brain (1191 genes), bone (1623 genes), liver (977 genes) and lung (254 genes) metastases by function and select the most differentially expressed gene in each function. As a result, we obtained 19 functional representative proteins in brain, 23 in bone, 15 in liver and 9 in lung, with which we constructed four organ-specific protein-protein interaction networks. The network taxonomy included seven proteins that interacted in brain metastasis, which were mainly associated with signal transduction. Proteins related to immune response functions were bone specific, while those involved in proteolysis, signal transduction and hepatic glucose metabolism were found in liver metastasis. No experimental protein-protein interaction was found in lung metastasis; thus, computationally determined interactions were included in this network. Moreover, three of these selected genes (CXCL12, DSC2 and TFDP2) were associated with progression to specific organs when tested in an independent dataset. In conclusion, we present a network-based approach to filter information by selecting key protein functions as metastatic markers or therapeutic targets. 相似文献
13.
Background
With the huge amount of uncharacterized protein sequences generated in the post-genomic age, it is highly desirable to develop effective computational methods for quickly and accurately predicting their functions. The information thus obtained would be very useful for both basic research and drug development in a timely manner.Methodology/Principal Findings
Although many efforts have been made in this regard, most of them were based on either sequence similarity or protein-protein interaction (PPI) information. However, the former often fails to work if a query protein has no or very little sequence similarity to any function-known proteins, while the latter had similar problem if the relevant PPI information is not available. In view of this, a new approach is proposed by hybridizing the PPI information and the biochemical/physicochemical features of protein sequences. The overall first-order success rates by the new predictor for the functions of mouse proteins on training set and test set were 69.1% and 70.2%, respectively, and the success rate covered by the results of the top-4 order from a total of 24 orders was 65.2%.Conclusions/Significance
The results indicate that the new approach is quite promising that may open a new avenue or direction for addressing the difficult and complicated problem. 相似文献14.
Proteins carry out their functions by interacting with other proteins and small molecules, forming a complex interaction network. In this review, we briefly introduce classical graph theory based protein-protein interaction networks. We also describe the commonly used experimental methods to construct these networks, and the insights that can be gained from these networks. We then discuss the recent transition from graph theory based networks to structure based protein-protein interaction networks and the advantages of the latter over the former, using two networks as examples. We further discuss the usefulness of structure based protein-protein interaction networks for drug discovery, with a special emphasis on drug repositioning. 相似文献
15.
Identifying protein complexes based on density and modularity in protein-protein interaction network
Background
Identifying protein complexes is crucial to understanding principles of cellular organization and functional mechanisms. As many evidences have indicated that the subgraphs with high density or with high modularity in PPI network usually correspond to protein complexes, protein complexes detection methods based on PPI network focused on subgraph's density or its modularity in PPI network. However, dense subgraphs may have low modularity and subgraph with high modularity may have low density, which results that protein complexes may be subgraphs with low modularity or with low density in the PPI network. As the density-based methods are difficult to mine protein complexes with low density, and the modularity-based methods are difficult to mine protein complexes with low modularity, both two methods have limitation for identifying protein complexes with various density and modularity.Results
To identify protein complexes with various density and modularity, including those have low density but high modularity and those have low modularity but high density, we define a novel subgraph's fitness, f ρ , as f ρ = (density) ρ *(modularity)1-ρ, and propose a novel algorithm, named LF_PIN, to identify protein complexes by expanding seed edges to subgraphs with the local maximum fitness value. Experimental results of LF-PIN in S.cerevisiae show that compared with the results of fitness equal to density (ρ = 1) or equal to modularity (ρ = 0), the LF-PIN identifies known protein complexes more effectively when the fitness value is decided by both density and modularity (0<ρ<1). Compared with the results of seven competing protein complex detection methods (CMC, Core-Attachment, CPM, DPClus, HC-PIN, MCL, and NFC) in S.cerevisiae and E.coli, LF-PIN outperforms other seven methods in terms of matching with known complexes and functional enrichment. Moreover, LF-PIN has better performance in identifying protein complexes with low density or with low modularity.Conclusions
By considering both the density and the modularity, LF-PIN outperforms other protein complexes detection methods that only consider density or modularity, especially in identifying known protein complexes with low density or low modularity.16.
Background
Protein-protein interactions are critical for cellular functions. Recently developed computational approaches for predicting protein-protein interactions utilize co-evolutionary information of the interacting partners, e.g., correlations between distance matrices, where each matrix stores the pairwise distances between a protein and its orthologs from a group of reference genomes. 相似文献17.
In this paper, we present a method for core-attachment complexes identification based on maximal frequent patterns (CCiMFP) in yeast protein-protein interaction (PPI) networks. First, we detect subgraphs with high degree as candidate protein cores by mining maximal frequent patterns. Then using topological and functional similarities, we combine highly similar protein cores and filter insignificant ones. Finally, the core-attachment complexes are formed by adding attachment proteins to each significant core. We experimentally evaluate the performance of our method CCiMFP on yeast PPI networks. Using gold standard sets of protein complexes, Gene Ontology (GO), and localization annotations, we show that our method gains an improvement over the previous algorithms in terms of precision, recall, and biological significance of the predicted complexes. The colocalization scores of our predicted complex sets are higher than those of two known complex sets. Moreover, our method can detect GO-enriched complexes with disconnected cores compared with other methods based on the subgraph connectivity. 相似文献
18.
《Genomics》2020,112(1):837-847
BackgroundGlioma is the most lethal nervous system cancer. Recent studies have made great efforts to study the occurrence and development of glioma, but the molecular mechanisms are still unclear. This study was designed to reveal the molecular mechanisms of glioma based on protein-protein interaction network combined with machine learning methods. Key differentially expressed genes (DEGs) were screened and selected by using the protein-protein interaction (PPI) networks.ResultsAs a result, 19 genes between grade I and grade II, 21 genes between grade II and grade III, and 20 genes between grade III and grade IV. Then, five machine learning methods were employed to predict the gliomas stages based on the selected key genes. After comparison, Complement Naive Bayes classifier was employed to build the prediction model for grade II-III with accuracy 72.8%. And Random forest was employed to build the prediction model for grade I-II and grade III-VI with accuracy 97.1% and 83.2%, respectively. Finally, the selected genes were analyzed by PPI networks, Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, and the results improve our understanding of the biological functions of select DEGs involved in glioma growth. We expect that the key genes expressed have a guiding significance for the occurrence of gliomas or, at the very least, that they are useful for tumor researchers.ConclusionMachine learning combined with PPI networks, GO and KEGG analyses of selected DEGs improve our understanding of the biological functions involved in glioma growth. 相似文献
19.
Protein-protein interaction networks are typically built with interactions collated from many experiments. These networks are thus composite and show all interactions that are currently known to occur in a cell. However, these representations are static and ignore the constant changes in protein-protein interactions. Here we present software for the generation and analysis of dynamic, four-dimensional (4-D) protein interaction networks. In this, time-course-derived abundance data are mapped onto three-dimensional networks to generate network movies. These networks can be navigated, manipulated and queried in real time. Two types of dynamic networks can be generated: a 4-D network that maps expression data onto protein nodes and one that employs 'real-time rendering' by which protein nodes and their interactions appear and disappear in association with temporal changes in expression data. We illustrate the utility of this software by the analysis of singlish interface date hub interactions during the yeast cell cycle. In this, we show that proteins MLC1 and YPT52 show strict temporal control of when their interaction partners are expressed. Since these proteins have one and two interaction interfaces, respectively, it suggests that temporal control of gene expression may be used to limit competition at the interaction interfaces of some hub proteins. The software and movies of the 4-D networks are available at http://www.systemsbiology.org.au/downloads_geomi.html. 相似文献
20.
Even though a rough sketch of the human genome is now available and the number of newly discovered genes, which carry the potential of being biologically and medically relevant is currently greater than ever, only a small proportion has been assigned a biological function. Therefore, enormous attention is now increasingly being drawn towards functional genomics, i.e. the functional characterization of these newly identified sequences. In order to elucidate the role of a particular gene product within its cellular context, we have screened high-density protein filter arrays for protein-protein interactions on the basis of a 'Far-Western' based approach. The methodology described herein easily allows the identification and isolation of cDNAs of proteins, which interact with specific ligands (interacting proteins, antibodies and DNA/RNA sequences), and represents an alternative to tedious conventional protein interaction analyses. Far-Western screening in the context of a whole-genome expression analysis not only facilitates the assignment of biological functions to specific, newly identified protein and DNA sequences, but also is useful in studies that assess the binding capacity of mutant proteins to their interaction partner and in the identification of domains and amino acids involved in known protein-protein interactions. Taken together, we describe an approach that allows the easy and reproducible identification of protein ligands on the basis of a whole-genome expression analysis. 相似文献