首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Cytoprophet is a software tool that allows prediction and visualization of protein and domain interaction networks. It is implemented as a plug-in of Cytoscape, an open source software framework for analysis and visualization of molecular networks. Cytoprophet implements three algorithms that predict new potential physical interactions using the domain composition of proteins and experimental assays. The algorithms for protein and domain interaction inference include maximum likelihood estimation (MLE) using expectation maximization (EM); the set cover approach maximum specificity set cover (MSSC) and the sum-product algorithm (SPA). After accepting an input set of proteins with Uniprot ID/Accession numbers and a selected prediction algorithm, Cytoprophet draws a network of potential interactions with probability scores and GO distances as edge attributes. A network of domain interactions between the domains of the initial protein list can also be generated. Cytoprophet was designed to take advantage of the visual capabilities of Cytoscape and be simple to use. An example of inference in a signaling network of myxobacterium Myxococcus xanthus is presented and available at Cytoprophet's website. AVAILABILITY: http://cytoprophet.cse.nd.edu.  相似文献   

2.
MOTIVATION: Identifying protein-protein interactions is critical for understanding cellular processes. Because protein domains represent binding modules and are responsible for the interactions between proteins, computational approaches have been proposed to predict protein interactions at the domain level. The fact that protein domains are likely evolutionarily conserved allows us to pool information from data across multiple organisms for the inference of domain-domain and protein-protein interaction probabilities. RESULTS: We use a likelihood approach to estimating domain-domain interaction probabilities by integrating large-scale protein interaction data from three organisms, Saccharomyces cerevisiae, Caenorhabditis elegans and Drosophila melanogaster. The estimated domain-domain interaction probabilities are then used to predict protein-protein interactions in S.cerevisiae. Based on a thorough comparison of sensitivity and specificity, Gene Ontology term enrichment and gene expression profiles, we have demonstrated that it may be far more informative to predict protein-protein interactions from diverse organisms than from a single organism. AVAILABILITY: The program for computing the protein-protein interaction probabilities and supplementary material are available at http://bioinformatics.med.yale.edu/interaction.  相似文献   

3.
MOTIVATION: Protein interactions are of biological interest because they orchestrate a number of cellular processes such as metabolic pathways and immunological recognition. Domains are the building blocks of proteins; therefore, proteins are assumed to interact as a result of their interacting domains. Many domain-based models for protein interaction prediction have been developed, and preliminary results have demonstrated their feasibility. Most of the existing domain-based methods, however, consider only single-domain pairs (one domain from one protein) and assume independence between domain-domain interactions. RESULTS: In this paper, we introduce a domain-based random forest of decision trees to infer protein interactions. Our proposed method is capable of exploring all possible domain interactions and making predictions based on all the protein domains. Experimental results on Saccharomyces cerevisiae dataset demonstrate that our approach can predict protein-protein interactions with higher sensitivity (79.78%) and specificity (64.38%) compared with that of the maximum likelihood approach. Furthermore, our model can be used to infer interactions not only for single-domain pairs but also for multiple domain pairs.  相似文献   

4.
Large-scale protein-protein interaction data sets have been generated for several species including yeast and human and have enabled the identification, quantification, and prediction of cellular molecular networks. Affinity purification-mass spectrometry (AP-MS) is the preeminent methodology for large-scale analysis of protein complexes, performed by immunopurifying a specific "bait" protein and its associated "prey" proteins. The analysis and interpretation of AP-MS data sets is, however, not straightforward. In addition, although yeast AP-MS data sets are relatively comprehensive, current human AP-MS data sets only sparsely cover the human interactome. Here we develop a framework for analysis of AP-MS data sets that addresses the issues of noise, missing data, and sparsity of coverage in the context of a current, real world human AP-MS data set. Our goal is to extend and increase the density of the known human interactome by integrating bait-prey and cocomplexed preys (prey-prey associations) into networks. Our framework incorporates a score for each identified protein, as well as elements of signal processing to improve the confidence of identified protein-protein interactions. We identify many protein networks enriched in known biological processes and functions. In addition, we show that integrated bait-prey and prey-prey interactions can be used to refine network topology and extend known protein networks.  相似文献   

5.
Probabilistic inference of molecular networks from noisy data sources   总被引:1,自引:0,他引:1  
Information on molecular networks, such as networks of interacting proteins, comes from diverse sources that contain remarkable differences in distribution and quantity of errors. Here, we introduce a probabilistic model useful for predicting protein interactions from heterogeneous data sources. The model describes stochastic generation of protein-protein interaction networks with real-world properties, as well as generation of two heterogeneous sources of protein-interaction information: research results automatically extracted from the literature and yeast two-hybrid experiments. Based on the domain composition of proteins, we use the model to predict protein interactions for pairs of proteins for which no experimental data are available. We further explore the prediction limits, given experimental data that cover only part of the underlying protein networks. This approach can be extended naturally to include other types of biological data sources.  相似文献   

6.
7.
The FHA domain is a modular phosphopeptide recognition motif.   总被引:1,自引:0,他引:1  
FHA domains are conserved sequences of 65-100 amino acid residues found principally within eukaryotic nuclear proteins, but which also exist in certain prokaryotes. The FHA domain is thought to mediate protein-protein interactions, but its mode of action has yet to be elucidated. Here, we show that the two highly divergent FHA domains of Saccharomyces cerevisiae Rad53p, a protein kinase involved in cell cycle checkpoint control, possess phosphopeptide-binding specificity. We also demonstrate that other FHA domains bind peptides in a phospho-dependent manner. These findings indicate that the FHA domain is a phospho-specific protein-protein interaction motif and have important implications for mechanisms of intracellular signaling in both eukaryotes and prokaryotes.  相似文献   

8.
Reconstruction of protein interaction networks that represent groups of proteins contributing to the same cellular function is a key step towards quantitative studies of signal transduction pathways. Here we present a novel approach to reconstruct a highly correlated protein interaction network and to identify previously unknown components of a signaling pathway through integration of protein-protein interaction data, gene expression data, and Gene Ontology annotations. A novel algorithm is designed to reconstruct a highly correlated protein interaction network which is composed of the candidate proteins for signal transduction mechanisms in yeast Saccharomyces cerevisiae. The high efficiency of the reconstruction process is proved by a Receiver Operating Characteristic curve analysis. Identification and scoring of the possible linear pathways enables reconstruction of specific sub-networks for glucose-induction signaling and high osmolarity MAPK signaling in S. cerevisiae. All of the known components of these pathways are identified together with several new "candidate" proteins, indicating the successful reconstructions of two model pathways involved in S. cerevisiae. The integrated approach is hence shown useful for (i) prediction of new signaling pathways, (ii) identification of unknown members of documented pathways, and (iii) identification of network modules consisting of a group of related components that often incorporate the same functional mechanism.  相似文献   

9.
Inference of protein functions is one of the most important aims of modern biology. To fully exploit the large volumes of genomic data typically produced in modern-day genomic experiments, automated computational methods for protein function prediction are urgently needed. Established methods use sequence or structure similarity to infer functions but those types of data do not suffice to determine the biological context in which proteins act. Current high-throughput biological experiments produce large amounts of data on the interactions between proteins. Such data can be used to infer interaction networks and to predict the biological process that the protein is involved in. Here, we develop a probabilistic approach for protein function prediction using network data, such as protein-protein interaction measurements. We take a Bayesian approach to an existing Markov Random Field method by performing simultaneous estimation of the model parameters and prediction of protein functions. We use an adaptive Markov Chain Monte Carlo algorithm that leads to more accurate parameter estimates and consequently to improved prediction performance compared to the standard Markov Random Fields method. We tested our method using a high quality S.cereviciae validation network with 1622 proteins against 90 Gene Ontology terms of different levels of abstraction. Compared to three other protein function prediction methods, our approach shows very good prediction performance. Our method can be directly applied to protein-protein interaction or coexpression networks, but also can be extended to use multiple data sources. We apply our method to physical protein interaction data from S. cerevisiae and provide novel predictions, using 340 Gene Ontology terms, for 1170 unannotated proteins and we evaluate the predictions using the available literature.  相似文献   

10.
Since many proteins express their functional activity by interacting with other proteins and forming protein complexes, it is very useful to identify sets of proteins that form complexes. For that purpose, many prediction methods for protein complexes from protein-protein interactions have been developed such as MCL, MCODE, RNSC, PCP, RRW, and NWE. These methods have dealt with only complexes with size of more than three because the methods often are based on some density of subgraphs. However, heterodimeric protein complexes that consist of two distinct proteins occupy a large part according to several comprehensive databases of known complexes. In this paper, we propose several feature space mappings from protein-protein interaction data, in which each interaction is weighted based on reliability. Furthermore, we make use of prior knowledge on protein domains to develop feature space mappings, domain composition kernel and its combination kernel with our proposed features. We perform ten-fold cross-validation computational experiments. These results suggest that our proposed kernel considerably outperforms the naive Bayes-based method, which is the best existing method for predicting heterodimeric protein complexes.  相似文献   

11.
Advances in proteomics technologies have enabled novel protein interactions to be detected at high speed, but they come at the expense of relatively low quality. Therefore, a crucial step in utilizing the high throughput protein interaction data is evaluating their confidence and then separating the subsets of reliable interactions from the background noise for further analyses. Using Bayesian network approaches, we combine multiple heterogeneous biological evidences, including model organism protein-protein interaction, interaction domain, functional annotation, gene expression, genome context, and network topology structure, to assign reliability to the human protein-protein interactions identified by high throughput experiments. This method shows high sensitivity and specificity to predict true interactions from the human high throughput protein-protein interaction data sets. This method has been developed into an on-line confidence scoring system specifically for the human high throughput protein-protein interactions. Users may submit their protein-protein interaction data on line, and the detailed information about the supporting evidence for query interactions together with the confidence scores will be returned. The Web interface of PRINCESS (protein interaction confidence evaluation system with multiple data sources) is available at the website of China Human Proteome Organisation.  相似文献   

12.
Protein-protein recognition, frequently mediated by members of large families of interaction domains, is one of the cornerstones of biological function. Here, we present a computational, structure-based method to predict the sequence space of peptides recognized by PDZ domains, one of the largest families of recognition proteins. As a test set, we use a considerable amount of recent phage display data that describe the peptide recognition preferences for 169 naturally occurring and engineered PDZ domains. For both wild-type PDZ domains and single point mutants, we find that 70-80% of the most frequently observed amino acids by phage display are predicted within the top five ranked amino acids. Phage display frequently identified recognition preferences for amino acids different from those present in the original crystal structure. Notably, in about half of these cases, our algorithm correctly captures these preferences, indicating that it can predict mutations that increase binding affinity relative to the starting structure. We also find that we can computationally recapitulate specificity changes upon mutation, a key test for successful forward design of protein-protein interface specificity. Across all evaluated data sets, we find that incorporation backbone sampling improves accuracy substantially, irrespective of using a crystal or NMR structure as the starting conformation. Finally, we report successful prediction of several amino acid specificity changes from blind tests in the DREAM4 peptide recognition domain specificity prediction challenge. Because the foundational methods developed here are structure based, these results suggest that the approach can be more generally applied to specificity prediction and redesign of other protein-protein interfaces that have structural information but lack phage display data.  相似文献   

13.
Wu C  Ma MH  Brown KR  Geisler M  Li L  Tzeng E  Jia CY  Jurisica I  Li SS 《Proteomics》2007,7(11):1775-1785
Systematic identification of direct protein-protein interactions is often hampered by difficulties in expressing and purifying the corresponding full-length proteins. By taking advantage of the modular nature of many regulatory proteins, we attempted to simplify protein-protein interactions to the corresponding domain-ligand recognition and employed peptide arrays to identify such binding events. A group of 12 Src homology (SH) 3 domains from eight human proteins (Swiss-Prot ID: SRC, PLCG1, P85A, NCK1, GRB2, FYN, CRK) were used to screen a peptide target array composed of 1536 potential ligands, which led to the identification of 921 binary interactions between these proteins and 284 targets. To assess the efficiency of the peptide array target screening (PATS) method in identifying authentic protein-protein interactions, we examined a set of interactions mediated by the PLCgamma1 SH3 domain by coimmunoprecipitation and/or affinity pull-downs using full-length proteins and achieved a 75% success rate. Furthermore, we characterized a novel interaction between PLCgamma1 and hematopoietic progenitor kinase 1 (HPK1) identified by PATS and demonstrated that the PLCgamma1 SH3 domain negatively regulated HPK1 kinase activity. Compared to protein interactions listed in the online predicted human interaction protein database (OPHID), the majority of interactions identified by PATS are novel, suggesting that, when extended to the large number of peptide interaction domains encoded by the human genome, PATS should aid in the mapping of the human interactome.  相似文献   

14.
MOTIVATION: Genomic and proteomic approaches have accumulated a huge amount of data which provide clues to protein function. However, interpreting single omic data for predicting uncharacterized protein functions has been a challenging task, because the data contain a lot of false positives. To overcome this problem, methods for integrating data from various omic approaches are needed for more accurate function prediction. RESULT: In this paper, we have developed a method which extracts functionally similar proteins with high confidence by integrating protein-protein interaction data and domain information. We used this method to analyze publicly available data from Saccharomyces cerevisiae. We identified 1042 functional associations, involving 765 proteins of which 98 (12.8%) had no previously ascribed function. Our method extracts functionally similar protein pairs more accurately than conventional methods, and predicting function for previously uncharacterized proteins can be achieved. Our method can of course be applied to protein-protein interaction data for any species.  相似文献   

15.
MOTIVATION: Protein-protein interaction networks are one of the major post-genomic data sources available to molecular biologists. They provide a comprehensive view of the global interaction structure of an organism's proteome, as well as detailed information on specific interactions. Here we suggest a physical model of protein interactions that can be used to extract additional information at an intermediate level: It enables us to identify proteins which share biological interaction motifs, and also to identify potentially missing or spurious interactions. RESULTS: Our new graph model explains observed interactions between proteins by an underlying interaction of complementary binding domains (lock-and-key model). This leads to a novel graph-theoretical algorithm to identify bipartite subgraphs within protein-protein interaction networks where the underlying data are taken from yeast two-hybrid experimental results. By testing on synthetic data, we demonstrate that under certain modelling assumptions, the algorithm will return correct domain information about each protein in the network. Tests on data from various model organisms show that the local and global patterns predicted by the model are indeed found in experimental data. Using functional and protein structure annotations, we show that bipartite subnetworks can be identified that correspond to biologically relevant interaction motifs. Some of these are novel and we discuss an example involving SH3 domains from the Saccharomyces cerevisiae interactome. AVAILABILITY: The algorithm (in Matlab format) is available (see http://www.maths.strath.ac.uk/~aas96106/lock_key.html).  相似文献   

16.
Hydrogen bonding is a key contributor to the specificity of intramolecular and intermolecular interactions in biological systems. Here, we develop an orientation-dependent hydrogen bonding potential based on the geometric characteristics of hydrogen bonds in high-resolution protein crystal structures, and evaluate it using four tests related to the prediction and design of protein structures and protein-protein complexes. The new potential is superior to the widely used Coulomb model of hydrogen bonding in prediction of the sequences of proteins and protein-protein interfaces from their structures, and improves discrimination of correctly docked protein-protein complexes from large sets of alternative structures.  相似文献   

17.
Assigning functions to unknown proteins is one of the most important problems in proteomics. Several approaches have used protein-protein interaction data to predict protein functions. We previously developed a Markov random field (MRF) based method to infer a protein's functions using protein-protein interaction data and the functional annotations of its protein interaction partners. In the original model, only direct interactions were considered and each function was considered separately. In this study, we develop a new model which extends direct interactions to all neighboring proteins, and one function to multiple functions. The goal is to understand a protein's function based on information on all the neighboring proteins in the interaction network. We first developed a novel kernel logistic regression (KLR) method based on diffusion kernels for protein interaction networks. The diffusion kernels provide means to incorporate all neighbors of proteins in the network. Second, we identified a set of functions that are highly correlated with the function of interest, referred to as the correlated functions, using the chi-square test. Third, the correlated functions were incorporated into our new KLR model. Fourth, we extended our model by incorporating multiple biological data sources such as protein domains, protein complexes, and gene expressions by converting them into networks. We showed that the KLR approach of incorporating all protein neighbors significantly improved the accuracy of protein function predictions over the MRF model. The incorporation of multiple data sets also improved prediction accuracy. The prediction accuracy is comparable to another protein function classifier based on the support vector machine (SVM), using a diffusion kernel. The advantages of the KLR model include its simplicity as well as its ability to explore the contribution of neighbors to the functions of proteins of interest.  相似文献   

18.
Recently, several domain-based computational models for predicting protein-protein interactions (PPIs) have been proposed. The conventional methods usually infer domain or domain combination (DC) interactions from already known interacting sets of proteins, and then predict PPIs using the information. However, the majority of these models often have limitations in providing detailed information on which domain pair (single domain interaction) or DC pair (multidomain interaction) will actually interact for the predicted protein interaction. Therefore, a more comprehensive and concrete computational model for the prediction of PPIs is needed. We developed a computational model to predict PPIs using the information of intraprotein domain cohesion and interprotein DC coupling interaction. A method of identifying the primary interacting DC pair was also incorporated into the model in order to infer actual participants in a predicted interaction. Our method made an apparent improvement in the PPI prediction accuracy, and the primary interacting DC pair identification was valid specifically in predicting multidomain protein interactions. In this paper, we demonstrate that 1) the intraprotein domain cohesion is meaningful in improving the accuracy of domain-based PPI prediction, 2) a prediction model incorporating the intradomain cohesion enables us to identify the primary interacting DC pair, and 3) a hybrid approach using the intra/interdomain interaction information can lead to a more accurate prediction.  相似文献   

19.
Polyketides, a diverse group of heteropolymers with antibiotic and antitumor properties, are assembled in bacteria by multiprotein chains of modular polyketide synthase (PKS) proteins. Specific protein-protein interactions determine the order of proteins within a multiprotein chain, and thereby the order in which chemically distinct monomers are added to the growing polyketide product. Here we investigate the evolutionary and molecular origins of protein interaction specificity. We focus on the short, conserved N- and C-terminal docking domains that mediate interactions between modular PKS proteins. Our computational analysis, which combines protein sequence data with experimental protein interaction data, reveals a hierarchical interaction specificity code. PKS docking domains are descended from a single ancestral interacting pair, but have split into three phylogenetic classes that are mutually noninteracting. Specificity within one such compatibility class is determined by a few key residues, which can be used to define compatibility subclasses. We identify these residues using a novel, highly sensitive co-evolution detection algorithm called CRoSS (correlated residues of statistical significance). The residue pairs selected by CRoSS are involved in direct physical interactions in a docked-domain NMR structure. A single PKS system can use docking domain pairs from multiple classes, as well as domain pairs from multiple subclasses of any given class. The termini of individual proteins are frequently shuffled, but docking domain pairs straddling two interacting proteins are linked as an evolutionary module. The hierarchical and modular organization of the specificity code is intimately related to the processes by which bacteria generate new PKS pathways.  相似文献   

20.
Proteins function through interactions with other molecules. Thus, the network of physical interactions among proteins is of great interest to both experimental and computational biologists. Here we present structure-based predictions of 3387 binary and 1234 higher order protein complexes in Saccharomyces cerevisiae involving 924 and 195 proteins, respectively. To generate candidate complexes, comparative models of individual proteins were built and combined together using complexes of known structure as templates. These candidate complexes were then assessed using a statistical potential, derived from binary domain interfaces in PIBASE (http://salilab.org/pibase). The statistical potential discriminated a benchmark set of 100 interface structures from a set of sequence-randomized negative examples with a false positive rate of 3% and a true positive rate of 97%. Moreover, the predicted complexes were also filtered using functional annotation and sub-cellular localization data. The ability of the method to select the correct binding mode among alternates is demonstrated for three camelid VHH domain-porcine alpha-amylase interactions. We also highlight the prediction of co-complexed domain superfamilies that are not present in template complexes. Through integration with MODBASE, the application of the method to proteomes that are less well characterized than that of S.cerevisiae will contribute to expansion of the structural and functional coverage of protein interaction space. The predicted complexes are deposited in MODBASE (http://salilab.org/modbase).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号