共查询到20条相似文献,搜索用时 0 毫秒
1.
Many biologically important protein-protein interactions (PPIs) have been found to be mediated by short linear motifs (SLiMs). These interactions are mediated by the binding of a protein domain, often with a nonlinear interaction interface, to a SLiM. We propose a method called D-SLIMMER to mine for SLiMs in PPI data on the basis of the interaction density between a nonlinear motif (i.e., a protein domain) in one protein and a SLiM in the other protein. Our results on a benchmark of 113 experimentally verified reference SLiMs showed that D-SLIMMER outperformed existing methods notably for discovering domain-SLiMs interaction motifs. To illustrate the significance of the SLiMs detected, we highlighted two SLiMs discovered from the PPI data by D-SLIMMER that are variants of the known ELM SLiM, as well as a literature-backed SLiM that is yet to be listed in the reference databases. We also presented a novel SLiM predicted by D-SLIMMER that was strongly supported by existing biological literatures. These examples showed that D-SLIMMER is able to find SLiMs that are biologically relevant. 相似文献
2.
Background
Extracting and visualizing of protein-protein interaction (PPI) from text literatures are a meaningful topic in protein science. It assists the identification of interactions among proteins. There is a lack of tools to extract PPI, visualize and classify the results. 相似文献3.
Background
The development of high-throughput technologies such as yeast two-hybrid systems and mass spectrometry technologies has made it possible to generate large protein-protein interaction (PPI) datasets. Mining these datasets for underlying biological knowledge has, however, remained a challenge. 相似文献4.
High-throughput methods for detecting protein interactions, such as mass spectrometry and yeast two-hybrid assays, continue to produce vast amounts of data that may be exploited to infer protein function and regulation. As this article went to press, the pool of all published interaction information on Saccharomyces cerevisiae was 15,143 interactions among 4,825 proteins, and power-law scaling supports an estimate of 20,000 specific protein interactions. To investigate the biases, overlaps, and complementarities among these data, we have carried out an analysis of two high-throughput mass spectrometry (HMS)-based protein interaction data sets from budding yeast, comparing them to each other and to other interaction data sets. Our analysis reveals 198 interactions among 222 proteins common to both data sets, many of which reflect large multiprotein complexes. It also indicates that a "spoke" model that directly pairs bait proteins with associated proteins is roughly threefold more accurate than a "matrix" model that connects all proteins. In addition, we identify a large, previously unsuspected nucleolar complex of 148 proteins, including 39 proteins of unknown function. Our results indicate that existing large-scale protein interaction data sets are nonsaturating and that integrating many different experimental data sets yields a clearer biological view than any single method alone. 相似文献
5.
6.
Minghua Deng Kui Zhang Shipra Mehta Ting Chen Fengzhu Sun 《Journal of computational biology》2003,10(6):947-960
Assigning functions to novel proteins is one of the most important problems in the postgenomic era. Several approaches have been applied to this problem, including the analysis of gene expression patterns, phylogenetic profiles, protein fusions, and protein-protein interactions. In this paper, we develop a novel approach that employs the theory of Markov random fields to infer a protein's functions using protein-protein interaction data and the functional annotations of protein's interaction partners. For each function of interest and protein, we predict the probability that the protein has such function using Bayesian approaches. Unlike other available approaches for protein annotation in which a protein has or does not have a function of interest, we give a probability for having the function. This probability indicates how confident we are about the prediction. We employ our method to predict protein functions based on "biochemical function," "subcellular location," and "cellular role" for yeast proteins defined in the Yeast Proteome Database (YPD, www.incyte.com), using the protein-protein interaction data from the Munich Information Center for Protein Sequences (MIPS, mips.gsf.de). We show that our approach outperforms other available methods for function prediction based on protein interaction data. The supplementary data is available at www-hto.usc.edu/~msms/ProteinFunction. 相似文献
7.
Yu X Ivanic J Memisević V Wallqvist A Reifman J 《Molecular & cellular proteomics : MCP》2011,10(12):M111.012500
We characterized and evaluated the functional attributes of three yeast high-confidence protein-protein interaction data sets derived from affinity purification/mass spectrometry, protein-fragment complementation assay, and yeast two-hybrid experiments. The interacting proteins retrieved from these data sets formed distinct, partially overlapping sets with different protein-protein interaction characteristics. These differences were primarily a function of the deployed experimental technologies used to recover these interactions. This affected the total coverage of interactions and was especially evident in the recovery of interactions among different functional classes of proteins. We found that the interaction data obtained by the yeast two-hybrid method was the least biased toward any particular functional characterization. In contrast, interacting proteins in the affinity purification/mass spectrometry and protein-fragment complementation assay data sets were over- and under-represented among distinct and different functional categories. We delineated how these differences affected protein complex organization in the network of interactions, in particular for strongly interacting complexes (e.g. RNA and protein synthesis) versus weak and transient interacting complexes (e.g. protein transport). We quantified methodological differences in detecting protein interactions from larger protein complexes, in the correlation of protein abundance among interacting proteins, and in their connectivity of essential proteins. In the latter case, we showed that minimizing inherent methodology biases removed many of the ambiguous conclusions about protein essentiality and protein connectivity. We used these findings to rationalize how biological insights obtained by analyzing data sets originating from different sources sometimes do not agree or may even contradict each other. An important corollary of this work was that discrepancies in biological insights did not necessarily imply that one detection methodology was better or worse, but rather that, to a large extent, the insights reflected the methodological biases themselves. Consequently, interpreting the protein interaction data within their experimental or cellular context provided the best avenue for overcoming biases and inferring biological knowledge. 相似文献
8.
Protein-protein interaction (PPI) prediction is a central task in achieving a better understanding of cellular and intracellular processes. Because high-throughput experimental methods are both expensive and time-consuming, and are also known of suffering from the problems of incompleteness and noise, many computational methods have been developed, with varied degrees of success. However, the inference of PPI network from multiple heterogeneous data sources remains a great challenge. In this work, we developed a novel method based on approximate Bayesian computation and modified differential evolution sampling (ABC-DEP) and regularized laplacian (RL) kernel. The method enables inference of PPI networks from topological properties and multiple heterogeneous features including gene expression and Pfam domain profiles, in forms of weighted kernels. The optimal weights are obtained by ABC-DEP, and the kernel fusion built based on optimal weights serves as input to RL to infer missing or new edges in the PPI network. Detailed comparisons with control methods have been made, and the results show that the accuracy of PPI prediction measured by AUC is increased by up to 23 %, as compared to a baseline without using optimal weights. The method can provide insights into the relations between PPIs and various feature kernels and demonstrates strong capability of predicting faraway interactions that cannot be well detected by traditional RL method. 相似文献
9.
《Genomics》2022,114(6):110509
The compatibility of plasmids with new host cells is significant given their role in spreading antimicrobial resistance (AMR) and virulence factor genes. Evaluating this using in vitro screening is laborious and can be informed by computational analyses of plasmid-host compatibility through rates of protein-protein interactions (PPIs) between plasmid and host cell proteins. We identified large excesses of such PPIs in eight important plasmids, including pOXA-48, using most known bacteria (n = 4363). 23 species had high rates of interactions with four blaOXA-48-positive plasmids. We also identified 48 species with high interaction rates with plasmids common in Escherichia coli. We found a strong association between one plasmid and the fimbrial adhesin operon pil, which could enhance host cell adhesion in aqueous environments. An excess rate of PPIs could be a sign of host-plasmid compatibility, which is important for AMR control given that plasmids like pOXA-48 move between species with ease. 相似文献
10.
11.
Inferring protein-protein interactions through high-throughput interaction data from diverse organisms 总被引:5,自引:0,他引:5
MOTIVATION: Identifying protein-protein interactions is critical for understanding cellular processes. Because protein domains represent binding modules and are responsible for the interactions between proteins, computational approaches have been proposed to predict protein interactions at the domain level. The fact that protein domains are likely evolutionarily conserved allows us to pool information from data across multiple organisms for the inference of domain-domain and protein-protein interaction probabilities. RESULTS: We use a likelihood approach to estimating domain-domain interaction probabilities by integrating large-scale protein interaction data from three organisms, Saccharomyces cerevisiae, Caenorhabditis elegans and Drosophila melanogaster. The estimated domain-domain interaction probabilities are then used to predict protein-protein interactions in S.cerevisiae. Based on a thorough comparison of sensitivity and specificity, Gene Ontology term enrichment and gene expression profiles, we have demonstrated that it may be far more informative to predict protein-protein interactions from diverse organisms than from a single organism. AVAILABILITY: The program for computing the protein-protein interaction probabilities and supplementary material are available at http://bioinformatics.med.yale.edu/interaction. 相似文献
12.
Background
High-throughput techniques are becoming widely used to study protein-protein interactions and protein complexes on a proteome-wide scale. Here we have explored the potential of these techniques to accurately determine the constituent proteins of complexes and their architecture within the complex.Results
Two-dimensional representations of the 19S and 20S proteasome, mediator, and SAGA complexes were generated and overlaid with high quality pairwise interaction data, core-module-attachment classifications from affinity purifications of complexes and predicted domain-domain interactions. Pairwise interaction data could accurately determine the members of each complex, but was unexpectedly poor at deciphering the topology of proteins in complexes. Core and module data from affinity purification studies were less useful for accurately defining the member proteins of these complexes. However, these data gave strong information on the spatial proximity of many proteins. Predicted domain-domain interactions provided some insight into the topology of proteins within complexes, but was affected by a lack of available structural data for the co-activator complexes and the presence of shared domains in paralogous proteins.Conclusion
The constituent proteins of complexes are likely to be determined with accuracy by combining data from high-throughput techniques. The topology of some proteins in the complexes will be able to be clearly inferred. We finally suggest strategies that can be employed to use high throughput interaction data to define the membership and understand the architecture of proteins in novel complexes. 相似文献13.
Background
Protein-protein interaction data used in the creation or prediction of molecular networks is usually obtained from large scale or high-throughput experiments. This experimental data is liable to contain a large number of spurious interactions. Hence, there is a need to validate the interactions and filter out the incorrect data before using them in prediction studies. 相似文献14.
15.
16.
Background
Structural models determined by X-ray crystallography play a central role in understanding protein-protein interactions at the molecular level. Interpretation of these models requires the distinction between non-specific crystal packing contacts and biologically relevant interactions. This has been investigated previously and classification approaches have been proposed. However, less attention has been devoted to distinguishing different types of biological interactions. These interactions are classified as obligate and non-obligate according to the effect of the complex formation on the stability of the protomers. So far no automatic classification methods for distinguishing obligate, non-obligate and crystal packing interactions have been made available. 相似文献17.
The advent of the "omics" era in biology research has brought new challenges and requires the development of novel strategies to answer previously intractable questions. Molecular interaction networks provide a framework to visualize cellular processes, but their complexity often makes their interpretation an overwhelming task. The inherently artificial nature of interaction detection methods and the incompleteness of currently available interaction maps call for a careful and well-informed utilization of this valuable data. In this tutorial, we aim to give an overview of the key aspects that any researcher needs to consider when working with molecular interaction data sets and we outline an example for interactome analysis. Using the molecular interaction database IntAct, the software platform Cytoscape, and its plugins BiNGO and clusterMaker, and taking as a starting point a list of proteins identified in a mass spectrometry-based proteomics experiment, we show how to build, visualize, and analyze a protein-protein interaction network. 相似文献
18.
The ever-increasing flow of gene expression profiles and protein-protein interactions has catalyzed many computational approaches for inference of gene functions. Despite all the efforts, there is still room for improvement, for the information enriched in each biological data source has not been exploited to its fullness. A composite method is proposed for classifying unannotated genes based on expression data and protein-protein interaction (PPI) data, which extracts information from both data sources in novel ways. With the noise nature of expression data taken into consideration, importance is attached to the consensus expression patterns of gene classes instead of the actual expression profiles of individual genes, thus characterizing the composite method with enhanced robustness against microarray data variation. With regard to the PPI network, the traditional clear-cut binary attitude towards inter- and intra-functional interactions is abandoned, whereas a more objective perspective into the PPI network structure is formed through incorporating the varied function-function interaction probabilities into the algorithm. The composite method was implemented in two numerical experiments, where its improvement over single-data-source based methods was observed and the superiority of the novel data handling operations was discussed. 相似文献
19.
Coverage and error models of protein-protein interaction data by directed graph analysis 总被引:1,自引:1,他引:0
Using a directed graph model for bait to prey systems and a multinomial error model, we assessed the error statistics in all published large-scale datasets for Saccharomyces cerevisiae and characterized them by three traits: the set of tested interactions, artifacts that lead to false-positive or false-negative observations, and estimates of the stochastic error rates that affect the data. These traits provide a prerequisite for the estimation of the protein interactome and its modules. 相似文献
20.
We present a statistical method SAINT-MS1 for scoring protein-protein interactions based on the label-free MS1 intensity data from affinity purification-mass spectrometry (AP-MS) experiments. The method is an extension of Significance Analysis of INTeractome (SAINT), a model-based method previously developed for spectral count data. We reformulated the statistical model for log-transformed intensity data, including adequate treatment of missing observations, that is, interactions identified in some but not all replicate purifications. We demonstrate the performance of SAINT-MS1 using two recently published data sets: a small LTQ-Orbitrap data set with three replicate purifications of single human bait protein and control purifications and a larger drosophila data set targeting insulin receptor/target of rapamycin signaling pathway generated using an LTQ-FT instrument. Using the drosophila data set, we also compare and discuss the performance of SAINT analysis based on spectral count and MS1 intensity data in terms of the recovery of orthologous and literature-curated interactions. Given rapid advances in high mass accuracy instrumentation and intensity-based label-free quantification software, we expect that SAINT-MS1 will become a useful tool allowing improved detection of protein interactions in label-free AP-MS data, especially in the low abundance range. 相似文献