首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Guo J  Wu X  Zhang DY  Lin K 《Nucleic acids research》2008,36(6):2002-2011
High-throughput studies of protein interactions may have produced, experimentally and computationally, the most comprehensive protein–protein interaction datasets in the completely sequenced genomes. It provides us an opportunity on a proteome scale, to discover the underlying protein interaction patterns. Here, we propose an approach to discovering motif pairs at interaction sites (often 38 residues) that are essential for understanding protein functions and helpful for the rational design of protein engineering and folding experiments. A gold standard positive (interacting) dataset and a gold standard negative (non-interacting) dataset were mined to infer the interacting motif pairs that are significantly overrepresented in the positive dataset compared to the negative dataset. Four negative datasets assembled by different strategies were evaluated and the one with the best performance was used as the gold standard negatives for further analysis. Meanwhile, to assess the efficiency of our method in detecting potential interacting motif pairs, other approaches developed previously were compared, and we found that our method achieved the highest prediction accuracy. In addition, many uncharacterized motif pairs of interest were found to be functional with experimental evidence in other species. This investigation demonstrates the important effects of a high-quality negative dataset on the performance of such statistical inference.  相似文献   

2.
Wang J  Xie D  Lin H  Yang Z  Zhang Y 《Proteome science》2012,10(Z1):S18

Background

Many biological processes recognize in particular the importance of protein complexes, and various computational approaches have been developed to identify complexes from protein-protein interaction (PPI) networks. However, high false-positive rate of PPIs leads to challenging identification.

Results

A protein semantic similarity measure is proposed in this study, based on the ontology structure of Gene Ontology (GO) terms and GO annotations to estimate the reliability of interactions in PPI networks. Interaction pairs with low GO semantic similarity are removed from the network as unreliable interactions. Then, a cluster-expanding algorithm is used to detect complexes with core-attachment structure on filtered network. Our method is applied to three different yeast PPI networks. The effectiveness of our method is examined on two benchmark complex datasets. Experimental results show that our method performed better than other state-of-the-art approaches in most evaluation metrics.

Conclusions

The method detects protein complexes from large scale PPI networks by filtering GO semantic similarity. Removing interactions with low GO similarity significantly improves the performance of complex identification. The expanding strategy is also effective to identify attachment proteins of complexes.
  相似文献   

3.
Identifying protein–protein interactions (PPIs) is critical for understanding the cellular function of the proteins and the machinery of a proteome. Data of PPIs derived from high-throughput technologies are often incomplete and noisy. Therefore, it is important to develop computational methods and high-quality interaction dataset for predicting PPIs. A sequence-based method is proposed by combining correlation coefficient (CC) transformation and support vector machine (SVM). CC transformation not only adequately considers the neighboring effect of protein sequence but describes the level of CC between two protein sequences. A gold standard positives (interacting) dataset MIPS Core and a gold standard negatives (non-interacting) dataset GO-NEG of yeast Saccharomyces cerevisiae were mined to objectively evaluate the above method and attenuate the bias. The SVM model combined with CC transformation yielded the best performance with a high accuracy of 87.94% using gold standard positives and gold standard negatives datasets. The source code of MATLAB and the datasets are available on request under smgsmg@mail.ustc.edu.cn.  相似文献   

4.
The purification of low-abundance protein complexes and detection of in vivo protein–protein interactions in complex biological samples remains a challenging task. Here, we devised crosslinking and tandem affinity purification coupled to mass spectrometry (XL–TAP–MS), a quantitative proteomics approach for analyzing tandem affinity-purified, crosslinked protein complexes from plant tissues. We exemplarily applied XL–TAP–MS to study the MKK2–Mitogen-activated protein kinase (MPK4) signaling module in Arabidopsis thaliana. A tandem affinity tag consisting of an in vivo-biotinylated protein domain flanked by two hexahistidine sequences was adopted to allow for the affinity-based isolation of formaldehyde–crosslinked protein complexes under fully denaturing conditions. Combined with 15N stable isotopic labeling and tandem MS we captured and identified a total of 107 MKK2–MPK4 module-interacting proteins. Consistent with the role of the MPK signaling module in plant immunity, many of the module-interacting proteins are involved in the biotic and abiotic stress response of Arabidopsis. Validation of binary protein–protein interactions by in planta split-luciferase assays and in vitro kinase assays disclosed several direct phosphorylation targets of MPK4. Together, the XL–TAP–MS approach purifies low abundance protein complexes from biological samples and discovers previously unknown protein–protein interactions.

XL–TAP–MS: a novel technique that allows purification of crosslinked, low abundant protein complexes from plant tissues under denatured conditions and detection of in vivo protein–protein interactions.  相似文献   

5.
Most cellular processes are enabled by cohorts of interacting proteins that form dynamic networks within the plant proteome. The study of these networks can provide insight into protein function and provide new avenues for research. This article informs the plant science community of the currently available sources of protein interaction data and discusses how they can be useful to researchers. Using our recently curated IntAct Arabidopsis thaliana protein–protein interaction data set as an example, we discuss potentials and limitations of the plant interactomes generated to date. In addition, we present our efforts to add value to the interaction data by using them to seed a proteome-wide map of predicted protein subcellular locations.For well over two decades, plant scientists have studied protein interactions within plants using many different and evolving approaches. Their findings are represented by a large and growing corpus of peer-reviewed literature reflecting the increasing activity in this area of plant proteomic research. More recently, a number of predicted interactomes have been reported in plants and, while these predictions remain largely untested, they could act as a useful guide for future research. These studies have allowed researchers to better understand the function of protein complexes and to refine our understanding of protein function within the cell (Uhrig, 2006; Morsy et al., 2008). The extraction of protein interaction data from the literature and its standardized deposition and representation within publicly available databases remains a challenging task. Aggregating the data in databases allows researchers to leverage visualization, data mining, and integrative approaches to produce new insights that would be unachievable when the data are dispersed within largely inaccessible formats (Rodriguez et al., 2009).Currently, there are three databases that act as repositories of plant protein interaction data. These are IntAct (http://www.ebi.ac.uk/intact/; Aranda et al., 2010), The Arabidopsis Information Resource (TAIR; http://www.Arabidopsis.org/; Poole, 2007), and BioGRID (http://www.thebiogrid.org/; Breitkreutz et al., 2008). These databases curate experimentally established interactions available from the peer-reviewed literature (as opposed to predicted interactions, which will be discussed below). Each repository takes its own approach to the capture, storage, and representation of protein interaction data. TAIR focuses on Arabidopsis thaliana protein–protein interaction data exclusively; BioGRID currently focuses on the plant species Arabidopsis and rice (Oryza sativa), while IntAct attempts to capture protein interaction data from any plant species. Unlike the other repositories, IntAct follows a deep curation strategy that captures detailed experimental and biophysical details, such as binding regions and subcellular locations of interactions using controlled vocabularies (Aranda et al., 2010). While the majority of plant interaction data held by IntAct concern protein–protein interaction data in Arabidopsis, there is a small but growing content of interaction data relating to protein–DNA, protein–RNA, and protein–small molecule interactions, as well as interaction data from other plant species.Using the IntAct Arabidopsis data set as an example, we outline how the accumulating knowledge captured in these repositories can be used to further our understanding of the plant proteome. We compare the characteristics of predicted interactomes with the IntAct protein–protein interaction data set, which consists entirely of experimentally measured protein interactions, to gauge the predictive accuracy of these studies. Finally, we show how the IntAct data set can be used together with a recently developed Divide and Conquer k-Nearest Neighbors Method (DC-kNN; K. Lee et al., 2008) to predict the subcellular locations for most Arabidopsis proteins. This data set predicts high confidence subcellular locations for many unannotated Arabidopsis proteins and should act as a useful resource for future studies of protein function. Although this article focuses on the IntAct Arabidopsis protein–protein interaction data set, readers are also encouraged to explore the resources offered by our colleagues at TAIR and BioGRID.Each database employs its own system to report molecular interactions, as represented in the referenced source publications, and each avoids making judgments on interaction reliability or whether two participants in a complex have a direct interaction. Thus, the user should carefully filter these data sets for their specific purpose based on the full annotation of the data sets. In particular, the user should consider the experimental methods and independent observation of the same interaction in different publications when assessing the reliability and type of interaction of the proteins (e.g., direct or indirect). Confidence scoring schemes for interaction data are discussed widely in the literature (Yu and Finley, 2009).  相似文献   

6.
7.
Hydrogen bond, hydrophobic and vdW interactions are the three major non-covalent interactions at protein–protein interfaces. We have developed a method that uses only these properties to describe interactions between proteins, which can qualitatively estimate the individual contribution of each interfacial residue to the binding and gives the results in a graphic display way. This method has been applied to analyze alanine mutation data at protein–protein interfaces. A dataset containing 13 protein–protein complexes with 250 alanine mutations of interfacial residues has been tested. For the 75 hot-spot residues (G1.5 kcal mol-1), 66 can be predicted correctly with a success rate of 88%. In order to test the tolerance of this method to conformational changes upon binding, we utilize a set of 26 complexes with one or both of their components available in the unbound form. The difference of key residues exported by the program is 11% between the results using complexed proteins and those from unbound ones. As this method gives the characteristics of the binding partner for a particular protein, in-depth studies on protein–protein recognition can be carried out. Furthermore, this method can be used to compare the difference between protein–protein interactions and look for correlated mutation. Figure Key interaction grids at the interface between barnase and barstar. Key interaction grid for barnase and barstar are presented in one figure according to their coordinates. In order to distinguish the two proteins, different icons were assigned. Crosses represent key grids for barstar and dots represent key grids for barnase. The four residues in ball and stick are Asp40 in barstar and Arg83, Arg87, His102 in barnase.  相似文献   

8.
The elucidation of a protein’s interaction/association network is important for defining its biological function. Mass spectrometry–based proteomic approaches have emerged as powerful tools for identifying protein–protein interactions (PPIs) and protein–protein associations (PPAs). However, interactome/association experiments are difficult to interpret, considering the complexity and abundance of data that are generated. Although tools have been developed to identify protein interactions/associations quantitatively, there is still a pressing need for easy-to-use tools that allow users to contextualize their results. To address this, we developed CANVS, a computational pipeline that cleans, analyzes, and visualizes mass spectrometry–based interactome/association data. CANVS is wrapped as an interactive Shiny dashboard with simple requirements, allowing users to interface easily with the pipeline, analyze complex experimental data, and create PPI/A networks. The application integrates systems biology databases such as BioGRID and CORUM to contextualize the results. Furthermore, CANVS features a Gene Ontology tool that allows users to identify relevant GO terms in their results and create visual networks with proteins associated with relevant GO terms. Overall, CANVS is an easy-to-use application that benefits all researchers, especially those who lack an established bioinformatic pipeline and are interested in studying interactome/association data.  相似文献   

9.
Systematic interrogation of mutation or protein modification data is important to identify sites with functional consequences and to deduce global consequences from large data sets. Mechismo (mechismo.russellab.org) enables simultaneous consideration of thousands of 3D structures and biomolecular interactions to predict rapidly mechanistic consequences for mutations and modifications. As useful functional information often only comes from homologous proteins, we benchmarked the accuracy of predictions as a function of protein/structure sequence similarity, which permits the use of relatively weak sequence similarities with an appropriate confidence measure. For protein–protein, protein–nucleic acid and a subset of protein–chemical interactions, we also developed and benchmarked a measure of whether modifications are likely to enhance or diminish the interactions, which can assist the detection of modifications with specific effects. Analysis of high-throughput sequencing data shows that the approach can identify interesting differences between cancers, and application to proteomics data finds potential mechanistic insights for how post-translational modifications can alter biomolecular interactions.  相似文献   

10.
11.
We analyze the protein–RNA interfaces in 81 transient binary complexes taken from the Protein Data Bank. Those with tRNA or duplex RNA are larger than with single-stranded RNA, and comparable in size to protein–DNA interfaces. The protein side bears a strong positive electrostatic potential and resembles protein–DNA interfaces in its amino acid composition. On the RNA side, the phosphate contributes less, and the sugar much more, to the interaction than in protein–DNA complexes. On average, protein–RNA interfaces contain 20 hydrogen bonds, 7 that involve the phosphates, 5 the sugar 2′OH, and 6 the bases, and 32 water molecules. The average H-bond density per unit buried surface area is less with tRNA or single-stranded RNA than with duplex RNA. The atomic packing is also less compact in interfaces with tRNA. On the protein side, the main chain NH and Arg/Lys side chains account for nearly half of all H-bonds to RNA; the main chain CO and side chain acceptor groups, for a quarter. The 2′OH is a major player in protein–RNA recognition, and shape complementarity an important determinant, whereas electrostatics and direct base–protein interactions play a lesser part than in protein–DNA recognition.  相似文献   

12.
13.
14.
15.

Background

Protein complexes can be identified from the protein interaction networks derived from experimental data sets. However, these analyses are challenging because of the presence of unreliable interactions and the complex connectivity of the network. The integration of protein-protein interactions with the data from other sources can be leveraged for improving the effectiveness of protein complexes detection algorithms.

Methods

We have developed novel semantic similarity method, which use Gene Ontology (GO) annotations to measure the reliability of protein-protein interactions. The protein interaction networks can be converted into a weighted graph representation by assigning the reliability values to each interaction as a weight. Following the approach of that of the previously proposed clustering algorithm IPCA which expands clusters starting from seeded vertices, we present a clustering algorithm OIIP based on the new weighted Protein-Protein interaction networks for identifying protein complexes.

Results

The algorithm OIIP is applied to the protein interaction network of Sacchromyces cerevisiae and identifies many well known complexes. Experimental results show that the algorithm OIIP has higher F-measure and accuracy compared to other competing approaches.
  相似文献   

16.
Gold standard datasets on protein complexes are key to inferring and validating protein–protein interactions. Despite much progress in characterizing protein complexes in the yeast Saccharomyces cerevisiae, numerous researchers still use as reference the manually curated complexes catalogued by the Munich Information Center of Protein Sequences database. Although this catalogue has served the community extremely well, it no longer reflects the current state of knowledge. Here, we report two catalogues of yeast protein complexes as results of systematic curation efforts. The first one, denoted as CYC2008, is a comprehensive catalogue of 408 manually curated heteromeric protein complexes reliably backed by small-scale experiments reported in the current literature. This catalogue represents an up-to-date reference set for biologists interested in discovering protein interactions and protein complexes. The second catalogue, denoted as YHTP2008, comprises 400 high-throughput complexes annotated with current literature evidence. Among them, 262 correspond, at least partially, to CYC2008 complexes. Evidence for interacting subunits is collected for 68 complexes that have only partial or no overlap with CYC2008 complexes, whereas no literature evidence was found for 100 complexes. Some of these partially supported and as yet unsupported complexes may be interesting candidates for experimental follow up. Both catalogues are freely available at: http://wodaklab.org/cyc2008/.  相似文献   

17.
Protein interaction in cells can be described at different levels. At a low interaction level, proteins function together in small, stable complexes and at a higher level, in sets of interacting complexes. All interaction levels are crucial for the living organism, and one of the challenges in proteomics is to measure the proteins at their different interaction levels. One common method for such measurements is immunoprecipitation followed by mass spectrometry (IP/MS), which has the potential to probe the different protein interaction forms. However, IP/MS data are complex because proteins, in their diverse interaction forms, manifest themselves in different ways in the data. Numerous bioinformatic tools for finding protein complexes in IP/MS data are currently available, but most tools do not provide information about the interaction level of the discovered complexes, and no tool is geared specifically to unraveling and visualizing these different levels. We present a new bioinformatic tool to explore IP/MS datasets for protein complexes at different interaction levels and show its performance on several real–life datasets. Our tool creates clusters that represent protein complexes, but unlike previous methods, it arranges them in a tree–shaped structure, reporting why specific proteins are predicted to build a complex and where it can be divided into smaller complexes. In every data analysis method, parameters have to be chosen. Our method can suggest values for its parameters and comes with adapted visualization tools that display the effect of the parameters on the result. The tools provide fast graphical feedback and allow the user to interact with the data by changing the parameters and examining the result. The tools also allow for exploring the different organizational levels of the protein complexes in a given dataset. Our method is available as GNU-R source code and includes examples at www.bdagroup.nl.  相似文献   

18.
High‐resolution experimental structural determination of protein–protein interactions has led to valuable mechanistic insights, yet due to the massive number of interactions and experimental limitations there is a need for computational methods that can accurately model their structures. Here we explore the use of the recently developed deep learning method, AlphaFold, to predict structures of protein complexes from sequence. With a benchmark of 152 diverse heterodimeric protein complexes, multiple implementations and parameters of AlphaFold were tested for accuracy. Remarkably, many cases (43%) had near‐native models (medium or high critical assessment of predicted interactions accuracy) generated as top‐ranked predictions by AlphaFold, greatly surpassing the performance of unbound protein–protein docking (9% success rate for near‐native top‐ranked models), however AlphaFold modeling of antibody–antigen complexes within our set was unsuccessful. We identified sequence and structural features associated with lack of AlphaFold success, and we also investigated the impact of multiple sequence alignment input. Benchmarking of a multimer‐optimized version of AlphaFold (AlphaFold‐Multimer) with a set of recently released antibody–antigen structures confirmed a low rate of success for antibody–antigen complexes (11% success), and we found that T cell receptor–antigen complexes are likewise not accurately modeled by that algorithm, showing that adaptive immune recognition poses a challenge for the current AlphaFold algorithm and model. Overall, our study demonstrates that end‐to‐end deep learning can accurately model many transient protein complexes, and highlights areas of improvement for future developments to reliably model any protein–protein interaction of interest.  相似文献   

19.
An analysis of cavities present in protein–DNA and protein–RNA complexes is presented. In terms of the number of cavities and their total volume, the interfaces formed in these complexes are akin to those in transient protein–protein heterocomplexes. With homodimeric proteins protein–DNA interfaces may contain cavities involving both the protein subunits and DNA, and these are more than twice as large as cavities involving a single protein subunit and DNA. A parameter, cavity index, measuring the degree of surface complementarity, indicates that the packing of atoms in protein–protein/DNA/RNA is very similar, but it is about two times less efficient in the permanent interfaces formed between subunits in homodimers. As within the tertiary structure and protein–protein interfaces, protein–DNA interfaces have a higher inclination to be lined by β-sheet residues; from the DNA side, base atoms, in particular those in minor grooves, have a higher tendency to be located in cavities. The larger cavities tend to be less spherical and solvated. A small fraction of water molecules are found to mediate hydrogen-bond interactions with both the components, suggesting their primary role is to fill in the void left due to the local non-complementary nature of the surface patches.  相似文献   

20.
A detailed computational analysis of 32 protein–RNA complexes is presented. A number of physical and chemical properties of the intermolecular interfaces are calculated and compared with those observed in protein–double-stranded DNA and protein–single-stranded DNA complexes. The interface properties of the protein–RNA complexes reveal the diverse nature of the binding sites. van der Waals contacts played a more prevalent role than hydrogen bond contacts, and preferential binding to guanine and uracil was observed. The positively charged residue, arginine, and the single aromatic residues, phenylalanine and tyrosine, all played key roles in the RNA binding sites. A comparison between protein–RNA and protein–DNA complexes showed that whilst base and backbone contacts (both hydrogen bonding and van der Waals) were observed with equal frequency in the protein–RNA complexes, backbone contacts were more dominant in the protein–DNA complexes. Although similar modes of secondary structure interactions have been observed in RNA and DNA binding proteins, the current analysis emphasises the differences that exist between the two types of nucleic acid binding protein at the atomic contact level.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号