首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Wu X  Zhu L  Guo J  Zhang DY  Lin K 《Nucleic acids research》2006,34(7):2137-2150
A map of protein–protein interactions provides valuable insight into the cellular function and machinery of a proteome. By measuring the similarity between two Gene Ontology (GO) terms with a relative specificity semantic relation, here, we proposed a new method of reconstructing a yeast protein–protein interaction map that is solely based on the GO annotations. The method was validated using high-quality interaction datasets for its effectiveness. Based on a Z-score analysis, a positive dataset and a negative dataset for protein–protein interactions were derived. Moreover, a gold standard positive (GSP) dataset with the highest level of confidence that covered 78% of the high-quality interaction dataset and a gold standard negative (GSN) dataset with the lowest level of confidence were derived. In addition, we assessed four high-throughput experimental interaction datasets using the positives and the negatives as well as GSPs and GSNs. Our predicted network reconstructed from GSPs consists of 40753 interactions among 2259 proteins, and forms 16 connected components. We mapped all of the MIPS complexes except for homodimers onto the predicted network. As a result, ~35% of complexes were identified interconnected. For seven complexes, we also identified some nonmember proteins that may be functionally related to the complexes concerned. This analysis is expected to provide a new approach for predicting the protein–protein interaction maps from other completely sequenced genomes with high-quality GO-based annotations.  相似文献   

2.
MOTIVATION: Protein-protein interaction, mediated by protein interaction sites, is intrinsic to many functional processes in the cell. In this paper, we propose a novel method to discover patterns in protein interaction sites. We observed from protein interaction networks that there exist a kind of significant substructures called interacting protein group pairs, which exhibit an all-versus-all interaction between the two protein-sets in such a pair. The full-interaction between the pair indicates a common interaction mechanism shared by the proteins in the pair, which can be referred as an interaction type. Motif pairs at the interaction sites of the protein group pairs can be used to represent such interaction type, with each motif derived from the sequences of a protein group by standard motif discovery algorithms. The systematic discovery of all pairs of interacting protein groups from large protein interaction networks is a computationally challenging problem. By a careful and sophisticated problem transformation, the problem is solved using efficient algorithms for mining frequent patterns, a problem extensively studied in data mining. RESULTS: We found 5349 pairs of interacting protein groups from a yeast interaction dataset. The expected value of sequence identity within the groups is only 7.48%, indicating non-homology within these protein groups. We derived 5343 motif pairs from these group pairs, represented in the form of blocks. Comparing our motifs with domains in the BLOCKS and PRINTS databases, we found that our blocks could be mapped to an average of 3.08 correlated blocks in these two databases. The mapped blocks occur 4221 out of total 6794 domains (protein groups) in these two databases. Comparing our motif pairs with iPfam consisting of 3045 interacting domain pairs derived from PDB, we found 47 matches occurring in 105 distinct PDB complexes. Comparing with another putative domain interaction database InterDom, we found 203 matches. AVAILABILITY: http://research.i2r.a-star.edu.sg/BindingMotifPairs/resources. SUPPLEMENTARY INFORMATION: http://research.i2r.a-star.edu.sg/BindingMotifPairs and Bioinformatics online.  相似文献   

3.
Identifying protein–protein interactions (PPIs) is critical for understanding the cellular function of the proteins and the machinery of a proteome. Data of PPIs derived from high-throughput technologies are often incomplete and noisy. Therefore, it is important to develop computational methods and high-quality interaction dataset for predicting PPIs. A sequence-based method is proposed by combining correlation coefficient (CC) transformation and support vector machine (SVM). CC transformation not only adequately considers the neighboring effect of protein sequence but describes the level of CC between two protein sequences. A gold standard positives (interacting) dataset MIPS Core and a gold standard negatives (non-interacting) dataset GO-NEG of yeast Saccharomyces cerevisiae were mined to objectively evaluate the above method and attenuate the bias. The SVM model combined with CC transformation yielded the best performance with a high accuracy of 87.94% using gold standard positives and gold standard negatives datasets. The source code of MATLAB and the datasets are available on request under smgsmg@mail.ustc.edu.cn.  相似文献   

4.
MOTIVATION: Discovery of binding sites is important in the study of protein-protein interactions. In this paper, we introduce stable and significant motif pairs to model protein-binding sites. The stability is the pattern's resistance to some transformation. The significance is the unexpected frequency of occurrence of the pattern in a sequence dataset comprising known interacting protein pairs. Discovery of stable motif pairs is an iterative process, undergoing a chain of changing but converging patterns. Determining the starting point for such a chain is an interesting problem. We use a protein complex dataset extracted from the Protein Data Bank to help in identifying those starting points, so that the computational complexity of the problem is much released. RESULTS: We found 913 stable motif pairs, of which 765 are significant. We evaluated these motif pairs using comprehensive comparison results against random patterns. Wet-experimentally discovered motifs reported in the literature were also used to confirm the effectiveness of our method. SUPPLEMENTARY INFORMATION: http://sdmc.i2r.a-star.edu.sg/BindingMotifPairs.  相似文献   

5.
研究酵母(yeast)蛋白质相互作用与基因表达谱和蛋白质亚细胞定位的关系.首先,构建了蛋白质相互作用正样本集、负样本集、随机组对负样本集和混合样本集.然后,对于4个数据集中的所有蛋白质对,通过比较它们的基于距离的基因共表达的分布以及它们中具有已知亚细胞定位的蛋白质对的共定位出现率,实现了这些高通量数据的交叉量化分析.结果揭示,与非相互作用蛋白质对相比,相互作用蛋白质对的基因表达谱具有较高的相似性;相互作用蛋白质对更倾向于具有相同的亚细胞定位.结果还揭示出这些蛋白质特征相关的总体趋势.  相似文献   

6.
Interactions among proteins are fundamental for life and determining whether two particular proteins physically interact can be essential for fully understanding a protein’s function. We present Caenorhabditiselegans light-induced coclustering (CeLINC), an optical binary protein–protein interaction assay to determine whether two proteins interact in vivo. Based on CRY2/CIB1 light-dependent oligomerization, CeLINC can rapidly and unambiguously identify protein–protein interactions between pairs of fluorescently tagged proteins. A fluorescently tagged bait protein is captured using a nanobody directed against the fluorescent protein (GFP or mCherry) and brought into artificial clusters within the cell. Colocalization of a fluorescently tagged prey protein in the cluster indicates a protein interaction. We tested the system with an array of positive and negative reference protein pairs. Assay performance was extremely robust with no false positives detected in the negative reference pairs. We then used the system to test for interactions among apical and basolateral polarity regulators. We confirmed interactions seen between PAR-6, PKC-3, and PAR-3, but observed no physical interactions among the basolateral Scribble module proteins LET-413, DLG-1, and LGL-1. We have generated a plasmid toolkit that allows use of custom promoters or CRY2 variants to promote flexibility of the system. The CeLINC assay is a powerful and rapid technique that can be widely applied in C. elegans due to the universal plasmids that can be used with existing fluorescently tagged strains without need for additional cloning or genetic modification of the genome.  相似文献   

7.
We demonstrate that protein–protein interaction networks in several eukaryotic organisms contain significantly more self-interacting proteins than expected if such homodimers randomly appeared in the course of the evolution. We also show that on average homodimers have twice as many interaction partners than non-self-interacting proteins. More specifically, the likelihood of a protein to physically interact with itself was found to be proportional to the total number of its binding partners. These properties of dimers are in agreement with a phenomenological model, in which individual proteins differ from each other by the degree of their ‘stickiness’ or general propensity toward interaction with other proteins including oneself. A duplication of self-interacting proteins creates a pair of paralogous proteins interacting with each other. We show that such pairs occur more frequently than could be explained by pure chance alone. Similar to homodimers, proteins involved in heterodimers with their paralogs on average have twice as many interacting partners than the rest of the network. The likelihood of a pair of paralogous proteins to interact with each other was also shown to decrease with their sequence similarity. This points to the conclusion that most of interactions between paralogs are inherited from ancestral homodimeric proteins, rather than established de novo after duplication. We finally discuss possible implications of our empirical observations from functional and evolutionary standpoints.  相似文献   

8.

Background  

Information about protein interaction networks is fundamental to understanding protein function and cellular processes. Interaction patterns among proteins can suggest new drug targets and aid in the design of new therapeutic interventions. Efforts have been made to map interactions on a proteomic-wide scale using both experimental and computational techniques. Reference datasets that contain known interacting proteins (positive cases) and non-interacting proteins (negative cases) are essential to support computational prediction and validation of protein-protein interactions. Information on known interacting and non interacting proteins are usually stored within databases. Extraction of these data can be both complex and time consuming. Although, the automatic construction of reference datasets for classification is a useful resource for researchers no public resource currently exists to perform this task.  相似文献   

9.

Background  

Elucidating protein-protein interactions (PPIs) is essential to constructing protein interaction networks and facilitating our understanding of the general principles of biological systems. Previous studies have revealed that interacting protein pairs can be predicted by their primary structure. Most of these approaches have achieved satisfactory performance on datasets comprising equal number of interacting and non-interacting protein pairs. However, this ratio is highly unbalanced in nature, and these techniques have not been comprehensively evaluated with respect to the effect of the large number of non-interacting pairs in realistic datasets. Moreover, since highly unbalanced distributions usually lead to large datasets, more efficient predictors are desired when handling such challenging tasks.  相似文献   

10.
Cell-cell interactions are vital for numerous biological processes including development, differentiation, and response to inflammation. Currently, most methods for studying interactions on scRNA-seq level are based on curated databases of ligands and receptors. While those methods are useful, they are limited to our current biological knowledge. Recent advances in single cell protocols have allowed for physically interacting cells to be captured, and as such we have the potential to study interactions in a complemantary way without relying on prior knowledge. We introduce a new method based on Latent Dirichlet Allocation (LDA) for detecting genes that change as a result of interaction. We apply our method to synthetic datasets to demonstrate its ability to detect genes that change in an interacting population compared to a reference population. Next, we apply our approach to two datasets of physically interacting cells to identify the genes that change as a result of interaction, examples include adhesion and co-stimulatory molecules which confirm physical interaction between cells. For each dataset we produce a ranking of genes that are changing in subpopulations of the interacting cells. In addition to the genes discussed in the original publications, we highlight further candidates for interaction in the top 100 and 300 ranked genes. Lastly, we apply our method to a dataset generated by a standard droplet-based protocol not designed to capture interacting cells, and discuss its suitability for analysing interactions. We present a method that streamlines detection of interactions and does not require prior clustering and generation of synthetic reference profiles to detect changes in expression.  相似文献   

11.
Directed evolution methodologies benefit from read-outs quantitatively linking genotype to phenotype. We therefore devised a method that couples protein–peptide interactions to the dynamic read-out provided by an engineered DNA polymerase. Fusion of a processivity clamp protein to a thermostable nucleic acid polymerase enables polymerase activity and DNA amplification in otherwise prohibitive high-salt buffers. Here, we recapitulate this phenotype by indirectly coupling the Sso7d processivity clamp to Taq DNA polymerase via respective fusion to a high affinity and thermostable interacting protein–peptide pair. Escherichia coli cells co-expressing protein–peptide pairs can directly be used in polymerase chain reactions to determine relative interaction strengths by the measurement of amplicon yields. Conditional polymerase activity is further used to link genotype to phenotype of interacting protein–peptide pairs co-expressed in E. coli using the compartmentalized self-replication directed evolution platform. We validate this approach, termed compartmentalized two-hybrid replication, by selecting for high-affinity peptides that bind two model protein partners: SpyCatcher and the large fragment of NanoLuc luciferase. We further demonstrate directed co-evolution by randomizing both protein and peptide components of the SpyCatcher–SpyTag pair and co-selecting for functionally interacting variants.  相似文献   

12.

Introduction

To decipher the interaction between the molecular subtype classification and the probability of a non-sentinel node metastasis in breast cancer patients with a metastatic sentinel lymph-node, we applied two validated predictors (Tenon Score and MSKCC Nomogram) on two large independent datasets.

Materials and Methods

Our datasets consisted of 656 and 574 early-stage breast cancer patients with a metastatic sentinel lymph-node biopsy treated at first by surgery. We applied both predictors on the whole dataset and on each molecular immune-phenotype subgroups. The performances of the two predictors were analyzed in terms of discrimination and calibration. Probability of non-sentinel lymph node metastasis was detailed for each molecular subtype.

Results

Similar results were obtained with both predictors. We showed that the performance in terms of discrimination was as expected in ER Positive HER2 negative subgroup in both datasets (MSKCC AUC Dataset 1 = 0.73 [0.69–0.78], MSKCC AUC Dataset 2 = 0.71 (0.65–0.76), Tenon Score AUC Dataset 1 = 0.7 (0.65–0.75), Tenon Score AUC Dataset 2 = 0.72 (0.66–0.76)). Probability of non-sentinel node metastatic involvement was slightly under-estimated. Contradictory results were obtained in other subgroups (ER negative HER2 negative, HER2 positive subgroups) in both datasets probably due to a small sample size issue. We showed that merging the two datasets shifted the performance close to the ER positive HER2 negative subgroup.

Discussion

We showed that validated predictors like the Tenon Score or the MSKCC nomogram built on heterogeneous population of breast cancer performed equally on the different subgroups analyzed. Our present study re-enforce the idea that performing subgroup analysis of such predictors within less than 200 samples subgroup is at major risk of misleading conclusions.  相似文献   

13.
MOTIVATION: Much research has been devoted to the characterization of interaction interfaces found in complexes with known structure. In this context, the interactions of non-homologous domains at equivalent binding sites are of particular interest, as they can reveal convergently evolved interface motifs. Such motifs are an important source of information to formulate rules for interaction specificity and to design ligands based on the common features shared among diverse partners. RESULTS: We develop a novel method to identify non-homologous structural domains which bind at equivalent sites when interacting with a common partner. We systematically apply this method to all pairs of interactions with known structure and derive a comprehensive database for these interactions. Of all non-homologous domains, which bind with a common interaction partner, 4.2% use the same interface of the common interaction partner (excluding immunoglobulins and proteases). This rises to 16% if immunoglobulin and proteases are included. We demonstrate two applications of our database: first, the systematic screening for viral protein interfaces, which can mimic native interfaces and thus interfere; and second, structural motifs in enzymes and its inhibitors. We highlight several cases of virus protein mimicry: viral M3 protein interferes with a chemokine dimer interface. The virus has evolved the motif SVSPLP, which mimics the native SSDTTP motif. A second example is the regulatory factor Nef in HIV which can mimic a kinase when interacting with SH3. Among others the virus has evolved the kinase's PxxP motif. Further, we elucidate motif resemblances in Baculovirus p35 and HIV capsid proteins. Finally, chymotrypsin is subject to scrutiny wrt. its structural similarity to subtilisin and wrt. its inhibitor's similar recognition sites. SUPPLEMENTARY INFORMATION: A database is online at scoppi.biotec.tu-dresden.de/abac/.  相似文献   

14.
MOTIVATION: Protein-protein interactions have proved to be a valuable starting point for understanding the inner workings of the cell. Computational methodologies have been built which both predict interactions and use interaction datasets in order to predict other protein features. Such methods require gold standard positive (GSP) and negative (GSN) interaction sets. Here we examine and demonstrate the usefulness of homologous interactions in predicting good quality positive and negative interaction datasets. RESULTS: We generate GSP interaction sets as subsets from experimental data using only interaction and sequence information. We can therefore produce sets for several species (many of which at present have no identified GSPs). Comprehensive error rate testing demonstrates the power of the method. We also show how the use of our datasets significantly improves the predictive power of algorithms for interaction prediction and function prediction. Furthermore, we generate GSN interaction sets for yeast and examine the use of homology along with other protein properties such as localization, expression and function. Using a novel method to assess the accuracy of a negative interaction set, we find that the best single selector for negative interactions is a lack of co-function. However, an integrated method using all the characteristics shows significant improvement over any current method for identifying GSN interactions. The nature of homologous interactions is also examined and we demonstrate that interologs are found more commonly within species than across species. CONCLUSION: GSP sets built using our homologous verification method are demonstrably better than standard sets in terms of predictive ability. We can build such GSP sets for several species. When generating GSNs we show a combination of protein features and lack of homologous interactions gives the highest quality interaction sets. AVAILABILITY: GSP and GSN datasets for all the studied species can be downloaded from http://www.stats.ox.ac.uk/~deane/HPIV.  相似文献   

15.
Zhou Y  Zhou YS  He F  Song J  Zhang Z 《Molecular bioSystems》2012,8(5):1396-1404
Deciphering functional interactions between proteins is one of the great challenges in biology. Sequence-based homology-free encoding schemes have been increasingly applied to develop promising protein-protein interaction (PPI) predictors by means of statistical or machine learning methods. Here we analyze the relationship between codon pair usage and PPIs in yeast. We show that codon pair usage of interacting protein pairs differs significantly from randomly expected. This motivates the development of a novel approach for predicting PPIs, with codon pair frequency difference as input to a Support Vector Machine predictor, termed as CCPPI. 10-fold cross-validation tests based on yeast PPI datasets with balanced positive-to-negative ratios indicate that CCPPI performs better than other sequence-based encoding schemes. Moreover, it ranks the best when tested on an unbalanced large-scale dataset. Although CCPPI is subjected to high false positive rates like many PPI predictors, statistical analyses of the predicted true positives confirm that the success of CCPPI is partly ascribed to its capability to capture proteomic co-expression and functional similarities between interacting protein pairs. Our findings suggest that codon pairs of interacting protein pairs evolve in a coordinated manner and consequently they provide additional information beyond amino acids-based encoding schemes. CCPPI has been made freely available at: http://protein.cau.edu.cn/ccppi.  相似文献   

16.
Target identification is essential for drug design, drug-drug interaction prediction, dosage adjustment and side effect anticipation. Specifically, the knowledge of structural details is essential for understanding the mode of action of a compound on a target protein. Here, we present nAnnoLyze, a method for target identification that relies on the hypothesis that structurally similar binding sites bind similar ligands. nAnnoLyze integrates structural information into a bipartite network of interactions and similarities to predict structurally detailed compound-protein interactions at proteome scale. The method was benchmarked on a dataset of 6,282 pairs of known interacting ligand-target pairs reaching a 0.96 of area under the Receiver Operating Characteristic curve (AUC) when using the drug names as an input feature for the classifier, and a 0.70 of AUC for “anonymous” compounds or compounds not present in the training set. nAnnoLyze resulted in higher accuracies than its predecessor, AnnoLyze. We applied the method to predict interactions for all the compounds in the DrugBank database with each human protein structure and provide examples of target identification for known drugs against human diseases. The accuracy and applicability of our method to any compound indicate that a comparative docking approach such as nAnnoLyze enables large-scale annotation and analysis of compound–protein interactions and thus may benefit drug development.  相似文献   

17.
Many of the specific functions of intrinsically disordered protein segments are mediated by Short Linear Motifs (SLiMs) interacting with other proteins. Well known examples include SLiMs that interact with 14-3-3, PDZ, SH2, SH3, and WW domains but the true extent and diversity of SLiM-mediated interactions is largely unknown. Here, we attempt to expand our knowledge of human SLiMs by applying in silico SLiM prediction to the human interactome. Combining data from seven different interaction databases, we analysed approximately 6000 protein-centred and 1600 domain-centred human interaction datasets of 3+ unrelated proteins that interact with a common partner. Results were placed in context through comparison to randomised datasets of similar size and composition. The search returned thousands of evolutionarily conserved, intrinsically disordered occurrences of hundreds of significantly enriched recurring motifs, including many that have never been previously identified (). In addition to True Positive results for at least 25 different known SLiMs, a striking number of "off-target" proteins/domains also returned significantly enriched known motifs. Often, this was due to the non-independence of the datasets, with many proteins sharing interaction partners or contributing interactions to multiple domain datasets. The majority of these motif classes, however, were also found to be significantly enriched in one or more randomised datasets. This highlights the need for care when interpreting motif predictions of this nature but also raises the possibility that SLiM occurrences may be successfully identified independently of interaction data. Although not as compositionally biased as previous studies, patterns matching known SLiMs tended to cluster into a few large groups of similar sequence, while novel predictions tended to be more distinctive and less abundant. Whether this is due to ascertainment bias or a true functional composition bias of SLiMs is not clear and warrants further investigation.  相似文献   

18.
The wealth of interaction information provided in biomedical articles motivated the implementation of text mining approaches to automatically extract biomedical relations. This paper presents an unsupervised method based on pattern clustering and sentence parsing to deal with biomedical relation extraction. Pattern clustering algorithm is based on Polynomial Kernel method, which identifies interaction words from unlabeled data; these interaction words are then used in relation extraction between entity pairs. Dependency parsing and phrase structure parsing are combined for relation extraction. Based on the semi-supervised KNN algorithm, we extend the proposed unsupervised approach to a semi-supervised approach by combining pattern clustering, dependency parsing and phrase structure parsing rules. We evaluated the approaches on two different tasks: (1) Protein–protein interactions extraction, and (2) Gene–suicide association extraction. The evaluation of task (1) on the benchmark dataset (AImed corpus) showed that our proposed unsupervised approach outperformed three supervised methods. The three supervised methods are rule based, SVM based, and Kernel based separately. The proposed semi-supervised approach is superior to the existing semi-supervised methods. The evaluation on gene–suicide association extraction on a smaller dataset from Genetic Association Database and a larger dataset from publicly available PubMed showed that the proposed unsupervised and semi-supervised methods achieved much higher F-scores than co-occurrence based method.  相似文献   

19.
G·U wobble base pairs are the most common and highly conserved non-Watson–Crick base pairs in RNA. Previous surface maps imply uniformly negative electrostatic potential at the major groove of G·U wobble base pairs embedded in RNA helices, suitable for entrapment of cationic ligands. In this work, we have used a Poisson–Boltzmann approach to gain a more detailed and accurate characterization of the electrostatic profile. We found that the major groove edge of an isolated G·U wobble displays distinctly enhanced negativity compared with standard GC or AU base pairs; however, in the context of different helical motifs, the electrostatic pattern varies. G·U wobbles with distinct widening have similar major groove electrostatic potentials to their canonical counterparts, whereas those with minimal widening exhibit significantly enhanced electronegativity, ranging from 0.8 to 2.5kT/e, depending upon structural features. We propose that the negativity at the major groove of G·U wobble base pairs is determined by the combined effect of the base atoms and the sugar-phosphate backbone, which is impacted by stacking pattern and groove width as a result of base sequence. These findings are significant in that they provide predictive power with respect to which G·U sites in RNA are most likely to bind cationic ligands.  相似文献   

20.
In this work, we analyse the potential for using structural knowledge to improve the detection of the DNA-binding helix–turn–helix (HTH) motif from sequence. Starting from a set of DNA-binding protein structures that include a functional HTH motif and have no apparent sequence similarity to each other, two different libraries of hidden Markov models (HMMs) were built. One library included sequence models of whole DNA-binding domains, which incorporate the HTH motif, the second library included shorter models of ‘partial’ domains, representing only the fraction of the domain that corresponds to the functionally relevant HTH motif itself. The libraries were scanned against a dataset of protein sequences, some containing the HTH motifs, others not. HMM predictions were compared with the results obtained from a previously published structure-based method and subsequently combined with it. The combined method proved more effective than either of the single-featured approaches, showing that information carried by motif sequences and motif structures are to some extent complementary and can successfully be used together for the detection of DNA-binding HTHs in proteins of unknown function.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号