首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Large-scale protein-protein interaction data sets have been generated for several species including yeast and human and have enabled the identification, quantification, and prediction of cellular molecular networks. Affinity purification-mass spectrometry (AP-MS) is the preeminent methodology for large-scale analysis of protein complexes, performed by immunopurifying a specific "bait" protein and its associated "prey" proteins. The analysis and interpretation of AP-MS data sets is, however, not straightforward. In addition, although yeast AP-MS data sets are relatively comprehensive, current human AP-MS data sets only sparsely cover the human interactome. Here we develop a framework for analysis of AP-MS data sets that addresses the issues of noise, missing data, and sparsity of coverage in the context of a current, real world human AP-MS data set. Our goal is to extend and increase the density of the known human interactome by integrating bait-prey and cocomplexed preys (prey-prey associations) into networks. Our framework incorporates a score for each identified protein, as well as elements of signal processing to improve the confidence of identified protein-protein interactions. We identify many protein networks enriched in known biological processes and functions. In addition, we show that integrated bait-prey and prey-prey interactions can be used to refine network topology and extend known protein networks.  相似文献   

2.
Choi H 《Proteomics》2012,12(10):1663-1668
Protein complex identification is an important goal of protein-protein interaction analysis. To date, development of computational methods for detecting protein complexes has been largely motivated by genome-scale interaction data sets from high-throughput assays such as yeast two-hybrid or tandem affinity purification coupled with mass spectrometry (TAP-MS). However, due to the popularity of small to intermediate-scale affinity purification-mass spectrometry (AP-MS) experiments, protein complex detection is increasingly discussed in local network analysis. In such data sets, protein complexes cannot be detected using binary interaction data alone because the data contain interactions with tagged proteins only and, as a result, interactions between all other proteins remain unobserved, limiting the scope of existing algorithms. In this article, we provide a pragmatic review of network graph-based computational algorithms for protein complex analysis in global interactome data, without requiring any computational background. We discuss the practical gap in applying these algorithms to recently surging small to intermediate-scale AP-MS data sets, and review alternative clustering algorithms using quantitative proteomics data and their limitations.  相似文献   

3.
SUMMARY: We have developed several new navigation features for a Java graph applet previously released for visualizing protein-protein interactions. This graph viewer can be used to navigate any molecular interactome dataset. We have successfully implemented this tool for exploring protein networks stored in the Bioverse interaction database. AVAILABILITY: http://bioverse.compbio.washington.edu/viewer CONTACT: ram@compbio.washington.edu.  相似文献   

4.
Heat shock protein 70 (Hsp70) is an evolutionarily well-conserved molecular chaperone involved in several cellular processes such as folding of proteins, modulating protein-protein interactions, and transport of proteins across the membrane. Binding partners of Hsp70 (known as “clients”) are identified on an individual basis as researchers discover their particular protein of interest binds to Hsp70. A full complement of Hsp70 interactors under multiple stress conditions remains to be determined. A promising approach to characterizing the Hsp70 “interactome” is the use of protein epitope tagging and then affinity purification followed by mass spectrometry (AP-MS/MS). AP-MS analysis is a widely used method to decipher protein-protein interaction networks and identifying protein functions. Conventionally, the proteins are overexpressed ectopically which interferes with protein complex stoichiometry, skewing AP-MS/MS data. In an attempt to solve this issue, we used CRISPR/Cas9-mediated gene editing to integrate a tandem-affinity (TAP) epitope tag into the genomic locus of HSC70. This system offers several benefits over existing expression systems including native expression, no requirement for selection, and homogeneity between cells. This cell line, freely available to chaperone researchers, will aid in small and large-scale protein interaction studies as well as the study of biochemical activities and structure-function relationships of the Hsc70 protein.  相似文献   

5.
ABSTRACT: BACKGROUND: Affinity-Purification Mass-Spectrometry (AP-MS) provides a powerful means of identifyingprotein complexes and interactions. Several important challenges exist in interpreting theresults of AP-MS experiments. First, the reproducibility of AP-MS experimental replicatescan be low, due both to technical variability and the dynamic nature of protein interactions inthe cell. Second, the identification of true protein-protein interactions in AP-MS experimentsis subject to inaccuracy due to high false negative and false positive rates. Severalexperimental approaches can be used to mitigate these drawbacks, including the use ofreplicated and control experiments and relative quantification to sensitively distinguish trueinteracting proteins from false ones. RESULTS: To address the issues of reproducibility and accuracy of protein-protein interactions, weintroduce a two-step method, called ROCS, which makes use of Indicator Proteins to selectreproducible AP-MS experiments, and of Confidence Scores to select specific protein-proteininteractions. The Indicator Proteins account for measures of protein identification as well asprotein reproducibility, effectively allowing removal of outlier experiments that contributenoise and affect downstream inferences. The filtered set of experiments is then used in theProtein-Protein Interaction (PPI) scoring step. Prey protein scoring is done by computing aConfidence Score, which accounts for the probability of occurrence of prey proteins in thebait experiments relative to the control experiment, where the significance cutoff parameter isestimated by simultaneously controlling false positives and false negatives against metrics offalse discovery rate and biological coherence respectively. In summary, the ROCS methodrelies on automatic objective criterions for parameter estimation and error-controlledprocedures. We illustrate the performance of our method by applying it to five previously published AP-MS experiments, each containing well characterized protein interactions,allowing for systematic benchmarking of ROCS. We show that our method may be used onits own to make accurate identification of specific, biologically relevant protein-proteininteractions or in combination with other AP-MS scoring methods to significantly improveinferences. CONCLUSIONS: Our method addresses important issues encountered in AP-MS datasets, making ROCS a verypromising tool for this purpose, either on its own or especially in conjunction with othermethods. We anticipate that our methodology may be used more generally in proteomicsstudies and databases, where experimental reproducibility issues arise. The method isimplemented in the R language, and is available as an R package called "ROCS", freelyavailable from the CRAN repository http://cran.r-project.org/.  相似文献   

6.
MOTIVATION: Algorithmic and modeling advances in the area of protein-protein interaction (PPI) network analysis could contribute to the understanding of biological processes. Local structure of networks can be measured by the frequency distribution of graphlets, small connected non-isomorphic induced subgraphs. This measure of local structure has been used to show that high-confidence PPI networks have local structure of geometric random graphs. Finding graphlets exhaustively in a large network is computationally intensive. More complete PPI networks, as well as PPI networks of higher organisms, will thus require efficient heuristic approaches. RESULTS: We propose two efficient and scalable heuristics for finding graphlets in high-confidence PPI networks. We show that both PPI and their model geometric random networks, have defined boundaries that are sparser than the 'inner parts' of the networks. In addition, these networks exhibit 'uniformity' of local structure inside the networks. Our first heuristic exploits these two structural properties of PPI and geometric random networks to find good estimates of graphlet frequency distributions in these networks up to 690 times faster than the exhaustive searches. Our second heuristic is a variant of a more standard sampling technique and it produces accurate approximate results up to 377 times faster than the exhaustive searches. We indicate how the combination of these approaches may result in an even better heuristic. AVAILABILITY: Supplementary information is available at http://www.cs.toronto.edu/~natasha/BIOINF-2005-0946/Supplementary.pdf. Software implementing the algorithms is available at http://www.cs.toronto.edu/~natasha/BIOINF-2005-0946/estimate_grap-hlets.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

7.
Prediction of protein-protein interactions at the structural level on the proteome scale is important because it allows prediction of protein function, helps drug discovery and takes steps toward genome-wide structural systems biology. We provide a protocol (termed PRISM, protein interactions by structural matching) for large-scale prediction of protein-protein interactions and assembly of protein complex structures. The method consists of two components: rigid-body structural comparisons of target proteins to known template protein-protein interfaces and flexible refinement using a docking energy function. The PRISM rationale follows our observation that globally different protein structures can interact via similar architectural motifs. PRISM predicts binding residues by using structural similarity and evolutionary conservation of putative binding residue 'hot spots'. Ultimately, PRISM could help to construct cellular pathways and functional, proteome-scale annotation. PRISM is implemented in Python and runs in a UNIX environment. The program accepts Protein Data Bank-formatted protein structures and is available at http://prism.ccbb.ku.edu.tr/prism_protocol/.  相似文献   

8.
Improvements in experimental techniques increasingly provide structural data relating to protein-protein interactions. Classification of structural details of protein-protein interactions can provide valuable insights for modeling and abstracting design principles. Here, we aim to cluster protein-protein interactions by their interface structures, and to exploit these clusters to obtain and study shared and distinct protein binding sites. We find that there are 22604 unique interface structures in the PDB. These unique interfaces, which provide a rich resource of structural data of protein-protein interactions, can be used for template-based docking. We test the specificity of these non-redundant unique interface structures by finding protein pairs which have multiple binding sites. We suggest that residues with more than 40% relative accessible surface area should be considered as surface residues in template-based docking studies. This comprehensive study of protein interface structures can serve as a resource for the community. The dataset can be accessed at http://prism.ccbb.ku.edu.tr/piface.  相似文献   

9.
Large-scale proteomic screens are increasingly employed for placing genes into specific pathways. Therefore generic methods providing a physiological context for protein-protein interaction studies are of great interest. In recent years many protein-protein interactions have been determined by affinity purification followed by mass spectrometry (AP-MS). Among many different AP-MS approaches, the recently developed Quantitative BAC InteraCtomics (QUBIC) approach is particularly attractive as it uses tagged, full-length baits that are expressed under endogenous control. For QUBIC large cell line collections expressing tagged proteins from BAC transgenes or gene trap loci have been developed and are freely available. Here we describe detailed workflows on how to obtain specific protein binding partners with high confidence under physiological conditions. The methods are based on fast, streamlined and generic purification procedures followed by single run liquid chromatography-mass spectrometric analysis. Quantification is achieved either by the stable isotope labeling of amino acids in cell culture (SILAC) method or by a 'label-free' procedure. In either case data analysis is performed by using the freely available MaxQuant environment. The QUBIC approach enables biologists with access to high resolution mass spectrometry to perform small and large-scale protein interactome mappings.  相似文献   

10.
We present 'significance analysis of interactome' (SAINT), a computational tool that assigns confidence scores to protein-protein interaction data generated using affinity purification-mass spectrometry (AP-MS). The method uses label-free quantitative data and constructs separate distributions for true and false interactions to derive the probability of a bona fide protein-protein interaction. We show that SAINT is applicable to data of different scales and protein connectivity and allows transparent analysis of AP-MS data.  相似文献   

11.
Understanding complex networks of protein-protein interactions (PPIs) is one of the foremost challenges of the post-genomic era. Due to the recent advances in experimental bio-technology, including yeast-2-hybrid (Y2H), tandem affinity purification (TAP) and other high-throughput methods for protein-protein interaction (PPI) detection, huge amounts of PPI network data are becoming available. Of major concern, however, are the levels of noise and incompleteness. For example, for Y2H screens, it is thought that the false positive rate could be as high as 64%, and the false negative rate may range from 43% to 71%. TAP experiments are believed to have comparable levels of noise.We present a novel technique to assess the confidence levels of interactions in PPI networks obtained from experimental studies. We use it for predicting new interactions and thus for guiding future biological experiments. This technique is the first to utilize currently the best fitting network model for PPI networks, geometric graphs. Our approach achieves specificity of 85% and sensitivity of 90%. We use it to assign confidence scores to physical protein-protein interactions in the human PPI network downloaded from BioGRID. Using our approach, we predict 251 interactions in the human PPI network, a statistically significant fraction of which correspond to protein pairs sharing common GO terms. Moreover, we validate a statistically significant portion of our predicted interactions in the HPRD database and the newer release of BioGRID. The data and Matlab code implementing the methods are freely available from the web site: http://www.kuchaev.com/Denoising.  相似文献   

12.
13.
With the increasing availability of diverse biological information for proteins, integration of heterogeneous data becomes more useful for many problems in proteomics, such as annotating protein functions, predicting novel protein–protein interactions and so on. In this paper, we present an integrative approach called InteHC (Inte grative H ierarchical C lustering) to identify protein complexes from multiple data sources. Although integrating multiple sources could effectively improve the coverage of current insufficient protein interactome (the false negative issue), it could also introduce potential false‐positive interactions that could hurt the performance of protein complex prediction. Our proposed InteHC method can effectively address these issues to facilitate accurate protein complex prediction and it is summarized into the following three steps. First, for each individual source/feature, InteHC computes the matrices to store the affinity scores between a protein pair that indicate their propensity to interact or co‐complex relationship. Second, InteHC computes a final score matrix, which is the weighted sum of affinity scores from individual sources. In particular, the weights indicating the reliability of individual sources are learned from a supervised model (i.e., a linear ranking SVM). Finally, a hierarchical clustering algorithm is performed on the final score matrix to generate clusters as predicted protein complexes. In our experiments, we compared the results collected by our hierarchical clustering on each individual feature with those predicted by InteHC on the combined matrix. We observed that integration of heterogeneous data significantly benefits the identification of protein complexes. Moreover, a comprehensive comparison demonstrates that InteHC performs much better than 14 state‐of‐the‐art approaches. All the experimental data and results can be downloaded from http://www.ntu.edu.sg/home/zhengjie/data/InteHC . Proteins 2013; 81:2023–2033. © 2013 Wiley Periodicals, Inc.  相似文献   

14.
Functional annotation from predicted protein interaction networks   总被引:1,自引:0,他引:1  
MOTIVATION: Progress in large-scale experimental determination of protein-protein interaction networks for several organisms has resulted in innovative methods of functional inference based on network connectivity. However, the amount of effort and resources required for the elucidation of experimental protein interaction networks is prohibitive. Previously we, and others, have developed techniques to predict protein interactions for novel genomes using computational methods and data generated from other genomes. RESULTS: We evaluated the performance of a network-based functional annotation method that makes use of our predicted protein interaction networks. We show that this approach performs equally well on experimentally derived and predicted interaction networks, for both manually and computationally assigned annotations. We applied the method to predicted protein interaction networks for over 50 organisms from all domains of life, providing annotations for many previously unannotated proteins and verifying existing low-confidence annotations. AVAILABILITY: Functional predictions for over 50 organisms are available at http://bioverse.compbio.washington.edu and datasets used for analysis at http://data.compbio.washington.edu/misc/downloads/nannotation_data/. SUPPLEMENTARY INFORMATION: A supplemental appendix gives additional details not in the main text. (http://data.compbio.washington.edu/misc/downloads/nannotation_data/supplement.pdf).  相似文献   

15.
SUMMARY: We present Serial SimCoal, a program that models population genetic data from multiple time points, as with ancient DNA data. An extension of SIMCOAL, it also allows simultaneous modeling of complex demographic histories, and migration between multiple populations. Further, we incorporate a statistical package to calculate relevant summary statistics, which, for the first time allows users to investigate the statistical power provided by, conduct hypothesis-testing with, and explore sample size limitations of ancient DNA data. AVAILABILITY: Source code and Windows/Mac executables at http://www.stanford.edu/group/hadlylab/ssc.html CONTACT: senka@stanford.edu.  相似文献   

16.
To understand the biology of the interactome, the covisualization of protein interactions and other protein-related data is required. In this study, we have adapted a 3-D network visualization platform, GEOMI, to allow the coanalysis of protein-protein interaction networks with proteomic parameters such as protein localization, abundance, physicochemical parameters, post-translational modifications, and gene ontology classification. Working with Saccharomyces cerevisiae data, we show that rich and interactive visualizations, constructed from multidimensional orthogonal data, provide insights on the complexity of the interactome and its role in biological processes and the architecture of the cell. We present the first organelle-specific interaction networks, that provide subinteractomes of high biological interest. We further present some of the first views of the interactome built from a new combination of yeast two-hybrid data and stable protein complexes, which are likely to approximate the true workings of stable and transient aspects of the interactome. The GEOMI tool and all interactome data are freely available by contacting the authors.  相似文献   

17.
ABSTRACT: BACKGROUND: A global map of protein-protein interactions in cellular systems provides key insights into the working of an organism. A repository of well-validated high-quality protein-protein interactions can be used in both large- and small-scale studies to generate and validate a wide range of functional hypotheses. RESULTS: We develop HINT (http://hint.yulab.org) - a database of high-quality protein-protein interactions for human, Saccharomyces cerevisiae and Schizosaccharomyces pombe. These were collected from several databases and filtered both systematically and manually to remove low-quality/erroneous interactions. The resulting datasets are classified by type (binary physical interactions vs. co-complex associations) and data source (high-throughput systematic setups vs. literature-curated small-scale experiments). We find strong sociological sampling biases in literature-curated datasets of small-scale interactions. An interactome without such sampling biases was used to understand network properties of human disease-genes - hubs are unlikely to cause disease, but if they do, they usually cause multiple disorders. CONCLUSIONS: HINT is of significant interest to researchers in all fields of biology as it addresses the ubiquitous need of having a repository of high-quality protein-protein interactions. These datasets can be utilized to generate specific hypotheses about specific proteins and/or pathways, as well as analyzing global properties of cellular networks. HINT will be regularly updated and all versions will be tracked.  相似文献   

18.
Estimating node degree in bait-prey graphs   总被引:2,自引:0,他引:2  
MOTIVATION: Proteins work together to drive biological processes in cellular machines. Summarizing global and local properties of the set of protein interactions, the interactome, is necessary for describing cellular systems. We consider a relatively simple per-protein feature of the interactome: the number of interaction partners for a protein, which in graph terminology is the degree of the protein. RESULTS: Using data subject to both stochastic and systematic sources of false positive and false negative observations, we develop an explicit probability model and resultant likelihood method to estimate node degree on portions of the interactome assayed by bait-prey technologies. This approach yields substantial improvement in degree estimation over the current practice that naively sums observed edges. Accurate modeling of observed data in relation to true but unknown parameters of interest gives a formal point of reference from which to draw conclusions about the system under study. AVAILABILITY: All analyses discussed in this text can be performed using the ppiStats and ppiData packages available through the Bioconductor project (http://www.bioconductor.org).  相似文献   

19.
Currently available protein-protein interaction (PPI) network or 'interactome' maps, obtained with the yeast two-hybrid (Y2H) assay or by co-affinity purification followed by mass spectrometry (co-AP/MS), only cover a fraction of the complete PPI networks. These partial networks display scale-free topologies--most proteins participate in only a few interactions whereas a few proteins have many interaction partners. Here we analyze whether the scale-free topologies of the partial networks obtained from Y2H assays can be used to accurately infer the topology of complete interactomes. We generated four theoretical interaction networks of different topologies (random, exponential, power law, truncated normal). Partial sampling of these networks resulted in sub-networks with topological characteristics that were virtually indistinguishable from those of currently available Y2H-derived partial interactome maps. We conclude that given the current limited coverage levels, the observed scale-free topology of existing interactome maps cannot be confidently extrapolated to complete interactomes.  相似文献   

20.
RATIONALE: Modern molecular biology is generating data of unprecedented quantity and quality. Particularly exciting for biochemical pathway modeling and proteomics are comprehensive, time-dense profiles of metabolites or proteins that are measurable, for instance, with mass spectrometry, nuclear magnetic resonance or protein kinase phosphorylation. These profiles contain a wealth of information about the structure and dynamics of the pathway or network from which the data were obtained. The retrieval of this information requires a combination of computational methods and mathematical models, which are typically represented as systems of ordinary differential equations. RESULTS: We show that, for the purpose of structure identification, the substitution of differentials with estimated slopes in non-linear network models reduces the coupled system of differential equations to several sets of decoupled algebraic equations, which can be processed efficiently in parallel or sequentially. The estimation of slopes for each time series of the metabolic or proteomic profile is accomplished with a 'universal function' that is computed directly from the data by cross-validated training of an artificial neural network (ANN). CONCLUSIONS: Without preprocessing, the inverse problem of determining structure from metabolic or proteomic profile data is challenging and computationally expensive. The combination of system decoupling and data fitting with universal functions simplifies this inverse problem very significantly. Examples show successful estimations and current limitations of the method. AVAILABILITY: A preliminary Web-based application for ANN smoothing is accessible at http://bioinformatics.musc.edu/webmetabol/. S-systems can be interactively analyzed with the user-friendly freeware PLAS (http://correio.cc.fc.ul.pt/~aenf/plas.html) or with the MATLAB module BSTLab (http://bioinformatics.musc.edu/bstlab/), which is currently being beta-tested.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号