首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
MOTIVATION: In recent years, the Protein Data Bank (PDB) has experienced rapid growth. To maximize the utility of the high resolution protein-protein interaction data stored in the PDB, we have developed PIBASE, a comprehensive relational database of structurally defined interfaces between pairs of protein domains. It is composed of binary interfaces extracted from structures in the PDB and the Probable Quaternary Structure server using domain assignments from the Structural Classification of Proteins and CATH fold classification systems. RESULTS: PIBASE currently contains 158,915 interacting domain pairs between 105,061 domains from 2125 SCOP families. A diverse set of geometric, physiochemical and topologic properties are calculated for each complex, its domains, interfaces and binding sites. A subset of the interface properties are used to remove interface redundancy within PDB entries, resulting in 20,912 distinct domain-domain interfaces. The complexes are grouped into 989 topological classes based on their patterns of domain-domain contacts. The binary interfaces and their corresponding binding sites are categorized into 18,755 and 30,975 topological classes, respectively, based on the topology of secondary structure elements. The utility of the database is illustrated by outlining several current applications. AVAILABILITY: The database is accessible via the world wide web at http://salilab.org/pibase SUPPLEMENTARY INFORMATION: http://salilab.org/pibase/suppinfo.html.  相似文献   

2.
MOTIVATION: The NCBI dbSNP database lists over 9 million single nucleotide polymorphisms (SNPs) in the human genome, but currently contains limited annotation information. SNPs that result in amino acid residue changes (nsSNPs) are of critical importance in variation between individuals, including disease and drug sensitivity. RESULTS: We have developed LS-SNP, a genomic scale software pipeline to annotate nsSNPs. LS-SNP comprehensively maps nsSNPs onto protein sequences, functional pathways and comparative protein structure models, and predicts positions where nsSNPs destabilize proteins, interfere with the formation of domain-domain interfaces, have an effect on protein-ligand binding or severely impact human health. It currently annotates 28,043 validated SNPs that produce amino acid residue substitutions in human proteins from the SwissProt/TrEMBL database. Annotations can be viewed via a web interface either in the context of a genomic region or by selecting sets of SNPs, genes, proteins or pathways. These results are useful for identifying candidate functional SNPs within a gene, haplotype or pathway and in probing molecular mechanisms responsible for functional impacts of nsSNPs. AVAILABILITY: http://www.salilab.org/LS-SNP CONTACT: rachelk@salilab.org SUPPLEMENTARY INFORMATION: http://salilab.org/LS-SNP/supp-info.pdf.  相似文献   

3.
The following resources for comparative protein structure modeling and analysis are described (http://salilab.org): MODELLER, a program for comparative modeling by satisfaction of spatial restraints; MODWEB, a web server for automated comparative modeling that relies on PSI-BLAST, IMPALA and MODELLER; MODLOOP, a web server for automated loop modeling that relies on MODELLER; MOULDER, a CPU intensive protocol of MODWEB for building comparative models based on distant known structures; MODBASE, a comprehensive database of annotated comparative models for all sequences detectably related to a known structure; MODVIEW, a Netscape plugin for Linux that integrates viewing of multiple sequences and structures; and SNPWEB, a web server for structure-based prediction of the functional impact of a single amino acid substitution.  相似文献   

4.
5.
One goal of contemporary proteome research is the elucidation of cellular protein interactions. Based on currently available protein-protein interaction and domain data, we introduce a novel method, maximum specificity set cover (MSSC), for the prediction of protein-protein interactions. In our approach, we map the relationship between interactions of proteins and their corresponding domain architectures to a generalized weighted set cover problem. The application of a greedy algorithm provides sets of domain interactions which explain the presence of protein interactions to the largest degree of specificity. Utilizing domain and protein interaction data of S. cerevisiae, MSSC enables prediction of previously unknown protein interactions, links that are well supported by a high tendency of coexpression and functional homogeneity of the corresponding proteins. Focusing on concrete examples, we show that MSSC reliably predicts protein interactions in well-studied molecular systems, such as the 26S proteasome and RNA polymerase II of S. cerevisiae. We also show that the quality of the predictions is comparable to the maximum likelihood estimation while MSSC is faster. This new algorithm and all data sets used are accessible through a Web portal at http://ppi-cse.nd.edu  相似文献   

6.
Advances in proteomics technology have enabled new proteins to be discovered at an unprecedented speed, and high throughput experimental methods have been developed to detect protein interactions and complexes en masse. Such bottom-up, data-driven approach has resulted in data that may be uninformative or potentially errorful, requiring further validation and annotation. The InterDom database focuses on providing supporting evidence for the detected protein interactions based on putative protein domain interactions. Using an integrative approach, InterDom derives potential domain interactions by combining data from multiple sources, ranging from domain fusions, protein interactions and complexes, to scientific literature. The InterDom database is available at http://InterDom.lit.org.sg.  相似文献   

7.
8.
Introduction: Calmodulin (CaM) is a highly conserved Ca2+-binding protein that is exceptionally abundant in the brain. In the presynaptic compartment of neurons, CaM transduces changes in Ca2+ concentration into the regulation of synaptic transmission dynamics.

Areas covered: We review selected literature including published CaM interactor screens and outline established and candidate presynaptic CaM targets. We present a workflow of biochemical and structural proteomic methods that were used to identify and characterize the interactions between CaM and Munc13 proteins. Finally, we outline the potential of ion mobility-mass spectrometry (IM-MS) for conformational screening and of protein-protein cross-linking for the structural characterization of CaM complexes.

Expert commentary: Cross-linking/MS and native MS can be applied with considerable throughput to protein mixtures under near-physiological conditions, and thus effectively complement high-resolution structural biology techniques. Experimental distance constraints are applicable best when obtained by combining different cross-linking strategies, i.e. by using cross-linkers with different spacer length and reactivity, and by using the incorporation of unnatural photo-reactive amino acids. Insights from structural proteomics can be used to generate CaM-insensitive mutants of CaM targets for functional studies in vitro or ideally in vivo.  相似文献   


9.
Rapid progress in structural modeling of proteins and their interactions is powered by advances in knowledge-based methodologies along with better understanding of physical principles of protein structure and function. The pool of structural data for modeling of proteins and protein–protein complexes is constantly increasing due to the rapid growth of protein interaction databases and Protein Data Bank. The GWYRE (Genome Wide PhYRE) project capitalizes on these developments by advancing and applying new powerful modeling methodologies to structural modeling of protein–protein interactions and genetic variation. The methods integrate knowledge-based tertiary structure prediction using Phyre2 and quaternary structure prediction using template-based docking by a full-structure alignment protocol to generate models for binary complexes. The predictions are incorporated in a comprehensive public resource for structural characterization of the human interactome and the location of human genetic variants. The GWYRE resource facilitates better understanding of principles of protein interaction and structure/function relationships. The resource is available at http://www.gwyre.org.  相似文献   

10.
11.
A significant proportion of proteins comprise multiple domains. Domain–domain docking is a tool that predicts multi-domain protein structures when individual domain structures can be accurately predicted but when domain orientations cannot be predicted accurately. GalaxyDomDock predicts an ensemble of domain orientations from given domain structures by docking. Such information would also be beneficial in elucidating the functions of proteins that have multiple states with different domain orientations. GalaxyDomDock is an ab initio domain–domain docking method based on GalaxyTongDock, a previously developed protein–protein docking method. Infeasible domain orientations for the given linker are effectively screened out from the docked conformations by a geometric filter, using the Dijkstra algorithm. In addition, domain linker conformations are predicted by adopting a loop sampling method FALC. The proposed GalaxyDomDock outperformed existing ab initio domain–domain docking methods, such as AIDA and Rosetta, in performance tests on the Rosetta benchmark set of two-domain proteins. GalaxyDomDock also performed better than or comparable to AIDA on the AIDA benchmark set of two-domain proteins and two-domain proteins containing discontinuous domains, including the benchmark set in which each domain of the set was modeled by the recent version of AlphaFold. The GalaxyDomDock web server is freely available as a part of GalaxyWEB at http://galaxy.seoklab.org/domdock.  相似文献   

12.
Protein domains are conserved and functionally independent structures that play an important role in interactions among related proteins. Domain-domain interactions have been recently used to predict protein-protein interactions (PPI). In general, the interaction probability of a pair of domains is scored using a trained scoring function. Satisfying a threshold, the protein pairs carrying those domains are regarded as "interacting". In this study, the signature contents of proteins were utilized to predict PPI pairs in Saccharomyces cerevisiae, Caenorhabditis elegans, and Homo sapiens. Similarity between protein signature patterns was scored and PPI predictions were drawn based on the binary similarity scoring function. Results show that the true positive rate of prediction by the proposed approach is approximately 32% higher than that using the maximum likelihood estimation method when compared with a test set, resulting in 22% increase in the area under the receiver operating characteristic (ROC) curve. When proteins containing one or two signatures were removed, the sensitivity of the predicted PPI pairs increased significantly. The predicted PPI pairs are on average 11 times more likely to interact than the random selection at a confidence level of 0.95, and on average 4 times better than those predicted by either phylogenetic profiling or gene expression profiling.  相似文献   

13.
Macromolecular protein complexes carry out many of the essential functions of cells, and many genetic diseases arise from disrupting the functions of such complexes. Currently, there is great interest in defining the complete set of human protein complexes, but recent published maps lack comprehensive coverage. Here, through the synthesis of over 9,000 published mass spectrometry experiments, we present hu.MAP, the most comprehensive and accurate human protein complex map to date, containing > 4,600 total complexes, > 7,700 proteins, and > 56,000 unique interactions, including thousands of confident protein interactions not identified by the original publications. hu.MAP accurately recapitulates known complexes withheld from the learning procedure, which was optimized with the aid of a new quantitative metric (k‐cliques) for comparing sets of sets. The vast majority of complexes in our map are significantly enriched with literature annotations, and the map overall shows improved coverage of many disease‐associated proteins, as we describe in detail for ciliopathies. Using hu.MAP, we predicted and experimentally validated candidate ciliopathy disease genes in vivo in a model vertebrate, discovering CCDC138, WDR90, and KIAA1328 to be new cilia basal body/centriolar satellite proteins, and identifying ANKRD55 as a novel member of the intraflagellar transport machinery. By offering significant improvements to the accuracy and coverage of human protein complexes, hu.MAP ( http://proteincomplexes.org ) serves as a valuable resource for better understanding the core cellular functions of human proteins and helping to determine mechanistic foundations of human disease.  相似文献   

14.
Overview: Elucidation of the networks of physical (functional) interactions present in cells and tissues is fundamental for understanding the molecular organization of biological systems, the mechanistic basis of essential and disease-related processes, and for functional annotation of previously uncharacterized proteins (via guilt-by-association or -correlation). After a decade in the field, we felt it timely to document our own experiences in the systematic analysis of protein interaction networks.

Areas covered: Researchers worldwide have contributed innovative experimental and computational approaches that have driven the rapidly evolving field of ‘functional proteomics’. These include mass spectrometry-based methods to characterize macromolecular complexes on a global-scale and sophisticated data analysis tools – most notably machine learning – that allow for the generation of high-quality protein association maps.

Expert commentary: Here, we recount some key lessons learned, with an emphasis on successful workflows, and challenges, arising from our own and other groups’ ongoing efforts to generate, interpret and report proteome-scale interaction networks in increasingly diverse biological contexts.  相似文献   


15.
CLIP-seq is widely used to study genome-wide interactions between RNA-binding proteins and RNAs. However, there are few tools available to analyze CLIP-seq data, thus creating a bottleneck to the implementation of this methodology. Here, we present PIPE-CLIP, a Galaxy framework-based comprehensive online pipeline for reliable analysis of data generated by three types of CLIP-seq protocol: HITS-CLIP, PAR-CLIP and iCLIP. PIPE-CLIP provides both data processing and statistical analysis to determine candidate cross-linking regions, which are comparable to those regions identified from the original studies or using existing computational tools. PIPE-CLIP is available at http://pipeclip.qbrc.org/.  相似文献   

16.
Amino acid contacts in terms of atomic interactions are essential factors to be considered in the analysis of the structure of a protein and its complexes. Consequently, molecular biologists do require specific tools for the identification and visualization of all such contacts. Graphical contacts (GC) and interface forming residue graphical contacts (IFRgc) presented here, calculate atomic contacts among amino acids based on a table of predefined pairs of the atom types and their distances, and then display them using number of different forms. The inventory of currently listed contact types by GC and IFRgc include hydrogen bonds (in nine different flavors), hydrophobic interactions, charge-charge interactions, aromatic stacking and disulfide bonds. Such extensive catalog of the interactions, representing the forces that govern protein folding, stability and binding, is the key feature of these two applications. GC and IFRgc are part of STING Millennium Suite. AVAILABILITY: http://sms.cbi.cnptia.embrapa.br/SMS, http://trantor.bioc.columbia.edu/SMS, http://mirrors.rcsb.org//SMS, http://www.es.embnet.org/SMS and http://www.ar.embnet.org/SMS (Options: Graphical Contacts and IFR Graphical Contacts).  相似文献   

17.
MODBASE is a queryable database of annotated comparative protein structure models. The models are derived by MODPIPE, an automated modeling pipeline relying on the programs PSI-BLAST and MODELLER. The database currently contains 3D models for substantial portions of approximately 17 000 proteins from 10 complete genomes, including those of Caenorhabditis elegans, Saccharomyces cerevisiae and Escherichia coli, as well as all the available sequences from Arabidopsis thaliana and Homo sapiens. The database also includes fold assignments and alignments on which the models were based. In addition, special care is taken to assess the quality of the models. ModBase is accessible through a web interface at http://guitar.rockefeller.edu/modbase/  相似文献   

18.
19.
Regulatory motif finding by logic regression   总被引:1,自引:0,他引:1  
  相似文献   

20.
Introduction: The mission of the Chromosome-Centric Human Proteome Project (C-HPP), is to map and annotate the entire predicted human protein set (~20,000 proteins) encoded by each chromosome. The initial steps of the project are focused on ‘missing proteins (MPs)’, which lacked documented evidence for existence at protein level. In addition to remaining 2,579 MPs, we also target those annotated proteins having unknown functions, uPE1 proteins, alternative splice isoforms and post-translational modifications. We also consider how to investigate various protein functions involved in cis-regulatory phenomena, amplicons lncRNAs and smORFs.

Areas covered: We will cover the scope, historic background, progress, challenges and future prospects of C-HPP. This review also addresses the question of how we can best improve the methodological approaches, select the optimal biological samples, and recommend stringent protocols for the identification and characterization of MPs. A new strategy for functional analysis of some of those annotated proteins having unknown function will also be discussed.

Expert commentary: If the project moves well by reshaping the original goals, the current working modules and team work in the proposed extended planning period, it is anticipated that a progressively more detailed draft of an accurate chromosome-based proteome map will become available with functional information.  相似文献   


设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号