首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
SUMMARY: With the continuous growth of the RCSB Protein Data Bank (PDB), providing an up-to-date systematic structure comparison of all protein structures poses an ever growing challenge. Here, we present a comparison tool for calculating both 1D protein sequence and 3D protein structure alignments. This tool supports various applications at the RCSB PDB website. First, a structure alignment web service calculates pairwise alignments. Second, a stand-alone application runs alignments locally and visualizes the results. Third, pre-calculated 3D structure comparisons for the whole PDB are provided and updated on a weekly basis. These three applications allow users to discover novel relationships between proteins available either at the RCSB PDB or provided by the user. Availability and Implementation: A web user interface is available at http://www.rcsb.org/pdb/workbench/workbench.do. The source code is available under the LGPL license from http://www.biojava.org. A source bundle, prepared for local execution, is available from http://source.rcsb.org CONTACT: andreas@sdsc.edu; pbourne@ucsd.edu.  相似文献   

2.
TargetDB: a target registration database for structural genomics projects   总被引:2,自引:0,他引:2  
TargetDB is a centralized target registration database that includes protein target data from the NIH structural genomics centers and a number of international sites. TargetDB, which is hosted by the Protein Data Bank (RCSB PDB), provides status information on target sequences and tracks their progress through the various stages of protein production and structure determination. A simple search form permits queries based on contributing site, target ID, protein name, sequence, status and other data. The progress of individual targets or entire structural genomics projects may be tracked over time, and target data from all contributing centers may also be downloaded in the XML format. AVAILABILITY: TargetDB is available at http://targetdb.pdb.org/  相似文献   

3.

Background  

Protein sequence insertions/deletions (indels) can be introduced during evolution or through alternative splicing (AS). Alternative splicing is an important biological phenomenon and is considered as the major means of expanding structural and functional diversity in eukaryotes. Knowledge of the structural changes due to indels is critical to our understanding of the evolution of protein structure and function. In addition, it can help us probe the evolution of alternative splicing and the diversity of functional isoforms. However, little is known about the effects of indels, in particular the ones involving core secondary structures, on the folding of protein structures. The long term goal of our study is to accurately predict the protein AS isoform structures. As a first step towards this goal, we performed a systematic analysis on the structural changes caused by short internal indels through mining highly homologous proteins in Protein Data Bank (PDB).  相似文献   

4.
Analyses of publicly available structural data reveal interesting insights into the impact of the three‐dimensional (3D) structures of protein targets important for discovery of new drugs (e.g., G‐protein‐coupled receptors, voltage‐gated ion channels, ligand‐gated ion channels, transporters, and E3 ubiquitin ligases). The Protein Data Bank (PDB) archive currently holds > 155,000 atomic‐level 3D structures of biomolecules experimentally determined using crystallography, nuclear magnetic resonance spectroscopy, and electron microscopy. The PDB was established in 1971 as the first open‐access, digital‐data resource in biology, and is now managed by the Worldwide PDB partnership (wwPDB; wwPDB.org ). US PDB operations are the responsibility of the Research Collaboratory for Structural Bioinformatics PDB (RCSB PDB). The RCSB PDB serves millions of RCSB.org users worldwide by delivering PDB data integrated with ~40 external biodata resources, providing rich structural views of fundamental biology, biomedicine, and energy sciences. Recently published work showed that the PDB archival holdings facilitated discovery of ~90% of the 210 new drugs approved by the US Food and Drug Administration 2010–2016. We review user‐driven development of RCSB PDB services, examine growth of the PDB archive in terms of size and complexity, and present examples and opportunities for structure‐guided drug discovery for challenging targets (e.g., integral membrane proteins).  相似文献   

5.
We present the development of a web server, a protein short motif search tool that allows users to simultaneously search for a protein sequence motif and its secondary structure assignments. The web server is able to query very short motifs searches against PDB structural data from the RCSB Protein Databank, with the users defining the type of secondary structures of the amino acids in the sequence motif. The output utilises 3D visualisation ability that highlights the position of the motif in the structure and on the corresponding sequence. Researchers can easily observe the locations and conformation of multiple motifs among the results. Protein short motif search also has an application programming interface (API) for interfacing with other bioinformatics tools. AVAILABILITY: The database is available for free at http://birg3.fbb.utm.my/proteinsms.  相似文献   

6.
MOTIVATION: Multi-domain proteins have evolved by insertions or deletions of distinct protein domains. Tracing the history of a certain domain combination can be important for functional annotation of multi-domain proteins, and for understanding the function of individual domains. In order to analyze the evolutionary history of the domains in modular proteins it is desirable to inspect a phylogenetic tree based on sequence divergence with the modular architecture of the sequences superimposed on the tree. RESULT: A Java applet, NIFAS, that integrates graphical domain schematics for each sequence in an evolutionary tree was developed. NIFAS retrieves domain information from the Pfam database and uses CLUSTAL W to calculate a tree for a given Pfam domain. The tree can be displayed with symbolic bootstrap values, and to allow the user to focus on a part of the tree, the layout can be altered by swapping nodes, changing the outgroup, and showing/collapsing subtrees. NIFAS is integrated with the Pfam database and is accessible over the internet (http://www.cgr.ki.se/Pfam). As an example, we use NIFAS to analyze the evolution of domains in Protein Kinases C.  相似文献   

7.
PSST-2.0     
PSST-2.0 (Protein Data Bank [PDB] Sequence Search Tool) is an updated version of the earlier PSST (Protein Sequence Search Tool), and the philosophy behind the search engine has remained unchanged. PSST-2.0 is a Web-based, interactive search engine developed to retrieve required protein or nucleic acid sequence information and some of its related details, primarily from sequences derived from the structures deposited in the PDB (the database of 3-dimensional [3-D] protein and nucleic acid structures). Additionally, the search engine works for a selected subset of 25% or 90% non-homologous protein chains. For some of the selected options, the search engine produces a detailed output for the user-uploaded, 3-D atomic coordinates of the protein structure (PDB file format) from the client machine through the Web browser. The search engine works on a locally maintained PDB, which is updated every week from the parent server at the Research Collaboratory for Structural Bioinformatics, and hence the search results are up to date at any given time. AVAILABILITY: PSST-2.0 is freely accessible via http://pranag.physics.iisc.ernet.in/psst/ or http://144.16.71.10/psst/.  相似文献   

8.
MOTIVATION: Modeling of protein interactions is often possible from known structures of related complexes. It is often time-consuming to find the most appropriate template. Hypothesized biological units (BUs) often differ from the asymmetric units and it is usually preferable to model from the BUs. RESULTS: ProtBuD is a database of BUs for all structures in the Protein Data Bank (PDB). We use both the PDBs BUs and those from the Protein Quaternary Server. ProtBuD is searchable by PDB entry, the Structural Classification of Proteins (SCOP) designation or pairs of SCOP designations. The database provides the asymmetric and BU contents of related proteins in the PDB as identified in SCOP and Position-Specific Iterated BLAST (PSI-BLAST). The asymmetric unit is different from PDB and/or Protein Quaternary Server (PQS) BUs for 52% of X-ray structures, and the PDB and PQS BUs disagree on 18% of entries. AVAILABILITY: The database is provided as a standalone program and a web server from http://dunbrack.fccc.edu/ProtBuD.php.  相似文献   

9.
The genomes of more than 100 species have been sequenced, and the biological functions of encoded proteins are now actively being researched. Protein function is based on interactions between proteins and other molecules. One approach to assuming protein function based on genomic sequence is to predict interactions between an encoded protein and other molecules. As a data source for such predictions, knowledge regarding known protein-small molecule interactions needs to be compiled. We have, therefore, surveyed interactions between proteins and other molecules in Protein Data Bank (PDB), the protein three-dimensional (3D) structure database. Among 20,685 entries in PDB (April, 2003), 4,189 types of small molecules were found to interact with proteins. Biologically relevant small molecules most often found in PDB were metal ions, such as calcium, zinc, and magnesium. Sugars and nucleotides were the next most common. These molecules are known to act as cofactors for enzymes and/or stabilizers of proteins. In each case of interactions between a protein and small molecule, we found preferred amino acid residues at the interaction sites. These preferences can be the basis for predicting protein function from genomic sequence and protein 3D structures. The data pertaining to these small molecules were collected in a database named Het-PDB Navi., which is freely available at http://daisy.nagahama-i-bio.ac.jp/golab/hetpdbnavi.html and linked to the official PDB home page.  相似文献   

10.
The rapidly increasing amount of information on three-dimensional (3D) structures of biological macro-molecules has still an insufficient impact on genome analysis, functional genomics and proteomics as well as on many other fields in biomedicine including disease-related research. There are, however, attempts to make structural data more easily accessible to the bench biologist. As members of the world-wide Protein Data Bank (wwPDB), the RCSB Protein Data Bank (PDB), the Protein Data Bank Japan and the Macromolecular Structure Database are the primary information resources for 3D structures of proteins, nucleic acids, carbohydrates and complexes thereof. In addition, a number of secondary resources have been set up that also provide information on all currently known structures in a relatively comprehensive manner and not focusing on specific features only. They include PDBsum, the OCA browser-database for protein structure/function, the Molecular Modeling Database and the Jena Library of Biological Macromolecules--JenaLib. Both the primary and secondary resources often merge the information in the PDB files with data from other resources and offer additional analysis tools thereby adding value to the original PDB data. Here, we briefly describe these resources from a user's point of view and from a comparative perspective. It is our aim to guide researchers outside the structure biology field in getting the most out of the 3D structure resources.  相似文献   

11.
PAS domains are widespread in archaea, bacteria, and eukaryota, and play important roles in various functions. In this study, we aim to explore functional evolutionary relationship among proteins in the PAS domain superfamily in view of the sequence‐structure‐dynamics‐function relationship. We collected protein sequences and crystal structure data from RCSB Protein Data Bank of the PAS domain superfamily belonging to three biological functions (nucleotide binding, photoreceptor activity, and transferase activity). Protein sequences were aligned and then used to select sequence‐conserved residues and build phylogenetic tree. Three‐dimensional structure alignment was also applied to obtain structure‐conserved residues. The protein dynamics were analyzed using elastic network model (ENM) and validated by molecular dynamics (MD) simulation. The result showed that the proteins with same function could be grouped by sequence similarity, and proteins in different functional groups displayed statistically significant difference in their vibrational patterns. Interestingly, in all three functional groups, conserved amino acid residues identified by sequence and structure conservation analysis generally have a lower fluctuation than other residues. In addition, the fluctuation of conserved residues in each biological function group was strongly correlated with the corresponding biological function. This research suggested a direct connection in which the protein sequences were related to various functions through structural dynamics. This is a new attempt to delineate functional evolution of proteins using the integrated information of sequence, structure, and dynamics.  相似文献   

12.
MOTIVATION: Assignment of putative protein functional annotation by comparative analysis using pre-defined experimental annotations is performed routinely by molecular biologists. The number and statistical significance of these assignments remains a challenge in this era of high-throughput proteomics. A combined statistical method that enables robust, automated protein annotation by reliably expanding existing annotation sets is described. An existing clustering scheme, based on relevant experimental information (e.g. sequence identity, keywords or gene expression data) is required. The method assigns new proteins to these clusters with a measure of reliability. It can also provide human reviewers with a reliability score for both new and previously classified proteins. RESULTS: A dataset of 27 000 annotated Protein Data Bank (PDB) polypeptide chains (of 36 000 chains currently in the PDB) was generated from 23 000 chains classified a priori. AVAILABILITY: PDB annotations and sample software implementation are freely accessible on the Web at http://pmr.sdsc.edu/go  相似文献   

13.
The Saccharomyces Genome Database (SGD: http://genome-www.stanford.edu/Saccharomyces/) has recently developed new resources to provide more complete information about proteins from the budding yeast Saccharomyces cerevisiae. The PDB Homologs page provides structural information from the Protein Data Bank (PDB) about yeast proteins and/or their homologs. SGD has also created a resource that utilizes the eMOTIF database for motif information about a given protein. A third new resource is the Protein Information page, which contains protein physical and chemical properties, such as molecular weight and hydropathicity scores, predicted from the translated ORF sequence.  相似文献   

14.
The PDBsum web server provides structural analyses of the entries in the Protein Data Bank (PDB). Two recent additions are described here. The first is the detailed analysis of the SARS‐CoV‐2 virus protein structures in the PDB. These include the variants of concern, which are shown both on the sequences and 3D structures of the proteins. The second addition is the inclusion of the available AlphaFold models for human proteins. The pages allow a search of the protein against existing structures in the PDB via the Sequence Annotated by Structure (SAS) server, so one can easily compare the predicted model against experimentally determined structures. The server is freely accessible to all at http://www.ebi.ac.uk/pdbsum.  相似文献   

15.
The analysis of disulphide bond containing proteins in the Protein Data Bank (PDB) revealed that out of 27,209 protein structures analyzed, 12,832 proteins contain at least one intra-chain disulphide bond and 811 proteins contain at least one inter-chain disulphide bond. The intra-chain disulphide bond containing proteins can be grouped into 256 categories based on the number of disulphide bonds and the disulphide bond connectivity patterns (DBCPs) that were generated according to the position of half-cystine residues along the protein chain. The PDB entries corresponding to these 256 categories represent 509 unique SCOP superfamilies. A simple web-based computational tool is made freely available at the website http://www.ccmb.res.in/bioinfo/dsbcp that allows flexible queries to be made on the database in order to retrieve useful information on the disulphide bond containing proteins in the PDB. The database is useful to identify the different SCOP superfamilies associated with a particular disulphide bond connectivity pattern or vice versa. It is possible to define a query based either on a single field or a combination of the following fields, i.e., PDB code, protein name, SCOP superfamily name, number of disulphide bonds, disulphide bond connectivity pattern and the number of amino acid residues in a protein chain and retrieve information that match the criterion. Thereby, the database may be useful to select suitable protein structural templates in order to model the more distantly related protein homologs/analogs using the comparative modeling methods.  相似文献   

16.
The current release of ProTherm, Thermodynamic Database for Proteins and Mutants, contains more than 10 000 numerical data (300% of the first version) of several thermodynamic parameters, experimental methods and conditions, reversibility of folding, details about the surrounding residues in space for all mutants, structural, functional and literature information. In the current version, we have added information about the source of each protein, identification codes for SWISS-PROT and Protein Information Resource and unique Protein Data Bank (PDB) code for proteins with relevant source. We have also provided additional options to search for data based on PDB code, number of states and reversibility. ProTherm is cross-linked with other sequence, structural, functional and literature databases, and the mutant sites and surrounding residues are automatically mapped on the structure. The ProTherm database is freely available at http://www.rtc.riken.go.jp/jouhou/protherm/protherm.html.  相似文献   

17.
Heat shock proteins (HSPs) are found in all living organisms, from bacteria to humans, are expressed under stress. In this study, characterization of two families of HSP including HSP60 and HSP70 protein was compared in different insect species from different orders. According to the conserved motifs analysis, none of the motifs were shared by all insects of two protein families but each family had their own common motifs. Functional and structural analyses were carried out on seven different insect species from each protein family as the representative samples. These analyses were performed via ExPASy database tools. The tertiary structure of Drosophila melanogater as the sample of each protein family were predicted by the Phyre2 and TM-score servers then their qualities were verified by SuperPose and PROCHECK. The tertiary structures were predicted through the “c4pj1E” model (PDB Accession Code: 4pj1) in HSP60 family and “c3d2fC” model (PDB Accession Code: 3d2f) in HSP70 family. The protein phylogenetic tree was constructed using the Neighbor-joining (NJ) method by Molecular Evolutionary Genetic Analysis (MEGA) 6.06. According to the results, there was a high identity of HSP60 and HSP70 families so that they should be derived from a common ancestor however they belonged to separate groups. In protein–protein interaction analysis by STRING 10.0, 10 common enriched pathways of biological process, molecular function and Kyoto Encyclopedia of Genes and Genomes (KEGG) were identified in D. melanogaster in both families. The obtained data provide a background for bioinformatic studies of the function and evolution of insects and other organisms.

Communicated by Ramaswamy H. Sarma  相似文献   


18.
The bias in protein structure and function space resulting from experimental limitations and targeting of particular functional classes of proteins by structural biologists has long been recognized, but never continuously quantified. Using the Enzyme Commission and the Gene Ontology classifications as a reference frame, and integrating structure data from the Protein Data Bank (PDB), target sequences from the structural genomics projects, structure homology derived from the SUPERFAMILY database, and genome annotations from Ensembl and NCBI, we provide a quantified view, both at the domain and whole-protein levels, of the current and projected coverage of protein structure and function space relative to the human genome. Protein structures currently provide at least one domain that covers 37% of the functional classes identified in the genome; whole structure coverage exists for 25% of the genome. If all the structural genomics targets were solved (twice the current number of structures in the PDB), it is estimated that structures of one domain would cover 69% of the functional classes identified and complete structure coverage would be 44%. Homology models from existing experimental structures extend the 37% coverage to 56% of the genome as single domains and 25% to 31% for complete structures. Coverage from homology models is not evenly distributed by protein family, reflecting differing degrees of sequence and structure divergence within families. While these data provide coverage, conversely, they also systematically highlight functional classes of proteins for which structures should be determined. Current key functional families without structure representation are highlighted here; updated information on the "most wanted list" that should be solved is available on a weekly basis from http://function.rcsb.org:8080/pdb/function_distribution/index.html.  相似文献   

19.
Kosloff M  Kolodny R 《Proteins》2008,71(2):891-902
It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which "redundant" structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-similar templates are equivalent. Here, we show that this assumption is often not correct and that standard approaches to create subsets of the PDB can lead to the loss of structurally and functionally important information. We have carried out sequence-based structural superpositions and geometry-based structural alignments of a large number of protein pairs to determine the extent to which sequence similarity ensures structural similarity. We find many examples where two proteins that are similar in sequence have structures that differ significantly from one another. The source of the structural differences usually has a functional basis. The number of such proteins pairs that are identified and the magnitude of the dissimilarity depend on the approach that is used to calculate the differences; in particular sequence-based structure superpositioning will identify a larger number of structurally dissimilar pairs than geometry-based structural alignments. When two sequences can be aligned in a statistically meaningful way, sequence-based structural superpositioning provides a meaningful measure of structural differences. This approach and geometry-based structure alignments reveal somewhat different information and one or the other might be preferable in a given application. Our results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information. We have established a data base of sequence-similar, structurally dissimilar protein pairs that will help address this problem (http://luna.bioc.columbia.edu/rachel/seqsimstrdiff.htm).  相似文献   

20.
PDB-REPRDB is a database of representative protein chains from the Protein Data Bank (PDB). Started at the Real World Computing Partnership (RWCP) in August 1997, it developed to the present system of PDB-REPRDB. In April 2001, the system was moved to the Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST) (http://www.cbrc.jp/); it is available at http://www.cbrc.jp/pdbreprdb/. The current database includes 33 368 protein chains from 16 682 PDB entries (1 September, 2002), from which are excluded (a) DNA and RNA data, (b) theoretically modeled data, (c) short chains (1<40 residues), or (d) data with non-standard amino acid residues at all residues. The number of entries including membrane protein structures in the PDB has increased rapidly with determination of numbers of membrane protein structures because of improved X-ray crystallography, NMR, and electron microscopic experimental techniques. Since many protein structure studies must address globular and membrane proteins separately, this new elimination factor, which excludes membrane protein chains, is introduced in the PDB-REPRDB system. Moreover, the PDB-REPRDB system for membrane protein chains begins at the same URL. The current membrane database includes 551 protein chains, including membrane domains in the SCOP database of release 1.59 (15 May, 2002).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号