首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The Protein Data Bank: unifying the archive   总被引:9,自引:3,他引:6       下载免费PDF全文
The Protein Data Bank (PDB; http://www.pdb.org/) is the single worldwide archive of structural data of biological macromolecules. This paper describes the progress that has been made in validating all data in the PDB archive and in releasing a uniform archive for the community. We have now produced a collection of mmCIF data files for the PDB archive (ftp://beta.rcsb.org/pub/pdb/uniformity/data/mmCIF/). A utility application that converts the mmCIF data files to the PDB format (called CIFTr) has also been released to provide support for existing software.  相似文献   

2.
The Protein Data Bank   总被引:183,自引:20,他引:163  
The Protein Data Bank (PDB; http://www.rcsb.org/pdb/ ) is the single worldwide archive of structural data of biological macromolecules. This paper describes the goals of the PDB, the systems in place for data deposition and access, how to obtain further information, and near-term plans for the future development of the resource.  相似文献   

3.
Analyses of publicly available structural data reveal interesting insights into the impact of the three‐dimensional (3D) structures of protein targets important for discovery of new drugs (e.g., G‐protein‐coupled receptors, voltage‐gated ion channels, ligand‐gated ion channels, transporters, and E3 ubiquitin ligases). The Protein Data Bank (PDB) archive currently holds > 155,000 atomic‐level 3D structures of biomolecules experimentally determined using crystallography, nuclear magnetic resonance spectroscopy, and electron microscopy. The PDB was established in 1971 as the first open‐access, digital‐data resource in biology, and is now managed by the Worldwide PDB partnership (wwPDB; wwPDB.org ). US PDB operations are the responsibility of the Research Collaboratory for Structural Bioinformatics PDB (RCSB PDB). The RCSB PDB serves millions of RCSB.org users worldwide by delivering PDB data integrated with ~40 external biodata resources, providing rich structural views of fundamental biology, biomedicine, and energy sciences. Recently published work showed that the PDB archival holdings facilitated discovery of ~90% of the 210 new drugs approved by the US Food and Drug Administration 2010–2016. We review user‐driven development of RCSB PDB services, examine growth of the PDB archive in terms of size and complexity, and present examples and opportunities for structure‐guided drug discovery for challenging targets (e.g., integral membrane proteins).  相似文献   

4.
The Protein Data Bank (PDB; http://www.rcsb.org/pdb/) is the single worldwide archive of structural data of biological macromolecules. This paper describes the data uniformity project that is underway to address the inconsistency in PDB data.  相似文献   

5.
We describe the role of the BioMagResBank (BMRB) within the Worldwide Protein Data Bank (wwPDB) and recent policies affecting the deposition of biomolecular NMR data. All PDB depositions of structures based on NMR data must now be accompanied by experimental restraints. A scheme has been devised that allows depositors to specify a representative structure and to define residues within that structure found experimentally to be largely unstructured. The BMRB now accepts coordinate sets representing three-dimensional structural models based on experimental NMR data of molecules of biological interest that fall outside the guidelines of the Protein Data Bank (i.e., the molecule is a peptide with 23 or fewer residues, a polynucleotide with 3 or fewer residues, a polysaccharide with 3 or fewer sugar residues, or a natural product), provided that the coordinates are accompanied by representation of the covalent structure of the molecule (atom connectivity), assigned NMR chemical shifts, and the structural restraints used in generating model. The BMRB now contains an archive of NMR data for metabolites and other small molecules found in biological systems.  相似文献   

6.
Whitmore L  Janes RW  Wallace BA 《Chirality》2006,18(6):426-429
The Protein Circular Dichroism Data Bank (PCDDB) is a new deposition data bank for validated circular dichroism spectra of biomacromolecules. Its aim is to be a resource for the structural biology and bioinformatics communities, providing open access and archiving facilities for circular dichroism and synchrotron radiation circular dichroism spectra. It is named in parallel with the Protein Data Bank (PDB), a long-existing valuable reference data bank for protein crystal and NMR structures. In this article, we discuss the design of the data bank structure and the deposition website located at http://pcddb.cryst.bbk.ac.uk. Our aim is to produce a flexible and comprehensive archive, which enables user-friendly spectral deposition and searching. In the case of a protein whose crystal structure and sequence are known, the PCDDB entry will be linked to the appropriate PDB and sequence data bank files, respectively. It is anticipated that the PCDDB will provide a readily accessible biophysical catalogue of information on folded proteins that may be of value in structural genomics programs, for quality control and archiving in industrial and academic labs, as a resource for programs developing spectroscopic structural analysis methods, and in bioinformatics studies.  相似文献   

7.
8.
The Protein Data Bank (PDB) is a widely used biological databaseof macromolecular structures with a long history. This historyis treated as lessons learned and is used to highlight whatare believed to be the best practices important to developersof biological databases today. While the focus is on data quality,data representation and the information technology to supportthese data, the non-data and technology issues cannot be ignored.The role of the human factor in the form of users, collaborators,scientific society and ad hoc committees is also included.   相似文献   

9.
Protein Data Bank (PDB) is a freely accessible archive of the 3-D structural data of biological molecules. Structure based studies offers a unique vantage point in inferring the properties of a protein molecule from structural data. This is too big a task to be done manually. Moreover, there is no single tool, software or server that comprehensively analyses all structure-based properties. The objective of the present work is to develop an offline computational toolkit, PDB@ containing in-built algorithms that help categorizing the structural properties of a protein molecule. The user has the facility to view and edit the PDB file to his need. Some features of the present work are unique in itself and others are an improvement over existing tools. Also, the representation of protein properties in both graphical and textual formats helps in predicting all the necessary details of a protein molecule on a single platform.  相似文献   

10.
A symposium celebrating the 40th anniversary of the Protein Data Bank archive (PDB), organized by the Worldwide Protein Data Bank, was held at Cold Spring Harbor Laboratory (CSHL) October 28-30, 2011. PDB40's distinguished speakers highlighted four decades of innovation in structural biology, from the early?era of structural determination to future directions for the field.  相似文献   

11.
Knowledge of the 3D structure of glycans is a prerequisite for a complete understanding of the biological processes glycoproteins are involved in. However, due to a lack of standardised nomenclature, carbohydrate compounds are difficult to locate within the Protein Data Bank (PDB). Using an algorithm that detects carbohydrate structures only requiring element types and atom coordinates, we were able to detect 1663 entries containing a total of 5647 carbohydrate chains. The majority of chains are found to be N-glycosidically bound. Noncovalently bound ligands are also frequent, while O-glycans form a minority. About 30% of all carbohydrate containing PDB entries comprise one or several errors. The automatic assignment of carbohydrate structures in PDB entries will improve the cross-linking of glycobiology resources with genomic and proteomic data collections, which will be an important issue of the upcoming glycomics projects. By aiding in detection of erroneous annotations and structures, the algorithm might also help to increase database quality.  相似文献   

12.
Despite the huge impact of data resources in genomics and structural biology, until now there has been no central archive for biological data for all imaging modalities. The BioImage Archive is a new data resource at the European Bioinformatics Institute (EMBL-EBI) designed to fill this gap. In its initial development BioImage Archive accepts bioimaging data associated with publications, in any format, from any imaging modality from the molecular to the organism scale, excluding medical imaging. The BioImage Archive will ensure reproducibility of published studies that derive results from image data and reduce duplication of effort. Most importantly, the BioImage Archive will help scientists to generate new insights through reuse of existing data to answer new biological questions, and provision of training, testing and benchmarking data for development of tools for image analysis. The archive is available at https://www.ebi.ac.uk/bioimage-archive/.  相似文献   

13.
The Protein Data Bank (PDB) is the worldwide repository of 3D structures of proteins, nucleic acids and complex assemblies. The PDB’s large corpus of data (> 100,000 structures) and related citations provide a well-organized and extensive test set for developing and understanding data citation and access metrics. In this paper, we present a systematic investigation of how authors cite PDB as a data repository. We describe a novel metric based on information cascade constructed by exploring the citation network to measure influence between competing works and apply that to analyze different data citation practices to PDB. Based on this new metric, we found that the original publication of RCSB PDB in the year 2000 continues to attract most citations though many follow-up updates were published. None of these follow-up publications by members of the wwPDB organization can compete with the original publication in terms of citations and influence. Meanwhile, authors increasingly choose to use URLs of PDB in the text instead of citing PDB papers, leading to disruption of the growth of the literature citations. A comparison of data usage statistics and paper citations shows that PDB Web access is highly correlated with URL mentions in the text. The results reveal the trend of how authors cite a biomedical data repository and may provide useful insight of how to measure the impact of a data repository.  相似文献   

14.
The Protein Data Bank (PDB) is the global archive for structural information on macromolecules, and a popular resource for researchers, teachers, and students, amassing more than one million unique users each year. Crystallographic structure models in the PDB (more than 100,000 entries) are optimized against the crystal diffraction data and geometrical restraints. This process of crystallographic refinement typically ignored hydrogen bond (H‐bond) distances as a source of information. However, H‐bond restraints can improve structures at low resolution where diffraction data are limited. To improve low‐resolution structure refinement, we present methods for deriving H‐bond information either globally from well‐refined high‐resolution structures from the PDB‐REDO databank, or specifically from on‐the‐fly constructed sets of homologous high‐resolution structures. Refinement incorporating HOmology DErived Restraints (HODER), improves geometrical quality and the fit to the diffraction data for many low‐resolution structures. To make these improvements readily available to the general public, we applied our new algorithms to all crystallographic structures in the PDB: using massively parallel computing, we constructed a new instance of the PDB‐REDO databank ( https://pdb-redo.eu ). This resource is useful for researchers to gain insight on individual structures, on specific protein families (as we demonstrate with examples), and on general features of protein structure using data mining approaches on a uniformly treated dataset.  相似文献   

15.
Experimental constraints associated with NMR structures are available from the Protein Data Bank (PDB) in the form of `Magnetic Resonance' (MR) files. These files contain multiple types of data concatenated without boundary markers and are difficult to use for further research. Reported here are the results of a project initiated to annotate, archive, and disseminate these data to the research community from a searchable resource in a uniform format. The MR files from a set of 1410 NMR structures were analyzed and their original constituent data blocks annotated as to data type using a semi-automated protocol. A new software program called Wattos was then used to parse and archive the data in a relational database. From the total number of MR file blocks annotated as constraints, it proved possible to parse 84% (3337/3975). The constraint lists that were parsed correspond to three data types (2511 distance, 788 dihedral angle, and 38 residual dipolar couplings lists) from the three most popular software packages used in NMR structure determination: XPLOR/CNS (2520 lists), DISCOVER (412 lists), and DYANA/DIANA (405 lists). These constraints were then mapped to a developmental version of the BioMagResBank (BMRB) data model. A total of 31 data types originating from 16 programs have been classified, with the NOE distance constraint being the most commonly observed. The results serve as a model for the development of standards for NMR constraint deposition in computer-readable form. The constraints are updated regularly and are available from the BMRB web site (http://www.bmrb.wisc.edu).  相似文献   

16.
Oldfield TJ 《Proteins》2002,49(4):510-528
The protein databank contains a vast wealth of structural and functional information. The analysis of this macromolecular information has been the subject of considerable work in order to advance knowledge beyond the collection of molecular coordinates. This article presents a method that determines local structural information within proteins using mathematical data mining techniques. The mine program described returns many known configurations of residues such as the catalytic triad, metal binding sites and the N-linked glycosylation site; as well as many other multiple residue interactions not previously categorized. Because mathematical constructs are used as targets, this method can identify new information not previously known, and also provide unbiased results of typical structure and their expected deviations. Because the results are defined mathematically, they cannot indicate the biological implications of the results. Therefore two support programs are described that provide insight into the biological context for the mine results. The first allows a weighted RMSD search between a template set of coordinates and a list of PDB files, and the second allows the labeling of a protein with the template results from mining to aid in the classification of this protein.  相似文献   

17.
MOTIVATION: Biological databases, with their rapidly expanding contents, are indispensable tools in the quest to understand more about biological function. However, a serious user of a database that comprises a large collection of data, collected over a long period, will likely be struck by the inconsistency in reporting individual items of data. This paper takes a critical look at the Protein Data Bank (PDB) to explore the seriousness of the problem in one particular data set and to explore the implications to those actively engaged in comparative analysis of these data. RESULTS: Averaged over the complete corpus, the stereochemical quality of atomic models has, in the past few years, moved towards ideal values. At the same time, there are inconsistencies in how data are reported. Water content is not reported consistently and the percent of data collected when reporting the high-resolution shell varies, detracting from the value of resolution as a yardstick for assessing the quality of a structure. A more detailed analysis of these inconsistencies is hampered by the lack of machine-readable experimental data. To the user of macromolecular structure data, this suggests that structural details beyond the standard quality measures of resolution and R value should be considered when using coordinate sets for further derivation or in inferring biological function. To the curators of the PDB, this suggests the need to capture more of the experimental data associated with the experiment in a way that permits straightforward parsing.  相似文献   

18.
With the accumulation of a large number and variety of molecules in the Protein Data Bank (PDB) comes the need on occasion to review and improve their representation. The Worldwide PDB (wwPDB) partners have periodically updated various aspects of structural data representation to improve the integrity and consistency of the archive. The remediation effort described here was focused on improving the representation of peptide‐like inhibitor and antibiotic molecules so that they can be easily identified and analyzed. Peptide‐like inhibitors or antibiotics were identified in over 1000 PDB entries, systematically reviewed and represented either as peptides with polymer sequence or as single components. For the majority of the single‐component molecules, their peptide‐like composition was captured in a new representation, called the subcomponent sequence. A novel concept called “group” was developed for representing complex peptide‐like antibiotics and inhibitors that are composed of multiple polymer and nonpolymer components. In addition, a reference dictionary was developed with detailed information about these peptide‐like molecules to aid in their annotation, identification and analysis. Based on the experience gained in this remediation, guidelines, procedures, and tools were developed to annotate new depositions containing peptide‐like inhibitors and antibiotics accurately and consistently. © 2013 Wiley Periodicals, Inc. Biopolymers 101: 659–668, 2014.  相似文献   

19.
PiQSi: protein quaternary structure investigation   总被引:1,自引:0,他引:1  
  相似文献   

20.
Structural data as collated in the Protein Data Bank (PDB) have been widely applied in the study and prediction of protein-protein interactions. However, since the basic PDB Entries contain only the contents of the asymmetric unit rather than the biological unit, some key interactions may be missed by analysing only the PDB Entry. A total of 69,054 SCOP (Structural Classification of Proteins) domains were examined systematically to identify the number of additional novel interacting domain pairs and interfaces found by considering the biological unit as stored in the PQS (Protein Quaternary Structure) database. The PQS data adds 25,965 interacting domain pairs to those seen in the PDB Entries to give a total of 61,783 redundant interacting domain pairs. Redundancy filtering at the level of the SCOP family shows PQS to increase the number of novel interacting domain-family pairs by 302 (13.3%) from 2277, but only 16/302 (1.4%) of the interacting domain pairs have the two domains in different SCOP families. This suggests the biological units add little to the elucidation of novel biological interaction networks. However, when the orientation of the domain pairs is considered, the PQS data increases the number of novel domain-domain interfaces observed by 1455 (34.5%) to give 5677 non-redundant domain-domain interfaces. In all, 162/1455 novel domain-domain interfaces are between domains from different families, an increase of 8.9% over the PDB Entries. Overall, the PQS biological units provide a rich source of novel domain-domain interfaces that are not seen in the studied PDB Entries, and so PQS domain-domain interaction data should be exploited wherever possible in the analysis and prediction of protein-protein interactions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号