首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The protein databank (PDB) contains high quality structural data for computational structural biology investigations. We have earlier described a fast tool (the decomp_pdb tool) for identifying and marking missing atoms and residues in PDB files. The tool also automatically decomposes PDB entries into separate files describing ligands and polypeptide chains. Here, we describe a web interface named DECOMP for the tool. Our program correctly identifies multi­monomer ligands, and the server also offers the preprocessed ligand­protein decomposition of the complete PDB for downloading (up to size: 5GB)

Availability

http://decomp.pitgroup.org  相似文献   

2.
Mapping PDB chains to UniProtKB entries   总被引:2,自引:0,他引:2  
MOTIVATION: UniProtKB/SwissProt is the main resource for detailed annotations of protein sequences. This database provides a jumping-off point to many other resources through the links it provides. Among others, these include other primary databases, secondary databases, the Gene Ontology and OMIM. While a large number of links are provided to Protein Data Bank (PDB) files, obtaining a regularly updated mapping between UniProtKB entries and PDB entries at the chain or residue level is not straightforward. In particular, there is no regularly updated resource which allows a UniProtKB/SwissProt entry to be identified for a given residue of a PDB file. RESULTS: We have created a completely automatically maintained database which maps PDB residues to residues in UniProtKB/SwissProt and UniProtKB/trEMBL entries. The protocol uses links from PDB to UniProtKB, from UniProtKB to PDB and a brute-force sequence scan to resolve PDB chains for which no annotated link is available. Finally the sequences from PDB and UniProtKB are aligned to obtain a residue-level mapping. AVAILABILITY: The resource may be queried interactively or downloaded from http://www.bioinf.org.uk/pdbsws/.  相似文献   

3.
Intrinsic disorder in the Protein Data Bank   总被引:2,自引:0,他引:2  
The Protein Data Bank (PDB) is the preeminent source of protein structural information. PDB contains over 32,500 experimentally determined 3-D structures solved using X-ray crystallography or nuclear magnetic resonance spectroscopy. Intrinsically disordered regions fail to form a fixed 3-D structure under physiological conditions. In this study, we compare the amino-acid sequences of proteins whose structures are determined by X-ray crystallography with the corresponding sequences from the Swiss-Prot database. The analyzed dataset includes 16,370 structures, which represent 18,101 PDB chains and 5,434 different proteins from 910 different organisms (2,793 eukaryotic, 2,109 bacterial, 288 viral, and 244 archaeal). In this dataset, on average, each Swiss-Prot protein is represented by 7 PDB chains with 76% of the crystallized regions being represented by more than one structure. Intriguingly, the complete sequences of only approximately 7% of proteins are observed in the corresponding PDB structures, and only approximately 25% of the total dataset have >95% of their lengths observed in the corresponding PDB structures. This suggests that the vast majority of PDB proteins is shorter than their corresponding Swiss-Prot sequences and/or contain numerous residues, which are not observed in maps of electron density. To determine the prevalence of disordered regions in PDB, the residues in the Swiss-Prot sequences were grouped into four general categories, "Observed" (which correspond to structured regions), "Not observed" (regions with missing electron density, potentially disordered), "Uncharacterized," and "Ambiguous," depending on their appearance in the corresponding PDB entries. This non-redundant set of residues can be viewed as a 'fragment' or empirical domain database that contains a set of experimentally determined structured regions or domains and a set of experimentally verified disordered regions or domains. We studied the propensities and properties of residues in these four categories and analyzed their relations to the predictions of disorder using several algorithms. "Non-observed," "Ambiguous," and "Uncharacterized" regions were shown to possess the amino acid compositional biases typical of intrinsically disordered proteins. The application of four different disorder predictors (PONDR(R) VL-XT, VL3-BA, VSL1P, and IUPred) revealed that the vast majority of residues in the "Observed" dataset are ordered, and that the "Not observed" regions are mostly disordered. The "Uncharacterized" regions possess some tendency toward order, whereas the predictions for the short "Ambiguous" regions are really ambiguous. Long "Ambiguous" regions (>70 amino acid residues) are mostly predicted to be ordered, suggesting that they are likely to be "wobbly" domains. Overall, we showed that completely ordered proteins are not highly abundant in PDB and many PDB sequences have disordered regions. In fact, in the analyzed dataset approximately 10% of the PDB proteins contain regions of consecutive missing or ambiguous residues longer than 30 amino-acids and approximately 40% of the proteins possess short regions (> or =10 and < 30 amino-acid long) of missing and ambiguous residues.  相似文献   

4.
PDB-REPRDB is a database of representative protein chains from the Protein Data Bank (PDB). Started at the Real World Computing Partnership (RWCP) in August 1997, it developed to the present system of PDB-REPRDB. In April 2001, the system was moved to the Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST) (http://www.cbrc.jp/); it is available at http://www.cbrc.jp/pdbreprdb/. The current database includes 33 368 protein chains from 16 682 PDB entries (1 September, 2002), from which are excluded (a) DNA and RNA data, (b) theoretically modeled data, (c) short chains (1<40 residues), or (d) data with non-standard amino acid residues at all residues. The number of entries including membrane protein structures in the PDB has increased rapidly with determination of numbers of membrane protein structures because of improved X-ray crystallography, NMR, and electron microscopic experimental techniques. Since many protein structure studies must address globular and membrane proteins separately, this new elimination factor, which excludes membrane protein chains, is introduced in the PDB-REPRDB system. Moreover, the PDB-REPRDB system for membrane protein chains begins at the same URL. The current membrane database includes 551 protein chains, including membrane domains in the SCOP database of release 1.59 (15 May, 2002).  相似文献   

5.
PDB-REPRDB is a database of representative protein chains from the Protein Data Bank (PDB). The previous version of PDB-REPRDB provided 48 representative sets, whose similarity criteria were predetermined, on the WWW. The current version is designed so that the user may obtain a quick selection of representative chains from PDB. The selection of representative chains can be dynamically configured according to the user's requirement. The WWW interface provides a large degree of freedom in setting parameters, such as cut-off scores of sequence and structural similarity. One can obtain a representative list and classification data of protein chains from the system. The current database includes 20 457 protein chains from PDB entries (August 6, 2000). The system for PDB-REPRDB is available at the Parallel Protein Information Analysis system (PAPIA) WWW server (http://www.rwcp.or.jp/papia/).  相似文献   

6.
We have updated the Protein Sequence-Structure Analysis Relational Database (PSSARD) first published in the Int. J. Biol. Macromol. 36 (2005) 259-262 corresponding to 1573 representative protein chains selected from the Protein Data Bank (PDB). In this, the updated and revised PSSARD (Version 2.0), we have included all proteins in the Protein Data Bank available at the time of developing this database including the NMR PDB entries. The current database corresponds to 22,752 XRAY PDB entries and 3977 NMR PDB entries and is separated accordingly in order to facilitate the appropriate database search. The representative protein chains can also be separately accessed within the current database. We have made a provision to combine more than one field to query the database and the results of any search can be used to carry out further nested searches using a combination of queries. We have provided hyperlinks to the individual PDB entries obtained as the result of any search in PSSARD in order to obtain additional details relevant to the protein structure. Certain applications useful to identify domains and structural motifs are discussed.  相似文献   

7.
Knowledge of the 3D structure of glycans is a prerequisite for a complete understanding of the biological processes glycoproteins are involved in. However, due to a lack of standardised nomenclature, carbohydrate compounds are difficult to locate within the Protein Data Bank (PDB). Using an algorithm that detects carbohydrate structures only requiring element types and atom coordinates, we were able to detect 1663 entries containing a total of 5647 carbohydrate chains. The majority of chains are found to be N-glycosidically bound. Noncovalently bound ligands are also frequent, while O-glycans form a minority. About 30% of all carbohydrate containing PDB entries comprise one or several errors. The automatic assignment of carbohydrate structures in PDB entries will improve the cross-linking of glycobiology resources with genomic and proteomic data collections, which will be an important issue of the upcoming glycomics projects. By aiding in detection of erroneous annotations and structures, the algorithm might also help to increase database quality.  相似文献   

8.
Dengler U  Siddiqui AS  Barton GJ 《Proteins》2001,42(3):332-344
The 3Dee database of domain definitions was developed as a comprehensive collection of domain definitions for all three-dimensional structures in the Protein Data Bank (PDB). The database includes definitions for complex, multiple-segment and multiple-chain domains as well as simple sequential domains, organized in a structural hierarchy. Two different snapshots of the 3Dee database were analyzed at September 1996 and November 1999. For the November 1999 release, 7,995 PDB entries contained 13,767 protein chains and gave rise to 18,896 domains. The domain sequences clustered into 1,715 domain sequence families, which were further clustered into a conservative 1,199 domain structure families (families with similar folds). The proportion of different domain structure families per domain sequence family increases from 84% for domains 1-100 residues long to 100% for domains greater than 600 residues. This is in keeping with the idea that longer chains will have more alternative folds available to them. Of the representative domains from the domain sequence families, 49% are in the range of 51-150 residues, whereas 64% of the representative chains over 200 residues have more than 1 domain. Of the representative chains, 8.5% are part of multichain domains. The largest multichain domain in the database has 14 chains and 1,400 residues, whereas the largest single-chain domain has 907 residues. The largest number of domains found in a protein is 13. The analysis shows that over the history of the PDB, new domain folds have been discovered at a slower rate than by random selection of all known folds. Between 1992 and 1997, a constant 1 in 11 new domains deposited in the PDB has shown no sequence similarity to a previously known domain sequence family, and only 1 in 15 new domain structures has had a fold that has not been seen previously. A comparison of the September 1996 release of 3Dee to the Structural Classification of Proteins (SCOP) showed that the domain definitions agreed for 80% of the representative protein chains. However, 3Dee provided explicit domain boundaries for more proteins. 3Dee is accessible on the World Wide Web at http://barton.ebi.ac.uk/servers/3Dee.html.  相似文献   

9.
Ligand binding may involve a wide range of structural changes in the receptor protein, from hinge movement of entire domains to small side-chain rearrangements in the binding pocket residues. The analysis of side chain flexibility gives insights valuable to improve docking algorithms and can provide an index of amino-acid side-chain flexibility potentially useful in molecular biology and protein engineering studies. In this study we analyzed side-chain rearrangements upon ligand binding. We constructed two non-redundant databases (980 and 353 entries) of "paired" protein structures in complexed (holo-protein) and uncomplexed (apo-protein) forms from the PDB macromolecular structural database. The number and identity of binding pocket residues that undergo side-chain conformational changes were determined. We show that, in general, only a small number of residues in the pocket undergo such changes (e.g., approximately 85% of cases show changes in three residues or less). The flexibility scale has the following order: Lys > Arg, Gln, Met > Glu, Ile, Leu > Asn, Thr, Val, Tyr, Ser, His, Asp > Cys, Trp, Phe; thus, Lys side chains in binding pockets flex 25 times more often then do the Phe side chains. Normalizing for the number of flexible dihedral bonds in each amino acid attenuates the scale somewhat, however, the clear trend of large, polar amino acids being more flexible in the pocket than aromatic ones remains. We found no correlation between backbone movement of a residue upon ligand binding and the flexibility of its side chain. These results are relevant to 1. Reduction of search space in docking algorithms by inclusion of side-chain flexibility for a limited number of binding pocket residues; and 2. Utilization of the amino acid flexibility scale in protein engineering studies to alter the flexibility of binding pockets.  相似文献   

10.
11.
Abstract

The Protein Data Bank (PDB) is the preeminent source of protein structural information. PDB contains over 32,500 experimentally determined 3-D structures solved using X-ray crystallography or nuclear magnetic resonance spectroscopy. Intrinsically disordered regions fail to form a fixed 3-D structure under physiological conditions. In this study, we compare the amino-acid sequences of proteins whose structures are determined by X-ray crystallography with the corresponding sequences from the Swiss-Prot database. The analyzed dataset includes 16,370 structures, which represent 18,101 PDB chains and 5,434 different proteins from 910 different organisms (2,793 eukaryotic, 2,109 bacterial, 288 viral, and 244 archaeal). In this dataset, on average, each Swiss-Prot protein is represented by 7 PDB chains with 76% of the crystallized regions being represented by more than one structure. Intriguingly, the complete sequences of only ~7% of proteins are observed in the corresponding PDB structures, and only ~25% of the total dataset have >95% of their lengths observed in the corresponding PDB structures. This suggests that the vast majority of PDB proteins is shorter than their corresponding Swiss-Prot sequences and/or contain numerous residues, which are not observed in maps of electron density. To determine the prevalence of disordered regions in PDB, the residues in the Swiss-Prot sequences were grouped into four general categories, “Observed” (which correspond to structured regions), “Not observed” (regions with missing electron density, potentially disordered), “Uncharacterized,” and “Ambiguous,” depending on their appearance in the corresponding PDB entries. This non-redundant set of residues can be viewed as a ‘fragment’ or empirical domain database that contains a set of experimentally determined structured regions or domains and a set of experimentally verified disordered regions or domains. We studied the propensities and properties of residues in these four categories and analyzed their relations to the predictions of disorder using several algorithms. “Non-observed,” “Ambiguous,” and “Uncharacterized” regions were shown to possess the amino acid compositional biases typical of intrinsically disordered proteins. The application of four different disorder predictors (PONDR® VL-XT, VL3-BA, VSL1P, and IUPred) revealed that the vast majority of residues in the “Observed” dataset are ordered, and that the “Not observed” regions are mostly disordered. The “Uncharacterized” regions possess some tendency toward order, whereas the predictions for the short “Ambiguous” regions are really ambiguous. Long “Ambiguous” regions (>70 amino acid residues) are mostly predicted to be ordered, suggesting that they are likely to be “wobbly” domains.

Overall, we showed that completely ordered proteins are not highly abundant in PDB and many PDB sequences have disordered regions. In fact, in the analyzed dataset ~10% of the PDB proteins contain regions of consecutive missing or ambiguous residues longer than 30 amino-acids and ~40% of the proteins possess short regions (≥10 and <30 amino-acid long) of missing and ambiguous residues.  相似文献   

12.
A pectic polysaccharide named silenan, [alpha]D20 +148.6 degrees (c 0.1; H2O), was isolated earlier from the aerial part of campion, Silene vulgaris (Moench) Garcke. Silenan has been shown to contain homogalacturonan segments as "smooth regions" and rhamnogalacturonan fragments as "hairy regions". The present study reveals a generalization of structural features of silenan. Silenan was subjected to enzymic digestion with pectinase, to Smith degradation, and to lithium-degradation to determine the conforming poly- and oligosaccharide fragments of "hairy regions" of silenan. The NMR-spectral data and mass-spectrometry confirmed that the core of the ramified region of silenan consisted of residues of alpha-rhamnopyranose 2-O-glycosylated with the residues of alpha-1,4-D-galactopyranosyl uronic acid. The part of the alpha-rhamnopyranose residues of the backbone are branched at O-4. On the basis of the data, the hairy regions of silenan proved to contain mainly linear chains of beta-1,3-, beta-1,4-, and beta-1,6-galactopyranan and alpha-1,5-arabinofuranan. The side chains of the ramified region were shown to have branching points represented 2,3-, 3,6-, 4,6-di-O-substituted beta-galactopyranose residues.  相似文献   

13.
We predicted gamma-turns from amino acid sequences using the first-order Markov chain theory and enlarged representative data sets corresponding to protein chains selected from the Protein Data Bank (PDB). The following data sets were used for training and deriving the probability values: (1) an initial data set containing 315 protein chains comprising 904 gamma-turns and (2) a later data set in order to include new entries in the PDB, containing 434 protein chains and comprising 1053 gamma-turns. By excluding 93 protein chains that were common to these two training data sets, we generated two mutually exclusive data sets containing 222 and 341 protein chains for testing our predictions. Applying amino acid probability values derived from training data sets on to testing data sets yielded overall prediction accuracies in the range 54-57%. We recommend the use of probability values derived from the data set comprising 315 protein chains that represents more gamma-turns and also provides better predictions.  相似文献   

14.
Intrinsically disordered regions serve as molecular recognition elements, which play an important role in the control of many cellular processes and signaling pathways. It is useful to be able to predict positions of disordered regions in protein chains. The statistical analysis of disordered residues was done considering 34,464 unique protein chains taken from the PDB database. In this database, 4.95% of residues are disordered (i.e. invisible in X-ray structures). The statistics were obtained separately for the N- and C-termini as well as for the central part of the protein chain. It has been shown that frequencies of occurrence of disordered residues of 20 types at the termini of protein chains differ from the ones in the middle part of the protein chain. Our systematic analysis of disordered regions in PDB revealed 109 disordered patterns of different lengths. Each of them has disordered occurrences in at least five protein chains with identity less than 20%. The vast majority of all occurrences of each disordered pattern are disordered. This allows one to use the library of disordered patterns for predicting the status of a residue of a given protein to be ordered or disordered. We analyzed the occurrence of the selected patterns in three eukaryotic and three bacterial proteomes.  相似文献   

15.
16.
Glucosylation of galactosylhydroxylysyl residues in various collagen polypeptide chains and in small peptides prepared from collagen was studied in vitro using collagen glucosyltransferase purified about 200 to 500-fold from extract prepared from chick embryos. When various denatured polypeptide or peptide chains were compared as substrates for the enzyme, no significant differences were found between citrate-soluble collagens from normal or lathyritic rats and isolated alpha1 and alpha2 chains. In contrast, gelatinized insoluble calf skin collagen, and peptides prepared from collagen and having an average molecular weight of about 500 were clearly less effective substrates as judged from their Km and V values. A marked difference was found between native and heat-denatured citrate-soluble collagen in that no synthesis of glucosylgalactosylhydroxylysine was observed with the native collagen when the reaction was studied at 30 degrees C with different times, enzyme concentrations, and substrate concentrations. When the reaction was studied as a function of temperature, little glucosylation of native collagen was observed below 37 degrees C, but there was a sharp transition in the rate of glucosylation of native collagen at temperatures above 37 degrees C, similar to that observable in the melting curve of collagen. The data suggest that triple-helical conformation of collagen prevents that glucosylation of galactosylhydroxylysyl residues.  相似文献   

17.
Lu Z  Dunaway-Mariano D  Allen KN 《Biochemistry》2005,44(24):8684-8696
The BT4131 gene from the bacterium Bacteroides thetaiotaomicron VPI-5482 has been cloned and overexpressed in Escherichia coli. The protein, a member of the haloalkanoate dehalogenase superfamily (subfamily IIB), was purified to homogeneity, and its X-ray crystal structure was determined to1.9 A resolution using the molecular replacement phasing method. BT4131 was shown by an extensive substrate screen to be a broad-range sugar phosphate phosphatase. On the basis of substrate specificity and gene context, the physiological function of BT4131 in chitin metabolism has been tentatively assigned. Comparison of the BT4131 structure alpha/beta cap domain structure with those of other type IIB enzymes (phosphoglycolate phosphatase, trehalose-6-phosphate phosphatase, and proteins of unknown function known as PDB entries , , and ) identified two conserved loops (BT4131 residues 172-182 and 118-130) in the alphabetabeta(alphabetaalphabeta)alphabetabeta type caps and one conserved loop in the alphabetabetaalphabetabeta type caps, which contribute residues for contact with the substrate leaving group. In BT4131, the two loops contribute one polar and two nonpolar residues to encase the displaced sugar. This finding is consistent with the lax specificity BT4131 has for the ring size and stereochemistry of the sugar phosphate. In contrast, substrate docking showed that the high-specificity phosphoglycolate phosphatase (PDB entry ) uses a single substrate specificity loop to position three polar residues for interaction with the glycolate leaving group. We show how active site "solvent cages" derived from analysis of the structures of the type IIB HAD phosphatases could be used in conjunction with the identity of the residues stationed along the cap domain substrate specificity loops, as a means of substrate identification.  相似文献   

18.
Intramolecular coupling of active sites in the pyruvate dehydrogenase multienzyme complexes of Escherichia coli, ox heart and Bacillus stearothermophilus was measured at various temperatures. As the temperature was raised, the extent of active-site coupling was found to increase, approaching a maximum near the physiological growth temperature of the organism. Under these conditions, a single pyruvate dehydrogenase (lipoamide) dimer appeared able to cause a rapid (20s) reductive acetylation of probably all 24 polypeptide chains in the dihydrolipoamide acetyltransferase core of the enzyme complex from E. coli at 37 degrees C, and of most if not all of the 60 polypeptide chains in the dihydrolipoamide acetyltransferase cores of the enzymes from ox heart and B. stearothermophilus at 37 degrees C and 60 degrees C respectively. Experiments designed to measure the inter-core and intra-core migration of enzyme subunits suggested that, in the bacterial enzymes at least, this was not a major contributor to active-site coupling.  相似文献   

19.
Pectin with [alpha]D(20) +192 degrees (c 0.1; water), named comaruman, was isolated from marsh cinquefoil Comarum palustre L., which is widespread in the European North. The sugar chain of comaruman contains residues of D-galacturonic acid (64%), D-galactose (13%), L-rhamnose (12%), L-arabinose (6%), and trace amounts of xylose and glucose. Partial acid hydrolysis and digestion with pectinase demonstrated that comaruman composed of the backbone comprised regions of linear alpha-1,4-D-galactopyranosyl uronan interconnected by numerous residues of alpha-1,2-L-rhamnopyranose. In addition to the backbone (core of the macromolecule), ramified regions are involved in comaruman and comprise alpha-2,4-L-rhamno-alpha-4-D-galacturonan with side chains consisting mainly of beta-1,4-linked residues of D-galactopyranose. The ramified region contains additionally residues of 5-O-substituted arabinofuranose and 3- and 6-O-substituted galactopyranose. The present 3,4- and 4,6-di-O-substituted residues of galactopyranose appear to be branching points of the side chains. Some galactopyranose residues were found to occupy the terminal positions of the side chains or appeared to be single sugar residues attached to the side chains. Methylation analysis data indicated that comaruman contains residues of terminal, 3- and 3,4-di-O-substituted galactopyranosyl uronic acid, which appeared to be constituents of the side chains, and the latter represented additionally branching points of the backbone.  相似文献   

20.
MOTIVATION: Data on both single nucleotide polymorphisms and disease-related mutations are being collected at ever-increasing rates. To understand the structural effects of missense mutations, we consider both classes under the term single amino acid polymorphisms (SAAPs) and we wish to map these to protein structure where their effects can be analyzed. Our initial aim therefore is to create a completely automatically maintained database of SAAPs mapped to individual residues in the Protein Data Bank (PDB) updated as new mutations or structures become available. RESULTS: We present an integrated pipeline for the automated mapping of SAAP data from HGVbase to individual PDB residues. Achieving this in a completely automated and reliable manner is a complex task. Data extracted from HGVbase are mapped to EMBL entries to confirm whether the mutation occurs in an exon and, if so, where in the sequence it occurs. From there we map to Swiss-Prot entries and thence to the PDB. AVAILABILITY: The resulting database may be accessed over the web at http://www.bioinf.org.uk/saap/ or http://acrmwww.biochem.ucl.ac.uk/saap/ CONTACT: a.martin@biochem.ucl.ac.uk.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号