首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Enlarged representative set of protein structures.   总被引:17,自引:13,他引:17       下载免费PDF全文
To reduce redundancy in the Protein Data Bank of 3D protein structures, which is caused by many homologous proteins in the data bank, we have selected a representative set of structures. The selection algorithm was designed to (1) select as many nonhomologous structures as possible, and (2) to select structures of good quality. The representative set may reduce time and effort in statistical analyses.  相似文献   

Huntley MA  Golding GB 《Proteins》2002,48(1):134-140
A simple sequence is abundant in the proteins that have been sequenced to date. But unusual protein features, such as a simple sequence, are not present in the same high frequency within structural databases. A subset of these simple sequences, a group with a highly repetitive nature has been shown to be abundant in eukaryotes but not in prokaryotes. In this study, an examination of the eukaryotic proteins in the Protein Data Bank (PDB) has revealed a large deficiency of low complexity, highly repetitive protein repeats. Through simulated databases of similar samples of eukaryotic proteins taken from the National Center for Biotechnology Information (NCBI) database, it is shown that the PDB contains a significantly less highly repetitive, simple sequence than artificial databases of similar composition randomly derived from NCBI. When the structural data for those few PDB sequences that did contain a highly repetitive simple sequence is examined in detail, it is found that in most cases the tertiary structure is unknown for the regions consisting of a simple sequence. This lack of a simple sequence both in the PDB database and in the structural information suggests that this type of simple sequence may produce disordered structures that make structural characterization difficult.  相似文献   

  1. Download : Download high-res image (145KB)
  2. Download : Download full-size image

We proposed recently an optimization method to derive energy parameters for simplified models of protein folding. The method is based on the maximization of the thermodynamic average of the overlap between protein native structures and a Boltzmann ensemble of alternative structures. Such a condition enforces protein models whose ground states are most similar to the corresponding native states. We present here an extensive testing of the method for a simple residue-residue contact energy function and for alternative structures generated by threading. The optimized energy function guarantees high stability and a well-correlated energy landscape to most representative structures in the PDB database. Failures in the recognition of the native structure can be attributed to the neglect of interactions between different chains in oligomeric proteins or with cofactors. When these are taken into account, only very few X-ray structures are not recognized. Most of them are short inhibitors or fragments and one is a structure that presents serious inconsistencies. Finally, we discuss the reasons that make NMR structures more difficult to recognizeCopyright 2001 Wiley-Liss, Inc.  相似文献   

A thermophilic Thermoactinomyces sp. E79 producing a highly thermostable alkaline protease was isolated from soil. The protease, produced extracellularly by Thermoactinomyces sp. E79, was purified by DEAE-Sepharose CL-6B and Butyl-Toyopearl 650M column chromatography. The relative molecular mass was estimated to be 31,000 by SDS–polyacrylamide gel electrophoresis. Enzyme activity was inhibited by phenylmethylsulfonyl fluoride, suggesting the enzyme to be a serine protease. The optimum temperature for the enzyme activity was 85°C, and about 50% of the original activity remained after incubation at 90°C for 10 min in the presence of Ca2 + . The optimum pH for the enzyme activity was 11.0 and the enzyme was fairly stable from pH 5.0 to 12.0. The gene for this thermostable alkaline protease was cloned in Escherichia coli and the expressed intracellular enzyme was activated by heat treatment. Sequence analysis showed an open reading frame of 1,152 base pairs, coding for a poiypeptide of 384 amino acids. The polypeptide was composed of a signal sequence (25 amino acids), a prosequence (81 amino acids), and a mature protein of 278 amino acids. The deduced amino acid sequence of the mature protease had high similarity with thermitase, a serine protease from Thermoactinomyces vulgaris, and the extent of sequence identity was 76%.  相似文献   

Protein Data Bank Japan (PDBj), a founding member of the worldwide Protein Data Bank (wwPDB) has accepted, processed and distributed experimentally determined biological macromolecular structures for 20 years. During that time, we have continuously made major improvements to our query search interface of PDBj Mine 2, the BMRBj web interface, and EM Navigator for PDB/BMRB/EMDB entries. PDBj also serves PDB‐related secondary database data, original web‐based modeling services such as Homology modeling of complex structure (HOMCOS), visualization services and utility tools, which we have continuously enhanced and expanded throughout the years. In addition, we have recently developed several unique archives, BSM‐Arc for computational structure models, and XRDa for raw X‐ray diffraction images, both of which promote open science in the structural biology community. During the COVID‐19 pandemic, PDBj has also started to provide feature pages for COVID‐19 related entries across all available archives at PDBj from raw experimental data and PDB structural data to computationally predicted models, while also providing COVID‐19 outreach content for high school students and teachers.  相似文献   

The beta hairpin motif is a ubiquitous protein structural motif that can be found in molecules across the tree of life. This motif, which is also popular in synthetically designed proteins and peptides, is known for its stability and adaptability to broad functions. Here, we systematically probe all 49,000 unique beta hairpin substructures contained within the Protein Data Bank (PDB) to uncover key characteristics correlated with stable beta hairpin structure, including amino acid biases and enriched interstrand contacts. We find that position specific amino acid preferences, while seen throughout the beta hairpin structure, are most evident within the turn region, where they depend on subtle turn dynamics associated with turn length and secondary structure. We also establish a set of broad design principles, such as the inclusion of aspartic acid residues at a specific position and the careful consideration of desired secondary structure when selecting residues for the turn region, that can be applied to the generation of libraries encoding proteins or peptides containing beta hairpin structures.  相似文献   

Structure comparisons of all representative proteins have been done. Employing the relative root mean square deviation (RMSD) from native enables the assessment of the statistical significance of structure alignments of different lengths in terms of a Z-score. Two conclusions emerge: first, proteins with their native fold can be distinguished by their Z-score. Second and somewhat surprising, all small proteins up to 100 residues in length have significant structure alignments to other proteins in a different secondary structure and fold class; i.e. 24.0% of them have 60% coverage by a template protein with a RMSD below 3.5 Å and 6.0% have 70% coverage. If the restriction that we align proteins only having different secondary structure types is removed, then in a representative benchmark set of proteins of 200 residues or smaller, 93% can be aligned to a single template structure (with average sequence identity of 9.8%), with a RMSD less than 4 Å, and 79% average coverage. In this sense, the current Protein Data Bank (PDB) is almost a covering set of small protein structures. The length of the aligned region (relative to the whole protein length) does not differ among the top hit proteins, indicating that protein structure space is highly dense. For larger proteins, non-related proteins can cover a significant portion of the structure. Moreover, these top hit proteins are aligned to different parts of the target protein, so that almost the entire molecule can be covered when combined. The number of proteins required to cover a target protein is very small, e.g. the top ten hit proteins can give 90% coverage below a RMSD of 3.5 Å for proteins up to 320 residues long. These results give a new view of the nature of protein structure space, and its implications for protein structure prediction are discussed.  相似文献   

SYNOPSIS. Cells of Euglena gracilis strain Z were extracted with trichloroacetic acid. Samples of gross cellular protein were hydrolyzed by a variety of reagents. Amino acids released by these procedures were analyzed and the overall composition of cell protein was quantitatively determined.  相似文献   

Kiwellin is a novel protein of 28 kDa isolated from kiwi (Actinidia chinensis) fruit. It is one of the three most abundant proteins present in the edible part of this fruit. Kiwellin has been purified by ion exchange chromatography. Its N-terminal amino acid sequence revealed high identity with that previously reported for a 28 kDa protein described as one of the most important kiwi allergens. This observation prompted us to fully characterize this protein. The complete primary structure, elucidated by direct sequencing, indicated that kiwellin is a cysteine-rich protein. Serological tests and Western Blotting analysis showed that kiwellin is specifically recognized by IgE of patients allergic to kiwi fruit. *The protein sequence data reported in this paper will appear in the Swiss-Prot and TrEMBL knowledgebase under the accessionnumber P84527.  相似文献   

以普通玉米掖单22和高油玉米高油115为材料,研究了不同供氮条件下玉米籽粒中蛋白质及其组分的含量、清蛋白和球蛋白含量、醇溶蛋白和谷蛋白含量、籽粒氨基酸总鼍以及氨基酸组分含量的品种差异。结果表明。氮素供应水平对两种类型玉米灌浆期间籽粒蛋白质含量变化作用相同,前期逐渐下降,至成熟期略有升高;籽粒清蛋白和球蛋白、醇溶蛋白和符蛋白含量变化动态各处理基本一致,两种类型玉米籽粒清蛋白含量随时间的推移逐渐降低。球蛋门含量的变化动态旱单峰曲线,峰值出现在授粉后30d。醇溶蛋白含量均呈“V”型变化,以授粉30d后最低。谷蛋白的含量则均呈上升趋势。氮素供应水平对两种类型玉米籽粒中各蛋白质组分禽量的变化的影响作用有所不同。对高油115籽粒中球蛋白含量的影响较小;施氮水平并不改变两种类型玉米籽粒氨基酸总量的变化趋势。但两种类型玉米籽粒中氨基酸组分的含量变化较大。  相似文献   

The classical procedure for nuclear magnetic resonance structure calculation allocates empirical distance ranges and uses historical values for weighting factors. However, Bayesian analysis suggests that there are more optimal choices for potential shape (bounds-free log-harmonic shape) and restraints weights. We compare the classical protocol with the Bayesian approach for more than 300 protein structures. We analyze the conformation similarity to the corresponding X-ray crystal structure, the distribution of the conformations around their average, and independent validation criteria. On average, the log-harmonic potential reduces the difference to the X-ray crystal structure. If the log-harmonic potential is used, the constant weighting tightens the distribution around the average conformation, with respect to the distributions obtained with Bayesian weighting. Conversely, the structure quality is improved by the Bayesian weighting over the classical procedure, whereas constant weighting worsens some criteria. The quality improvement obtained with the log-harmonic potential coupled to Bayesian weighting validates this approach on a representative set of protein structures.  相似文献   

With a growing number of structures available in the Brookhaven Protein Data Bank, automatic methods for domain identification are required for the construction of databases. Domains are considered to be clusters of secondary structure elements. Thus, helices and strands are first clustered using intersecondary structural distances between C alpha positions, and dendrograms based on this distance measure are used to identify domains. Individual domains are recognized by a disjoint factor, which enables the automatic identification and classification into disjoint, interacting, and conjoint domains. Application to a database of 83 protein families and 18 unique structures shows that the approach provides an effective delineation of boundaries and identifies those proteins that can be considered as a single domain. A quantitative estimate of the interaction between domains has been proposed. The database of protein domains is a useful tool for understanding protein folding, for recognizing protein folds, and for understanding structure-activity relationships.  相似文献   

PDB-REPRDB is a database of representative protein chains from the Protein Data Bank (PDB). The previous version of PDB-REPRDB provided 48 representative sets, whose similarity criteria were predetermined, on the WWW. The current version is designed so that the user may obtain a quick selection of representative chains from PDB. The selection of representative chains can be dynamically configured according to the user's requirement. The WWW interface provides a large degree of freedom in setting parameters, such as cut-off scores of sequence and structural similarity. One can obtain a representative list and classification data of protein chains from the system. The current database includes 20 457 protein chains from PDB entries (August 6, 2000). The system for PDB-REPRDB is available at the Parallel Protein Information Analysis system (PAPIA) WWW server (http://www.rwcp.or.jp/papia/).  相似文献   

This article describes the development and creation of the Protein Circular Dichroism Data Bank (PCDDB), a deposition and searchable data bank for validated circular dichroism spectra located at http://pcddb.cryst.bbk.ac.uk/.  相似文献   

猪伪狂犬病毒蛋白激酶基因的序列测定与分析   总被引:5,自引:0,他引:5  
对伪狂犬病毒湖北株(PRV HB株)蛋白激酶(PK)基因进行了克隆和序列测定。分析比较了该序列与PRVNIA-3株、Ka株以及HSV-1、VZV PK基因的同源性。结果显示,在测定全长1312bp的DNA序列中,包括着一个1002核苷酸的开放读框,可编码334个氨基酸组成的多肽。PRV-HB株PK与PRV-NIA3、PRV-Ka、HSV-1、VZV PK基因比较,核苷酸的同源性分别为98.7%、9  相似文献   

实验显示,一种氨基酸混合液(含异亮氨酸、甲硫氨酸和苯丙氨酸,添加浓度分别为1.0、0.5和2.0g/L)能显著提高自絮凝酵母——粟酒裂殖酵母和酿酒酵母融合株SPSC的耐酒精能力。实验将菌体分别培养于添加(试验组)和未添加(对照组)该氨基酸混合液的条件下,然后收集菌体进行酒精(20%,V/V)冲击试验(30℃,9h),结果,试验组的菌体尚有一半以上的存活细胞,而对照组的菌体全部死亡。通过对试验组和对照组的菌体细胞膜蛋白质氨基酸组成分析发现,试验组的菌体耐酒精能力提高与所添加氨基酸组入菌体的细胞膜密切相关。以DPH为荧光探针的细胞膜流动性测定分析进一步揭示,氨基酸组入菌体的细胞膜后,细胞膜能有效抵抗高浓度酒精冲击诱发的膜流动性的提高,从而维持膜的稳定。因此,实验首次揭示膜蛋白氨基酸组成可通过改变膜流动性而影响酵母菌的耐酒精能力。  相似文献   

Cerato-ulmin, a toxin produced by Ceratocystis ulmi, the causal agent of Dutch elm disease, has been characterized as a small protein (128 residues) with a MW of ca 13000. The protein has a high content of cystine, proline, leucine, serine and aspartic acid/asparagine; it is low in histidine, lysine, arginine, isoleucine, phenylalanine and tyrosine and does not contain cysteine, methionine, or tryptophan. The amino acid sequence of the N-terminal region is: H2N-Ala-Asp-Ser-Tyr-Asp-Pro-Cys-Thr-Gly-Leu-Leu-Gln-Lys-Ser-Pro-Gln-Cys-Cys-Asp-Thr-Asp-Ile-Leu-Gly-Val-Ser-Asp-Leu-Asp-Cys-. Toxic symptoms similar to those of Dutch elm disease can be elicited by cerato-ulmin in white elm shoot cuttings (Ulmus americana L.).  相似文献   

PDB-REPRDB is a database of representative protein chains from the Protein Data Bank (PDB). Started at the Real World Computing Partnership (RWCP) in August 1997, it developed to the present system of PDB-REPRDB. In April 2001, the system was moved to the Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST) (http://www.cbrc.jp/); it is available at http://www.cbrc.jp/pdbreprdb/. The current database includes 33 368 protein chains from 16 682 PDB entries (1 September, 2002), from which are excluded (a) DNA and RNA data, (b) theoretically modeled data, (c) short chains (1<40 residues), or (d) data with non-standard amino acid residues at all residues. The number of entries including membrane protein structures in the PDB has increased rapidly with determination of numbers of membrane protein structures because of improved X-ray crystallography, NMR, and electron microscopic experimental techniques. Since many protein structure studies must address globular and membrane proteins separately, this new elimination factor, which excludes membrane protein chains, is introduced in the PDB-REPRDB system. Moreover, the PDB-REPRDB system for membrane protein chains begins at the same URL. The current membrane database includes 551 protein chains, including membrane domains in the SCOP database of release 1.59 (15 May, 2002).  相似文献   

氨基酸组成聚类、蛋白质结构型和结构型的预测   总被引:11,自引:0,他引:11  
用信息聚类方法对蛋白质的氨基酸组成进行聚类,发现存在梯级成团(大集团分解成小集团)现象,645个蛋白质可分成15个小集团,每一个小集团与蛋白质二级结构含量决定的结构型有一定相关性,但与蛋白质五大结构型相关性不明显。指出了由氨基酸成分和二级结构含量预测结构型的方案中存在的问题。提出了由蛋白质二级结构序列预测蛋白质结构型的新方法,并给出了预测蛋白质结构型的简明预测规则  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号