首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Intensive growth in 3D structure data on DNA-protein complexes as reflected in the Protein Data Bank (PDB) demands new approaches to the annotation and characterization of these data and will lead to a new understanding of critical biological processes involving these data. These data and those from other protein structure classifications will become increasingly important for the modeling of complete proteomes. We propose a fully automated classification of DNA-binding protein domains based on existing 3D-structures from the PDB. The classification, by domain, relies on the Protein Domain Parser (PDP) and the Combinatorial Extension (CE) algorithm for structural alignment. The approach involves the analysis of 3D-interaction patterns in DNA-protein interfaces, assignment of structural domains interacting with DNA, clustering of domains based on structural similarity and DNA-interacting patterns. Comparison with existing resources on describing structural and functional classifications of DNA-binding proteins was used to validate and improve the approach proposed here. In the course of our study we defined a set of criteria and heuristics allowing us to automatically build a biologically meaningful classification and define classes of functionally related protein domains. It was shown that taking into consideration interactions between protein domains and DNA considerably improves the classification accuracy. Our approach provides a high-throughput and up-to-date annotation of DNA-binding protein families which can be found at http://spdc.sdsc.edu.  相似文献   

2.
An overview of the structures of protein-DNA complexes   总被引:1,自引:0,他引:1  
Luscombe NM  Austin SE  Berman HM  Thornton JM 《Genome biology》2000,1(1):reviews001.1-reviews00137
On the basis of a structural analysis of 240 protein-DNA complexes contained in the Protein Data Bank (PDB), we have classified the DNA-binding proteins involved into eight different structural/functional groups, which are further classified into 54 structural families. Here we present this classification and review the functions, structures and binding interactions of these protein-DNA complexes.  相似文献   

3.

Background

The major birch pollen allergen, Bet v 1, is a member of the ubiquitous PR-10 family of plant pathogenesis-related proteins. In recent years, a number of diverse plant proteins with low sequence similarity to Bet v 1 was identified. In addition, determination of the Bet v 1 structure revealed the existence of a large superfamily of structurally related proteins. In this study, we aimed to identify and classify all Bet v 1-related structures from the Protein Data Bank and all Bet v 1-related sequences from the Uniprot database.

Results

Structural comparisons of representative members of already known protein families structurally related to Bet v 1 with all entries of the Protein Data Bank yielded 47 structures with non-identical sequences. They were classified into eleven families, five of which were newly identified and not included in the Structural Classification of Proteins database release 1.71. The taxonomic distribution of these families extracted from the Pfam protein family database showed that members of the polyketide cyclase family and the activator of Hsp90 ATPase homologue 1 family were distributed among all three superkingdoms, while members of some bacterial families were confined to a small number of species. Comparison of ligand binding activities of Bet v 1-like superfamily members revealed that their functions were related to binding and metabolism of large, hydrophobic compounds such as lipids, hormones, and antibiotics. Phylogenetic relationships within the Bet v 1 family, defined as the group of proteins with significant sequence similarity to Bet v 1, were determined by aligning 264 Bet v 1-related sequences. A distance-based phylogenetic tree yielded a classification into 11 subfamilies, nine exclusively containing plant sequences and two subfamilies of bacterial proteins. Plant sequences included the pathogenesis-related proteins 10, the major latex proteins/ripening-related proteins subfamily, and polyketide cyclase-like sequences.

Conclusion

The ubiquitous distribution of Bet v 1-related proteins among all superkingdoms suggests that a Bet v 1-like protein was already present in the last universal common ancestor. During evolution, this protein diversified into numerous families with low sequence similarity but with a common fold that succeeded as a versatile scaffold for binding of bulky ligands.  相似文献   

4.
Garma L  Mukherjee S  Mitra P  Zhang Y 《PloS one》2012,7(6):e38913
"Protein quaternary structure universe" refers to the ensemble of all protein-protein complexes across all organisms in nature. The number of quaternary folds thus corresponds to the number of ways proteins physically interact with other proteins. This study focuses on answering two basic questions: Whether the number of protein-protein interactions is limited and, if yes, how many different quaternary folds exist in nature. By all-to-all sequence and structure comparisons, we grouped the protein complexes in the protein data bank (PDB) into 3,629 families and 1,761 folds. A statistical model was introduced to obtain the quantitative relation between the numbers of quaternary families and quaternary folds in nature. The total number of possible protein-protein interactions was estimated around 4,000, which indicates that the current protein repository contains only 42% of quaternary folds in nature and a full coverage needs approximately a quarter century of experimental effort. The results have important implications to the protein complex structural modeling and the structure genomics of protein-protein interactions.  相似文献   

5.
Most of the proteins in a cell assemble into complexes to carry out their function. It is therefore crucial to understand the physicochemical properties as well as the evolution of interactions between proteins. The Protein Data Bank represents an important source of information for such studies, because more than half of the structures are homo- or heteromeric protein complexes. Here we propose the first hierarchical classification of whole protein complexes of known 3-D structure, based on representing their fundamental structural features as a graph. This classification provides the first overview of all the complexes in the Protein Data Bank and allows nonredundant sets to be derived at different levels of detail. This reveals that between one-half and two-thirds of known structures are multimeric, depending on the level of redundancy accepted. We also analyse the structures in terms of the topological arrangement of their subunits and find that they form a small number of arrangements compared with all theoretically possible ones. This is because most complexes contain four subunits or less, and the large majority are homomeric. In addition, there is a strong tendency for symmetry in complexes, even for heteromeric complexes. Finally, through comparison of Biological Units in the Protein Data Bank with the Protein Quaternary Structure database, we identified many possible errors in quaternary structure assignments. Our classification, available as a database and Web server at http://www.3Dcomplex.org, will be a starting point for future work aimed at understanding the structure and evolution of protein complexes.  相似文献   

6.
7.
Viruses are the most abundant life form and infect practically all organisms. Consequently, these obligate parasites are a major cause of human suffering and economic loss. Rossmann‐like fold is the most populated fold among α/β‐folds in the Protein Data Bank and proteins containing Rossmann‐like fold constitute 22% of all known proteins 3D structures. Thus, analysis of viral proteins containing Rossmann‐like domains could provide an understanding of viral biology and evolution as well as could propose possible targets for antiviral therapy. We provide functional and evolutionary analysis of viral proteins containing a Rossmann‐like fold found in the evolutionary classification of protein domains (ECOD) database developed in our lab. We identified 81 protein families of bacterial, archeal, and eukaryotic viruses in light of their evolution‐based ECOD classification and Pfam taxonomy. We defined their functional significance using enzymatic EC number assignments as well as domain‐level family annotations.  相似文献   

8.
Mechanisms of interaction of DNA with nonhistone chromosomal protein HMGB1 and linker histone H1 have been studied by means of circular dichroism and absorption spectroscopy. Both proteins are located in the internucleosomal regions of chromatin. It is demonstrated that the properties of DNA-protein complexes depend on the protein content and cannot be considered as a mere summing up of the effects of individual protein components. Interaction of the HMGB1 and H1 proteins is shown with DNA to be cooperative rather than competitive. Lysine-rich histone H1 facilitates the binding of HMGB1 to DNA by screening the negatively charged groups of the sugar-phosphate backbone of DNA and dicarboxylic amino acid residues in the C-terminal domain of HMGB1. The observed joint action of HMGB1 and H1 stimulates DNA condensation with the formation of anisotropic DNA-protein complexes with typical ψ-type CD spectra. Structural organization of the complexes depends not only on DNA-protein interactions but also on interaction between the HMGB1 and H1 protein molecules bound to DNA. Manganese ions significantly modify the mode of interactions between components in the triple DNA-HMGB1-H1 complex. The binding of Mn2+ ions weakens DNA-protein interactions and strengthens protein-protein interactions, which promote DNA condensation and formation of large DNA-protein particles in solution.  相似文献   

9.
The mechanisms of interaction of the non-histone chromosomal protein HMGB1 and linker histone H1 with DNA have been studied using circular dichroism and absorption spectroscopy. Both of the proteins are located in the inter-nucleosomal regions of chromatin. It was demonstrated that properties of the DNA-protein complexes depend on the protein content and can not be considered as a simple summing up of the effects of individual protein components. Interaction of HMGB1 and H1 proteins is shown to be co-operative rather than competitive. Lysine-rich histone H1 facilitates the binding of the HMGB1 with DNA by screening the negatively charged groups of the sugar-phosphate backbone of DNA and dicarboxylic amino-acid residues in the C-terminal domain of the HMGB1 protein. The observed joint action of the and H1 proteins stimulates DNA condensation with formation of the anisotropic DNA-protein complexes with typical psi-type CD spectra. Structural organization of the complexes depends not only on the DNA-protein interactions, but also on the interaction between HMGB1 and H1 protein molecules bound to DNA. Manganese ions significantly modify the character of interactions between the components in the triple DNA-HMGB1-H1 complex. Binding of Mn2+ ions causes the weakening of the DNA-protein interactions and strengthening the protein-protein interactions, which promote DNA condensation and formation of large DNA-protein particles in solution.  相似文献   

10.
We extend our previous analysis of binding specificity of DNA-protein complexes to complexes containing water-mediated bridges. Inclusion of water bridges between phosphate and base, phosphate and sugar, as well as proteins and DNA, improves the prediction of specificity; six data sets studied in this paper yield correct predictions for all base pairs that have two or more hydrogen-bonds. Beside massive computation, our approach relies highly on experimental data. After deriving protein structures from DNA-protein complexes in which coordinates were established by X-ray diffraction techniques, we analysed all possible DNA sequences to which these proteins might bind, ranking them in terms of Lennard-Jones potential for the optimal docking configuration. Our prediction algorithm rests on the following assumptions: (1) specificity comes mainly from direct hydrogen bonding; (2) electrostatic forces stabilise DNA-protein complexes and contribute only weakly to specificity since they occur at the charged phosphate groups; (3) Van der Waals forces and electrostatic interactions between positively charged groups on the protein and phosphates on DNA can be neglected as they contribute primarily to the free energy of stabilisation as opposed to specificity.  相似文献   

11.
12.
With a growing number of structures available in the Brookhaven Protein Data Bank, automatic methods for domain identification are required for the construction of databases. Domains are considered to be clusters of secondary structure elements. Thus, helices and strands are first clustered using intersecondary structural distances between C alpha positions, and dendrograms based on this distance measure are used to identify domains. Individual domains are recognized by a disjoint factor, which enables the automatic identification and classification into disjoint, interacting, and conjoint domains. Application to a database of 83 protein families and 18 unique structures shows that the approach provides an effective delineation of boundaries and identifies those proteins that can be considered as a single domain. A quantitative estimate of the interaction between domains has been proposed. The database of protein domains is a useful tool for understanding protein folding, for recognizing protein folds, and for understanding structure-activity relationships.  相似文献   

13.
14.
The structures of DNA-protein complexes have illuminated the diversity of DNA-protein binding mechanisms shown by different protein families. This lack of generality could pose a great challenge for predicting DNA-protein interactions. To address this issue, we have developed a knowledge-based method, DNA-binding Domain Hunter (DBD-Hunter), for identifying DNA-binding proteins and associated binding sites. The method combines structural comparison and the evaluation of a statistical potential, which we derive to describe interactions between DNA base pairs and protein residues. We demonstrate that DBD-Hunter is an accurate method for predicting DNA-binding function of proteins, and that DNA-binding protein residues can be reliably inferred from the corresponding templates if identified. In benchmark tests on approximately 4000 proteins, our method achieved an accuracy of 98% and a precision of 84%, which significantly outperforms three previous methods. We further validate the method on DNA-binding protein structures determined in DNA-free (apo) state. We show that the accuracy of our method is only slightly affected on apo-structures compared to the performance on holo-structures cocrystallized with DNA. Finally, we apply the method to approximately 1700 structural genomics targets and predict that 37 targets with previously unknown function are likely to be DNA-binding proteins. DBD-Hunter is freely available at http://cssb.biology.gatech.edu/skolnick/webservice/DBD-Hunter/.  相似文献   

15.
We investigate the conservation of amino acid residue sequences in 21 DNA-binding protein families and study the effects that mutations have on DNA-sequence recognition. The observations are best understood by assigning each protein family to one of three classes: (i) non-specific, where binding is independent of DNA sequence; (ii) highly specific, where binding is specific and all members of the family target the same DNA sequence; and (iii) multi-specific, where binding is also specific, but individual family members target different DNA sequences. Overall, protein residues in contact with the DNA are better conserved than the rest of the protein surface, but there is a complex underlying trend of conservation for individual residue positions. Amino acid residues that interact with the DNA backbone are well conserved across all protein families and provide a core of stabilising contacts for homologous protein-DNA complexes. In contrast, amino acid residues that interact with DNA bases have variable levels of conservation depending on the family classification. In non-specific families, base-contacting residues are well conserved and interactions are always found in the minor groove where there is little discrimination between base types. In highly specific families, base-contacting residues are highly conserved and allow member proteins to recognise the same target sequence. In multi-specific families, base-contacting residues undergo frequent mutations and enable different proteins to recognise distinct target sequences. Finally, we report that interactions with bases in the target sequence often follow (though not always) a universal code of amino acid-base recognition and the effects of amino acid mutations can be most easily understood for these interactions.  相似文献   

16.
17.
18.
The open reading frames of human cytomegalovirus (human herpesvirus-5, HHV5) encode some 213 unique proteins with mostly unknown functions. Using the threading program, ProCeryon, we calculated possible matches between the amino acid sequences of these proteins and the Protein Data Bank library of three-dimensional structures. Thirty-six proteins were fully identified in terms of their structure and, often, function; 65 proteins were recognized as members of narrow structural/functional families (e.g. DNA-binding factors, cytokines, enzymes, signaling particles, cell surface receptors etc.); and 87 proteins were assigned to broad structural classes (e.g. all-beta, 3-layer-alphabetaalpha, multidomain, etc.). Genes encoding proteins with similar folds, or containing identical structural traits (extreme sequence length, runs of unstructured (Pro and/or Gly-rich) residues, transmembrane segments, etc.) often formed tandem clusters throughout the genome. In the course of this work, benchmarks on about 20 known folds were used to optimize adjustable parameters of threading calculations, i.e. gap penalty weights used in sequence/structure alignments; new scores obtained as simple combinations of existing scoring functions; and number of threading runs conducive to meaningful results. An introduction of summed, per-residue-normalized scores has been essential for discovery of subdomains (EGF-like, SH2, SH3) in longer protein sequences, such as the eight "open sandwich" cytokine domains, 60-70 amino acids long and having the 3beta1alpha fold with one or two disulfide bridges, present in otherwise unrelated proteins.  相似文献   

19.
20.
Abstract The primary structure of a novel adenoviral protein referred to as p32K and found exclusively in members of the proposed new genus Atadenovirus was analyzed. The p32K gene sequence was determined from two bovine and one snake adenovirus types. Altogether five different p32K sequences were examined, two of them were obtained from the Gene Bank. The C-terminal part of the protein is conserved and shares similarity with certain bacterial small acid soluble proteins (SASPs). The sequence similarity seems coupled with functional relatedness, i.e. both protein groups are found in structures where the genome of the “dormant” organism is packaged in tight nucleoprotein complexes. In these complexes the DNA is protected against harmful environmental effects until the new reproductive cycle is started with specific protease cleavage of the packaging proteins. Although there is no experimental clue about the role of the p32K proteins, we hypothesize phylogenetic relationship between the two protein groups based on the sequence similarity and the supposed functional similarity. The alignments of these protein groups shows that the conserved part of the p32Ks probably is the result of the duplication of a shorter sequence similar to the SASPs of the Bacilli.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号