首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
L Wernisch  M Hunting  S J Wodak 《Proteins》1999,35(3):338-352
A novel automatic procedure for identifying domains from protein atomic coordinates is presented. The procedure, termed STRUDL (STRUctural Domain Limits), does not take into account information on secondary structures and handles any number of domains made up of contiguous or non-contiguous chain segments. The core algorithm uses the Kernighan-Lin graph heuristic to partition the protein into residue sets which display minimum interactions between them. These interactions are deduced from the weighted Voronoi diagram. The generated partitions are accepted or rejected on the basis of optimized criteria, representing basic expected physical properties of structural domains. The graph heuristic approach is shown to be very effective, it approximates closely the exact solution provided by a branch and bound algorithm for a number of test proteins. In addition, the overall performance of STRUDL is assessed on a set of 787 representative proteins from the Protein Data Bank by comparison to domain definitions in the CATH protein classification. The domains assigned by STRUDL agree with the CATH assignments in at least 81% of the tested proteins. This result is comparable to that obtained previously using PUU (Holm and Sander, Proteins 1994;9:256-268), the only other available algorithm designed to identify domains with any number of non-contiguous chain segments. A detailed discussion of the structures for which our assignments differ from those in CATH brings to light some clear inconsistencies between the concept of structural domains based on minimizing inter-domain interactions and that of delimiting structural motifs that represent acceptable folding topologies or architectures. Considering both concepts as complementary and combining them in a layered approach might be the way forward.  相似文献   

2.
Tai CH  Sam V  Gibrat JF  Garnier J  Munson PJ  Lee B 《Proteins》2011,79(3):853-866
Domains are basic units of protein structure and essential for exploring protein fold space and structure evolution. With the structural genomics initiative, the number of protein structures in the Protein Databank (PDB) is increasing dramatically and domain assignments need to be done automatically. Most existing structural domain assignment programs define domains using the compactness of the domains and/or the number and strength of intra-domain versus inter-domain contacts. Here we present a different approach based on the recurrence of locally similar structural pieces (LSSPs) found by one-against-all structure comparisons with a dataset of 6373 protein chains from the PDB. Residues of the query protein are clustered using LSSPs via three different procedures to define domains. This approach gives results that are comparable to several existing programs that use geometrical and other structural information explicitly. Remarkably, most of the proteins that contribute the LSSPs defining a domain do not themselves contain the domain of interest. This study shows that domains can be defined by a collection of relatively small locally similar structural pieces containing, on average, four secondary structure elements. In addition, it indicates that domains are indeed made of recurrent small structural pieces that are used to build protein structures of many different folds as suggested by recent studies.  相似文献   

3.
An algorithm is presented for the fast and accurate definition of protein structural domains from coordinate data without prior knowledge of the number or type of domains. The algorithm explicitly locates domains that comprise one or two continuous segments of protein chain. Domains that include more than two segments are also located. The algorithm was applied to a nonredundant database of 230 protein structures and the results compared to domain definitions obtained from the literature, or by inspection of the coordinates on molecular graphics. For 70% of the proteins, the derived domains agree with the reference definitions, 18% show minor differences and only 12% (28 proteins) show very different definitions. Three screens were applied to identify the derived domains least likely to agree with the subjective definition set. These screens revealed a set of 173 proteins, 97% of which agree well with the subjective definitions. The algorithm represents a practical domain identification tool that can be run routinely on the entire structural database. Adjustment of parameters also allows smaller compact units to be identified in proteins.  相似文献   

4.
MOTIVATION: Although many methods are available for the identification of structural domains from protein three-dimensional structures, accurate definition of protein domains and the curation of such data for a large number of proteins are often possible only after manual intervention. The availability of domain definitions for protein structural entries is useful for the sequence analysis of aligned domains, structure comparison, fold recognition procedures and understanding protein folding, domain stability and flexibility. RESULTS: We have improved our method of domain identification starting from the concept of clustering secondary structural elements, but with an intention of reducing the number of discontinuous segments in identified domains. The results of our modified and automatic approach have been compared with the domain definitions from other databases. On a test data set of 55 proteins, this method acquires high agreement (88%) in the number of domains with the crystallographers' definition and resources such as SCOP, CATH, DALI, 3Dee and PDP databases. This method also obtains 98% overlap score with the other resources in the definition of domain boundaries of the 55 proteins. We have examined the domain arrangements of 4592 non-redundant protein chains using the improved method to include 5409 domains leading to an update of the structural domain database. AVAILABILITY: The latest version of the domain database and online domain identification methods are available from http://www.ncbs.res.in/~faculty/mini/ddbase/ddbase.html Supplementary information: http://www.ncbs.res.in/~faculty/mini/ddbase/supplementary/supplementary.html  相似文献   

5.
蛋白质结构与功能中的结构域   总被引:5,自引:1,他引:4  
结构域是蛋白质亚基结构中的紧密球状区域.结构域作为蛋白质结构中介于二级与三级结构之间的又一结构层次,在蛋白质中起着独立的结构单位、功能单位与折叠单位的作用.在复杂蛋白质中,结构域具有结构与功能组件与遗传单位的作用.结构域层次的研究将会促进蛋白质结构与功能关系、蛋白质折叠机制以及蛋白质设计的研究.  相似文献   

6.
In this article, we present a de novo method for predicting protein domain boundaries, called OPUS-Dom. The core of the method is a novel coarse-grained folding method, VECFOLD, which constructs low-resolution structural models from a target sequence by folding a chain of vectors representing the predicted secondary-structure elements. OPUS-Dom generates a large ensemble of folded structure decoys by VECFOLD and labels the domain boundaries of each decoy by a domain parsing algorithm. Consensus domain boundaries are then derived from the statistical distribution of the putative boundaries and three empirical sequence-based domain profiles. OPUS-Dom generally outperformed several state-of-the-art domain prediction algorithms over various benchmark protein sets. Even though each VECFOLD-generated structure contains large errors, collectively these structures provide a more robust delineation of domain boundaries. The success of OPUS-Dom suggests that the arrangement of protein domains is more a consequence of limited coordination patterns per domain arising from tertiary packing of secondary-structure segments, rather than sequence-specific constraints.  相似文献   

7.
Intensive growth in 3D structure data on DNA-protein complexes as reflected in the Protein Data Bank (PDB) demands new approaches to the annotation and characterization of these data and will lead to a new understanding of critical biological processes involving these data. These data and those from other protein structure classifications will become increasingly important for the modeling of complete proteomes. We propose a fully automated classification of DNA-binding protein domains based on existing 3D-structures from the PDB. The classification, by domain, relies on the Protein Domain Parser (PDP) and the Combinatorial Extension (CE) algorithm for structural alignment. The approach involves the analysis of 3D-interaction patterns in DNA-protein interfaces, assignment of structural domains interacting with DNA, clustering of domains based on structural similarity and DNA-interacting patterns. Comparison with existing resources on describing structural and functional classifications of DNA-binding proteins was used to validate and improve the approach proposed here. In the course of our study we defined a set of criteria and heuristics allowing us to automatically build a biologically meaningful classification and define classes of functionally related protein domains. It was shown that taking into consideration interactions between protein domains and DNA considerably improves the classification accuracy. Our approach provides a high-throughput and up-to-date annotation of DNA-binding protein families which can be found at http://spdc.sdsc.edu.  相似文献   

8.
With a growing number of structures available in the Brookhaven Protein Data Bank, automatic methods for domain identification are required for the construction of databases. Domains are considered to be clusters of secondary structure elements. Thus, helices and strands are first clustered using intersecondary structural distances between C alpha positions, and dendrograms based on this distance measure are used to identify domains. Individual domains are recognized by a disjoint factor, which enables the automatic identification and classification into disjoint, interacting, and conjoint domains. Application to a database of 83 protein families and 18 unique structures shows that the approach provides an effective delineation of boundaries and identifies those proteins that can be considered as a single domain. A quantitative estimate of the interaction between domains has been proposed. The database of protein domains is a useful tool for understanding protein folding, for recognizing protein folds, and for understanding structure-activity relationships.  相似文献   

9.
Guo JT  Xu D  Kim D  Xu Y 《Nucleic acids research》2003,31(3):944-952
Structural domains are considered as the basic units of protein folding, evolution, function and design. Automatic decomposition of protein structures into structural domains, though after many years of investigation, remains a challenging and unsolved problem. Manual inspection still plays a key role in domain decomposition of a protein structure. We have previously developed a computer program, DomainParser, using network flow algorithms. The algorithm partitions a protein structure into domains accurately when the number of domains to be partitioned is known. However the performance drops when this number is unclear (the overall performance is 74.5% over a set of 1317 protein chains). Through utilization of various types of structural information including hydrophobic moment profile, we have developed an effective method for assessing the most probable number of domains a structure may have. The core of this method is a neural network, which is trained to discriminate correctly partitioned domains from incorrectly partitioned domains. When compared with the manual decomposition results given in the SCOP database, our new algorithm achieves higher decomposition accuracy (81.9%) on the same data set.  相似文献   

10.
Structural genomic projects envision almost routine protein structure determinations, which are currently imaginable only for small proteins with molecular weights below 25,000 Da. For larger proteins, structural insight can be obtained by breaking them into small segments of amino acid sequences that can fold into native structures, even when isolated from the rest of the protein. Such segments are autonomously folding units (AFU) and have sizes suitable for fast structural analyses. Here, we propose to expand an intuitive procedure often employed for identifying biologically important domains to an automatic method for detecting putative folded protein fragments. The procedure is based on the recognition that large proteins can be regarded as a combination of independent domains conserved among diverse organisms. We thus have developed a program that reorganizes the output of BLAST searches and detects regions with a large number of similar sequences. To automate the detection process, it is reduced to a simple geometrical problem of recognizing rectangular shaped elevations in a graph that plots the number of similar sequences at each residue of a query sequence. We used our program to quantitatively corroborate the premise that segments with conserved sequences correspond to domains that fold into native structures. We applied our program to a test data set composed of 99 amino acid sequences containing 150 segments with structures listed in the Protein Data Bank, and thus known to fold into native structures. Overall, the fragments identified by our program have an almost 50% probability of forming a native structure, and comparable results are observed with sequences containing domain linkers classified in SCOP. Furthermore, we verified that our program identifies AFU in libraries from various organisms, and we found a significant number of AFU candidates for structural analysis, covering an estimated 5 to 20% of the genomic databases. Altogether, these results argue that methods based on sequence similarity can be useful for dissecting large proteins into small autonomously folding domains, and such methods may provide an efficient support to structural genomics projects.  相似文献   

11.
Protein domains are conspicuous structural units in globular proteins, and their identification has been a topic of intense biochemical interest dating back to the earlier crystal structures. Numerous disparate domain identification algorithms have been proposed, all involving some combination of visual intuition and/or structure-based decomposition. Instead, we present a rigorous thermodynamically based approach that redefines domains as cooperative chain segments. In greater detail, most small proteins fold with high cooperativity, meaning that the equilibrium population is dominated by completely folded and unfolded molecules, with a negligible subpopulation of partially folded intermediates. Here, domains are equated to chain segments that retain full cooperativity when excised from their parent structures. Implementing this approach computationally, the domains in a large representative set of proteins were identified; all exhibit consistency with experimental findings. Our reframed interpretation of a protein domain transforms an indeterminate structural phenomenon into a quantifiable molecular property, grounded in solution thermodynamics.  相似文献   

12.
Protein structural annotation and classification is an important and challenging problem in bioinformatics. Research towards analysis of sequence-structure correspondences is critical for better understanding of a protein's structure, function, and its interaction with other molecules. Clustering of protein domains based on their structural similarities provides valuable information for protein classification schemes. In this article, we attempt to determine whether structure information alone is sufficient to adequately classify protein structures. We present an algorithm that identifies regions of structural similarity within a given set of protein structures, and uses those regions for clustering. In our approach, called STRALCP (STRucture ALignment-based Clustering of Proteins), we generate detailed information about global and local similarities between pairs of protein structures, identify fragments (spans) that are structurally conserved among proteins, and use these spans to group the structures accordingly. We also provide a web server at http://as2ts.llnl.gov/AS2TS/STRALCP/ for selecting protein structures, calculating structurally conserved regions and performing automated clustering.  相似文献   

13.
Domains are the main structural and functional units of larger proteins. They tend to be contiguous in primary structure and can fold and function independently. It has been observed that 10–20% of all encoded proteins contain duplicated domains and the average pairwise sequence identity between them is usually low. In the present study, we have analyzed the structural similarity between domain repeats of proteins with known structures available in the Protein Data Bank using structure-based inter-residue interaction measures such as the number of long-range contacts, surrounding hydrophobicity, and pairwise interaction energy. We used RADAR program for detecting the repeats in a protein sequence which were further validated using Pfam domain assignments. The sequence identity between the repeats in domains ranges from 20 to 40% and their secondary structural elements are well conserved. The number of long-range contacts, surrounding hydrophobicity calculations and pairwise interaction energy of the domain repeats clearly reveal the conservation of 3-D structure environment in the repeats of domains. The proportions of mainchain–mainchain hydrogen bonds and hydrophobic interactions are also highly conserved between the repeats. The present study has suggested that the computation of these structure-based parameters will give better clues about the tertiary environment of the repeats in domains. The folding rates of individual domains in the repeats predicted using the long-range order parameter indicate that the predicted folding rates correlate well with most of the experimentally observed folding rates for the analyzed independently folded domains.  相似文献   

14.
15.
Structural classification of zinc fingers: survey and summary   总被引:1,自引:0,他引:1  
  相似文献   

16.
Fbxo7 and PI31 contain a conserved FP domain that mediates the homo-/hetero-dimerization of the proteins. The PI31 FP domain may also interact with the F-box motif in Fbxo7. The FP domain-mediated protein–protein interactions are important for the functions of Fbxo7 and PI31. The crystal structures of the Fbxo7 and PI31 FP domains were determined previously, showing that a C-terminal helix in the Fbxo7 FP domain was not present in the PI31 FP domain. Here, we determine the crystal structure of the PI31 FP domain using a longer protein construct. The structure is comparable to the Fbxo7 FP domain (including the C-terminal helix), indicating that the two FP domains share the same global fold. However, the FP domains also harbor their own characteristic structural features, mainly in the longest loop (which has a largely fixed conformation due to extensive hydrogen bonding and hydrophobic interactions) and the C-terminal end regions. The crystal structures also reveal fundamental differences in the modes of protein–protein interactions mediated by the two FP domains: the PI31 FP domain utilizes either an α interface or β interface for homodimeric interaction, whereas the Fbxo7 FP domain utilizes an αβ interface. We perform modeling studies to show that the domain-specific structural features may dictate specific modes of inter-domain interactions. We propose that a heterodimeric interaction would be mediated by an αβ interface consisting of the α-helical and β-sheet surfaces of the Fbxo7 and PI31 FP domains, respectively. We also discuss the structural/functional significance of various modes of FP domain-mediated protein–protein interactions.  相似文献   

17.
Structural genomics projects require strategies for rapidly recognizing protein sequences appropriate for routine structure determination. For large proteins, this strategy includes the dissection of proteins into structural domains that form stable native structures. However, protein dissection essentially remains an empirical and often a tedious process. Here, we describe a simple strategy for rapidly identifying structural domains and assessing their structures. This approach combines the computational prediction of sequence regions corresponding to putative domains with an experimental assessment of their structures and stabilities by NMR and biochemical methods. We tested this approach with nine putative domains predicted from a set of 108 Thermus thermophilus HB8 sequences using PASS, a domain prediction program we previously reported. To facilitate the experimental assessment of the domain structures, we developed a generic 6-hour His-tag-based purification protocol, which enables the sample quality evaluation of a putative structural domain in a single day. As a result, we observed that half of the predicted structural domains were indeed natively folded, as judged by their HSQC spectra. Furthermore, two of the natively folded domains were novel, without related sequences classified in the Pfam and SMART databases, which is a significant result with regard to the ability of structural genomics projects to uniformly cover the protein fold space.  相似文献   

18.
Dengler U  Siddiqui AS  Barton GJ 《Proteins》2001,42(3):332-344
The 3Dee database of domain definitions was developed as a comprehensive collection of domain definitions for all three-dimensional structures in the Protein Data Bank (PDB). The database includes definitions for complex, multiple-segment and multiple-chain domains as well as simple sequential domains, organized in a structural hierarchy. Two different snapshots of the 3Dee database were analyzed at September 1996 and November 1999. For the November 1999 release, 7,995 PDB entries contained 13,767 protein chains and gave rise to 18,896 domains. The domain sequences clustered into 1,715 domain sequence families, which were further clustered into a conservative 1,199 domain structure families (families with similar folds). The proportion of different domain structure families per domain sequence family increases from 84% for domains 1-100 residues long to 100% for domains greater than 600 residues. This is in keeping with the idea that longer chains will have more alternative folds available to them. Of the representative domains from the domain sequence families, 49% are in the range of 51-150 residues, whereas 64% of the representative chains over 200 residues have more than 1 domain. Of the representative chains, 8.5% are part of multichain domains. The largest multichain domain in the database has 14 chains and 1,400 residues, whereas the largest single-chain domain has 907 residues. The largest number of domains found in a protein is 13. The analysis shows that over the history of the PDB, new domain folds have been discovered at a slower rate than by random selection of all known folds. Between 1992 and 1997, a constant 1 in 11 new domains deposited in the PDB has shown no sequence similarity to a previously known domain sequence family, and only 1 in 15 new domain structures has had a fold that has not been seen previously. A comparison of the September 1996 release of 3Dee to the Structural Classification of Proteins (SCOP) showed that the domain definitions agreed for 80% of the representative protein chains. However, 3Dee provided explicit domain boundaries for more proteins. 3Dee is accessible on the World Wide Web at http://barton.ebi.ac.uk/servers/3Dee.html.  相似文献   

19.
Li CH  Ma XH  Chen WZ  Wang CX 《Proteins》2003,52(1):47-50
An efficient soft docking algorithm is described for predicting the mode of binding between an antibody and its antigen based on the three-dimensional structures of the molecules. The basic tools are the "simplified protein" model and the docking algorithm of Wodak and Janin. The side-chain flexibility of Arg, Lys, Asp, Glu, and Met residues on the protein surface is taken into account. A combined filtering technique is used to select candidate binding modes. After energy minimization, we calculate a scoring function, which includes electrostatic and desolvation energy terms. This procedure was applied to targets 04, 05, and 06 of CAPRI, which are complexes of three different camelid antibody VHH variable domains with pig alpha-amylase. For target 06, two native-like structures with a root-mean-square deviation < 4.0 A relative to the X-ray structure were found within the five top ranking structures. For targets 04 and 05, our procedure produced models where more than half of the antigen residues forming the epitope were correctly predicted, albeit with a wrong VHH domain orientation. Thus, our soft docking algorithm is a promising tool for predicting antibody-antigen recognition.  相似文献   

20.
Young MM  Skillman AG  Kuntz ID 《Proteins》1999,34(3):317-332
We have developed an automatic protein fingerprinting method for the evaluation of protein structural similarities based on secondary structure element compositions, spatial arrangements, lengths, and topologies. This method can rapidly identify proteins sharing structural homologies as we demonstrate with five test cases: the globins, the mammalian trypsinlike serine proteases, the immunoglobulins, the cupredoxins, and the actinlike ATPase domain-containing proteins. Principal components analysis of the similarity distance matrix calculated from an all-by-all comparison of 1,031 unique chains in the Protein Data Bank has produced a distribution of structures within a high-dimensional structural space. Fifty percent of the variance observed for this distribution is bounded by six axes, two of which encode structural variability within two large families, the immunoglobulins and the trypsinlike serine proteases. Many aspects of the spatial distribution remain stable upon reduction of the database to 140 proteins with minimal family overlap. The axes correlated with specific structural families are no longer observed. A clear hierarchy of organization is seen in the arrangement of protein structures in the universe. At the highest level, protein structures populate regions corresponding to the all-alpha, all-beta, and alpha/beta superfamilies. Large protein families are arranged along family-specific axes, forming local densely populated regions within the space. The lowest level of organization is intrafamilial; homologous structures are ordered by variations in peripheral secondary structure elements or by conformational shifts in the tertiary structure.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号