首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Tai CH  Sam V  Gibrat JF  Garnier J  Munson PJ  Lee B 《Proteins》2011,79(3):853-866
Domains are basic units of protein structure and essential for exploring protein fold space and structure evolution. With the structural genomics initiative, the number of protein structures in the Protein Databank (PDB) is increasing dramatically and domain assignments need to be done automatically. Most existing structural domain assignment programs define domains using the compactness of the domains and/or the number and strength of intra-domain versus inter-domain contacts. Here we present a different approach based on the recurrence of locally similar structural pieces (LSSPs) found by one-against-all structure comparisons with a dataset of 6373 protein chains from the PDB. Residues of the query protein are clustered using LSSPs via three different procedures to define domains. This approach gives results that are comparable to several existing programs that use geometrical and other structural information explicitly. Remarkably, most of the proteins that contribute the LSSPs defining a domain do not themselves contain the domain of interest. This study shows that domains can be defined by a collection of relatively small locally similar structural pieces containing, on average, four secondary structure elements. In addition, it indicates that domains are indeed made of recurrent small structural pieces that are used to build protein structures of many different folds as suggested by recent studies.  相似文献   

2.
Intensive growth in 3D structure data on DNA-protein complexes as reflected in the Protein Data Bank (PDB) demands new approaches to the annotation and characterization of these data and will lead to a new understanding of critical biological processes involving these data. These data and those from other protein structure classifications will become increasingly important for the modeling of complete proteomes. We propose a fully automated classification of DNA-binding protein domains based on existing 3D-structures from the PDB. The classification, by domain, relies on the Protein Domain Parser (PDP) and the Combinatorial Extension (CE) algorithm for structural alignment. The approach involves the analysis of 3D-interaction patterns in DNA-protein interfaces, assignment of structural domains interacting with DNA, clustering of domains based on structural similarity and DNA-interacting patterns. Comparison with existing resources on describing structural and functional classifications of DNA-binding proteins was used to validate and improve the approach proposed here. In the course of our study we defined a set of criteria and heuristics allowing us to automatically build a biologically meaningful classification and define classes of functionally related protein domains. It was shown that taking into consideration interactions between protein domains and DNA considerably improves the classification accuracy. Our approach provides a high-throughput and up-to-date annotation of DNA-binding protein families which can be found at http://spdc.sdsc.edu.  相似文献   

3.
The CATH database of protein domain structures (http://www.biochem.ucl.ac.uk/bsm/cath_new) currently contains 34 287 domain structures classified into 1383 superfamilies and 3285 sequence families. Each structural family is expanded with domain sequence relatives recruited from GenBank using a variety of efficient sequence search protocols and reliable thresholds. This extended resource, known as the CATH-protein family database (CATH-PFDB) contains a total of 310 000 domain sequences classified into 26 812 sequence families. New sequence search protocols have been designed, based on these intermediate sequence libraries, to allow more regular updating of the classification. Further developments include the adaptation of a recently developed method for rapid structure comparison, based on secondary structure matching, for domain boundary assignment. The philosophy behind CATHEDRAL is the recognition of recurrent folds already classified in CATH. Benchmarking of CATHEDRAL, using manually validated domain assignments, demonstrated that 43% of domains boundaries could be completely automatically assigned. This is an improvement on a previous consensus approach for which only 10-20% of domains could be reliably processed in a completely automated fashion. Since domain boundary assignment is a significant bottleneck in the classification of new structures, CATHEDRAL will also help to increase the frequency of CATH updates.  相似文献   

4.
Knowledge of the three‐dimensional structure of a protein is essential for describing and understanding its function. Today, a large number of known protein sequences faces a small number of identified structures. Thus, the need arises to predict structure from sequence without using time‐consuming experimental identification. In this paper the performance of Support Vector Machines (SVMs) is compared to Neural Networks and to standard statistical classification methods as Discriminant Analysis and Nearest Neighbor Classification. We show that SVMs can beat the competing methods on a dataset of 268 protein sequences to be classified into a set of 42 fold classes. We discuss misclassification with respect to biological function and similarity. In a second step we examine the performance of SVMs if the embedding is varied from frequencies of single amino acids to frequencies of tripletts of amino acids. This work shows that SVMs provide a promising alternative to standard statistical classification and prediction methods in functional genomics.  相似文献   

5.
Fbxo7 and PI31 contain a conserved FP domain that mediates the homo-/hetero-dimerization of the proteins. The PI31 FP domain may also interact with the F-box motif in Fbxo7. The FP domain-mediated protein–protein interactions are important for the functions of Fbxo7 and PI31. The crystal structures of the Fbxo7 and PI31 FP domains were determined previously, showing that a C-terminal helix in the Fbxo7 FP domain was not present in the PI31 FP domain. Here, we determine the crystal structure of the PI31 FP domain using a longer protein construct. The structure is comparable to the Fbxo7 FP domain (including the C-terminal helix), indicating that the two FP domains share the same global fold. However, the FP domains also harbor their own characteristic structural features, mainly in the longest loop (which has a largely fixed conformation due to extensive hydrogen bonding and hydrophobic interactions) and the C-terminal end regions. The crystal structures also reveal fundamental differences in the modes of protein–protein interactions mediated by the two FP domains: the PI31 FP domain utilizes either an α interface or β interface for homodimeric interaction, whereas the Fbxo7 FP domain utilizes an αβ interface. We perform modeling studies to show that the domain-specific structural features may dictate specific modes of inter-domain interactions. We propose that a heterodimeric interaction would be mediated by an αβ interface consisting of the α-helical and β-sheet surfaces of the Fbxo7 and PI31 FP domains, respectively. We also discuss the structural/functional significance of various modes of FP domain-mediated protein–protein interactions.  相似文献   

6.
A consensus approach for the assignment of structural domains in proteins is presented. The approach combines a number of previously published algorithms, and takes advantage of the elevated accuracy obtained when assignments from the individual algorithms are in agreement. The consensus approach is tested on a data set of 55 protein chains, for which domain assignments from four automated methods were known, and for which crystallographers assignments had been reported in the literature. Accuracy was found to increase in this test from 72% using individual algorithms to 100% when all four methods were in agreement. However a consensus prediction using all four methods was only possible for 52% of the dataset. The consensus approach [using three publicly available domain assignment algorithms (PUU, DETECTIVE, DOMAK)] was then used to make domain assignments for a data set of 787 protein chains from the Protein Data Bank. Analysis of the assignments showed 55.7% of assignments could be made automatically, and of these, 13.5% were multi-domain proteins. Of the remaining 44.3% that could not be assigned by the consensus procedure 90.4% had their domain boundaries assigned correctly by at least one of the algorithms. Once identified, these domains were analyzed for trends in their size and secondary structure class. In addition, the discontinuity of each domain along the protein chain was considered.  相似文献   

7.
Protein disulfide isomerase (PDI) is an enzyme that promotes protein folding by catalyzing disulfide bridge isomerization. PDI and its relatives form a diverse protein family whose members are characterized by thioredoxin-like (TX) domains in the primary structures. The family was classified into four classes by the number and the relative positions of the TX domains. To investigate the evolution of the domain structures, we aligned the amino acid sequences of the TX domains, and the molecular phylogeny was examined by the NJ and ML methods. We found that all of the current members of the PDI family have evolved from an ancestral enzyme, which has two TX domains in the primary structure. The diverse domain structures of the members have been generated through domain duplications and deletions.  相似文献   

8.
Proteins are biochemical compounds made up of one or more polypeptides in a specific order, typically folded into a functionally active form. Proteins are categorized into four different structural classes according to the topology of α-helices and β-strands. In this study, we modeled these four structural classes as an undirected network depicting amino acids as nodes and interaction between them as edges. Results infer that basic protein classes can be easily recognized as well as distinguished by utilizing protein contact maps (PCM). Toward studying the globin-like fold, the helix-loop-helix region contacts were seen to be of a unique pattern, and these remained in all the folds. Further, the averaged diagonal contacts were analyzed and identified those contacts in α/β proteins were higher in comparison with the other class. Interesting, we noticed that anti-parallel beta sheets were dominant in all-β and α + β classes that lead to similar diagonal patterns. Network properties of all four basic classes were analyzed and found to possess small-world property. Findings infer that PCM may assist classify protein structure classes and it also helps in evaluating the predicted protein structures.  相似文献   

9.
Based on the 210 non-homologous proteins (domains) classified manually by Michie et al. (J. Mol. Biol. 262, 168-185, 1996), a new structure classification criterion of globular proteins relying on the content of helix/strand has been proposed, using a quadratic discriminant method. Each protein is classified into one of the three classes, i.e. those of alpha class, beta class and alphabeta class (including alpha/beta and alpha+beta classes). According to the new structure classification criterion, of the 210 proteins in the training set, 207 are correctly classified and thus the accuracy is 207/210=98.57%. Multiple cross-validation tests are performed. The jackknife test shows that of the 210 proteins 207 are correctly classified with an accuracy of 98.57%. To test the method further, of 3577 proteins (domains) extracted from SCOP, 91.39% of them are correctly reclassified by the new classification criterion. On average, the accuracy of the new criterion is about 8 percentage points higher than that of the criterion proposed by Nakashima et al. (J. Biochem. 99, 153-162, 1986). Our result shows that the classification based solely on structures is basically consistent with that combining both structural and evolutionary information. Further complete automated classification scheme should consider both structures and evolutionary relationship. The methodology presented provides an appropriate mathematical format to reach this goal.  相似文献   

10.
Major histocompatibility complex (MHC) class I molecules play a pivotal role in immune defense system, presenting the antigen peptides to cytotoxic CD8+ T lymphocytes. Most vertebrates possess multiple MHC class I loci, but the analysis of their evolutionary relationships between distantly related species has difficulties because genetic events such as gene duplication, deletion, recombination, and/or conversion have occurred frequently in these genes. Human MHC class I genes have been conserved only within the primates for up to 46-66 My. Here, we performed comprehensive analysis of the MHC class I genes of the medaka fish, Oryzias latipes, and found that they could be classified into four groups of ancient origin. In phylogenetic analysis using these genes and the classical and nonclassical class I genes of other teleost fishes, three extracellular domains of the class I genes showed quite different evolutionary histories. The α1 domains generated four deeply diverged lineages corresponding to four medaka class I groups with high bootstrap values. These lineages were shared with salmonid and/or other acanthopterygian class I genes, unveiling the orthologous relationships between the classical MHC class I genes of medaka and salmonids, which diverged approximately 260 Ma. This suggested that the lineages must have diverged in the early days of the euteleost evolution and have been maintained for a long time in their genome. In contrast, the α3 domains clustered by species or fish groups, regardless of classical or nonclassical gene types, suggesting that this domain was homogenized in each species during prolonged evolution, possibly retaining the potential for CD8 binding even in the nonclassical genes. On the other hand, the α2 domains formed no apparent clusters with the α1 lineages or with species, suggesting that they were diversified partly by interlocus gene conversion, and that the α1 and α2 domains evolved separately. Such evolutionary mode is characteristic to the teleost MHC class I genes and might have contributed to the long-term conservation of the α1 domain.  相似文献   

11.
Cell surface receptors of the integrin family are pivotal to cell adhesion and migration. The activation state of heterodimeric αβ integrins is correlated to the association state of the single-pass α and β transmembrane domains. The association of integrin αIIbβ3 transmembrane domains, resulting in an inactive receptor, is characterized by the asymmetric arrangement of a straight (αIIb) and tilted (β3) helix relative to the membrane in congruence to the dissociated structures. This allows for a continuous association interface centered on helix-helix glycine-packing and an unusual αIIb(GFF) structural motif that packs the conserved Phe-Phe residues against the β3 transmembrane helix, enabling αIIb(D723)β3(R995) electrostatic interactions. The transmembrane complex is further stabilized by the inactive ectodomain, thereby coupling its association state to the ectodomain conformation. In combination with recently determined structures of an inactive integrin ectodomain and an activating talin/β complex that overlap with the αβ transmembrane complex, a comprehensive picture of integrin bi-directional transmembrane signaling has emerged.  相似文献   

12.
Structural trees for large protein superfamilies, such as β proteins with the aligned β sheet packing, β proteins with the orthogonal packing of α helices, two-layer and three-layer α/β proteins, have been constructed. The structural motifs having unique overall folds and a unique handedness are taken as root structures of the trees. The larger protein structures of each superfamily are obtained by a stepwise addition of α helices and/or β strands to the corresponding root motif, taking into account a restricted set of rules inferred from known principles of the protein structure. Among these rules, prohibition of crossing connections, attention to handedness and compactness, and a requirement for α helices to be packed in α-helical layers and β strands in β layers are the most important. Proteins and domains whose structures can be obtained by stepwise addition of α helices and/or β strands to the same root motif can be grouped into one structural class or a superfamily. Proteins and domains found within branches of a structural tree can be grouped into subclasses or subfamilies. Levels of structural similarity between different proteins can easily be observed by visual inspection. Within one branch, protein structures having a higher position in the tree include the structures located lower. Proteins and domains of different branches have the structure located in the branching point as the common fold. Proteins 28:241–260, 1997. © 1997 Wiley-Liss Inc.  相似文献   

13.
The knowledge collated from the known protein structures has revealed that the proteins are usually folded into the four structural classes: all-α, all-β, α/β and α + β. A number of methods have been proposed to predict the protein's structural class from its primary structure; however, it has been observed that these methods fail or perform poorly in the cases of distantly related sequences. In this paper, we propose a new method for protein structural class prediction using low homology (twilight-zone) protein sequences dataset. Since protein structural class prediction is a typical classification problem, we have developed a Support Vector Machine (SVM)-based method for protein structural class prediction that uses features derived from the predicted secondary structure and predicted burial information of amino acid residues. The examination of different individual as well as feature combinations revealed that the combination of secondary structural content, secondary structural and solvent accessibility state frequencies of amino acids gave rise to the best leave-one-out cross-validation accuracy of ~81% which is comparable to the best accuracy reported in the literature so far.  相似文献   

14.
15.
We present CATHEDRAL, an iterative protocol for determining the location of previously observed protein folds in novel multidomain protein structures. CATHEDRAL builds on the features of a fast secondary-structure-based method (using graph theory) to locate known folds within a multidomain context and a residue-based, double-dynamic programming algorithm, which is used to align members of the target fold groups against the query protein structure to identify the closest relative and assign domain boundaries. To increase the fidelity of the assignments, a support vector machine is used to provide an optimal scoring scheme. Once a domain is verified, it is excised, and the search protocol is repeated in an iterative fashion until all recognisable domains have been identified. We have performed an initial benchmark of CATHEDRAL against other publicly available structure comparison methods using a consensus dataset of domains derived from the CATH and SCOP domain classifications. CATHEDRAL shows superior performance in fold recognition and alignment accuracy when compared with many equivalent methods. If a novel multidomain structure contains a known fold, CATHEDRAL will locate it in 90% of cases, with <1% false positives. For nearly 80% of assigned domains in a manually validated test set, the boundaries were correctly delineated within a tolerance of ten residues. For the remaining cases, previously classified domains were very remotely related to the query chain so that embellishments to the core of the fold caused significant differences in domain sizes and manual refinement of the boundaries was necessary. To put this performance in context, a well-established sequence method based on hidden Markov models was only able to detect 65% of domains, with 33% of the subsequent boundaries assigned within ten residues. Since, on average, 50% of newly determined protein structures contain more than one domain unit, and typically 90% or more of these domains are already classified in CATH, CATHEDRAL will considerably facilitate the automation of protein structure classification.  相似文献   

16.
Kurgan LA  Zhang T  Zhang H  Shen S  Ruan J 《Amino acids》2008,35(3):551-564
Structural class categorizes proteins based on the amount and arrangement of the constituent secondary structures. The knowledge of structural classes is applied in numerous important predictive tasks that address structural and functional features of proteins. We propose novel structural class assignment methods that use one-dimensional (1D) secondary structure as the input. The methods are designed based on a large set of low-identity sequences for which secondary structure is predicted from their sequence (PSSAsc model) or assigned based on their tertiary structure (SSAsc). The secondary structure is encoded using a comprehensive set of features describing count, content, and size of secondary structure segments, which are fed into a small decision tree that uses ten features to perform the assignment. The proposed models were compared against seven secondary structure-based and ten sequence-based structural class predictors. Using the 1D secondary structure, SSAsc and PSSAsc can assign proteins to the four main structural classes, while the existing secondary structure-based assignment methods can predict only three classes. Empirical evaluation shows that the proposed models are quite promising. Using the structure-based assignment performed in SCOP (structural classification of proteins) as the golden standard, the accuracy of SSAsc and PSSAsc equals 76 and 75%, respectively. We show that the use of the secondary structure predicted from the sequence as an input does not have a detrimental effect on the quality of structural class assignment when compared with using secondary structure derived from tertiary structure. Therefore, PSSAsc can be used to perform the automated assignment of structural classes based on the sequences.  相似文献   

17.
We probe the stability and near-native energy landscape of protein fold space using powerful conformational sampling methods together with simple reduced models and statistical potentials. Fold space is represented by a set of 280 protein domains spanning all topological classes and having a wide range of lengths (33-300 residues) amino acid composition and number of secondary structural elements. The degrees of freedom are taken as the loop torsion angles. This choice preserves the native secondary structure but allows the tertiary structure to change. The proteins are represented by three-point per residue, three-dimensional models with statistical potentials derived from a knowledge-based study of known protein structures. When this space is sampled by a combination of parallel tempering and equi-energy Monte Carlo, we find that the three-point model captures the known stability of protein native structures with stable energy basins that are near-native (all α: 4.77 Å, all β: 2.93 Å, α/β: 3.09 Å, α+β: 4.89 Å on average and within 6 Å for 71.41%, 92.85%, 94.29% and 64.28% for all-α, all-β, α/β and α+β, classes, respectively). Denatured structures also occur and these have interesting structural properties that shed light on the different landscape characteristics of α and β folds. We find that α/β proteins with alternating α and β segments (such as the β-barrel) are more stable than proteins in other fold classes.  相似文献   

18.

Background  

Partitioning of a protein into structural components, known as domains, is an important initial step in protein classification and for functional and evolutionary studies. While the systematic assignments of domains by human experts exist (CATH and SCOP), the introduction of high throughput technologies for structure determination threatens to overwhelm expert approaches. A variety of algorithmic methods have been developed to expedite this process, allowing almost instant structural decomposition into domains. The performance of algorithmic methods can approach 85% agreement on the number of domains with the consensus reached by experts. However, each algorithm takes a somewhat different conceptual approach, each with unique strengths and weaknesses. Currently there is no simple way to automatically compare assignments from different structure-based domain assignment methods, thereby providing a comprehensive understanding of possible structure partitioning as well as providing some insight into the tendencies of particular algorithms. Most importantly, a consensus assignment drawn from multiple assignment methods can provide a singular and presumably more accurate view.  相似文献   

19.
Classification is central to many studies of protein structure, function, and evolution. This article presents a strategy for classifying protein three-dimensional structures. Methods for and issues related to secondary structure, domain, and class assignment are discussed, in addition to methods for the comparison of protein three-dimensional structures. Strategies for assigning protein domains to particular folds and homologous superfamilies are then described in the context of the currently available classification schemes. Two examples (adenylate cyclase/DNA polymerase and glycogen phosphorylase/β-glucosyltransferase) are presented to illustrate problems associated with protein classification.  相似文献   

20.
In protein structure prediction, a central problem is defining the structure of a loop connecting 2 secondary structures. This problem frequently occurs in homology modeling, fold recognition, and in several strategies in ab initio structure prediction. In our previous work, we developed a classification database of structural motifs, ArchDB. The database contains 12,665 clustered loops in 451 structural classes with information about phi-psi angles in the loops and 1492 structural subclasses with the relative locations of the bracing secondary structures. Here we evaluate the extent to which sequence information in the loop database can be used to predict loop structure. Two sequence profiles were used, a HMM profile and a PSSM derived from PSI-BLAST. A jack-knife test was made removing homologous loops using SCOP superfamily definition and predicting afterwards against recalculated profiles that only take into account the sequence information. Two scenarios were considered: (1) prediction of structural class with application in comparative modeling and (2) prediction of structural subclass with application in fold recognition and ab initio. For the first scenario, structural class prediction was made directly over loops with X-ray secondary structure assignment, and if we consider the top 20 classes out of 451 possible classes, the best accuracy of prediction is 78.5%. In the second scenario, structural subclass prediction was made over loops using PSI-PRED (Jones, J Mol Biol 1999;292:195-202) secondary structure prediction to define loop boundaries, and if we take into account the top 20 subclasses out of 1492, the best accuracy is 46.7%. Accuracy of loop prediction was also evaluated by means of RMSD calculations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号