首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Recognizing the fold of a protein structure   总被引:3,自引:0,他引:3  
This paper reports a graph-theoretic program, GRATH, that rapidly, and accurately, matches a novel structure against a library of domain structures to find the most similar ones. GRATH generates distributions of scores by comparing the novel domain against the different types of folds that have been classified previously in the CATH database of structural domains. GRATH uses a measure of similarity that details the geometric information, number of secondary structures and number of residues within secondary structures, that any two protein structures share. Although GRATH builds on well established approaches for secondary structure comparison, a novel scoring scheme has been introduced to allow ranking of any matches identified by the algorithm. More importantly, we have benchmarked the algorithm using a large dataset of 1702 non-redundant structures from the CATH database which have already been classified into fold groups, with manual validation. This has facilitated introduction of further constraints, optimization of parameters and identification of reliable thresholds for fold identification. Following these benchmarking trials, the correct fold can be identified with the top score with a frequency of 90%. It is identified within the ten most likely assignments with a frequency of 98%. GRATH has been implemented to use via a server (http://www.biochem.ucl.ac.uk/cgi-bin/cath/Grath.pl). GRATH's speed and accuracy means that it can be used as a reliable front-end filter for the more accurate, but computationally expensive, residue based structure comparison algorithm SSAP, currently used to classify domain structures in the CATH database. With an increasing number of structures being solved by the structural genomics initiatives, the GRATH server also provides an essential resource for determining whether newly determined structures are related to any known structures from which functional properties may be inferred.  相似文献   

2.
The CATH database of domain structures has been used to explore the structural variation of homologous domains in 294 well populated domain structure superfamilies, each containing at least three sequence diverse relatives. Our analyses confirm some previously detected trends relating sequence divergence to structural variation but for a much larger dataset and in some superfamilies the new data reveal exceptional structural variation. Use of a new algorithm (2DSEC) to analyse variability in secondary structure compositions across a superfamily sheds new light on how structures evolve. 2DSEC detects inserted secondary structures that embellish the core of conserved secondary structures found throughout the superfamily. Analysis showed that for 56% of highly populated superfamilies (>9 sequence diverse relatives), there are twofold or more increases in the numbers of secondary structures in some relatives. In some families fivefold increases occur, sometimes modifying the fold of the domain. Manual inspection of secondary structure insertions or embellishments in 48 particularly variable superfamilies revealed that although these insertions were usually discontiguous in the sequence they were often co-located in 3D resulting in a larger structural motif that often modified the geometry of the active site or the surface conformation promoting diverse domain partnerships and protein interactions. These observations, supported by automatic analysis of all well populated CATH families, suggest that accretion of small secondary structure insertions may provide a simple mechanism for evolving new functions in diverse relatives. Some layered domain architectures (e.g. mainly-beta and alpha-beta sandwiches) that recur highly in the genomes more frequently exploit these types of embellishments to modify function. In these architectures, aggregation occurs most often at the edges, top or bottom of the beta-sheets. Information on structural variability across domain superfamilies has been made available through the CATH Dictionary of Homologous Structures (DHS).  相似文献   

3.
There are more than 200 completed genomes and over 1 million nonredundant sequences in public repositories. Although the structural data are more sparse (approximately 13,000 nonredundant structures solved to date), several powerful sequence-based methodologies now allow these structures to be mapped onto related regions in a significant proportion of genome sequences. We review a number of publicly available strategies for providing structural annotations for genome sequences, and we describe the protocol adopted to provide CATH structural annotations for completed genomes. In particular, we assess the performance of several sequence-based protocols employing Hidden Markov model (HMM) technologies for superfamily recognition, including a new approach (SAMOSA [sequence augmented models of structure alignments]) that exploits multiple structural alignments from the CATH domain structure database when building the models. Using a data set of remote homologs detected by structure comparison and manually validated in CATH, a single-seed HMM library was able to recognize 76% of the data set. Including the SAMOSA models in the HMM library showed little gain in homolog recognition, although a slight improvement in alignment quality was observed for very remote homologs. However, using an expanded 1D-HMM library, CATH-ISL increased the coverage to 86%. The single-seed HMM library has been used to annotate the protein sequences of 120 genomes from all three major kingdoms, allowing up to 70% of the genes or partial genes to be assigned to CATH superfamilies. It has also been used to recruit sequences from Swiss-Prot and TrEMBL into CATH domain superfamilies, expanding the CATH database eightfold.  相似文献   

4.
The secondary structures of DnaK and the mutant DnaK756 heat-shock proteins from Escherichia coli have been investigated by Fourier transform infrared spectroscopy. The analysis of infrared data showed that DnaK and DnaK756 proteins have different secondary structures that are not affected by the presence of ATP or beta, gamma-methyleneadenosine 5'-triphosphate. The infrared data indicate also that the tertiary structures of DnaK and DnaK756 proteins are different and that DnaK protein undergoes conformational changes in its tertiary structure not only during binding of ATP but also during ATP hydrolysis. Using fluorescence spectroscopy of a single tryptophan located in the N-terminal domain of DnaK protein and fluorescence of 1,1'-bis(4-anilino)naphthalene-5,5'-disulfonic acid, which interacts with hydrophobic domains of DnaK protein, we were able to distinguish between two conformational states of DnaK protein. After binding of triphosphonucleotides, the C-terminal domain of DnaK protein changes in tertiary structure in such a way that fewer hydrophobic segments are exposed on the surface of the protein. After ATP hydrolysis, the number of hydrophobic segments on the surface of the protein is further reduced, and moreover the tertiary structure of the N-terminal domain of the protein changes. These data are discussed in terms of structural and functional relationships of both DnaK and DnaK756 proteins.  相似文献   

5.
Exponential growth in the number of available protein sequences is unmatched by the slower growth in the number of structures. As a result, the development of efficient and fast protein secondary structure prediction methods is essential for the broad comprehension of protein structures. Computational methods that can efficiently determine secondary structure can in turn facilitate protein tertiary structure prediction, since most methods rely initially on secondary structure predictions. Recently, we have developed a fast learning optimized prediction methodology (FLOPRED) for predicting protein secondary structure (Saraswathi et al. in JMM 18:4275, 2012). Data are generated by using knowledge-based potentials combined with structure information from the CATH database. A neural network-based extreme learning machine (ELM) and advanced particle swarm optimization (PSO) are used with this data to obtain better and faster convergence to more accurate secondary structure predicted results. A five-fold cross-validated testing accuracy of 83.8 % and a segment overlap (SOV) score of 78.3 % are obtained in this study. Secondary structure predictions and their accuracy are usually presented for three secondary structure elements: α-helix, β-strand and coil but rarely have the results been analyzed with respect to their constituent amino acids. In this paper, we use the results obtained with FLOPRED to provide detailed behaviors for different amino acid types in the secondary structure prediction. We investigate the influence of the composition, physico-chemical properties and position specific occurrence preferences of amino acids within secondary structure elements. In addition, we identify the correlation between these properties and prediction accuracy. The present detailed results suggest several important ways that secondary structure predictions can be improved in the future that might lead to improved protein design and engineering.  相似文献   

6.
Abaturov LV  Nosova NG 《Biofizika》2007,52(6):978-996
The information on the high-temperature proteolytic degradation of RNase A has been analyzed. It has been shown that a few peptide bonds primarily splitted by trypsin, chymotrypsin and thermolysin are localized only in the N-terminal part of structural domain II of the native molecule. The same peptide bonds are splitted by proteases with the highest rate upon the denaturation in the presense of trifluoroethanol or the renaturation from concentrated urea solutions and after the desorganization of the native structure by the reduction of all four S-S bonds of RNase A. According to the data on hydrogen exchange in the native RNase A molecule, the dynamic stability of the tertiary structure of domain II is lower than that of domain I because of the lesser amount of the internal bulky nonpolar residues Val, Ile, and Phe. For the same reason, this part of the molecule in different nonnative forms of RNase A is less compact and more flexible and is splitted with the highest rate in the segment 31-39 enriched by long cationic residues Lys and Arg. A common feature of the conformation of the flexible disordered backbone of all RNase A nonnative structures considered is the predominance of short PPII helices, which provides a high rate of the restoration of the native secondary and tertiary structures upon renaturation or self-organization and global fluctuations of the native structure revealed by the hydrogen exchange and proteolytic degradation.  相似文献   

7.
The rate constants for the processes that lead to local opening and closing of the structures around hydrogen bonds in native proteins have been determined for most of the secondary structure hydrogen bonds in the four-helix protein acyl coenzyme A binding protein. In an analysis that combines these results with the energies of activation of the opening processes and the stability of the local structures, three groups of residues in the protein structure have been identified. In one group, the structures around the hydrogen bonds have frequent openings, every 600 to 1,500 s, and long lifetimes in the open state, around 1 s. In another group of local structures, the local opening is a very rare event that takes place only every 15 to 60 h. For these the lifetime in the open state is also around 1 s. The majority of local structures have lifetimes between 2,000 and 20,000 s and relatively short lifetimes of the open state in the range between 30 and 400 ms. Mapping of these groups of amides to the tertiary structure shows that the openings of the local structures are not cooperative at native conditions, and they rarely if ever lead to global unfolding. The results suggest a mechanism of hydrogen exchange by progressive local openings.  相似文献   

8.
Dividing protein structures into domains is proven useful for more accurate structural and functional characterization of proteins. Here, we develop a method, called DDOMAIN, that divides structure into DOMAINs using a normalized contact-based domain-domain interaction profile. Results of DDOMAIN are compared to AUTHORS annotations (domain definitions are given by the authors who solved protein structures), as well as to popular SCOP and CATH annotations by human experts and automatic programs. DDOMAIN's automatic annotations are most consistent with the AUTHORS annotations (90% agreement in number of domains and 88% agreement in both number of domains and at least 85% overlap in domain assignment of residues) if its three adjustable parameters are trained by the AUTHORS annotations. By comparison, the agreement is 83% (81% with at least 85% overlap criterion) between SCOP-trained DDOMAIN and SCOP annotations and 77% (73%) between CATH-trained DDOMAIN and CATH annotations. The agreement between DDOMAIN and AUTHORS annotations goes beyond single-domain proteins (97%, 82%, and 56% for single-, two-, and three-domain proteins, respectively). For an "easy" data set of proteins whose CATH and SCOP annotations agree with each other in number of domains, the agreement is 90% (89%) between "easy-set"-trained DDOMAIN and CATH/SCOP annotations. The consistency between SCOP-trained DDOMAIN and SCOP annotations is superior to two other recently developed, SCOP-trained, automatic methods PDP (protein domain parser), and DomainParser 2. We also tested a simple consensus method made of PDP, DomainParser 2, and DDOMAIN and a different version of DDOMAIN based on a more sophisticated statistical energy function. The DDOMAIN server and its executable are available in the services section on http://sparks.informatics.iupui.edu.  相似文献   

9.
A computer program is used to analyse automatically and objectively the atomic co-ordinates of a large number of globular proteins in order to identify the regions of α-helix, β-sheet and reverse-turn secondary structure. Several different criteria for the assignment of secondary structure are tested for accuracy, reproducibility and efficiency. The most successful criterion, which is based on patterns of peptide hydrogen bonds, inter-Cα distances and inter-Cα torsion angles, is used to find the secondary structure of all the proteins studied. The accuracy of the derived assignments is assessed by comparing them with the secondary structure reported in the literature for each protein. The reliability of the methods is assessed by comparing the secondary structures derived from the independently determined sets of co-ordinates available for some proteins.We provide the first objective and consistent compilation of α-helix, β-sheet and reverse-turn secondary structure in almost all globular proteins of known tertiary structure. These data will be invaluable for analysing the relative tendencies of different amino acids to occur in different types of secondary structure, for analysing the regularity of the secondary structure itself, and for analysing how the pieces of secondary structure fit together to form the globular tertiary structure of each protein.  相似文献   

10.
The CATH database of protein domain structures (http://www.biochem.ucl.ac.uk/bsm/cath_new) currently contains 34 287 domain structures classified into 1383 superfamilies and 3285 sequence families. Each structural family is expanded with domain sequence relatives recruited from GenBank using a variety of efficient sequence search protocols and reliable thresholds. This extended resource, known as the CATH-protein family database (CATH-PFDB) contains a total of 310 000 domain sequences classified into 26 812 sequence families. New sequence search protocols have been designed, based on these intermediate sequence libraries, to allow more regular updating of the classification. Further developments include the adaptation of a recently developed method for rapid structure comparison, based on secondary structure matching, for domain boundary assignment. The philosophy behind CATHEDRAL is the recognition of recurrent folds already classified in CATH. Benchmarking of CATHEDRAL, using manually validated domain assignments, demonstrated that 43% of domains boundaries could be completely automatically assigned. This is an improvement on a previous consensus approach for which only 10-20% of domains could be reliably processed in a completely automated fashion. Since domain boundary assignment is a significant bottleneck in the classification of new structures, CATHEDRAL will also help to increase the frequency of CATH updates.  相似文献   

11.
Domains are the main structural and functional units of larger proteins. They tend to be contiguous in primary structure and can fold and function independently. It has been observed that 10–20% of all encoded proteins contain duplicated domains and the average pairwise sequence identity between them is usually low. In the present study, we have analyzed the structural similarity between domain repeats of proteins with known structures available in the Protein Data Bank using structure-based inter-residue interaction measures such as the number of long-range contacts, surrounding hydrophobicity, and pairwise interaction energy. We used RADAR program for detecting the repeats in a protein sequence which were further validated using Pfam domain assignments. The sequence identity between the repeats in domains ranges from 20 to 40% and their secondary structural elements are well conserved. The number of long-range contacts, surrounding hydrophobicity calculations and pairwise interaction energy of the domain repeats clearly reveal the conservation of 3-D structure environment in the repeats of domains. The proportions of mainchain–mainchain hydrogen bonds and hydrophobic interactions are also highly conserved between the repeats. The present study has suggested that the computation of these structure-based parameters will give better clues about the tertiary environment of the repeats in domains. The folding rates of individual domains in the repeats predicted using the long-range order parameter indicate that the predicted folding rates correlate well with most of the experimentally observed folding rates for the analyzed independently folded domains.  相似文献   

12.
BACKGROUND: Several methods of structural classification have been developed to introduce some order to the large amount of data present in the Protein Data Bank. Such methods facilitate structural comparisons and provide a greater understanding of structure and function. The most widely used and comprehensive databases are SCOP, CATH and FSSP, which represent three unique methods of classifying protein structures: purely manual, a combination of manual and automated, and purely automated, respectively. In order to develop reliable template libraries and benchmarks for protein-fold recognition, a systematic comparison of these databases has been carried out to determine their overall agreement in classifying protein structures. RESULTS: Approximately two-thirds of the protein chains in each database are common to all three databases. Despite employing different methods, and basing their systems on different rules of protein structure and taxonomy, SCOP, CATH and FSSP agree on the majority of their classifications. Discrepancies and inconsistencies are accounted for by a small number of explanations. Other interesting features have been identified, and various differences between manual and automatic classification methods are presented. CONCLUSIONS: Using these databases requires an understanding of the rules upon which they are based; each method offers certain advantages depending on the biological requirements and knowledge of the user. The degree of discrepancy between the systems also has an impact on reliability of prediction methods that employ these schemes as benchmarks. To generate accurate fold templates for threading, we extract information from a consensus database, encompassing agreements between SCOP, CATH and FSSP.  相似文献   

13.
The elucidation of the domain content of a given protein sequence in the absence of determined structure or significant sequence homology to known domains is an important problem in structural biology. Here we address how successfully the delineation of continuous domains can be accomplished in the absence of sequence homology using simple baseline methods, an existing prediction algorithm (Domain Guess by Size), and a newly developed method (DomSSEA). The study was undertaken with a view to measuring the usefulness of these prediction methods in terms of their application to fully automatic domain assignment. Thus, the sensitivity of each domain assignment method was measured by calculating the number of correctly assigned top scoring predictions. We have implemented a new continuous domain identification method using the alignment of predicted secondary structures of target sequences against observed secondary structures of chains with known domain boundaries as assigned by Class Architecture Topology Homology (CATH). Taking top predictions only, the success rate of the method in correctly assigning domain number to the representative chain set is 73.3%. The top prediction for domain number and location of domain boundaries was correct for 24% of the multidomain set (+/-20 residues). These results have been put into context in relation to the results obtained from the other prediction methods assessed.  相似文献   

14.
15.
ABSTRACT: BACKGROUND: Multiple structure alignments have received increasing attention in recent years as an alternative to multiple sequence alignments. Although multiple structure alignment algorithms can potentially be applied to a number of problems, they have primarily been used for protein core identification. A method that is capable of solving a variety of problems using structure comparison is still absent. Here we introduce a program msTALI for aligning multiple protein structures. Our algorithm uses several informative features to guide its alignments: torsion angles, backbone Calpha atom positions, secondary structure, residue type, surface accessibility, and properties of nearby atoms. The algorithm allows the user to weight the types of information used to generate the alignment, which expands its utility to a wide variety of problems. RESULTS: msTALI exhibits competitive results on 824 families from the Homstrad and SABmark databases when compared to Matt and Mustang. We also demonstrate success at building a database of protein cores using 341 randomly selected CATH domains and highlight the contribution of msTALI compared to the CATH classifications. Finally, we present an example applying msTALI to the problem of detecting hinges in a protein undergoing rigid-body motion. CONCLUSIONS: msTALI is an effective algorithm for multiple structure alignment. In addition to its performance on standard comparison databases, it utilizes clear, informative features, allowing further customization for domain-specific applications. The C++ source code for msTALI is available for Linux on the web at http://ifestos.cse.sc.edu/mstali.  相似文献   

16.
Theories of protein folding often consider contributions from three fundamental elements: loops, hydrophobic interactions, and secondary structures. The pathway of protein folding, the rate of folding, and the final folded structure should be predictable if the energetic contributions to folding of these fundamental factors were properly understood. alphatalpha is a helix-turn-helix peptide that was developed by de novo design to provide a model system for the study of these important elements of protein folding. Hydrogen exchange experiments were performed on selectively 15N-labeled alphatalpha and used to calculate the stability of hydrogen bonds within the peptide. The resulting pattern of hydrogen bond stability was analyzed using a version of Lifson-Roig model that was extended to include a statistical parameter for tertiary interactions. This parameter, x, represents the additional statistical weight conferred upon a helical state by a tertiary contact. The hydrogen exchange data is most closely fit by the XHC model with an x parameter of 9.25. Thus the statistical weight of a hydrophobic tertiary contact is approximately 5.8x the statistical weight for helix formation by alanine. The value for the x parameter derived from this study should provide a basis for the understanding of the relationship between hydrophobic cluster formation and secondary structure formation during the early stages of protein folding.  相似文献   

17.
S K Silverman  T R Cech 《Biochemistry》1999,38(27):8691-8702
Tertiary interactions that allow RNA to fold into intricate three-dimensional structures are being identified, but little is known about the thermodynamics of individual interactions. Here we quantify the tertiary structure contributions of individual hydrogen bonds in a "ribose zipper" motif of the recently crystallized Tetrahymena group I intron P4-P6 domain. The 2'-hydroxyls of P4-P6 nucleotides C109/A184 and A183/G110 participate in forming the "teeth" of the zipper. These four nucleotides were substituted in all combinations with their 2'-deoxy and (separately) 2'-methoxy analogues, and thermodynamic effects on the tertiary folding DeltaG degrees ' were assayed by the Mg2+ dependence of electrophoretic mobility in nondenaturing gels. The 2'-deoxy series showed a consistent trend with an average contribution to the tertiary folding DeltaG degrees' of -0.4 to -0.5 kcal/mol per hydrogen bond. Contributions were approximately additive, reflecting no cooperativity among the hydrogen bonds. Each "tooth" of the ribose zipper (comprising two hydrogen bonds) thus contributes about -1.0 kcal/mol to the tertiary folding DeltaG degrees'. Single 2'-methoxy substitutions destabilized folding by approximately 1 kcal/mol, but the trend reversed with multiple 2'-methoxy substitutions; the folding DeltaG degrees' for the quadruple 2'-methoxy derivative was approximately unchanged relative to wild-type. On the basis of these data and on temperature-gradient gel results, we conclude that entropically favorable hydrophobic interactions balance enthalpically unfavorable hydrogen bond deletions and steric clashes for multiple 2'-methoxy substitutions. Because many of the 2'-deoxy derivatives no longer have the characteristic hydrogen-bond patterns of the ribose zipper motif but simply have individual long-range ribose-base or ribose-ribose hydrogen bonds, we speculate that the energetic value of -0.4 to -0.5 kcal/mol per tertiary hydrogen bond may be more generally applicable to RNA folding.  相似文献   

18.
The pattern of residue substitution in divergently evolving families of globular proteins is highly variable. At each position in a fold there are constraints on the identities of amino acids from both the three-dimensional structure and the function of the protein. To characterize and quantify the structural constraints, we have made a comparative analysis of families of homologous globular proteins. Residues are classified according to amino acid type, secondary structure, accessibility of the sidechain, and existence of hydrogen bonds from sidechain to other sidechains or peptide carbonyl or amide functions. There are distinct patterns of substitution especially where residues are both solvent inaccessible and hydrogen bonded through their sidechains. The patterns of residue substitution can be used to construct templates or to identify 'key' residues if one or more structures are known. Conversely, analysis of conversation and substitution across a large family of aligned sequences in terms of substitution profiles can allow prediction of tertiary environment or indicate a functional role. Similar analyses can be used to test the validity of putative structures if several homologous sequences are available.  相似文献   

19.
This analysis takes an in-depth look into the difficulties encountered by automatic methods for domain decomposition from three-dimensional structure. The analysis involves a multi-faceted set of criteria including the integrity of secondary structure elements, the tendency toward fragmentation of domains, domain boundary consistency and topology. The strength of the analysis comes from the use of a new comprehensive benchmark dataset, which is based on consensus among experts (CATH, SCOP and AUTHORS of the 3D structures) and covers 30 distinct architectures and 211 distinct topologies as defined by CATH. Furthermore, over 66% of the structures are multi-domain proteins; each domain combination occurring once per dataset. The performance of four automatic domain assignment methods, DomainParser, NCBI, PDP and PUU, is carefully analyzed using this broad spectrum of topology combinations and knowledge of rules and assumptions built into each algorithm. We conclude that it is practically impossible for an automatic method to achieve the level of performance of human experts. However, we propose specific improvements to automatic methods as well as broadening the concept of a structural domain. Such work is prerequisite for establishing improved approaches to domain recognition. (The benchmark dataset is available from http://pdomains.sdsc.edu).  相似文献   

20.
A conformational variability of the collagen triple helix was studied with the methods of molecular mechanics. The Rich-Crick model with one hydrogen bond per tripeptide fragment or the model with two hydrogen bonds per tripeptide fragment were used for tripeptides forming the primary structure of the protein. Imino acid and amino acid residues were located in the second position of the tripeptide fragments in the first and second cases, respectively. Conformations on domain boundaries, which had alternating structures with one and two hydrogen bonds per tripeptide, were particularly studied. Essentially all types of collagen backbone composed of amino acid residues most frequently occurring in this protein were considered. A new model was suggested that combined elements of the Rich-Crick model and our new approach. This was shown to be stereochemically valid, energetically advantageous, and consistent with the experimental data. It was conclusively demonstrated that the primary structure of collagen determines its tertiary structure.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号