首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 694 毫秒
1.
Structurally conserved regions or structural templates have been identified and examined for features such as amino acid content, solvent accessibility, secondary structures, non-polar interaction, residue packing and extent of structural deviations in 179 aligned members of superfamilies involving 1208 pairs of protein domains. An analysis of these structural features shows that the retention of secondary structural conservation and similar hydrogen bonding pattern within the templates is 2.5 and 1.8 times higher, respectively, than full-length alignments suggesting that they form the minimum structural requirement of a superfamily. The identification and availability of structural templates find value in different areas of protein structure prediction and modelling such as in sensitive sequence searches, accurate sequence alignment and three-dimensional modelling on the basis of distant relationships.  相似文献   

2.
Knowledge of three dimensional structure is essential to understand the function of a protein. Although the overall fold is made from the whole details of its sequence, a small group of residues, often called as structural motifs, play a crucial role in determining the protein fold and its stability. Identification of such structural motifs requires sufficient number of sequence and structural homologs to define conservation and evolutionary information. Unfortunately, there are many structures in the protein structure databases have no homologous structures or sequences. In this work, we report an SVM method, SMpred, to identify structural motifs from single protein structure without using sequence and structural homologs. SMpred method was trained and tested using 132 proteins domains containing 581 motifs. SMpred method achieved 78.79% accuracy with 79.06% sensitivity and 78.53% specificity. The performance of SMpred was evaluated with MegaMotifBase using 188 proteins containing 1161 motifs. Out of 1161 motifs, SMpred correctly identified 1503 structural motifs reported in MegaMotifBase. Further, we showed that SMpred is useful approach for the length deviant superfamilies and single member superfamilies. This result suggests the usefulness of our approach for facilitating the identification of structural motifs in protein structure in the absence of sequence and structural homologs. The dataset and executable for the SMpred algorithm is available at http://www3.ntu.edu.sg/home/EPNSugan/index_files/SMpred.htm.  相似文献   

3.
The presence of sequence homologues and the availability of structural information of proteins enable better understanding of the biological function of a protein family. A majority of entries in protein structural databank are single member superfamilies for which it is hard to derive motifs due to the paucity of structural homologues. Important conserved segments for these superfamilies have been identified and compiled into a database, SSToSS (Sequence Structural Templates of Single member Superfamily). Conserved regions, recognized by permitted amino acid exchanges, are mapped on the structure and various structural features (solvent accessibility, secondary structure content, hydrogen bonding and residue packing) are examined. These conserved segments with high structural feature content are projected as sequence-structural templates for the particular superfamily member. Interactive three-dimensional displays of the templates in three-dimensional structure (in Chime and RASMOL) are provided for better understanding and visualization. In SSToSS database, we also provide the application of sequence-structural templates in three different areas: multiple-motif based sequence search, multiple sequence alignment and homology modeling. In each case, the inclusion of the sequence-structural templates can give rise to sensitive and accurate results. This enables the inclusion of singletons to provide added value to the recognition of additional members, comparative modeling and in designing experiments.  相似文献   

4.
In order to search for a common structural motif in the phosphate-binding sites of protein-mononucleotide complexes, we investigated the structural variety of phosphate-binding schemes by an all-against-all comparison of 491 binding sites found in the Protein Data Bank. We found four frequently occurring structural motifs composed of protein atoms interacting with phosphate groups, each of which appears in different protein superfamilies with different folds. The most frequently occurring motif, which we call the structural P-loop, is shared by 13 superfamilies and is characterized by a four-residue fragment, GXXX, interacting with a phosphate group through the backbone atoms. Various sequence motifs, including Walker's A motif or the P-loop, turn out to be a structural P-loop found in a few specific superfamilies. The other three motifs are found in pairs of superfamilies: protein kinase and glutathione synthetase ATPase domain like, actin-like ATPase domain and nucleotidyltransferase, and FMN-linked oxidoreductase and PRTase.  相似文献   

5.
6.

Background  

Annotation of sequences that share little similarity to sequences of known function remains a major obstacle in genome annotation. Some of the best methods of detecting remote relationships between protein sequences are based on matching sequence profiles. We analyse the superfamily specific performance of sequence profile-profile matching. Our benchmark consists of a set of 16 protein superfamilies that are highly diverse at the sequence level. We relate the performance to the number of sequences in the profiles, the profile diversity and the extent of structural conservation in the superfamily.  相似文献   

7.
Evolution of function in protein superfamilies, from a structural perspective   总被引:29,自引:0,他引:29  
The recent growth in protein databases has revealed the functional diversity of many protein superfamilies. We have assessed the functional variation of homologous enzyme superfamilies containing two or more enzymes, as defined by the CATH protein structure classification, by way of the Enzyme Commission (EC) scheme. Combining sequence and structure information to identify relatives, the majority of superfamilies display variation in enzyme function, with 25 % of superfamilies in the PDB having members of different enzyme types. We determined the extent of functional similarity at different levels of sequence identity for 486,000 homologous pairs (enzyme/enzyme and enzyme/non-enzyme), with structural and sequence relatives included. For single and multi-domain proteins, variation in EC number is rare above 40 % sequence identity, and above 30 %, the first three digits may be predicted with an accuracy of at least 90 %. For more distantly related proteins sharing less than 30 % sequence identity, functional variation is significant, and below this threshold, structural data are essential for understanding the molecular basis of observed functional differences. To explore the mechanisms for generating functional diversity during evolution, we have studied in detail 31 diverse structural enzyme superfamilies for which structural data are available. A large number of variations and peculiarities are observed, at the atomic level through to gross structural rearrangements. Almost all superfamilies exhibit functional diversity generated by local sequence variation and domain shuffling. Commonly, substrate specificity is diverse across a superfamily, whilst the reaction chemistry is maintained. In many superfamilies, the position of catalytic residues may vary despite playing equivalent functional roles in related proteins. The implications of functional diversity within supefamilies for the structural genomics projects are discussed. More detailed information on these superfamilies is available at http://www.biochem.ucl.ac.uk/bsm/FAM-EC/.  相似文献   

8.
The epoxide hydrolases and haloalkane dehalogenases database (EH/HD) integrates sequence and structure of a highly diverse protein family, including mainly the Asp-hydrolases of EHs and HDs but also proteins, such as Ser-hydrolases non-heme peroxidases, prolyl iminopetidases and 2-hydroxymuconic semialdehyde hydrolases. These proteins have a highly conserved structure, but display a remarkable diversity in sequence and function. A total of 305 protein entries were assigned to 14 homologous families, forming two superfamilies. Annotated multisequence alignments and phylogenetic trees are provided for each homologous family and superfamily. Experimentally derived structures of 19 proteins are superposed and consistently annotated. Sequence and structure of all 305 proteins were systematically analysed. Thus, deeper insight is gained into the role of a highly conserved sequence motifs and structural elements. AVAILABILITY: The EH/HD database is available at http://www.led.uni-stuttgart.de  相似文献   

9.
It is often possible to identify sequence motifs that characterize a protein family in terms of its fold and/or function from aligned protein sequences. Such motifs can be used to search for new family members. Partitioning of sequence alignments into regions of similar amino acid variability is usually done by hand. Here, I present a completely automatic method for this purpose: one that is guaranteed to produce globally optimal solutions at all levels of partition granularity. The method is used to compare the tempo of sequence diversity across reliable three-dimensional (3D) structure-based alignments of 209 protein families (HOMSTRAD) and that for 69 superfamilies (CAMPASS). (The mean alignment length for HOMSTRAD and CAMPASS are very similar.) Surprisingly, the optimal segmentation distributions for the closely related proteins and distantly related ones are found to be very similar. Also, optimal segmentation identifies an unusual protein superfamily. Finally, protein 3D structure clues from the tempo of sequence diversity across alignments are examined. The method is general, and could be applied to any area of comparative biological sequence and 3D structure analysis where the constraint of the inherent linear organization of the data imposes an ordering on the set of objects to be clustered.  相似文献   

10.
The Short-chain Dehydrogenases/Reductases Engineering Database (SDRED) covers one of the largest known protein families (168 150 proteins). Assignment to the superfamilies of Classical and Extended SDRs was achieved by global sequence similarity and by identification of family-specific sequence motifs. Two standard numbering schemes were established for Classical and Extended SDRs that allow for the determination of conserved amino acid residues, such as cofactor specificity determining positions or superfamily specific sequence motifs. The comprehensive sequence dataset of the SDRED facilitates the refinement of family-specific sequence motifs. The glycine-rich motifs for Classical and Extended SDRs were refined to improve the precision of superfamily classification. In each superfamily, the majority of sequences formed a tightly connected sequence network and belonged to a large homologous family. Despite their different sequence motifs and their different sequence length, the two sequence networks of Classical and Extended SDRs are not separate, but connected by edges at a threshold of 40% sequence similarity, indicating that all SDRs belong to a large, connected network. The SDRED is accessible at https://sdred.biocatnet.de/.  相似文献   

11.
Proteins for which there are good structural, functional and genetic similarities that imply a common evolutionary origin, can have sequences whose similarities are low or undetectable by conventional sequence comparison procedures. Do these proteins have sequence conservation beyond the simple conservation of hydrophobic and hydrophilic character at specific sites and if they do what is its nature? To answer these questions we have analysed the structures and sequences of two superfamilies: the four-helical cytokines and cytochromes c'-b(562). Members of these superfamilies have sequence similarities that are either very low or not detectable. The cytokine superfamily has within it a long chain family and a short chain family. The sequences of known representative structures of the two families were aligned using structural information. From these alignments we identified the regions that conserve the same main-chain conformation: the common core (CC). For members of the same family, the CC comprises some 50% of the individual structures; for the combination of both families it is 30%. We added homologous sequences to the structural alignment. Analysis of the residues occurring at sites within the CCs showed that 30% have little or no conservation, whereas about 40% conserve the polar/neutral or hydrophobic/neutral character of their residues. The remaining 30% conserve hydrophobic residues with strong or medium limitations on their volume variations. Almost all of these residues are found at sites that form the "buried spine" of each helix (at sites i, i+3, i+7, i+10, etc., or i, i+4, i+7, i+11, etc.) and they pack together at the centre of each structure to give a pattern of residue-residue contacts that is almost absolutely conserved. These CC conserved hydrophobic residues form only 10-15% of all the residues in the individual structures.A similar analysis of the cytochromes c'-b(562), which bind haem and have a very different function to that of the cytokines, gave very similar results. Again some 30% of the CC residues have hydrophobic residues with strong or medium conservation. Most of these form the buried spine of each helix and play the same role as those in the cytokines. The others, and some spine residues bind the haem co-factor.  相似文献   

12.
Nature's strategies for evolving catalytic functions can be deciphered from the information contained in the rapidly expanding protein sequence databases. However, the functions of many proteins in the protein sequence and structure databases are either uncertain (too divergent to assign function based on homology) or unknown (no homologs), thereby limiting the utility of the databases. The mechanistically diverse enolase superfamily is a paradigm for understanding the structural bases for evolution of enzymatic function. We describe strategies for assigning functions to members of the enolase superfamily that should be applicable to other superfamilies.  相似文献   

13.
To investigate the relationships between functional subclasses and sequence and structural information contained in the active‐site and ligand‐binding residues (LBRs), we performed a detailed analysis of seven diverse enzyme superfamilies: aldolase class I, TIM‐barrel glycosidases, α/β‐hydrolases, P‐loop containing nucleotide triphosphate hydrolases, collagenase, Zn peptidases, and glutamine phosphoribosylpyrophosphate, subunit 1, domain 1. These homologous superfamilies, as defined in CATH, were selected from the enzyme catalytic‐mechanism database. We defined active‐site and LBRs based solely on the literature information and complex structures in the Protein Data Bank. From a structure‐based multiple sequence alignment for each CATH homologous superfamily, we extracted subsequences consisting of the aligned positions that were used as an active‐site or a ligand‐binding site by at least one sequence. Using both the subsequences and full‐length alignments, we performed cluster analysis with three sequence distance measures. We showed that the cluster analysis using the subsequences was able to detect functional subclasses more accurately than the clustering using the full‐length alignments. The subsequences determined by only the literature information and complex structures, thus, had sufficient information to detect the functional subclasses. Detailed examination of the clustering results provided new insights into the mechanism of functional diversification for these superfamilies. Proteins 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

14.

Background  

Certain protein families are highly conserved across distantly related organisms and belong to large and functionally diverse superfamilies. The patterns of conservation present in these protein sequences presumably are due to selective constraints maintaining important but unknown structural mechanisms with some constraints specific to each family and others shared by a larger subset or by the entire superfamily. To exploit these patterns as a source of functional information, we recently devised a statistically based approach called contrast hierarchical alignment and interaction network (CHAIN) analysis, which infers the strengths of various categories of selective constraints from co-conserved patterns in a multiple alignment. The power of this approach strongly depends on the quality of the multiple alignments, which thus motivated development of theoretical concepts and strategies to improve alignment of conserved motifs within large sets of distantly related sequences.  相似文献   

15.
Many protein classification systems capture homologous relationships by grouping domains into families and superfamilies on the basis of sequence similarity. Superfamilies with similar 3D structures are further grouped into folds. In the absence of discernable sequence similarity, these structural similarities were long thought to have originated independently, by convergent evolution. However, the growth of databases and advances in sequence comparison methods have led to the discovery of many distant evolutionary relationships that transcend the boundaries of superfamilies and folds. To investigate the contributions of convergent versus divergent evolution in the origin of protein folds, we clustered representative domains of known structure by their sequence similarity, treating them as point masses in a virtual 2D space which attract or repel each other depending on their pairwise sequence similarities. As expected, families in the same superfamily form tight clusters. But often, superfamilies of the same fold are linked with each other, suggesting that the entire fold evolved from an ancient prototype. Strikingly, some links connect superfamilies with different folds. They arise from modular peptide fragments of between 20 and 40 residues that co‐occur in the connected folds in disparate structural contexts. These may be descendants of an ancestral pool of peptide modules that evolved as cofactors in the RNA world and from which the first folded proteins arose by amplification and recombination. Our galaxy of folds summarizes, in a single image, most known and many yet undescribed homologous relationships between protein superfamilies, providing new insights into the evolution of protein domains.  相似文献   

16.
Structural trees for large protein superfamilies, such as β proteins with the aligned β sheet packing, β proteins with the orthogonal packing of α helices, two-layer and three-layer α/β proteins, have been constructed. The structural motifs having unique overall folds and a unique handedness are taken as root structures of the trees. The larger protein structures of each superfamily are obtained by a stepwise addition of α helices and/or β strands to the corresponding root motif, taking into account a restricted set of rules inferred from known principles of the protein structure. Among these rules, prohibition of crossing connections, attention to handedness and compactness, and a requirement for α helices to be packed in α-helical layers and β strands in β layers are the most important. Proteins and domains whose structures can be obtained by stepwise addition of α helices and/or β strands to the same root motif can be grouped into one structural class or a superfamily. Proteins and domains found within branches of a structural tree can be grouped into subclasses or subfamilies. Levels of structural similarity between different proteins can easily be observed by visual inspection. Within one branch, protein structures having a higher position in the tree include the structures located lower. Proteins and domains of different branches have the structure located in the branching point as the common fold. Proteins 28:241–260, 1997. © 1997 Wiley-Liss Inc.  相似文献   

17.
As enzymes evolve and diverge from common ancestor sequences, they often keep their overall reaction chemistry but specialize in the binding of different cognate ligands. This study borrows methods for the computational assessment of 2D similarity of small molecules from the field of chemoinformatics, to examine the extent of structure conservation of cognate ligands binding to similar proteins. Proteins from 87 structural superfamilies from Escherichia coli form the core dataset, which is extended using homologues with functional assignments from any organism. We find that correlation of the substrate similarity with protein similarity (measured by either sequence-based or structure-based scores) can only be clearly established for very similar proteins. At low sequence identities, the superfamily to which a protein belongs can give helpful clues to its function, and more importantly, the confidence attached to such clues is superfamily-dependent. Our data indicate that only a few superfamilies show great substrate diversity, and that most exhibit conservation of at least part of the structural scaffold of the substrate.  相似文献   

18.
As increasingly large amounts of data from genome and other sequencing projects become available, new approaches are needed to determine the functions of the proteins these genes encode. We show how large-scale computational analysis can help to address this challenge by linking functional information to sequence and structural similarities using protein similarity networks. Network analyses using three functionally diverse enzyme superfamilies illustrate the use of these approaches for facile updating and comparison of available structures for a large superfamily, for creation of functional hypotheses for metagenomic sequences, and to summarize the limits of our functional knowledge about even well studied superfamilies.  相似文献   

19.
20.
The CATH database of domain structures has been used to explore the structural variation of homologous domains in 294 well populated domain structure superfamilies, each containing at least three sequence diverse relatives. Our analyses confirm some previously detected trends relating sequence divergence to structural variation but for a much larger dataset and in some superfamilies the new data reveal exceptional structural variation. Use of a new algorithm (2DSEC) to analyse variability in secondary structure compositions across a superfamily sheds new light on how structures evolve. 2DSEC detects inserted secondary structures that embellish the core of conserved secondary structures found throughout the superfamily. Analysis showed that for 56% of highly populated superfamilies (>9 sequence diverse relatives), there are twofold or more increases in the numbers of secondary structures in some relatives. In some families fivefold increases occur, sometimes modifying the fold of the domain. Manual inspection of secondary structure insertions or embellishments in 48 particularly variable superfamilies revealed that although these insertions were usually discontiguous in the sequence they were often co-located in 3D resulting in a larger structural motif that often modified the geometry of the active site or the surface conformation promoting diverse domain partnerships and protein interactions. These observations, supported by automatic analysis of all well populated CATH families, suggest that accretion of small secondary structure insertions may provide a simple mechanism for evolving new functions in diverse relatives. Some layered domain architectures (e.g. mainly-beta and alpha-beta sandwiches) that recur highly in the genomes more frequently exploit these types of embellishments to modify function. In these architectures, aggregation occurs most often at the edges, top or bottom of the beta-sheets. Information on structural variability across domain superfamilies has been made available through the CATH Dictionary of Homologous Structures (DHS).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号