首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Structure comparison is widely used to quantify protein relationships. Although there are several approaches to calculate structural similarity, specifying significance thresholds for similarity metrics is difficult due to the inherent likeness of common secondary structure elements. In this study, metal co‐factor location is used to assess the biological relevance of structural alignments. The distance between the centroids of bound co‐factors adds a chemical and function‐relevant constraint to the structural superimposition of two proteins. This additional dimension can be used to define cut‐off values for discriminating valid and spurious alignments in large alignment sets. The hypothesis underlying our approach is that metal coordination sites constrain structural evolution, thus revealing functional relationships between distantly related proteins. A comparison of three related nitrogenases shows the sequence and fold constraints imposed on the protein structures up to 18 Å away from the centers of their bound metal clusters. Proteins 2014; 82:648–656. © 2013 Wiley Periodicals, Inc.  相似文献   

2.
McGuffin LJ  Jones DT 《Proteins》2002,48(1):44-52
The ultimate goal of structural genomics is to obtain the structure of each protein coded by each gene within a genome to determine gene function. Because of cost and time limitations, it remains impractical to solve the structure for every gene product experimentally. Up to a point, reasonably accurate three‐dimensional structures can be deduced for proteins with homologous sequences by using comparative modeling. Beyond this, fold recognition or threading methods can be used for proteins showing little homology to any known fold, although this is relatively time‐consuming and limited by the library of template folds currently available. Therefore, it is appropriate to develop methods that can increase our knowledge base, expanding our fold libraries by earmarking potentially “novel” folds for experimental structure determination. How can we sift through proteomic data rapidly and yet reliably identify novel folds as targets for structural genomics? We have analyzed a number of simple methods that discriminate between “novel” and “known” folds. We propose that simple alignments of secondary structure elements using predicted secondary structure could potentially be a more selective method than both a simple fold recognition method (GenTHREADER) and standard sequence alignment at finding novel folds when sequences show no detectable homology to proteins with known structures. Proteins 2002;48:44–52. © 2002 Wiley‐Liss, Inc.  相似文献   

3.
Qi Y  Grishin NV 《Proteins》2005,58(2):376-388
Protein structure classification is necessary to comprehend the rapidly growing structural data for better understanding of protein evolution and sequence-structure-function relationships. Thioredoxins are important proteins that ubiquitously regulate cellular redox status and various other crucial functions. We define the thioredoxin-like fold using the structure consensus of thioredoxin homologs and consider all circular permutations of the fold. The search for thioredoxin-like fold proteins in the PDB database identified 723 protein domains. These domains are grouped into eleven evolutionary families based on combined sequence, structural, and functional evidence. Analysis of the protein-ligand structure complexes reveals two major active site locations for the thioredoxin-like proteins. Comparison to existing structure classifications reveals that our thioredoxin-like fold group is broader and more inclusive, unifying proteins from five SCOP folds, five CATH topologies and seven DALI domain dictionary globular folding topologies. Considering these structurally similar domains together sheds new light on the relationships between sequence, structure, function and evolution of thioredoxins.  相似文献   

4.
The question of how best to compare and classify the (three‐dimensional) structures of proteins is one of the most important unsolved problems in computational biology. To help tackle this problem, we have developed a novel shape‐density superposition algorithm called 3D‐Blast which represents and superposes the shapes of protein backbone folds using the spherical polar Fourier correlation technique originally developed by us for protein docking. The utility of this approach is compared with several well‐known protein structure alignment algorithms using receiver‐operator‐characteristic plots of queries against the “gold standard” CATH database. Despite being completely independent of protein sequences and using no information about the internal geometry of proteins, our results from searching the CATH database show that 3D‐Blast is highly competitive compared to current state‐of‐the‐art protein structure alignment algorithms. A novel and potentially very useful feature of our approach is that it allows an average or “consensus” fold to be calculated easily for a given group of protein structures. We find that using consensus shapes to represent entire fold families also gives very good database query performance. We propose that using the notion of consensus fold shapes could provide a powerful new way to index existing protein structure databases, and that it offers an objective way to cluster and classify all of the currently known folds in the protein universe. Proteins 2012. © 2011 Wiley Periodicals, Inc.  相似文献   

5.
Functional annotation is seldom straightforward with complexities arising due to functional divergence in protein families or functional convergence between non‐homologous protein families, leading to mis‐annotations. An enzyme may contain multiple domains and not all domains may be involved in a given function, adding to the complexity in function annotation. To address this, we use binding site information from bound cognate ligands and catalytic residues, since it can help in resolving fold‐function relationships at a finer level and with higher confidence. A comprehensive database of 2,020 fold‐function‐binding site relationships has been systematically generated. A network‐based approach is employed to capture the complexity in these relationships, from which different types of associations are deciphered, that identify versatile protein folds performing diverse functions, same function associated with multiple folds and one‐to‐one relationships. Binding site similarity networks integrated with fold, function, and ligand similarity information are generated to understand the depth of these relationships. Apart from the observed continuity in the functional site space, network properties of these revealed versatile families with topologically different or dissimilar binding sites and structural families that perform very similar functions. As a case study, subtle changes in the active site of a set of evolutionarily related superfamilies are studied using these networks. Tracing of such similarities in evolutionarily related proteins provide clues into the transition and evolution of protein functions. Insights from this study will be helpful in accurate and reliable functional annotations of uncharacterized proteins, poly‐pharmacology, and designing enzymes with new functional capabilities. Proteins 2017; 85:1319–1335. © 2017 Wiley Periodicals, Inc.  相似文献   

6.
In the present study, a novel structural motif of proteins referred to as the phi-motif is considered, and two novel structural trees in which the phi-motif is taken as the root structure have been constructed. The simplest phi-motif is formed by three adjacent beta-strands connected by loops and packed in one beta-sheet so that its overall fold resembles the Greek letter phi. Construction of the structural trees and modeling of folding pathways have shown that all structures of the protein superfamilies can be obtained by stepwise addition of alpha-helices and/or beta-strands to the root phi-motif taking into account a restricted set of rules inferred from known principles of protein structure. The structural trees are a good tool for structure comparison, structural classification of proteins, as well as for searching for all possible protein folds and folding pathways.  相似文献   

7.
Many protein classification systems capture homologous relationships by grouping domains into families and superfamilies on the basis of sequence similarity. Superfamilies with similar 3D structures are further grouped into folds. In the absence of discernable sequence similarity, these structural similarities were long thought to have originated independently, by convergent evolution. However, the growth of databases and advances in sequence comparison methods have led to the discovery of many distant evolutionary relationships that transcend the boundaries of superfamilies and folds. To investigate the contributions of convergent versus divergent evolution in the origin of protein folds, we clustered representative domains of known structure by their sequence similarity, treating them as point masses in a virtual 2D space which attract or repel each other depending on their pairwise sequence similarities. As expected, families in the same superfamily form tight clusters. But often, superfamilies of the same fold are linked with each other, suggesting that the entire fold evolved from an ancient prototype. Strikingly, some links connect superfamilies with different folds. They arise from modular peptide fragments of between 20 and 40 residues that co‐occur in the connected folds in disparate structural contexts. These may be descendants of an ancestral pool of peptide modules that evolved as cofactors in the RNA world and from which the first folded proteins arose by amplification and recombination. Our galaxy of folds summarizes, in a single image, most known and many yet undescribed homologous relationships between protein superfamilies, providing new insights into the evolution of protein domains.  相似文献   

8.
Newly determined protein structures are classified to belong to a new fold, if the structures are sufficiently dissimilar from all other so far known protein structures. To analyze structural similarities of proteins, structure alignment tools are used. We demonstrate that the usage of nonsequential structure alignment tools, which neglect the polypeptide chain connectivity, can yield structure alignments with significant similarities between proteins of known three-dimensional structure and newly determined protein structures that possess a new fold. The recently introduced protein structure alignment tool, GANGSTA, is specialized to perform nonsequential alignments with proper assignment of the secondary structure types by focusing on helices and strands only. In the new version, GANGSTA+, the underlying algorithms were completely redesigned, yielding enhanced quality of structure alignments, offering alignment against a larger database of protein structures, and being more efficient. We applied DaliLite, TM-align, and GANGSTA+ on three protein crystal structures considered to be novel folds. Applying GANGSTA+ to these novel folds, we find proteins in the ASTRAL40 database, which possess significant structural similarities, albeit the alignments are nonsequential and in some cases involve secondary structure elements aligned in reverse orientation. A web server is available at http://agknapp.chemie.fu-berlin.de/gplus for pairwise alignment, visualization, and database comparison.  相似文献   

9.
Available high‐resolution crystal structures for the family of β‐trefoil proteins in the structural databank were queried for buried waters. Such waters were classified as either: (a) unique to a particular domain, family, or superfamily or (b) conserved among all β‐trefoil folds. Three buried waters conserved among all β‐trefoil folds were identified. These waters are related by the threefold rotational pseudosymmetry characteristic of this protein architecture (representing three instances of an identical structural environment within each repeating trefoil‐fold motif). The structural properties of this buried water are remarkable and include: residing in a cavity space no larger than a single water molecule, exhibiting a positional uncertainty (i.e., normalized B‐factor) substantially lower than the average Cα atom, providing essentially ideal H‐bonding geometry with three solvent‐inaccessible main chain groups, simultaneously serving as a bridging H‐bond for three different β‐strands at a point of secondary structure divergence, and orienting conserved hydrophobic side chains to form a nascent core‐packing group. Other published work supports an interpretation that these interactions are key to the formation of an efficient folding nucleus and folded thermostability. The fundamental threefold symmetric structural element of the β‐trefoil fold is therefore, surprisingly, a buried water molecule.  相似文献   

10.
We have used GRATH, a graph-based structure comparison algorithm, to map the similarities between the different folds observed in the CATH domain structure database. Statistical analysis of the distributions of the fold similarities has allowed us to assess the significance for any similarity. Therefore we have examined whether it is best to represent folds as discrete entities or whether, in fact, a more accurate model would be a continuum wherein folds overlap via common motifs. To do this we have introduced a new statistical measure of fold similarity, termed gregariousness. For a particular fold, gregariousness measures how many other folds have a significant structural overlap with that fold, typically comprising 40% or more of the larger structure. Gregarious folds often contain commonly occurring super-secondary structural motifs, such as beta-meanders, greek keys, alpha-beta plait motifs or alpha-hairpins, which are matching similar motifs in other folds. Apart from one example, all the most gregarious folds matching 20% or more of the other folds in the database, are alpha-beta proteins. They also occur in highly populated architectural regions of fold space, adopting sandwich-like arrangements containing two or more layers of alpha-helices and beta-strands.Domains that exhibit a low gregariousness, are those that have very distinctive folds, with few common motifs or motifs that are packed in unusual arrangements. Most of the superhelices exhibit low gregariousness despite containing some commonly occurring super-secondary structural motifs. In these folds, these common motifs are combined in an unusual way and represent a small proportion of the fold (<10%). Our results suggest that fold space may be considered as continuous for some architectural arrangements (e.g. alpha-beta sandwiches), in that super-secondary motifs can be used to link neighbouring fold groups. However, in other regions of fold space much more discrete topologies are observed with little similarity between folds.  相似文献   

11.
12.
Typically, protein spatial structures are more conserved in evolution than amino acid sequences. However, the recent explosion of sequence and structure information accompanied by the development of powerful computational methods led to the accumulation of examples of homologous proteins with globally distinct structures. Significant sequence conservation, local structural resemblance, and functional similarity strongly indicate evolutionary relationships between these proteins despite pronounced structural differences at the fold level. Several mechanisms such as insertions/deletions/substitutions, circular permutations, and rearrangements in beta-sheet topologies account for the majority of detected structural irregularities. The existence of evolutionarily related proteins that possess different folds brings new challenges to the homology modeling techniques and the structure classification strategies and offers new opportunities for protein design in experimental studies.  相似文献   

13.
Overexpression of multiple copies in T‐cell lymphoma‐1 (MCT‐1) oncogene accompanies malignant phenotypic changes in human lymphoma cells. Specific disruption of MCT‐1 results in reduced tumorigenesis, suggesting a potential for MCT‐1‐targeted therapeutic strategy. MCT‐1 is known as a cap‐binding protein and has a putative RNA‐binding motif, the PUA‐domain, at its C‐terminus. We determined the crystal structure of apo MCT‐1 at 1.7 Å resolution using the surface entropy reduction method. Notwithstanding limited sequence identity to its homologs, the C‐terminus of MCT‐1 adopted a typical PUA‐domain fold that includes secondary structural elements essential for RNA recognition. The surface of the N‐terminal domain contained positively charged patches that are predicted to contribute to RNA‐binding. Proteins 2013. © 2012 Wiley Periodicals, Inc.  相似文献   

14.
15.
Gene duplication and fusion events in protein evolution are postulated to be responsible for the common protein folds exhibiting internal rotational symmetry. Such evolutionary processes can also potentially yield regions of repetitive primary structure. Repetitive primary structure offers the potential for alternative definitions of critical regions, such as the folding nucleus (FN). In principle, more than one instance of the FN potentially enables an alternative folding pathway in the face of a subsequent deleterious mutation. We describe the targeted mutation of the carboxyl‐terminal region of the (internally located) FN of the de novo designed purely‐symmetric β‐trefoil protein Symfoil‐4P. This mutation involves wholesale replacement of a repeating trefoil‐fold motif with a “blade” motif from a β‐propeller protein, and postulated to trap that region of the Symfoil‐4P FN in a nonproductive folding intermediate. The resulting protein (termed “Bladefoil”) is shown to be cooperatively folding, but as a trimeric oligomer. The results illustrate how symmetric protein architectures have potentially diverse folding alternatives available to them, including oligomerization, when preferred pathways are perturbed.  相似文献   

16.
Many protein architectures exhibit evidence of internal rotational symmetry postulated to be the result of gene duplication/fusion events involving a primordial polypeptide motif. A common feature of such structures is a domain‐swapped arrangement at the interface of the N‐ and C‐termini motifs and postulated to provide cooperative interactions that promote folding and stability. De novo designed symmetric protein architectures have demonstrated an ability to accommodate circular permutation of the N‐ and C‐termini in the overall architecture; however, the folding requirement of the primordial motif is poorly understood, and tolerance to circular permutation is essentially unknown. The β‐trefoil protein fold is a threefold‐symmetric architecture where the repeating ~42‐mer “trefoil‐fold” motif assembles via a domain‐swapped arrangement. The trefoil‐fold structure in isolation exposes considerable hydrophobic area that is otherwise buried in the intact β‐trefoil trimeric assembly. The trefoil‐fold sequence is not predicted to adopt the trefoil‐fold architecture in ab initio folding studies; rather, the predicted fold is closely related to a compact “blade” motif from the β‐propeller architecture. Expression of a trefoil‐fold sequence and circular permutants shows that only the wild‐type N‐terminal motif definition yields an intact β‐trefoil trimeric assembly, while permutants yield monomers. The results elucidate the folding requirements of the primordial trefoil‐fold motif, and also suggest that this motif may sample a compact conformation that limits hydrophobic residue exposure, contains key trefoil‐fold structural features, but is more structurally homologous to a β‐propeller blade motif.  相似文献   

17.
18.
We determined the NMR structure of a highly aromatic (13%) protein of unknown function, Aq1974 from Aquifex aeolicus (PDB ID: 5SYQ). The unusual sequence of this protein has a tryptophan content five times the normal (six tryptophan residues of 114 or 5.2% while the average tryptophan content is 1.0%) with the tryptophans occurring in a WXW motif. It has no detectable sequence homology with known protein structures. Although its NMR spectrum suggested that the protein was rich in β‐sheet, upon resonance assignment and solution structure determination, the protein was found to be primarily α‐helical with a small two‐stranded β‐sheet with a novel fold that we have termed an Aromatic Claw. As this fold was previously unknown and the sequence unique, we submitted the sequence to CASP10 as a target for blind structural prediction. At the end of the competition, the sequence was classified a hard template based model; the structural relationship between the template and the experimental structure was small and the predictions all failed to predict the structure. CSRosetta was found to predict the secondary structure and its packing; however, it was found that there was little correlation between CSRosetta score and the RMSD between the CSRosetta structure and the NMR determined one. This work demonstrates that even in relatively small proteins, we do not yet have the capacity to accurately predict the fold for all primary sequences. The experimental discovery of new folds helps guide the improvement of structural prediction methods.  相似文献   

19.
Alexander V. Efimov 《Proteins》2017,85(10):1925-1930
In this study, the structural motifs that can be represented as combinations of small motifs such as β‐hairpins, S‐, and Z‐like β‐sheets and βαβ‐units, and the П‐like module are described and analyzed. The П‐module consists of connected elements of the β‐strand‐loop‐β‐strand type arranged in space so that its overall fold resembles a clip or the Greek letter П. In proteins, the П‐module itself and the structural motifs containing it exhibit unique overall folds and have specific sequence patterns of the key hydrophobic, hydrophilic and glycine residues. All this together enables us to conclude that these structural motifs can fold independently of the remaining part of the molecule and can act as nuclei and/or “ready‐made” building blocks in protein folding.  相似文献   

20.
John Lhota  Lei Xie 《Proteins》2016,84(4):467-472
Protein structure prediction, when construed as a fold recognition problem, is one of the most important applications of similarity search in bioinformatics. A new protein‐fold recognition method is reported which combines a single‐source K diverse shortest path (SSKDSP) algorithm with Enrichment of Network Topological Similarity (ENTS) algorithm to search a graphic feature space generated using sequence similarity and structural similarity metrics. A modified, more efficient SSKDSP algorithm is developed to improve the performance of graph searching. The new implementation of the SSKDSP algorithm empirically requires 82% less memory and 61% less time than the current implementation, allowing for the analysis of larger, denser graphs. Furthermore, the statistical significance of fold ranking generated from SSKDSP is assessed using ENTS. The reported ENTS‐SSKDSP algorithm outperforms original ENTS that uses random walk with restart for the graph search as well as other state‐of‐the‐art protein structure prediction algorithms HHSearch and Sparks‐X, as evaluated by a benchmark of 600 query proteins. The reported methods may easily be extended to other similarity search problems in bioinformatics and chemoinformatics. The SSKDSP software is available at http://compsci.hunter.cuny.edu/~leixie/sskdsp.html . Proteins 2016; 84:467–472. © 2016 Wiley Periodicals, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号