首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 859 毫秒
1.
2.
Several studies based on the known three-dimensional (3-D) structures of proteins show that two homologous proteins with insignificant sequence similarity could adopt a common fold and may perform same or similar biochemical functions. Hence, it is appropriate to use similarities in 3-D structure of proteins rather than the amino acid sequence similarities in modelling evolution of distantly related proteins. Here we present an assessment of using 3-D structures in modelling evolution of homologous proteins. Using a dataset of 108 protein domain families of known structures with at least 10 members per family we present a comparison of extent of structural and sequence dissimilarities among pairs of proteins which are inputs into the construction of phylogenetic trees. We find that correlation between the structure-based dissimilarity measures and the sequence-based dissimilarity measures is usually good if the sequence similarity among the homologues is about 30% or more. For protein families with low sequence similarity among the members, the correlation coefficient between the sequence-based and the structure-based dissimilarities are poor. In these cases the structure-based dendrogram clusters proteins with most similar biochemical functional properties better than the sequence-similarity based dendrogram. In multi-domain protein families and disulphide-rich protein families the correlation coefficient for the match of sequence-based and structure-based dissimilarity (SDM) measures can be poor though the sequence identity could be higher than 30%. Hence it is suggested that protein evolution is best modelled using 3-D structures if the sequence similarities (SSM) of the homologues are very low.  相似文献   

3.
The rapid growth in protein structural data and the emergence of structural genomics projects have increased the need for automatic structure analysis and tools for function prediction. Small molecule recognition is critical to the function of many proteins; therefore, determination of ligand binding site similarity is important for understanding ligand interactions and may allow their functional classification. Here, we present a binding sites database (SitesBase) that given a known protein-ligand binding site allows rapid retrieval of other binding sites with similar structure independent of overall sequence or fold similarity. However, each match is also annotated with sequence similarity and fold information to aid interpretation of structure and functional similarity. Similarity in ligand binding sites can indicate common binding modes and recognition of similar molecules, allowing potential inference of function for an uncharacterised protein or providing additional evidence of common function where sequence or fold similarity is already known. Alternatively, the resource can provide valuable information for detailed studies of molecular recognition including structure-based ligand design and in understanding ligand cross-reactivity. Here, we show examples of atomic similarity between superfamily or more distant fold relatives as well as between seemingly unrelated proteins. Assignment of unclassified proteins to structural superfamiles is also undertaken and in most cases substantiates assignments made using sequence similarity. Correct assignment is also possible where sequence similarity fails to find significant matches, illustrating the potential use of binding site comparisons for newly determined proteins.  相似文献   

4.
5.
An Y  Friesner RA 《Proteins》2002,48(2):352-366
In this work, we introduce a new method for fold recognition using composite secondary structures assembled from different secondary structure prediction servers for a given target sequence. An automatic, complete, and robust way of finding all possible combinations of predicted secondary structure segments (SSS) for the target sequence and clustering them into a few flexible clusters, each containing patterns with the same number of SSS, is developed. This program then takes two steps in choosing plausible homologues: (i) a SSS-based alignment excludes impossible templates whose SSS patterns are very different from any of those of the target; (ii) a residue-based alignment selects good structural templates based on sequence similarity and secondary structure similarity between the target and only those templates left in the first stage. The secondary structure of each residue in the target is selected from one of the predictions to find the best match with the template. Truncation is applied to a target where different predictions vary. In most cases, a target is also divided into N-terminal and C-terminal fragments, each of which is used as a separate subsequence. Our program was tested on the fold recognition targets from CASP3 with known PDB codes and some available targets from CASP4. The results are compared with a structural homologue list for each target produced by the CE program (Shindyalov and Bourne, Protein Eng 1998;11:739-747). The program successfully locates homologues with high Z-score and low root-mean-score deviation within the top 30-50 predictions in the overwhelming majority of cases.  相似文献   

6.
A significant proportion of bacteria express two or more chaperonin genes. Chaperonins are a group of molecular chaperones, defined by sequence similarity, required for the folding of some cellular proteins. Chaperonin monomers have a mass of c . 60 kDa, and are typically found as large protein complexes containing 14 subunits arranged in two rings. The mechanism of action of the Escherichia coli GroEL protein has been studied in great detail. It acts by binding to unfolded proteins and enabling them to fold in a protected environment where they do not interact with any other proteins. GroEL can assist the folding of many proteins of different sizes, sequences, and structures, and homologues from many different bacteria can functionally replace GroEL in E. coli . What then are the functions of multiple chaperonins? Do they provide a mechanism for cells to increase their general chaperoning ability, or have they become specialized to take on specific novel cellular roles? Here I will review the genetic, biochemical, and phylogenetic evidence that has a bearing on this question, and show that there is good evidence for at least some specificity of function in multiple chaperonin genes.  相似文献   

7.
Using sensitive structure similarity searches, we identify a shared alpha+beta fold, RAGNYA, principally involved in nucleic acid, nucleotide or peptide interactions in a diverse group of proteins. These include the Ribosomal proteins L3 and L1, ATP-grasp modules, the GYF domain, DNA-recombination proteins of the NinB family from caudate bacteriophages, the C-terminal DNA-interacting domain of the Y-family DNA polymerases, the uncharacterized enzyme AMMECR1, the siRNA silencing repressor of tombusviruses, tRNA Wybutosine biosynthesis enzyme Tyw3p, DNA/RNA ligases and related nucleotidyltransferases and the Enhancer of rudimentary proteins. This fold exhibits three distinct circularly permuted versions and is composed of an internal repeat of a unit with two-strands and a helix. We show that despite considerable structural diversity in the fold, its representatives show a common mode of nucleic acid or nucleotide interaction via the exposed face of the sheet. Using this information and sensitive profile-based sequence searches: (1) we predict the active site, and mode of substrate interaction of the Wybutosine biosynthesis enzyme, Tyw3p, and a potential catalytic role for AMMECR1. (2) We provide insights regarding the mode of nucleic acid interaction of the NinB proteins, and the evolution of the active site of classical ATP-grasp enzymes and DNA/RNA ligases. (3) We also present evidence for a bacterial origin of the GYF domain and propose how this version of the fold might have been utilized in peptide interactions in the context of nucleoprotein complexes.  相似文献   

8.
Heterodimeric proteins with homologous subunits of same fold are involved in various biological processes. The objective of this study is to understand the evolution of structural and functional features of such heterodimers. Using a non‐redundant dataset of 70 such heterodimers of known 3D structure and an independent dataset of 173 heterodimers from yeast, we note that the mean sequence identity between interacting homologous subunits is only 23–24% suggesting that, generally, highly diverged paralogues assemble to form such a heterodimer. We also note that the functional roles of interacting subunits/domains are generally quite different. This suggests that, though the interacting subunits/domains are homologous, the high evolutionary divergence characterize their high functional divergence which contributes to a gross function for the heterodimer considered as a whole. The inverse relationship between sequence identity and RMSD of interacting homologues in heterodimers is not followed. We also addressed the question of formation of homodimers of the subunits of heterodimers by generating models of fictitious homodimers on the basis of the 3D structures of the heterodimers. Interaction energies associated with these homodimers suggests that, in overwhelming majority of the cases, such homodimers are unlikely to be stable. Majority of the homologues of heterodimers of known structures form heterodimers (51.8%) and a small proportion (14.6%) form homodimers. Comparison of 3D structures of heterodimers with homologous homodimers suggests that interfacial nature of residues is not well conserved. In over 90% of the cases we note that the interacting subunits of heterodimers are co‐localized in the cell. Proteins 2015; 83:1766–1786. © 2015 Wiley Periodicals, Inc.  相似文献   

9.
10.
Despite significant methodological advances in protein structure determination high-resolution structures of membrane proteins are still rare, leaving sequence-based predictions as the only option for exploring the structural variability of membrane proteins at large scale. Here, a new structural classification approach for α-helical membrane proteins is introduced based on the similarity of predicted helix interaction patterns. Its application to proteins with known 3D structure showed that it is able to reliably detect structurally similar proteins even in the absence of any sequence similarity, reproducing the SCOP and CATH classifications with a sensitivity of 65% at a specificity of 90%. We applied the new approach to enhance our comprehensive structural classification of α-helical membrane proteins (CAMPS), which is primarily based on sequence and topology similarity, in order to find protein clusters that describe the same fold in the absence of sequence similarity. The total of 151 helix architectures were delineated for proteins with more than four transmembrane segments. Interestingly, we observed that proteins with 8 and more transmembrane helices correspond to fewer different architectures than proteins with up to 7 helices, suggesting that in large membrane proteins the evolutionary tendency to re-use already available folds is more pronounced.  相似文献   

11.
We present a protein fold recognition method, MANIFOLD, which uses the similarity between target and template proteins in predicted secondary structure, sequence and enzyme code to predict the fold of the target protein. We developed a non-linear ranking scheme in order to combine the scores of the three different similarity measures used. For a difficult test set of proteins with very little sequence similarity, the program predicts the fold class correctly in 34% of cases. This is an over twofold increase in accuracy compared with sequence-based methods such as PSI-BLAST or GenTHREADER, which score 13-14% correct first hits for the same test set. The functional similarity term increases the prediction accuracy by up to 3% compared with using the combination of secondary structure similarity and PSI-BLAST alone. We argue that using functional and secondary structure information can increase the fold recognition beyond sequence similarity.  相似文献   

12.
13.
14.
The availability of complete genome sequences has highlighted the problems of functional annotation of the many gene products that have only limited sequence similarity with proteins of known function. The predicted protein encoded by open reading frame Rv3214 from the Mycobacterium tuberculosis H37Rv genome was originally annotated as EntD through sequence similarity with the Escherichia coli EntD, a 4'-phosphopantetheinyl transferase implicated in siderophore biosynthesis. An alternative annotation, based on slightly higher sequence identity, grouped Rv3214 with proteins of the cofactor-dependent phosphoglycerate mutase (dPGM) family. The crystal structure of this protein has been solved by single-wavelength anomalous dispersion methods and refined at 2.07-Angstroms resolution (R = 0.229; R(free) = 0.245). The protein is dimeric, with a monomer fold corresponding to the classical dPGM alpha/beta structure, albeit with some variations. Closer comparisons of structure and sequence indicate that it most closely corresponds with a broad-spectrum phosphatase subfamily within the dPGM superfamily. This functional annotation has been confirmed by biochemical assays which show negligible mutase activity but acid phosphatase activity with a pH optimum of 5.4 and suggests that Rv3214 may be important for mycobacterial phosphate metabolism in vivo. Despite its weak sequence similarity with the 4'-phosphopantetheinyl transferases (EntD homologues), there is little evidence to support this function.  相似文献   

15.
MOTIVATION: Large-scale experiments reveal pairs of interacting proteins but leave the residues involved in the interactions unknown. These interface residues are essential for understanding the mechanism of interaction and are often desired drug targets. Reliable identification of residues that reside in protein-protein interface typically requires analysis of protein structure. Therefore, for the vast majority of proteins, for which there is no high-resolution structure, there is no effective way of identifying interface residues. RESULTS: Here we present a machine learning-based method that identifies interacting residues from sequence alone. Although the method is developed using transient protein-protein interfaces from complexes of experimentally known 3D structures, it never explicitly uses 3D information. Instead, we combine predicted structural features with evolutionary information. The strongest predictions of the method reached over 90% accuracy in a cross-validation experiment. Our results suggest that despite the significant diversity in the nature of protein-protein interactions, they all share common basic principles and that these principles are identifiable from sequence alone.  相似文献   

16.
A family of hypothetical proteins, identified predominantly from archaeal genomes, has been analyzed in order to understand its functional characteristics. Using extensive sequence similarity searches it is inferred that this family is remotely related (best sequence identity is 19%) to ClpP proteinases that belongs to serine proteinase class. This family of hypothetical proteins is referred to as SDH proteinase family based on conserved sequential order of Ser, Asp and His residues and predicted serine proteinase activity. Results of fold recognition of SDH family sequences confirmed the remote relationship between SDH proteinases and Clp proteinases and revealed similar tertiary location of putative catalytic triad residues critical for serine proteinase function. However, the best sequence alignment we could obtain suggests that while catalytic Ser is conserved across Clp and SDH proteinases the location of the other catalytic triad residues, namely, His and Asp are swapped in their amino acid alignment positions and hence in 3-D structure. The evidence of conserved catalytic triad suggests that SDH could be a new family of serine proteinases with the fold of Clp proteinase, however sharing the catalytic triad order of carboxypeptidase clan. Signal peptide sequence identified at the N-terminus of some of the homologues suggests that these might be secretory serine proteinases involved in cleavage of extracellular proteins while the remote homologues, ClpP proteinases, are known to work in intracellular environment.  相似文献   

17.
In a cell, it has been estimated that each protein on average interacts with roughly 10 others, resulting in tens of thousands of proteins known or suspected to have interaction partners; of these, only a tiny fraction have solved protein structures. To partially address this problem, we have developed M-TASSER, a hierarchical method to predict protein quaternary structure from sequence that involves template identification by multimeric threading, followed by multimer model assembly and refinement. The final models are selected by structure clustering. M-TASSER has been tested on a benchmark set comprising 241 dimers having templates with weak sequence similarity and 246 without multimeric templates in the dimer library. Of the total of 207 targets predicted to interact as dimers, 165 (80%) were correctly assigned as interacting with a true positive rate of 68% and a false positive rate of 17%. The initial best template structures have an average root mean-square deviation to native of 5.3, 6.7, and 7.4 Å for the monomer, interface, and dimer structures. The final model shows on average a root mean-square deviation improvement of 1.3, 1.3, and 1.5 Å over the initial template structure for the monomer, interface, and dimer structures, with refinement evident for 87% of the cases. Thus, we have developed a promising approach to predict full-length quaternary structure for proteins that have weak sequence similarity to proteins of solved quaternary structure.  相似文献   

18.
Prion diseases are a group of fatal neurodegenerative disorders associated with structural conversion of a normal, mostly alpha-helical cellular prion protein, PrP(C), into a pathogenic beta-sheet-rich conformation, PrP(Sc). The structure of PrP(C) is well studied, whereas the insolubility of PrP(Sc) makes the characterization of its structure problematic. No proteins similar to PrP, except for its paralog with the same fold, PrP-Doppel, are known. However, PrP-Doppel does not undergo a structural transition into a beta-sheet-rich conformation. Structural information from proteins that share a weak but significant sequence similarity with PrP may be used to gain additional insights into the conformation of PrP(Sc). We construct a sequence profile corresponding to the structured domain of PrP and use this profile to search the SWISS-PROT and TrEMBL databases. We identify a significant sequence similarity between PrP and chimpanzee cytomegalovirus glycoprotein UL9. This glycoprotein scores higher than all PrP-Doppel sequences. Fold recognition methods assign a mainly-beta fold to UL9. Owing to the observed sequence similarity with PrP and a putative mainly-beta fold, the UL9 glycoprotein may represent a potential target for experimental structure determination aimed at obtaining a structural template for PrP(Sc) modeling.  相似文献   

19.
20.
A unique family of proteins have been identified in the Deinococcus genus with an N-terminal cobalamin (vitamin B(12)) chelatase domain denoted CbiX and an additional unique C-terminal domain with unknown function. Here we report the first crystal structure from this new family of proteins with the structure of Deinococcus radiodurans protein DR2241. The structure reveals a multi-domain protein where domains A (residues 1-132) has the same fold as the small CbiX (CbiX(S)), domains A and B (residues 1-272) follow the chelatase super-family fold and the two additional unique domains C and D have no structural homologues. Domain D harbours the sequence motifs CxxC and CxxxC, in which DR2241 gives the first evidence that these motifs bind a [4Fe-4S] iron-sulphur cluster. In solution there are indications of multimeric forms, and in the crystallographic asymmetric unit a tetramer is found where domains C and D are involved in stabilising the tetrameric assembly.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号