首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Qi Y  Grishin NV 《Proteins》2005,58(2):376-388
Protein structure classification is necessary to comprehend the rapidly growing structural data for better understanding of protein evolution and sequence-structure-function relationships. Thioredoxins are important proteins that ubiquitously regulate cellular redox status and various other crucial functions. We define the thioredoxin-like fold using the structure consensus of thioredoxin homologs and consider all circular permutations of the fold. The search for thioredoxin-like fold proteins in the PDB database identified 723 protein domains. These domains are grouped into eleven evolutionary families based on combined sequence, structural, and functional evidence. Analysis of the protein-ligand structure complexes reveals two major active site locations for the thioredoxin-like proteins. Comparison to existing structure classifications reveals that our thioredoxin-like fold group is broader and more inclusive, unifying proteins from five SCOP folds, five CATH topologies and seven DALI domain dictionary globular folding topologies. Considering these structurally similar domains together sheds new light on the relationships between sequence, structure, function and evolution of thioredoxins.  相似文献   

2.
Only a minority of currently known protein families is characterized structurally. This makes homology-based structure modeling an essential instrument that can be viewed as the first approximation to experimental determination of protein structure. Using sequence similarity searches, we detected a distant similarity between a family of uncharacterized hypothetical proteins, COG4849, and the family of tRNA nucleotidyltransferases. The suggested remote homology between the N-terminal domain of COG4849 and the catalytic domain of tRNA nucleotidyltransferase was further supported by comparison of sequence profiles, methods for fold recognition and structure modeling. The combined multiple alignment of the two families reveals shared conservation of functionally important motifs and suggests the similarity in catalytic mechanisms of the performed reactions. Our results suggest that (i) the N-terminal domain of proteins from COG4849 shares structural similarity with the catalytic domain of tRNA nucleotidyltransferase, and (ii) this domain catalyzes the nucleotidyl transfer reaction involving two metal ions.  相似文献   

3.
We have identified two new lysozyme-like protein families by using a combination of sequence similarity searches, domain architecture analysis, and structural predictions. First, the P5 protein from bacteriophage phi8, which belongs to COG3926 and Pfam family DUF847, is predicted to have a new lysozyme-like domain. This assignment is consistent with the lytic function of P5 proteins observed in several related double-stranded RNA bacteriophages. Domain architecture analysis reveals two lysozyme-associated transmembrane modules (LATM1 and LATM2) in a few COG3926/DUF847 members. LATM2 is also present in two proteins containing a peptidoglycan binding domain (PGB) and an N-terminal region that corresponds to COG5526 with uncharacterized function. Second, structure prediction and sequence analysis suggest that COG5526 represents another new lysozyme-like family. Our analysis offers fold and active-site assignments for COG3926/DUF847 and COG5526. The predicted enzymatic activity is consistent with an experimental study on the zliS gene product from Zymomonas mobilis, suggesting that bacterial COG3926/DUF847 members might be activators of macromolecular secretion.  相似文献   

4.
Ribonuclease H-like (RNHL) superfamily, also called the retroviral integrase superfamily, groups together numerous enzymes involved in nucleic acid metabolism and implicated in many biological processes, including replication, homologous recombination, DNA repair, transposition and RNA interference. The RNHL superfamily proteins show extensive divergence of sequences and structures. We conducted database searches to identify members of the RNHL superfamily (including those previously unknown), yielding >60 000 unique domain sequences. Our analysis led to the identification of new RNHL superfamily members, such as RRXRR (PF14239), DUF460 (PF04312, COG2433), DUF3010 (PF11215), DUF429 (PF04250 and COG2410, COG4328, COG4923), DUF1092 (PF06485), COG5558, OrfB_IS605 (PF01385, COG0675) and Peptidase_A17 (PF05380). Based on the clustering analysis we grouped all identified RNHL domain sequences into 152 families. Phylogenetic studies revealed relationships between these families, and suggested a possible history of the evolution of RNHL fold and its active site. Our results revealed clear division of the RNHL superfamily into exonucleases and endonucleases. Structural analyses of features characteristic for particular groups revealed a correlation between the orientation of the C-terminal helix with the exonuclease/endonuclease function and the architecture of the active site. Our analysis provides a comprehensive picture of sequence-structure-function relationships in the RNHL superfamily that may guide functional studies of the previously uncharacterized protein families.  相似文献   

5.
Double-stranded DNA bacteriophages and herpesviruses assemble their heads in a similar fashion; a pre-formed precursor called a prohead or procapsid undergoes a conformational transition to give rise to a mature head or capsid. A virus-encoded prohead or procapsid protease is often required in this maturation process. Through computational analysis, we infer homology between bacteriophage prohead proteases (MEROPS families U9 and U35) and herpesvirus protease (MEROPS family S21), and unify them into a procapsid protease superfamily. We also extend this superfamily to include an uncharacterized cluster of orthologs (COG3566) and many other phage or bacteria-encoded hypothetical proteins. On the basis of this homology and the herpesvirus protease structure and catalytic mechanism, we predict that bacteriophage prohead proteases adopt the herpesvirus protease fold and exploit a conserved Ser and His residue pair in catalysis. Our study provides further support for the proposed evolutionary link between dsDNA bacteriophages and herpesviruses.  相似文献   

6.
MOTIVATION: It is commonly believed that sequence determines structure, which in turn determines function. However, the presence of many proteins with the same structural fold but different functions suggests that global structure and function do not always correlate well. RESULTS: We propose a method for accurate functional annotation, based on identification of functional signatures from structural alignments (FSSA) using the Structural Classification of Proteins (SCOP) database. The FSSA method is superior at function discrimination and classification compared with several methods that directly inherit functional annotation information from homology inference, such as Smith-Waterman, PSI-BLAST, hidden Markov models and structure comparison methods, for a large number of structural fold families. Our results indicate that the contributions of amino acid residue types and positions to structure and function are largely separable for proteins in multi-functional fold families.  相似文献   

7.
In functional metagenomics, BLAST homology search is a common method to classify metagenomic reads into protein/domain sequence families such as Clusters of Orthologous Groups of proteins (COGs) in order to quantify the abundance of each COG in the community. The resulting functional profile of the community is then used in downstream analysis to correlate the change in abundance to environmental perturbation, clinical variation, and so on. However, the short read length coupled with next-generation sequencing technologies poses a barrier in this approach, essentially because similarity significance cannot be discerned by searching with short reads. Consequently, artificial functional families are produced, in which those with a large number of reads assigned decreases the accuracy of functional profile dramatically. There is no method available to address this problem. We intended to fill this gap in this paper. We revealed that BLAST similarity scores of homologues for short reads from COG protein members coding sequences are distributed differently from the scores of those derived elsewhere. We showed that, by choosing an appropriate score cut-off, we are able to filter out most artificial families and simultaneously to preserve sufficient information in order to build the functional profile. We also showed that, by incorporated application of BLAST and RPS-BLAST, some artificial families with large read counts can be further identified after the score cutoff filtration. Evaluated on three experimental metagenomic datasets with different coverages, we found that the proposed method is robust against read coverage and consistently outperforms the other E-value cutoff methods currently used in literatures.  相似文献   

8.
TA0095 is a 96-residue hypothetical protein from Thermoplasma acidophilum that exhibits no sequence similarity to any protein of known structure. Also, TA0095 is a member of the COG4004 orthologous group of unknown function found in Archaea bacteria. We determined its three-dimensional structure by NMR methods. The structure displays an alpha/beta two-layer sandwich architecture formed by three alpha-helices and five beta-strands following the order beta1-alpha1-beta2-beta3-beta4-beta5-alpha2-alpha3. Searches for structural homologs indicate that the TA0095 structure belongs to the TBP-like fold, constituting a novel superfamily characterized by an additional C-terminal helix. The TA0095 structure provides a fold common to the COG4004 proteins that will obviously belong to this new superfamily. Most hydrophobic residues conserved in the COG4004 proteins are buried in the structure determined herein, thus underlying their importance for structure stability. Considering that the TA0095 surface shows a large positively charged patch with a high degree of residue conservation within the COG4004 domain, the biological function of TA0095 and the rest of COG4004 proteins might occur through binding a negatively charged molecule. Like other TBP-like fold proteins, the COG4004 proteins might be DNA-binding proteins. The fact that TA0095 is shown to interact with large DNA fragments is in favor of this hypothesis, although nonspecific DNA binding cannot be ruled out.  相似文献   

9.
Restriction endonucleases and other nucleic acid cleaving enzymes form a large and extremely diverse superfamily that display little sequence similarity despite retaining a common core fold responsible for cleavage. The lack of significant sequence similarity between protein families makes homology inference a challenging task and hinders new family identification with traditional sequence-based approaches. Using the consensus fold recognition method Meta-BASIC that combines sequence profiles with predicted protein secondary structure, we identify nine new restriction endonuclease-like fold families among previously uncharacterized proteins and predict these proteins to cleave nucleic acid substrates. Application of transitive searches combined with gene neighborhood analysis allow us to confidently link these unknown families to a number of known restriction endonuclease-like structures and thus assign folds to the uncharacterized proteins. Finally, our method identifies a novel restriction endonuclease-like domain in the C-terminus of RecC that is not detected with structure-based searches of the existing PDB database.  相似文献   

10.
Endo-alpha-1,4-polygalactosaminidase is a rare enzyme. Its catalytic domain belongs to the GH114 family of glycoside hydrolases. Phylogenetic analysis of the family proteins allowed us to show an important role of duplications, eliminations, and horizontal transfer in the evolution of their genes. Domain structure, the secondary structure, and proposed structure of the active center of the endo-alpha-1,4-polygalactosaminidases are discussed. Evolutionary connections of the GH114 family with GH13, GH18, GH20, GH27, GH29, GH31, GH35, GH36, and GH66 families of glycoside hydrolases, as well as, with COG1306, COG1649, COG2342, GHL3, and GHL4 families of enzymatically uncharacterized proteins have been revealed by iterative screening of the protein database. The unclassified homologues have been grouped into 13 new families of hypothetical glycoside hydrolases: GHL5 - GHL15, GH36J, and GH36K.  相似文献   

11.
Recent progress in structure determination techniques has led to a significant growth in the number of known membrane protein structures, and the first structural genomics projects focusing on membrane proteins have been initiated, warranting an investigation of appropriate bioinformatics strategies for optimal structural target selection for these molecules. What determines a membrane protein fold? How many membrane structures need to be solved to provide sufficient structural coverage of the membrane protein sequence space? We present the CAMPS database (Computational Analysis of the Membrane Protein Space) containing almost 45,000 proteins with three or more predicted transmembrane helices (TMH) from 120 bacterial species. This large set of membrane proteins was subjected to single‐linkage clustering using only sequence alignments covering at least 40% of the TMH present in a given family. This process yielded 266 sequence clusters with at least 15 members, roughly corresponding to membrane structural folds, sufficiently structurally homogeneous in terms of the variation of TMH number between individual sequences. These clusters were further subdivided into functionally homogeneous subclusters according to the COG (Clusters of Orthologous Groups) system as well as more stringently defined families sharing at least 30% identity. The CAMPS sequence clusters are thus designed to reflect three main levels of interest for structural genomics: fold, function, and modeling distance. We present a library of Hidden Markov Models (HMM) derived from sequence alignments of TMH at these three levels of sequence similarity. Given that 24 out of 266 clusters corresponding to membrane folds already have associated known structures, we estimate that 242 additional new structures, one for each remaining cluster, would provide structural coverage at the fold level of roughly 70% of prokaryotic membrane proteins belonging to the currently most populated families. Proteins 2006. © 2006 Wiley‐Liss, Inc.  相似文献   

12.
Toxin-antitoxin systems (TAS) are abundant, diverse, horizontally mobile gene modules that encode powerful resistance mechanisms in prokaryotes. We use the comparative-genomic approach to predict a new TAS that consists of a two-gene cassette encoding uncharacterized HicA and HicB proteins. Numerous bacterial and archaeal genomes encode from one to eight HicAB modules which appear to be highly prone to horizontal gene transfer. The HicB protein (COG1598/COG4226) has a partially degraded RNAse H fold, whereas HicA (COG1724) contains a double-stranded RNA-binding domain. The stable combination of these two domains suggests a link to RNA metabolism, possibly, via an RNA interference-type mechanism. In most HicB proteins, the RNAse H-like domain is fused to a DNA-binding domain, either of the ribbon-helix-helix or of the helix-turn-helix class; in other TAS, proteins containing these DNA-binding domains function as antitoxins. Thus, the HicAB module is predicted to be a novel TAS whose mechanism involves RNA-binding and, possibly, cleavage.  相似文献   

13.
There are many well-known examples of proteins with low sequence similarity, adopting the same structural fold. This aspect of sequence-structure relationship has been extensively studied both experimentally and theoretically, however with limited success. Most of the studies consider remote homology or "sequence conservation" as the basis for their understanding. Recently "interaction energy" based network formalism (Protein Energy Networks (PENs)) was developed to understand the determinants of protein structures. In this paper we have used these PENs to investigate the common non-covalent interactions and their collective features which stabilize the TIM barrel fold. We have also developed a method of aligning PENs in order to understand the spatial conservation of interactions in the fold. We have identified key common interactions responsible for the conservation of the TIM fold, despite high sequence dissimilarity. For instance, the central beta barrel of the TIM fold is stabilized by long-range high energy electrostatic interactions and low-energy contiguous vdW interactions in certain families. The other interfaces like the helix-sheet or the helix-helix seem to be devoid of any high energy conserved interactions. Conserved interactions in the loop regions around the catalytic site of the TIM fold have also been identified, pointing out their significance in both structural and functional evolution. Based on these investigations, we have developed a novel network based phylogenetic analysis for remote homologues, which can perform better than sequence based phylogeny. Such an analysis is more meaningful from both structural and functional evolutionary perspective. We believe that the information obtained through the "interaction conservation" viewpoint and the subsequently developed method of structure network alignment, can shed new light in the fields of fold organization and de novo computational protein design.  相似文献   

14.
High divergence in protein sequences makes the detection of distant protein relationships through homology-based approaches challenging. Grouping protein sequences into families, through similarities in either sequence or 3-D structure, facilitates in the improved recognition of protein relationships. In addition, strategically designed protein-like sequences have been shown to bridge distant structural domain families by serving as artificial linkers. In this study, we have augmented a search database of known protein domain families with such designed sequences, with the intention of providing functional clues to domain families of unknown structure. When assessed using representative query sequences from each family, we obtain a success rate of 94% in protein domain families of known structure. Further, we demonstrate that the augmented search space enabled fold recognition for 582 families with no structural information available a priori. Additionally, we were able to provide reliable functional relationships for 610 orphan families. We discuss the application of our method in predicting functional roles through select examples for DUF4922, DUF5131, and DUF5085. Our approach also detects new associations between families that were previously not known to be related, as demonstrated through new sub-groups of the RNA polymerase domain among three distinct RNA viruses. Taken together, designed sequences-augmented search databases direct the detection of meaningful relationships between distant protein families. In turn, they enable fold recognition and offer reliable pointers to potential functional sites that may be probed further through direct mutagenesis studies.  相似文献   

15.

Background

As tertiary structure is currently available only for a fraction of known protein families, it is important to assess what parts of sequence space have been structurally characterized. We consider protein domains whose structure can be predicted by sequence similarity to proteins with solved structure and address the following questions. Do these domains represent an unbiased random sample of all sequence families? Do targets solved by structural genomic initiatives (SGI) provide such a sample? What are approximate total numbers of structure-based superfamilies and folds among soluble globular domains?

Results

To make these assessments, we combine two approaches: (i) sequence analysis and homology-based structure prediction for proteins from complete genomes; and (ii) monitoring dynamics of the assigned structure set in time, with the accumulation of experimentally solved structures. In the Clusters of Orthologous Groups (COG) database, we map the growing population of structurally characterized domain families onto the network of sequence-based connections between domains. This mapping reveals a systematic bias suggesting that target families for structure determination tend to be located in highly populated areas of sequence space. In contrast, the subset of domains whose structure is initially inferred by SGI is similar to a random sample from the whole population. To accommodate for the observed bias, we propose a new non-parametric approach to the estimation of the total numbers of structural superfamilies and folds, which does not rely on a specific model of the sampling process. Based on dynamics of robust distribution-based parameters in the growing set of structure predictions, we estimate the total numbers of superfamilies and folds among soluble globular proteins in the COG database.

Conclusion

The set of currently solved protein structures allows for structure prediction in approximately a third of sequence-based domain families. The choice of targets for structure determination is biased towards domains with many sequence-based homologs. The growing SGI output in the future should further contribute to the reduction of this bias. The total number of structural superfamilies and folds in the COG database are estimated as ~4000 and ~1700. These numbers are respectively four and three times higher than the numbers of superfamilies and folds that can currently be assigned to COG proteins.  相似文献   

16.
MOTIVATION: Protein families can be defined based on structure or sequence similarity. We wanted to compare two protein family databases, one based on structural and one on sequence similarity, to investigate to what extent they overlap, the similarity in definition of corresponding families, and to create a list of large protein families with unknown structure as a resource for structural genomics. We also wanted to increase the sensitivity of fold assignment by exploiting protein family HMMs. RESULTS: We compared Pfam, a protein family database based on sequence similarity, to Scop, which is based on structural similarity. We found that 70% of the Scop families exist in Pfam while 57% of the Pfam families exist in Scop. Most families that occur in both databases correspond well to each other, but in some cases they are different. Such cases highlight situations in which structure and sequence approaches differ significantly. The comparison enabled us to compile a list of the largest families that do not occur in Scop; these are suitable targets for structure prediction and determination, and may be useful to guide projects in structural genomics. It can be noted that 13 out of the 20 largest protein families without a known structure are likely transmembrane proteins. We also exploited Pfam to increase the sensitivity of detecting homologs of proteins with known structure, by comparing query sequences to Pfam HMMs that correspond to Scop families. For SWISSPROT+TREMBL, this yielded an increase in fold assignment from 31% to 42% compared to using FASTA only. This method assigned a structure to 22% of the proteins in Saccharomyces cerevisiae, 24% in Escherichia coli, and 16% in Methanococcus jannaschii.  相似文献   

17.
MOTIVATION: A method for recognizing the three-dimensional fold from the protein amino acid sequence based on a combination of hidden Markov models (HMMs) and secondary structure prediction was recently developed for proteins in the Mainly-Alpha structural class. Here, this methodology is extended to Mainly-Beta and Alpha-Beta class proteins. Compared to other fold recognition methods based on HMMs, this approach is novel in that only secondary structure information is used. Each HMM is trained from known secondary structure sequences of proteins having a similar fold. Secondary structure prediction is performed for the amino acid sequence of a query protein. The predicted fold of a query protein is the fold described by the model fitting the predicted sequence the best. RESULTS: After model cross-validation, the success rate on 44 test proteins covering the three structural classes was found to be 59%. On seven fold predictions performed prior to the publication of experimental structure, the success rate was 71%. In conclusion, this approach manages to capture important information about the fold of a protein embedded in the length and arrangement of the predicted helices, strands and coils along the polypeptide chain. When a more extensive library of HMMs representing the universe of known structural families is available (work in progress), the program will allow rapid screening of genomic databases and sequence annotation when fold similarity is not detectable from the amino acid sequence. AVAILABILITY: FORESST web server at http://absalpha.dcrt.nih.gov:8008/ for the library of HMMs of structural families used in this paper. FORESST web server at http://www.tigr.org/ for a more extensive library of HMMs (work in progress). CONTACT: valedf@tigr.org; munson@helix.nih.gov; garnier@helix.nih.gov  相似文献   

18.
19.
Escherichia coli DsbD transports electrons across the plasma membrane, a pathway that leads to the reduction of protein disulfide bonds. Three secreted thioredoxin-like factors, DsbC, DsbE, and DsbG, reduce protein disulfide bonds whereby an active site C-X-X-C motif is oxidized to generate a disulfide bond. DsbD catalyzes the reduction of the disulfide of DsbC, DsbE, and DsbG but not of the thioredoxin-like oxidant DsbA. The reduction of DsbC, DsbE, and DsbG occurs by transport of electrons from cytoplasmic thioredoxin to the C-terminal thioredoxin-like domain of DsbD (DsbD(C)). The N-terminal domain of DsbD, DsbD(N), acts as a versatile adaptor in electron transport and is capable of forming disulfides with oxidized DsbC, DsbE, or DsbG as well as with reduced DsbD(C). Isolated DsbD(N) is functional in electron transport in vitro. Crystallized DsbD(N) assumes an immunoglobulin-like fold that encompasses two active site cysteines, C103 and C109, forming a disulfide bond between beta-strands. The disulfide of DsbD(N) is shielded from the environment and capped by a phenylalanine (F70). A model is discussed whereby the immunoglobulin fold of DsbD(N) may provide for the discriminating interaction with thioredoxin-like factors, thereby triggering movement of the phenylalanine cap followed by disulfide rearrangement.  相似文献   

20.
The TT1485 gene from Thermus thermophilus HB8 encodes a hypothetical protein of unknown function with about 20 sequence homologs of bacterial or archaeal origin. Together they form a family of uncharacterized proteins, the cluster of orthologous group COG3253. Using a combination of amino acid sequence analysis, three-dimensional structural studies and biochemical assays, we identified TT1485 as a novel heme-binding protein. The crystal structure reveals that this protein is a pentamer and each monomer exhibits a β-barrel fold. TT1485 is structurally similar to muconolactone isomerase, but this provided no functional clues. Amino acid sequence analysis revealed remote homology to a heme enzyme, chlorite dismutase. Strikingly, amino acid residues that are highly conserved in the homologous hypothetical proteins and chlorite dismutase cluster around a deep cavity on the surface of each monomer. Molecular modeling shows that the cavity can accommodate a heme group with a strictly conserved His as a heme ligand. TT1485 reconstituted with iron protoporphyrin IX chloride gave a low chlorite dismutase activity, indicating that TT1485 catalyzes a reaction other than chlorite degradation. The presence of a possible Fe–His–Asp triad in the heme proximal site suggests that TT1485 functions as a novel heme peroxidase to detoxify hydrogen peroxide within the cell.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号