首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 375 毫秒
1.
The HNH Database is a collection and sequence-based classification of HNH domain proteins. The database contains about 1913 HNH domain containing proteins, and is classified into 10 subsets based on the sequence pattern. Each of these subsets has unique signature sequences. We have shown a correlation between the subset combination and their domain association and function. Functional divergence of this domain may be due to the combination of these conserved patterns and the large variations in the non-conserved regions. HNHDb is freely available at http://bicmku.in:8081/hnh.  相似文献   

2.
The LAGLIDADG and HNH families of site-specific DNA endonucleases encoded by viruses, bacteriophages as well as archaeal, eucaryotic nuclear and organellar genomes are characterized by the sequence motifs 'LAGLIDADG' and 'HNH', respectively. These endonucleases have been shown to occur in different environments: LAGLIDADG endonucleases are found in inteins, archaeal and group I introns and as free standing open reading frames (ORFs); HNH endonucleases occur in group I and group II introns and as ORFs. Here, statistical models (hidden Markov models, HMMs) that encompass both the conserved motifs and more variable regions of these families have been created and employed to characterize known and potential new family members. A number of new, putative LAGLIDADG and HNH endonucleases have been identified including an intein-encoded HNH sequence. Analysis of an HMM-generated multiple alignment of 130 LAGLIDADG family members and the three-dimensional structure of the I- Cre I endonuclease has enabled definition of the core elements of the repeated domain (approximately 90 residues) that is present in this family of proteins. A conserved negatively charged residue is proposed to be involved in catalysis. Phylogenetic analysis of the two families indicates a lack of exchange of endonucleases between different mobile elements (environments) and between hosts from different phylogenetic kingdoms. However, there does appear to have been considerable exchange of endonuclease domains amongst elements of the same type. Such events are suggested to be important for the formation of elements of new specficity.  相似文献   

3.
Lee D  Grant A  Marsden RL  Orengo C 《Proteins》2005,59(3):603-615
Using a new protocol, PFscape, we undertake a systematic identification of protein families and domain architectures in 120 complete genomes. PFscape clusters sequences into protein families using a Markov clustering algorithm (Enright et al., Nucleic Acids Res 2002;30:1575-1584) followed by complete linkage clustering according to sequence identity. Within each protein family, domains are recognized using a library of hidden Markov models comprising CATH structural and Pfam functional domains. Domain architectures are then determined using DomainFinder (Pearl et al., Protein Sci 2002;11:233-244) and the protein family and domain architecture data are amalgamated in the Gene3D database (Buchan et al., Genome Res 2002;12:503-514). Using Gene3D, we have investigated protein sequence space, the extent of structural annotation, and the distribution of different domain architectures in completed genomes from all kingdoms of life. As with earlier studies by other researchers, the distribution of domain families shows power-law behavior such that the largest 2,000 domain families can be mapped to approximately 70% of nonsingleton genome sequences; the remaining sequences are assigned to much smaller families. While approximately 50% of domain annotations within a genome are assigned to 219 universal domain families, a much smaller proportion (< 10%) of protein sequences are assigned to universal protein families. This supports the mosaic theory of evolution whereby domain duplication followed by domain shuffling gives rise to novel domain architectures that can expand the protein functional repertoire of an organism. Functional data (e.g. COG/KEGG/GO) integrated within Gene3D result in a comprehensive resource that is currently being used in structure genomics initiatives and can be accessed via http://www.biochem.ucl.ac.uk/bsm/cath/Gene3D/.  相似文献   

4.
The restriction endonuclease (REase) R. HphI is a Type IIS enzyme that recognizes the asymmetric target DNA sequence 5'-GGTGA-3' and in the presence of Mg(2+) hydrolyzes phosphodiester bonds in both strands of the DNA at a distance of 8 nucleotides towards the 3' side of the target, producing a 1 nucleotide 3'-staggered cut in an unspecified sequence at this position. REases are typically ORFans that exhibit little similarity to each other and to any proteins in the database. However, bioinformatics analyses revealed that R.HphI is a member of a relatively big sequence family with a conserved C-terminal domain and a variable N-terminal domain. We predict that the C-terminal domains of proteins from this family correspond to the nuclease domain of the HNH superfamily rather than to the most common PD-(D/E)XK superfamily of nucleases. We constructed a three-dimensional model of the R.HphI catalytic domain and validated our predictions by site-directed mutagenesis and studies of DNA-binding and catalytic activities of the mutant proteins. We also analyzed the genomic neighborhood of R.HphI homologs and found that putative nucleases accompanied by a DNA methyltransferase (i.e. predicted REases) do not form a single group on a phylogenetic tree, but are dispersed among free-standing putative nucleases. This suggests that nucleases from the HNH superfamily were independently recruited to become REases in the context of RM systems multiple times in the evolution and that members of the HNH superfamily may be much more frequent among the so far unassigned REase sequences than previously thought.  相似文献   

5.
CRISPR-Cas9 is a widely used biochemical tool with applications in molecular biology and precision medicine. The RNA-guided Cas9 protein uses its HNH endonuclease domain to cleave the DNA strand complementary to its endogenous guide RNA. In this study, novel constructs of HNH from two divergent organisms, G. stearothermophilus (GeoHNH) and S. pyogenes (SpHNH) were engineered from their respective full-length Cas9 proteins. Despite low sequence similarity, the X-ray crystal structures of these constructs reveal that the core of HNH surrounding the active site is conserved. Structure prediction of the full-length GeoCas9 protein using Phyre2 and AlphaFold2 also showed that the crystallographic construct of GeoHNH represents the structure of the domain within the full-length GeoCas9 protein. However, significant differences are observed in the solution dynamics of structurally conserved regions of GeoHNH and SpHNH, the latter of which was shown to use such molecular motions to propagate the DNA cleavage signal. Indeed, molecular simulations show that the intradomain signaling pathways, which drive SpHNH function, are non-specific and poorly formed in GeoHNH. Taken together, these outcomes suggest mechanistic differences between mesophilic and thermophilic Cas9 species.  相似文献   

6.
Domains are considered as the basic units of protein folding, evolution, and function. Decomposing each protein into modular domains is thus a basic prerequisite for accurate functional classification of biological molecules. Here, we present ADDA, an automatic algorithm for domain decomposition and clustering of all protein domain families. We use alignments derived from an all-on-all sequence comparison to define domains within protein sequences based on a global maximum likelihood model. In all, 90% of domain boundaries are predicted within 10% of domain size when compared with the manual domain definitions given in the SCOP database. A representative database of 249,264 protein sequences were decomposed into 450,462 domains. These domains were clustered on the basis of sequence similarities into 33,879 domain families containing at least two members with less than 40% sequence identity. Validation against family definitions in the manually curated databases SCOP and PFAM indicates almost perfect unification of various large domain families while contamination by unrelated sequences remains at a low level. The global survey of protein-domain space by ADDA confirms that most large and universal domain families are already described in PFAM and/or SMART. However, a survey of the complete set of mobile modules leads to the identification of 1479 new interesting domain families which shuffle around in multi-domain proteins. The data are publicly available at ftp://ftp.ebi.ac.uk/pub/contrib/heger/adda.  相似文献   

7.
PSD-Zip45 (also named Homer 1c/Vesl-1L) is a synaptic scaffolding protein, which interacts with neurotransmitter receptors and other scaffolding proteins to target them into post-synaptic density (PSD), a specialized protein complex at the synaptic junction. Binding of the PSD-Zip45 to the receptors and scaffolding proteins results in colocalization and clustering of its binding partners in PSD. It has an Ena/VASP homology 1 (EVH1) domain in the N terminus for receptor binding, two leucine zipper motifs in the C terminus for clustering, and a linking region whose function is unclear despite the high level of conservation within the Homer 1 family. The X-ray crystallographic analysis of the largest fragment of residues 1-163, including an EVH1 domain reported here, demonstrates that the EVH1 domain contains an alpha-helix longer than that of the previous models, and that the linking part included in the conserved region of Homer 1 (CRH1) of the PSD-Zip45 interacts with the EVH1 domain of the neighbour CRH1 molecule in the crystal. The results suggest that the EVH1 domain recognizes the PPXXF motif found in the binding partners, and the SPLTP sequence (P-motif) in the linking region of the CRH1. The two types of binding are partly overlapped in the EVH1 domain, implying a mechanism to regulate multimerization of Homer 1 family proteins.  相似文献   

8.
Type IIS restriction endonuclease Eco31I harbors a single HNH active site and cleaves both DNA strands close to its recognition sequence, 5'-GGTCTC(1/5). A two-domain organization of Eco31I was determined by limited proteolysis. Analysis of proteolytic fragments revealed that the N-terminal domain of Eco31I is responsible for the specific DNA binding, while the C-terminal domain contains the HNH nuclease-like active site. Gel-shift and gel-filtration experiments revealed that a monomer of the N-terminal domain of Eco31I is able to bind a single copy of cognate DNA. However, in contrast to other studied type IIS enzymes, the isolated catalytic domain of Eco31I was inactive. Steady-state and transient kinetic analysis of Eco31I reactions was inconsistent with dimerization of Eco31I on DNA. Thus, we propose that Eco31I interacts with individual copies of its recognition sequence in its monomeric form and presumably remains a monomer as it cleaves both strands of double-stranded DNA. The domain organization and reaction mechanism established for Eco31I should be common for a group of evolutionary related type IIS restriction endonucleases Alw26I, BsaI, BsmAI, BsmBI and Esp3I that recognize DNA sequences bearing the common pentanucleotide 5'-GTCTC.  相似文献   

9.
The structure of many proteins consists of a combination of discrete modules that have been shuffled during evolution. Such modules can frequently be recognized from the analysis of homology. Here we present a systematic analysis of the modular organization of all sequenced proteins. To achieve this we have developed an automatic method to identify protein domains from sequence comparisons. Homologous domains can then be clustered into consistent families. The method was applied to all 21,098 nonfragment protein sequences in SWISS-PROT 21.0, which was automatically reorganized into a comprehensive protein domain database, ProDom. We have constructed multiple sequence alignments for each domain family in ProDom, from which consensus sequences were generated. These nonreduntant domain consensuses are useful for fast homology searches. Domain organization in ProDom is exemplified for proteins of the phosphoenolpyruvate:sugar phosphotransferase system (PEP:PTS) and for bacterial 2-component regulators. We provide 2 examples of previously unrecognized domain arrangements discovered with the help of ProDom.  相似文献   

10.
Homing endonuclease structure and function   总被引:14,自引:0,他引:14  
Homing endonucleases are encoded by open reading frames that are embedded within group I, group II and archael introns, as well as inteins (intervening sequences that are spliced and excised post-translationally). These enzymes initiate transfer of those elements (and themselves) by generating strand breaks in cognate alleles that lack the intervening sequence, as well as in additional ectopic sites that broaden the range of intron and intein mobility. Homing endonucleases can be divided into several unique families that are remarkable in several respects: they display extremely high DNA-binding specificities which arise from long DNA target sites (14-40 bp), they are tolerant of a variety of sequence variations in these sites, and they display disparate DNA cleavage mechanisms. A significant number of homing endonucleases also act as maturases (highly specific cofactors for the RNA splicing reactions of their cognate introns). Of the known homing group I endonuclease families, two (HNH and His-Cys box enzymes) appear to be diverged from a common ancestral nuclease. While crystal structures of several representatives of the LAGLIDADG endonuclease family have been determined, only structures of single members of the HNH (I-HmuI), His-Cys box (I-PpoI) and GIY-YIG (I-TevI) families have been elucidated. These studies provide an important source of information for structure-function relationships in those families, and are the centerpiece of this review. Finally, homing endonucleases are significant targets for redesign and selection experiments, in hopes of generating novel DNA binding and cutting reagents for a variety of genomic applications.  相似文献   

11.
SET and RING-finger-associated (SRA) domain is involved in establishment and maintenance of DNA methylation in eukaryotes. Proteins containing SRA domains exist in mammals, plants, even microorganisms. It has been established that mammalian SRA domain recognizes 5-methylcytosine (5mC) through a base-flipping mechanism. Here, we identified and characterized two SRA domain-containing proteins with the common domain architecture of N-terminal SRA domain and C-terminal HNH nuclease domain, Sco5333 from Streptomyces coelicolor and Tbis1 from Thermobispora bispora. Both sco5333 and tbis1 cannot establish in methylated Escherichia coli hosts (dcm+), and this in vivo toxicity requires both SRA and HNH domain. Purified Sco5333 and Tbis1 displayed weak DNA cleavage activity in the presence of Mg2+, Mn2+ and Co2+ and the cleavage activity was suppressed by Zn2+. Both Sco5333 and Tbis1 bind to 5mC-containing DNA in all sequence contexts and have at least a preference of 100 folds in binding affinity for methylated DNA over non-methylated one. We suggest that linkage of methyl-specific SRA domain and weakly active HNH domain may represent a universal mechanism in competing alien methylated DNA but to maximum extent minimizing damage to its own chromosome.  相似文献   

12.
Many bacteriophage and prophage genomes encode an HNH endonuclease (HNHE) next to their cohesive end site and terminase genes. The HNH catalytic domain contains the conserved catalytic residues His-Asn-His and a zinc-binding site [CxxC]2. An additional zinc ribbon (ZR) domain with one to two zinc-binding sites ([CxxxxC], [CxxxxH], [CxxxC], [HxxxH], [CxxC] or [CxxH]) is frequently found at the N-terminus or C-terminus of the HNHE or a ZR domain protein (ZRP) located adjacent to the HNHE. We expressed and purified 10 such HNHEs and characterized their cleavage sites. These HNHEs are site-specific and strand-specific nicking endonucleases (NEase or nickase) with 3- to 7-bp specificities. A minimal HNH nicking domain of 76 amino acid residues was identified from Bacillus phage γ HNHE and subsequently fused to a zinc finger protein to generate a chimeric NEase with a new specificity (12–13 bp). The identification of a large pool of previously unknown natural NEases and engineered NEases provides more ‘tools’ for DNA manipulation and molecular diagnostics. The small modular HNH nicking domain can be used to generate rare NEases applicable to targeted genome editing. In addition, the engineered ZF nickase is useful for evaluation of off-target sites in vitro before performing cell-based gene modification.  相似文献   

13.
DraIII is a type IIP restriction endonucleases (REases) that recognizes and creates a double strand break within the gapped palindromic sequence CAC↑NNN↓GTG of double-stranded DNA (↑ indicates nicking on the bottom strand; ↓ indicates nicking on the top strand). However, wild type DraIII shows significant star activity. In this study, it was found that the prominent star site is CAT↑GTT↓GTG, consisting of a star 5′ half (CAT) and a canonical 3′ half (GTG). DraIII nicks the 3′ canonical half site at a faster rate than the 5′ star half site, in contrast to the similar rate with the canonical full site. The crystal structure of the DraIII protein was solved. It indicated, as supported by mutagenesis, that DraIII possesses a ββα-metal HNH active site. The structure revealed extensive intra-molecular interactions between the N-terminal domain and the C-terminal domain containing the HNH active site. Disruptions of these interactions through sitedirected mutagenesis drastically increased cleavage fidelity. The understanding of fidelity mechanisms will enable generation of high fidelity REases.  相似文献   

14.
Bacteriophages have evolved a range of anti-CRISPR proteins (Acrs) to escape the adaptive immune system of prokaryotes, therefore Acrs can be used as switches to regulate gene editing. Herein, we report the crystal structure of a quaternary complex of AcrIIA14 bound SauCas9–sgRNA–dsDNA at 2.22 Å resolution, revealing the molecular basis for AcrIIA14 recognition and inhibition. Our structural and biochemical data analysis suggest that AcrIIA14 binds to a non-conserved region of SauCas9 HNH domain that is distinctly different from AcrIIC1 and AcrIIC3, with no significant effect on sgRNA or dsDNA binding. Further, our structural data shows that the allostery of the HNH domain close to the substrate DNA is sterically prevented by AcrIIA14 binding. In addition, the binding of AcrIIA14 triggers the conformational allostery of the HNH domain and the L1 linker within the SauCas9, driving them to make new interactions with the target-guide heteroduplex, enhancing the inhibitory ability of AcrIIA14. Our research both expands the current understanding of anti-CRISPRs and provides additional culues for the rational use of the CRISPR-Cas system in genome editing and gene regulation.  相似文献   

15.
Several autoinflammatory diseases with distinct clinical manifestations have been associated with sequence variations in the gene products PYPAF1/CIAS1 and NOD2/CARD15. Both proteins belong to the PYD/CARD-containing family of apoptosis regulators and activators of pro-inflammatory caspases. To gain insight into the dysfunctional role of sequence alterations, we assembled a structure-based multiple sequence alignment of family members and related proteins. This allowed us to analyze the putative effect of the alterations on the function of nucleotide-binding (NACHT) and leucine-rich repeat (LRR) domains shared by the family members. In support of this analysis, we carefully selected template structures for the NACHT and LRR domains and mapped the genetic variations onto 3D domain models. Additionally, we propose a model of the NACHT and LRR domain complex. Our study revealed that many of the disease-associated sequence variants are located close to highly conserved sequence regions of functional relevance and are spatially adjacent in the predicted 3D structure. The implications on the domain functions such as NTP-hydrolysis or oligomerization are discussed.  相似文献   

16.
Summary. An important sequence motif identified by sequence analysis is shared by the ACT domain family, which has been found in a number of diverse proteins. Most of the proteins containing the ACT domain seem to be involved in amino acid and purine synthesis and are in many cases allosteric enzymes with complex regulation enforced by the binding of ligands. Here we explore the current understanding of the ACT domain function including its role as an allosteric module in a selected group of enzymes. We will further describe in more detail three of the proteins where some understanding is available on function and structure: i) the archetypical ACT domain protein E. coli 3PGDH, which catalyzes the first step in the biosynthesis of L-Ser, ii) the bifunctional chorismate mutase/prephenate dehydratase (P-protein) from E. coli, which catalyzes the first two steps in the biosynthesis of L-Phe, and iii) the mammalian aromatic amino acid hydroxylases, with special emphasis on phenylalanine hydroxylase, which catalyzes the first step in the catabolic degradation of L-phenylalanine (L-Phe). The ACT domain is commonly involved in the binding of a small regulatory molecule, such as the amino acids L-Ser and L-Phe in the case of 3PGDH and P-protein, respectively. On the other hand, for PAH, and probably for other enzymes, this domain appears to have been incorporated as a handy, flexible small module with the potential to provide allosteric regulation via transmission of finely tuned conformational changes, not necessarily initiated by regulatory ligand binding at the domain itself.Current address: Protein Biophysics & Delivery, Novo Nordisk A/S, Novo Allé, 2880 Bagsværd, Denmark.  相似文献   

17.
Glycoside hydrolase (GH) family 13 comprises about 30 different specificities. Four of them have been proposed to form the GH13 pullulanase subfamily: pullulanase, isoamylase, maltooligosyl trehalohydrolase and branching enzyme forming the seven CAZy GH13 subfamilies: GH13 8-GH13 14. Recently, a new family of carbohydrate-binding modules (CBMs), the family CBM48 has been established containing the putative starch-binding domains from the pullulanase subfamily, the β-subunit of AMP-activated protein kinase and some other GH13 enzymes with pullulanase and/or α-amylase-pullulanase specificity. Since all of these enzymes are multidomain proteins and the structure for at least one representative of each enzyme specificity has already been determined, the main goal of the present study was to elucidate domain evolution within this GH13 pullulanase subfamily (84 real enzymes) focusing on the CBM48 module. With regard to CBM48 positioning in the amino acid sequence, the N-terminal end of a protein appears to be a predominant position. This is especially true for isoamylases and maltooligosyl trehalohydrolases. Secondary structure-based alignment of CBM modules from CBM48, CBM20 and CBM21 revealed that several residues known as consensus for CBM20 and CBM21 could also be identified in CBM48, but only branching enzymes possess the aromatic residues that correspond with the two tryptophans forming the evolutionary conserved starch-binding site 1 in CBM20. The evolutionary trees constructed for the individual domains, complete alignment, and the conserved sequence regions of the α-amylase family were found to be comparable to each other (except for the C-domain tree) with two basic parts: (i) branching enzymes and maltooligosyl trehalohydrolases; and (ii) pullulanases and isoamylases. Taxonomy was respected only within clusters with pure specificity, i.e. the evolution of CBM48 reflects the evolution of specificities rather than evolution of species. This is a feature different from the one observed for the starch-binding domain of the family CBM20 where the starch-binding domain evolution reflects the evolution of species.  相似文献   

18.
The Caenorhabditis elegans SEM-5 SH3 domains recognize proline-rich peptide segments with modest affinity. We developed a bivalent peptide ligand that contains a naturally occurring proline-rich binding sequence, tethered by a glycine linker to a disulfide-closed loop segment containing six variable residues. The glycine linker allows the loop segment to explore regions of greatest diversity in sequence and structure of the SH3 domain: the RT and n-Src loops. The bivalent ligand was optimized using phage display, leading to a peptide (PP-G(4)-L) with 1000-fold increased affinity for the SEM-5 C-terminal SH3 domain over that of a natural ligand. NMR analysis of the complex confirms that the peptide loop segment is targeted to the RT and n-Src loops and parts of the beta-sheet scaffold of this SH3 domain. This binding region is comparable to that targeted by a natural non-PXXP peptide to the p67(phox) SH3 domain, a region not known to be targeted in the Grb2 SH3 domain family. PP-G(4)-L may aid in the discovery of additional binding partners of Grb2 family SH3 domains.  相似文献   

19.
The overall function of a multi‐domain protein is determined by the functional and structural interplay of its constituent domains. Traditional sequence alignment‐based methods commonly utilize domain‐level information and provide classification only at the level of domains. Such methods are not capable of taking into account the contributions of other domains in the proteins, and domain‐linker regions and classify multi‐domain proteins. An alignment‐free protein sequence comparison tool, CLAP (CLAssification of Proteins) was previously developed in our laboratory to especially handle multi‐domain protein sequences without a requirement of defining domain boundaries and sequential order of domains. Through this method we aim to achieve a biologically meaningful classification scheme for multi‐domain protein sequences. In this article, CLAP‐based classification has been explored on 5 datasets of multi‐domain proteins and we present detailed analysis for proteins containing (1) Tyrosine phosphatase and (2) SH3 domain. At the domain‐level CLAP‐based classification scheme resulted in a clustering similar to that obtained from an alignment‐based method. CLAP‐based clusters obtained for full‐length datasets were shown to comprise of proteins with similar functions and domain architectures. Our study demonstrates that multi‐domain proteins could be classified effectively by considering full‐length sequences without a requirement of identification of domains in the sequence.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号