首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
We studied the evolutionary relationships between γ-carbonic anhydrase (γ-CA) and a very diverse group of proteins that share the sequence motif characteristic of the left-handed parallel β-helix (LβH) fold. This sequence motif is characterized by the imperfect tandem repetition of short hexapeptide units, which makes it difficult to obtain a reliable alignment based on sequence information alone. To solve this problem, we used a structural alignment of three members of the group with known crystallographic structures as a seed to obtain a reliable sequence alignment. Then, we applied protein maximum-parsimony and maximum-likelihood phylogenetic inference methods to this alignment. We found that γ-CA belongs to a diverse superfamily of proteins that share the LβH domain. This superfamily is composed mainly of acyltransferases. The most remarkable feature of the phylogenetic tree obtained is that its main branches group together functionally related proteins, so that the coarse topology can be rather easily explained in terms of functional diversification. Regarding the main branch of the tree containing γ-CA, we found that, in addition to the group of its closest relatives that had already been studied, γ-CA is closely related to the tetrahydrodipicolinate N-succinyltransferases.  相似文献   

2.
Protein structure contains evolutionary information and it is more highly conserved than sequence. The evolution of structure in gamma-class carbonic anhydrase (gamma-CA) and its structurally related proteins (gammaCASRPs) were discussed. To obtain a reliable analysis, we defined a subset that contains all specificities and organisms as the nonredundant set using QR factorization based on the multiple structural alignment of the known crystallographic structures of gammaCASRPs with Q(H) as the structural homology measure. Then, we applied unweighted pair group method with arithmetic averages (UPGMA) to reconstruct structural phylogeny. We found that gamma-CA most likely arose through duplication events; the domain of gamma-CA underwent a process of alpha-helical content from amino-terminal end to carboxyl-terminal end of the left-handed beta-helix (LbetaH); the capacity of gamma-CA to bind Zn occurred early in evolution and only later included the ability to catalyze the reversible hydration of CO(2) efficiently for the occurrence of two loops involving Glu 62 and Glu 84, respectively, and a long helix at the carboxyl-terminal end of the LbetaH. In addition, the main conserved regions in these structures are in the structurally constrained residues of LbetaH domain, and the topology of the structural dendrogram can be rather easily understood in terms of functional diversification.  相似文献   

3.
The P-loop NTPases are involved in diverse cellular functions. Members of the P-loop NTPase superfamily are characterized by presence of a highly conserved sequence pattern GxxxxGKS/T, known as Walker A motif. This motif adopts an archetypal P-loop conformation which allows accommodation of the triphosphate moiety of a bound nucleotide. Despite the presence of Walker A as a common sequence motif, P-loop NTPases exhibit extreme sequence divergence which hampers their phylogenetic or evolutionary classification. Here, we show that P-loop and its flanking region subsequence (termed as “extended-WalkerA motif”) contain distinct signatures that can be utilized to classify NTPase domain of functionally diverse proteins. We find a clearly classified group of diverse NTPases of Conserved Domain Database such as G-proteins, Ylqf, RecA like, DExDc, AAA, CPT, NK, ABC transporter and NifH proteins.  相似文献   

4.
Protein sequence alignments are more reliable the shorter the evolutionary distance. Here, we align distantly related proteins using many closely spaced intermediate sequences as stepping stones. Such transitive alignments can be generated between any two proteins in a connected set, whether they are direct or indirect sequence neighbors in the underlying library of pairwise alignments. We have implemented a greedy algorithm, MaxFlow, using a novel consistency score to estimate the relative likelihood of alternative paths of transitive alignment. In contrast to traditional profile models of amino acid preferences, MaxFlow models the probability that two positions are structurally equivalent and retains high information content across large distances in sequence space. Thus, MaxFlow is able to identify sparse and narrow active-site sequence signatures which are embedded in high-entropy sequence segments in the structure based multiple alignment of large diverse enzyme superfamilies. In a challenging benchmark based on the urease superfamily, MaxFlow yields better reliability and double coverage compared to available sequence alignment software. This promises to increase information returns from functional and structural genomics, where reliable sequence alignment is a bottleneck to transferring the functional or structural characterization of model proteins to entire protein superfamilies.  相似文献   

5.
PAS domains are widespread in archaea, bacteria, and eukaryota, and play important roles in various functions. In this study, we aim to explore functional evolutionary relationship among proteins in the PAS domain superfamily in view of the sequence‐structure‐dynamics‐function relationship. We collected protein sequences and crystal structure data from RCSB Protein Data Bank of the PAS domain superfamily belonging to three biological functions (nucleotide binding, photoreceptor activity, and transferase activity). Protein sequences were aligned and then used to select sequence‐conserved residues and build phylogenetic tree. Three‐dimensional structure alignment was also applied to obtain structure‐conserved residues. The protein dynamics were analyzed using elastic network model (ENM) and validated by molecular dynamics (MD) simulation. The result showed that the proteins with same function could be grouped by sequence similarity, and proteins in different functional groups displayed statistically significant difference in their vibrational patterns. Interestingly, in all three functional groups, conserved amino acid residues identified by sequence and structure conservation analysis generally have a lower fluctuation than other residues. In addition, the fluctuation of conserved residues in each biological function group was strongly correlated with the corresponding biological function. This research suggested a direct connection in which the protein sequences were related to various functions through structural dynamics. This is a new attempt to delineate functional evolution of proteins using the integrated information of sequence, structure, and dynamics.  相似文献   

6.
Ebner S  Sharon N  Ben-Tal N 《Proteins》2003,53(1):44-55
Members of the C-type lectin/C-type lectin-like domain (CTL/CTLD) superfamily share a common fold and are involved in a variety of functions, such as generalized defense mechanisms against foreign agents, discrimination between healthy and pathogen-infected cells, and endocytosis and blood coagulation. In this work we used ConSurf, a computer program recently developed in our lab, to perform an evolutionary analysis of this superfamily in order to further identify characteristics of all or part of its members. Given a set of homologous proteins in the form of multiple sequence alignment (MSA) and an inferred phylogenetic tree, ConSurf calculates the conservation score in every alignment position, taking into account the relationships between the sequences and the physicochemical similarity between the amino acids. The scores are then color-coded onto the three-dimensional structure of one of the homologous proteins. We provide here and at http://ashtoret.tau.ac.il/ approximately sharon a detailed analysis of the conservation pattern obtained for the entire superfamily and for two subgroups of proteins: (a) 21 CTLs and (b) 11 heterodimeric CTLD toxins. We show that, in general, proteins of the superfamily have one face that is constructed mostly of conserved residues and another that is not, and we suggest that the former face is involved in binding to other proteins or domains. In the CTLs examined we detected a region of highly conserved residues, corresponding to the known calcium- and carbohydrate-binding site of the family, which is not conserved throughout the entire superfamily, and in the CTLD toxins we found a patch of highly conserved residues, corresponding to the known dimerization region of these proteins. Our analysis also detected patches of conserved residues with yet unknown function(s).  相似文献   

7.
SUMMARY: We introduce an algorithm that uses the information gained from simultaneous consideration of an entire group of related proteins to create multiple structure alignments (MSTAs). Consistency-based alignment (CBA) first harnesses the information contained within regions that are consistently aligned among a set of pairwise superpositions in order to realign pairs of proteins through both global and local refinement methods. It then constructs a multiple alignment that is maximally consistent with the improved pairwise alignments. We validate CBA's alignments by assessing their accuracy in regions where at least two of the aligned structures contain the same conserved sequence motif. RESULTS: CBA correctly aligns well over 90% of motif residues in superpositions of proteins belonging to the same family or superfamily, and it outperforms a number of previously reported MSTA algorithms.  相似文献   

8.
The tannase protein sequences of 149 bacteria and 36 fungi were retrieved from NCBI database. Among them only 77 bacterial and 31 fungal tannase sequences were taken which have different amino acid compositions. These sequences were analysed for different physical and chemical properties, superfamily search, multiple sequence alignment, phylogenetic tree construction and motif finding to find out the functional motif and the evolutionary relationship among them. The superfamily search for these tannase exposed the occurrence of proline iminopeptidase-like, biotin biosynthesis protein BioH, O-acetyltransferase, carboxylesterase/thioesterase 1, carbon–carbon bond hydrolase, haloperoxidase, prolyl oligopeptidase, C-terminal domain and mycobacterial antigens families and alpha/beta hydrolase superfamily. Some bacterial and fungal sequence showed similarity with different families individually. The multiple sequence alignment of these tannase protein sequences showed conserved regions at different stretches with maximum homology from amino acid residues 389–469 and 482–523 which could be used for designing degenerate primers or probes specific for tannase producing bacterial and fungal species. Phylogenetic tree showed two different clusters; one has only bacteria and another have both fungi and bacteria showing some relationship between these different genera. Although in second cluster near about all fungal species were found together in a corner which indicates the sequence level similarity among fungal genera. The distributions of fourteen motifs analysis revealed Motif 1 with a signature amino acid sequence of 29 amino acids, i.e. GCSTGGREALKQAQRWPHDYDGIIANNPA, was uniformly observed in 83.3 % of studied tannase sequences representing its participation with the structure and enzymatic function.  相似文献   

9.
MOTIVATION: Evolutionary comparison leads to efficient functional characterisation of hypothetical proteins. Here, our goal is to map specific sequence patterns to putative functional classes. The evolutionary signal stands out most clearly in a maximally diverse set of homologues. This diversity, however, leads to a number of technical difficulties. The targeted patterns-as gleaned from structure comparisons-are too sparse for statistically significant signals of sequence similarity and accurate multiple sequence alignment. RESULTS: We address this problem by a fuzzy alignment model, which probabilistically assigns residues to structurally equivalent positions (attributes) of the proteins. We then apply multivariate analysis to the 'attributes x proteins' matrix. The dimensionality of the space is reduced using non-negative matrix factorization. The method is general, fully automatic and works without assumptions about pattern density, minimum support, explicit multiple alignments, phylogenetic trees, etc. We demonstrate the discovery of biologically meaningful patterns in an extremely diverse superfamily related to urease.  相似文献   

10.
Translin and its interacting partner protein, TRAX, are members of the translin superfamily. These proteins are involved in mRNA regulation and in promoting RISC activity by removing siRNA passenger strand cleavage products, and have been proposed to play roles in DNA repair and recombination. Both homomeric translin and heteromeric translin-TRAX complex bind to ssDNA and RNA; however, the heteromeric complex is a key activator in siRNA-mediated silencing in human and drosophila. The residues critical for RNase activity of the complex reside in TRAX sequence. Both translin and TRAX are well conserved in eukaryotes. In present work, a single translin superfamily protein is detected in Chloroflexi eubacteria, in the known phyla of archaea and in some unicellular eukaryotes. The prokaryotic proteins essentially share unique sequence motifs with eukaryotic TRAX, while the proteins possessing both the unique sequences and conserved indels of TRAX or translin can be identified from protists. Intriguingly, TRAX protein in all the known genomes of extant Chloroflexi share high sequence similarity and conserved indels with the archaeal protein, suggesting occurrence of TRAX at least at the time of Chloroflexi divergence as well as evolutionary relationship between Chloroflexi and archaea. The mirror phylogeny in phylogenetic tree, constructed using diverse translin and TRAX sequences, indicates gene duplication event leading to evolution of translin in unicellular eukaryotes, prior to divergence of multicellular eukayrotes. Since Chloroflexi has been debated to be near the last universal common ancestor, the present analysis indicates that TRAX may be useful to understand the tree of life.  相似文献   

11.
Structural genomics is the idea of covering protein space so that every protein sequence comes within model building distance of a protein of known structure. Unfortunately, reproducing the structural alignment of distantly related proteins is a difficult challenge to existing sequence alignment and motif search software. We have developed a new transitive alignment algorithm (MaxFlow), which generates accurate alignments between proteins deep in the twilight zone of sequence similarity, below 20% sequence identity. In particular, MaxFlow reliably identifies conserved core motifs between proteins which are only indirect PSI-Blast neighbours. Based on MaxFlow alignments, useful 3D models can be generated for all members of a superfamily from as few as a single structural template – despite hundreds of representatives at 40% sequence identity level and patchy detection of homology by PSI-Blast. We propose novel strategies for target prioritization using MaxFlow scores to predict the optimal templates in a superfamily. Our results support an increase in the granularity of covering protein space that has potentially enormous economic implications for planning the transition to the full production phase of structural genomics.  相似文献   

12.
Nine proteins have been assigned to date to the superfamily of mammalian small heat shock proteins (sHsps): Hsp27 (HspB1, Hsp25), myotonic dystrophy protein kinase-binding protein (MKBP) (HspB2), HspB3, alphaA-crystallin (HspB4), alphaB-crystallin (HspB5), Hsp20 (p20, HspB6), cardiovascular heat shock protein (cvHsp [HspB7]), Hsp22 (HspB8), and HspB9. The most pronounced structural feature of sHsps is the alpha-crystallin domain, a conserved stretch of approximately 80 amino acid residues in the C-terminal half of the molecule. Using the alpha-crystallin domain of human Hsp27 as query in a BLAST search, we found sequence similarity with another mammalian protein, the sperm outer dense fiber protein (ODFP). ODFP occurs exclusively in the axoneme of sperm cells. Multiple alignment of human ODFP with the other human sHsps reveals that the primary structure of ODFP fits into the sequence pattern that is typical for this protein superfamily: alpha-crystallin domain (conserved), N-terminal domain (less conserved), central region (variable), and C-terminal tails (variable). In a phylogenetic analysis of 167 proteins of the sHsp superfamily, using Bayesian inference, mammalian ODFPs form a clade and are nested within previously identified sHsps, some of which have been implicated in cytoskeletal functions. Both the multiple alignment and the phylogeny suggest that ODFP is the 10th member of the superfamily of mammalian sHsps, and we propose to name it HspB10 in analogy with the other sHsps. The C-terminal tail of HspB10 has a remarkable low-complexity structure consisting of 10 repeats of the motif C-X-P. A BLAST search using the C-terminal tail as query revealed similarity with sequence elements in a number of Drosophila male sperm proteins, and mammalian type I keratins and cornifin-alpha. Taken together, the following findings suggest a specialized role of HspB10 in cytoskeleton: (1) the exclusive location in sperm cell tails, (2) the phylogenetic relationship with sHsps implicated in cytoskeletal functions, and (3) the partial similarity with cytoskeletal proteins.  相似文献   

13.
Phospholipase D (PLD) participates in the formation of phosphatidic acid, a precursor in glycerolipid biosynthesis and a second messenger. PLDs are part of a superfamily of proteins that hydrolyze phosphodiesters and share a catalytic motif, HxKxxxxD, and hence a mechanism of action. Although HKD‐PLDs have been thoroughly characterized in plants, animals and bacteria, very little is known about these enzymes in algae. To fill this gap in knowledge, we performed a biocomputational analysis by means of HMMER iterative profiling, using most eukaryotic algae genomes available. Phylogenetic analysis revealed that algae exhibit very few eukaryotic‐type PLDs but possess, instead, many bacteria‐like PLDs. Among algae eukaryotic‐type PLDs, we identified C2‐PLDs and PXPH‐like PLDs. In addition, the dinoflagellate Alexandrium tamarense features several proteins phylogenetically related to oomycete PLDs. Our phylogenetic analysis also showed that algae bacteria‐like PLDs (proteins with putative PLD activity) fall into five clades, three of which are novel lineages in eukaryotes, composed almost entirely of algae. Specifically, Clade II is almost exclusive to diatoms, whereas Clade I and IV are mainly represented by proteins from prasinophytes. The other two clades are composed of mitochondrial PLDs (Clade V or Mito‐PLDs), previously found in mammals, and a subfamily of potentially secreted proteins (Clade III or SP‐PLDs), which includes a homolog formerly characterized in rice. In addition, our phylogenetic analysis shows that algae have non‐PLD members within the bacteria‐like HKD superfamily with putative cardiolipin synthase and phosphatidylserine/phosphatidylglycerophosphate synthase activities. Altogether, our results show that eukaryotic algae possess a moderate number of PLDs that belong to very diverse phylogenetic groups.  相似文献   

14.
Summary We have found ragweed allergen Ra3 to be related to the type 1 copper proteins; it is most closely related to stellacyanin and basic blue protein. The type 1 copper proteins form a diverse group of proteins, most of which are involved in electron transport. However, key amino acids believed to be involved in copper binding are absent from the allergen sequence; thus, the allergen is not likely to be functionally related to the type 1 copper proteins. We have grouped these proteins into one superfamily and we depict the relationships among them by an evolutionary tree. As indicated by this tree, an ancient gene duplication resulted in the divergence of plastocyanin from the line leading to basic blue protein, stellacyanin, and allergen Ra3.This paper is dedicated to the memory of Professor Margaret O. Dayhoff, whose contributions to the study of protein evolution made this investigation possible  相似文献   

15.
Choi JH  Govaerts C  May BC  Cohen FE 《Proteins》2008,73(1):150-160
The left-handed parallel beta-helix (LbetaH) is a structurally repetitive, highly regular, and symmetrical fold formed by coiling of elongated beta-sheets into helical "rungs." This canonical fold has recently received interest as a possible solution to the fibril structure of amyloid and as a building block of self-assembled nanotubular structures. In light of this interest, we aimed to understand the structural requirements of the LbetaH fold. We first sought to determine the sequence characteristics of the repeats by analyzing known structures to identify positional preferences of specific residues types. We then used molecular dynamics simulations to demonstrate the stabilizing effect of successive rungs and the hydrophobic core of the LbetaH. We show that a two-rung structure is the minimally stable LbetaH structure. In addition, we defined the structure-based sequence preference of the LbetaH and undertook a genome-wide sequence search to determine the prevalence of this unique protein fold. This profile-based LbetaH search algorithm predicted a large fraction of LbetaH proteins from microbial origins. However, the relative number of predicted LbetaH proteins per specie was approximately equal across the genomes from prokaryotes to eukaryotes.  相似文献   

16.
The iron-regulated irp2 gene is specific for the highly pathogenic Yersinia species and encodes high-molecular-weight protein 2 (HMWP2). Despite the established correlation between the presence of HMWP2 and virulence, the role of this protein is still unknown. To gain insight into the function of HMWP2, the entire coding sequence and the promoter of irp2 were sequenced. Two putative -35 and -10 promoter sequences were identified upstream of a large open reading frame, and two potential Fur-binding sites were found overlapping the second -35 box. The large open reading frame is composed of 6,126 nucleotides and may encode a protein of 2,035 amino acids (ca. 228 kDa) with a pI of 5.81. A signal sequence was not present at the N terminus of the protein. Despite the existence of 30 cysteine residues, carboxymethylation prevented the formation of most if not all disulfide bonds that otherwise occurred when the cells were sonicated. The protein was composed of three main domains: a central region of ca. 850 residues, bordered on each side by a repeat of 550 residues. A high degree of identity (44.5%) was found between HMWP2 and the protein AngR of Vibrio anguillarum. The central part of HMWP2 (after removal of a loop of 337 residues) also displayed significant homology with proteins belonging to the superfamily of adenylate-forming enzymes and, like them, possessed a putative ATP-binding motif that is also present in AngR. In addition, HMWP2 shared with the group of antibiotic and enterochelin synthetases a potential amino acid-binding site. Six consensus sequences defining the superfamily and four defining the family of synthetases were derived from the multiple alignment of the 30 sequences of proteins or repeated domains. A phylogenetic tree that was constructed showed that HMWP2 and AngR are in a family composed of Lys2, EntF, and the tyrocidine, gramicidin, and Beta-lactam synthetases. This finding suggests that HMWP2 may participate in the nonribosomal synthesis of small biologically active peptides.  相似文献   

17.
18.
Structural evolution of the protein kinase-like superfamily   总被引:1,自引:0,他引:1       下载免费PDF全文
The protein kinase family is large and important, but it is only one family in a larger superfamily of homologous kinases that phosphorylate a variety of substrates and play important roles in all three superkingdoms of life. We used a carefully constructed structural alignment of selected kinases as the basis for a study of the structural evolution of the protein kinase-like superfamily. The comparison of structures revealed a "universal core" domain consisting only of regions required for ATP binding and the phosphotransfer reaction. Remarkably, even within the universal core some kinase structures display notable changes, while still retaining essential activity. Hence, the protein kinase-like superfamily has undergone substantial structural and sequence revision over long evolutionary timescales. We constructed a phylogenetic tree for the superfamily using a novel approach that allowed for the combination of sequence and structure information into a unified quantitative analysis. When considered against the backdrop of species distribution and other metrics, our tree provides a compelling scenario for the development of the various kinase families from a shared common ancestor. We propose that most of the so-called "atypical kinases" are not intermittently derived from protein kinases, but rather diverged early in evolution to form a distinct phyletic group. Within the atypical kinases, the aminoglycoside and choline kinase families appear to share the closest relationship. These two families in turn appear to be the most closely related to the protein kinase family. In addition, our analysis suggests that the actin-fragmin kinase, an atypical protein kinase, is more closely related to the phosphoinositide-3 kinase family than to the protein kinase family. The two most divergent families, alpha-kinases and phosphatidylinositol phosphate kinases (PIPKs), appear to have distinct evolutionary histories. While the PIPKs probably have an evolutionary relationship with the rest of the kinase superfamily, the relationship appears to be very distant (and perhaps indirect). Conversely, the alpha-kinases appear to be an exception to the scenario of early divergence for the atypical kinases: they apparently arose relatively recently in eukaryotes. We present possible scenarios for the derivation of the alpha-kinases from an extant kinase fold.  相似文献   

19.
The insulin superfamily is composed of a diverse group of proteins that share a common structural design whose most notable feature is a set of disulfide bonds. There is now sufficient experimental and bioinformatics evidence that it is represented in at least a number of well-investigated invertebrates, where they have been found to intervene mainly in complex processes such as mitosis, cell growth, castes differentiation, and fertility. In this article we automated a methodology first proposed elsewhere-that combines sequence similarity with assessing membership to the superfamily by conservation of structuraly key residues-to identify putative insulin-like peptides (ILPs) in completely sequenced genomes, and applied it as a pipeline to a group of 46 organisms both vertebrates and invertebrates. As a result, we were able to identify 1,653 putative members of the insulin superfamily, from 17 putative members in C. savigny to 58 in X. tropicalis. Moreover, we found that structural distinctions-such as peptides length-between functionally diverse members of the superfamily found in vertebrates, that is, insulins, IGFs, and relaxins, are not equally represented in invertebrates genomes, suggesting that such divergence has occurred only recently in the evolutionary history of vertebrates.  相似文献   

20.
Molecular sequences provide a rich source of data for inferring the phylogenetic relationships among species. However, recent work indicates that even an accurate multiple alignment of a large sequence set may yield an incorrect phylogeny and that the quality of the phylogenetic tree improves when the input consists only of the highly conserved, motif regions of the alignment. This work introduces two methods of producing multiple alignments that include only the conserved regions of the initial alignment. The first method retains conserved motifs, whereas the second retains individual conserved sites in the initial alignment. Using parsimony analysis on a mitochondrial data set containing 19 species among which the phylogenetic relationships are widely accepted, both conserved alignment methods produce better phylogenetic trees than the complete alignment. Unlike any of the 19 inference methods used before to analyze this data, both methods produce trees that are completely consistent with the known phylogeny. The motif-based method employs far fewer alignment sites for comparable error rates. For a larger data set containing mitochondrial sequences from 39 species, the site-based method produces a phylogenetic tree that is largely consistent with known phylogenetic relationships and suggests several novel placements. J. Exp. Zool. ( Mol. Dev. Evol.) 285:128-139, 1999.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号