首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Sixty-five families of glycosyltransferases (EC 2.4.x.y) have been recognized on the basis of high-sequence similarity to a founding member with experimentally demonstrated enzymatic activity. Although distant sequence relationships between some of these families have been reported, the natural history of glycosyltransferases is poorly understood. We used iterative searches of sequence databases, motif extraction, structural comparison, and analysis of completely sequenced genomes to track the origins of modern-type glycosyltransferases. We show that >75% of recognized glycosyltransferase families belong to one of only three monophyletic superfamilies of proteins, namely, (1) a recently described GPGTF/GT-B superfamily; (2) a nucleoside-diphosphosugar transferase (GT-A) superfamily, which is characterized by a DxD sequence signature and also includes nucleotidyltransferases; and (3) a GT-C superfamily of integral membrane glycosyltransferases with a modified DxD signature in the first extracellular loop. Several developmental regulators in Metazoans, including Fringe and Egghead homologs, belong to the second superfamily. Interestingly, Tout-velu/Exostosin family of developmental proteins found in all multicellular eukaryotes, contains separate domains belonging to the first and the second superfamilies, explaining multiple glycosyltransferase activities in one protein.  相似文献   

3.
4.
Vertebrates' plasmatic apolipoproteins and a few number of lipases in their metabolism present sequence homologies. They are grouped in genes families. The four exons apolipoproteins gene family includes nine human genes: the divergence rate of their sequences allows to place the first ancestral gene very high in the phylogenetic tree of the evolution. However, a more recent duplication of apolipoprotein C-I gene dating from 40 millions years, may be a phylogenetic marker for the radiation of Monkeys. Pancreatic lipase and isoforms, lipoprotein-lipase and hepatic triacylglycerol-lipase form by their homologies a "superfamily" of genes, which also includes yolk proteins of Dipterians eggs. Sequence homologies of PL, LPL and HL are analysed and compared with multiple alignments of amino-acids and nucleotides on spreadsheets. From these comparisons we may characterize four classes of phylogenetic markers: 1) repetitive DNA sequence (Alu, B1, PRE-1) appeared during Mammals evolution, 2) short insertions or deletions (within N-terminal domain) and a gene conversion in guinea-pig lineage, 3) a progressive reduction of intron number during the lipases evolution, 4) several duplications of genes which have produced the five genes of this superfamily currently known in the human genome.  相似文献   

5.
The epoxide hydrolases and haloalkane dehalogenases database (EH/HD) integrates sequence and structure of a highly diverse protein family, including mainly the Asp-hydrolases of EHs and HDs but also proteins, such as Ser-hydrolases non-heme peroxidases, prolyl iminopetidases and 2-hydroxymuconic semialdehyde hydrolases. These proteins have a highly conserved structure, but display a remarkable diversity in sequence and function. A total of 305 protein entries were assigned to 14 homologous families, forming two superfamilies. Annotated multisequence alignments and phylogenetic trees are provided for each homologous family and superfamily. Experimentally derived structures of 19 proteins are superposed and consistently annotated. Sequence and structure of all 305 proteins were systematically analysed. Thus, deeper insight is gained into the role of a highly conserved sequence motifs and structural elements. AVAILABILITY: The EH/HD database is available at http://www.led.uni-stuttgart.de  相似文献   

6.
The Lipase Engineering Database (LED) (http://www.led.uni-stuttgart.de) integrates information on sequence, structure, and function of lipases, esterases, and related proteins. Sequence data on 806 protein entries are assigned to 38 homologous families, which are grouped into 16 superfamilies with no global sequence similarity between each other. For each family, multisequence alignments are provided with functionally relevant residues annotated. Pre-calculated phylogenetic trees allow navigation inside superfamilies. Experimental structures of 45 proteins are superposed and consistently annotated. The LED has been applied to systematically analyze sequence-structure-function relationships of this vast and diverse enzyme class. It is a useful tool to identify functionally relevant residues apart from the active site residues, and to design mutants with desired substrate specificity.  相似文献   

7.
Reconstructing the evolutionary history of protein sequences will provide a better understanding of divergence mechanisms of protein superfamilies and their functions. Long-term protein evolution often includes dynamic changes such as insertion, deletion, and domain shuffling. Such dynamic changes make reconstructing protein sequence evolution difficult and affect the accuracy of molecular evolutionary methods, such as multiple alignments and phylogenetic methods. Unfortunately, currently available simulation methods are not sufficiently flexible and do not allow biologically realistic dynamic protein sequence evolution. We introduce a new method, indel-Seq-Gen (iSG), that can simulate realistic evolutionary processes of protein sequences with insertions and deletions (indels). Unlike other simulation methods, iSG allows the user to simulate multiple subsequences according to different evolutionary parameters, which is necessary for generating realistic protein families with multiple domains. iSG tracks all evolutionary events including indels and outputs the "true" multiple alignment of the simulated sequences. iSG can also generate a larger sequence space by allowing the use of multiple related root sequences. With all these functions, iSG can be used to test the accuracy of, for example, multiple alignment methods, phylogenetic methods, evolutionary hypotheses, ancestral protein reconstruction methods, and protein family classification methods. We empirically evaluated the performance of iSG against currently available methods by simulating the evolution of the G protein-coupled receptor and lipocalin protein families. We examined their true multiple alignments, reconstruction of the transmembrane regions and beta-strands, and the results of similarity search against a protein database using the simulated sequences. We also presented an example of using iSG for examining how phylogenetic reconstruction is affected by high indel rates.  相似文献   

8.
The available genomic sequences of five closely related hemiascomycetous yeast species (Kluyveromyces lactis, Kluyveromyces waltii, Candida glabrata, Ashbya (Eremothecium) gossypii with Saccharomyces cerevisiae as a reference) were analysed to identify multidrug resistance (MDR) transport proteins belonging to the ATP-binding cassette (ABC) and major facilitator superfamilies (MFS), respectively. The phylogenetic trees clearly demonstrate that a similar set of gene (sub)families already existed in the common ancestor of all five fungal species studied. However, striking differences exist between the two superfamilies with respect to the evolution of the various subfamilies. Within the ABC superfamily all six half-size transporters with six transmembrane-spanning domains (TMs) and most full-size transporters with 12 TMs have one and only one gene per genome. An exception is the PDR family, in which gene duplications and deletions have occurred independently in individual genomes. Among the MFS transporters, the DHA2 family (TC 2.A.1.3) is more variable between species than the DHA1 family (TC 2.A.1.2). Conserved gene order relationships allow to trace the evolution of most (sub)families, for which the Kluyveromyces lactis genome can serve as an optimal scaffold. Cross-species sequence alignment of orthologous upstream gene sequences led to the identification of conserved sequence motifs ("phylogenetic footprints"). Almost half of them match known sequence motifs for the MDR regulators described in S. cerevisiae. The biological significance of those and of the novel predicted motifs awaits to be confirmed experimentally.  相似文献   

9.
The ITS1, ITS2, and 5.8S gene sequences of nuclear ribosomal DNA from 40 taxa of the family Heteroderidae (including the genera Afenestrata, Cactodera, Heterodera, Globodera, Punctodera, Meloidodera, Cryphodera, and Thecavermiculatus) were sequenced and analyzed. The ITS regions displayed high levels of sequence divergence within Heteroderinae and compared to outgroup taxa. Unlike recent findings in root knot nematodes, ITS sequence polymorphism does not appear to complicate phylogenetic analysis of cyst nematodes. Phylogenetic analyses with maximum-parsimony, minimum-evolution, and maximum-likelihood methods were performed with a range of computer alignments, including elision and culled alignments. All multiple alignments and phylogenetic methods yielded similar basic structure for phylogenetic relationships of Heteroderidae. The cyst-forming nematodes are represented by six main clades corresponding to morphological characters and host specialization, with certain clades assuming different positions depending on alignment procedure and/or method of phylogenetic inference. Hypotheses of monophyly of Punctoderinae and Heteroderinae are, respectively, strongly and moderately supported by the ITS data across most alignments. Close relationships were revealed between the Avenae and the Sacchari groups and between the Humuli group and the species H. salixophila within Heteroderinae. The Goettingiana group occupies a basal position within this subfamily. The validity of the genera Afenestrata and Bidera was tested and is discussed based on molecular data. We conclude that ITS sequence data are appropriate for studies of relationships within the different species groups and less so for recovery of more ancient speciations within Heteroderidae.  相似文献   

10.
PASS2 is a nearly automated version of CAMPASS and contains sequence alignments of proteins grouped at the level of superfamilies. This database has been created to fall in correspondence with SCOP database (1.53 release) and currently consists of 110 multi-member superfamilies and 613 superfamilies corresponding to single members. In multi-member superfamilies, protein chains with no more than 25% sequence identity have been considered for the alignment and hence the database aims to address sequence alignments which represent 26 219 protein domains under the SCOP 1.53 release. Structure-based sequence alignments have been obtained by COMPARER and the initial equivalences are provided automatically from a MALIGN alignment and subsequently augmented using STAMP4.0. The final sequence alignments have been annotated for the structural features using JOY4.0. Several interesting links are provided to other related databases and genome sequence relatives. Availability of reliable sequence alignments of distantly related proteins, despite poor sequence identity and single-member superfamilies, permit better sampling of structures in libraries for fold recognition of new sequences and for the understanding of protein structure–function relationships of individual superfamilies. The database can be queried by keywords and also by sequence search, interfaced by PSI-BLAST methods. Structure-annotated sequence alignments and several structural accessory files can be retrieved for all the superfamilies including the user-input sequence. The database can be accessed from http://www.ncbs.res.in/%7Efaculty/mini/campass/pass.html.  相似文献   

11.
Members of a superfamily of proteins could result from divergent evolution of homologues with insignificant similarity in the amino acid sequences. A superfamily relationship is detected commonly after the three-dimensional structures of the proteins are determined using X-ray analysis or NMR. The SUPFAM database described here relates two homologous protein families in a multiple sequence alignment database of either known or unknown structure. The present release (1.1), which is the first version of the SUPFAM database, has been derived by analysing Pfam, which is one of the commonly used databases of multiple sequence alignments of homologous proteins. The first step in establishing SUPFAM is to relate Pfam families with the families in PALI, which is an alignment database of homologous proteins of known structure that is derived largely from SCOP. The second step involves relating Pfam families which could not be associated reliably with a protein superfamily of known structure. The profile matching procedure, IMPALA, has been used in these steps. The first step resulted in identification of 1280 Pfam families (out of 2697, i.e. 47%) which are related, either by close homologous connection to a SCOP family or by distant relationship to a SCOP family, potentially forming new superfamily connections. Using the profiles of 1417 Pfam families with apparently no structural information, an all-against-all comparison involving a sequence-profile match using IMPALA resulted in clustering of 67 homologous protein families of Pfam into 28 potential new superfamilies. Expansion of groups of related proteins of yet unknown structural information, as proposed in SUPFAM, should help in identifying ‘priority proteins’ for structure determination in structural genomics initiatives to expand the coverage of structural information in the protein sequence space. For example, we could assign 858 distinct Pfam domains in 2203 of the gene products in the genome of Mycobacterium tubercolosis. Fifty-one of these Pfam families of unknown structure could be clustered into 17 potentially new superfamilies forming good targets for structural genomics. SUPFAM database can be accessed at http://pauling.mbu.iisc.ernet.in/~supfam.  相似文献   

12.
水螨群总科阶元系统发育的支序分析 (蜱螨亚纲:水螨群)   总被引:1,自引:0,他引:1  
金道超 《昆虫学报》2000,43(3):309-317
对水螨群9总科进行了系统发育分析,支序分析选用了23个形态学特征和3个生物学特征。据分析结果所揭示的9总科间的系统发育关系和姐妹群关系,将水螨群9总科划分为5类:拟水螨类,含冥绒螨总科;始水螨类,含溪螨总科;真水螨类,含古水螨类和新水螨类;古水螨类,含水螨总科、盾水螨总科和皱喙螨总科;新水螨类,含刺触螨总科、腺水螨总科、湿螨总科和雄尾螨总科。类间姐妹群关系为:拟水螨类与始水螨类+真水螨类为姐妹群,始水螨类与真水螨类(古水螨类+新水螨类)为姐妹群,古水螨类与新水螨类为姐妹群。该文还就所提出的水螨群5类9总科的阶元排列建议与已有的观点进行了比较。  相似文献   

13.
We report the latest release (version 1.6) of the CATH protein domains database (http://www.biochem.ucl. ac.uk/bsm/cath ). This is a hierarchical classification of 18 577 domains into evolutionary families and structural groupings. We have identified 1028 homo-logous superfamilies in which the proteins have both structural, and sequence or functional similarity. These can be further clustered into 672 fold groups and 35 distinct architectures. Recent developments of the database include the generation of 3D templates for recognising structural relatives in each fold group, which has led to significant improvements in the speed and accuracy of updating the database and also means that less manual validation is required. We also report the establishment of the CATH-PFDB (Protein Family Database), which associates 1D sequences with the 3D homologous superfamilies. Sequences showing identifiable homology to entries in CATH have been extracted from GenBank using PSI-BLAST. A CATH-PSIBLAST server has been established, which allows you to scan a new sequence against the database. The CATH Dictionary of Homologous Superfamilies (DHS), which contains validated multiple structural alignments annotated with consensus functional information for evolutionary protein superfamilies, has been updated to include annotations associated with sequence relatives identified in GenBank. The DHS is a powerful tool for considering the variation of functional properties within a given CATH superfamily and in deciding what functional properties may be reliably inherited by a newly identified relative.  相似文献   

14.
SUMMARY: The Cytochrome P450 Engineering Database (CYPED) has been designed to serve as a tool for a comprehensive and systematic comparison of protein sequences and structures within the vast and diverse family of cytochrome P450 monooxygenases (CYPs). The CYPED currently integrates sequence and structure data of 3911 and 25 proteins, respectively. Proteins are grouped into homologous families and superfamilies according to Nelson's classification. Nonclassified CYP sequences are assigned by similarity. Functionally relevant residues are annotated. The web accessible version contains multisequence alignments, phylogenetic trees and HMM profiles. The CYPED is regularly updated and supplies all data for download. Thus, it provides a valuable data source for phylogenetic analysis, investigation of sequence-function relationships and the design of CYPs with improved biochemical properties. Abbreviations: Cytochrome P450 Engineering Database, CYPED; cytochrome P450 monooxygenase, CYP; Hidden Markov Model, HMM. AVAILABILITY: www.cyped.uni-stuttgart.de  相似文献   

15.
Protein sequence alignments are more reliable the shorter the evolutionary distance. Here, we align distantly related proteins using many closely spaced intermediate sequences as stepping stones. Such transitive alignments can be generated between any two proteins in a connected set, whether they are direct or indirect sequence neighbors in the underlying library of pairwise alignments. We have implemented a greedy algorithm, MaxFlow, using a novel consistency score to estimate the relative likelihood of alternative paths of transitive alignment. In contrast to traditional profile models of amino acid preferences, MaxFlow models the probability that two positions are structurally equivalent and retains high information content across large distances in sequence space. Thus, MaxFlow is able to identify sparse and narrow active-site sequence signatures which are embedded in high-entropy sequence segments in the structure based multiple alignment of large diverse enzyme superfamilies. In a challenging benchmark based on the urease superfamily, MaxFlow yields better reliability and double coverage compared to available sequence alignment software. This promises to increase information returns from functional and structural genomics, where reliable sequence alignment is a bottleneck to transferring the functional or structural characterization of model proteins to entire protein superfamilies.  相似文献   

16.
A previous report identified and classified a small family of gram-negative bacterial drug and heavy metal efflux permeases, now commonly referred to as the RND family (TC no. 2.6). We here show that this family is actually a ubiquitous superfamily with representation in all major kingdoms. We report phylogenetic analyses that define seven families within the RND superfamily as follows: (1) the heavy metal efflux (HME) family (gram negative bacteria), (2) the hydrophobe/amphiphile efflux-1 (HAE1) family (gram negative bacteria), (3) the nodulation factor exporter (NFE) family (gram negative bacteria), (4) the SecDF protein-secretion accessory protein (SecDF) family (gram negative and gram positive bacteria as well as archaea), (5) the hydrophobe/amphiphile efflux-2 (HAE2) family (gram positive bacteria), (6) the eukaryotic sterol homeostasis (ESH) family, and (7) the hydrophobe/amphiphile efflux-3 (HAE3) family (archaea and spirochetes). Functionally uncharacterized proteins were identified that are members of the RND superfamily but fall outside of these seven families. Some of the eukaryotic homologues function as enzymes and receptors instead of (or in addition to) transporters. The sizes and topological patterns exhibited by members of all seven families are shown to be strikingly similar, and statistical analyses establish common descent. Multiple alignments of proteins within each family allow derivation of family-specific signature sequences. Structural, functional, mechanistic and evolutionary implication of the reported results are discussed.  相似文献   

17.
Shotgun: getting more from sequence similarity searches.   总被引:1,自引:0,他引:1  
MOTIVATION: As genomic sequencing reveals the range of structural classes generated through the evolution of proteins, analysis of the superfamilies to which they belong can contribute important insights for understanding their structure-function relationships. Current database search techniques fall short of identifying the majority of distant sequence relationships at statistically significant levels. We developed the Shotgun program in an effort to enhance the sensitivity and utility of current database search output. RESULTS: We have developed and used the Shotgun program to identify both new superfamily members and to reconstruct several known enzyme superfamilies using BLAST database searches. An analysis of the false-positive rates generated in the analysis and other control experiments provides evidence that high Shotgun scores indicate real evolutionary relationships. Shotgun is also a useful tool for identifying subgroup relationships within superfamilies and for testing hypotheses about related protein families. AVAILABILITY: By request from the Babbitt lab homepage: http://mako.cgl.ucsf. edu/babbittlab/ CONTACT: babbitt@cgl.ucsf.edu  相似文献   

18.
The dramatic increase in heterogeneous types of biological data—in particular, the abundance of new protein sequences—requires fast and user-friendly methods for organizing this information in a way that enables functional inference. The most widely used strategy to link sequence or structure to function, homology-based function prediction, relies on the fundamental assumption that sequence or structural similarity implies functional similarity. New tools that extend this approach are still urgently needed to associate sequence data with biological information in ways that accommodate the real complexity of the problem, while being accessible to experimental as well as computational biologists. To address this, we have examined the application of sequence similarity networks for visualizing functional trends across protein superfamilies from the context of sequence similarity. Using three large groups of homologous proteins of varying types of structural and functional diversity—GPCRs and kinases from humans, and the crotonase superfamily of enzymes—we show that overlaying networks with orthogonal information is a powerful approach for observing functional themes and revealing outliers. In comparison to other primary methods, networks provide both a good representation of group-wise sequence similarity relationships and a strong visual and quantitative correlation with phylogenetic trees, while enabling analysis and visualization of much larger sets of sequences than trees or multiple sequence alignments can easily accommodate. We also define important limitations and caveats in the application of these networks. As a broadly accessible and effective tool for the exploration of protein superfamilies, sequence similarity networks show great potential for generating testable hypotheses about protein structure-function relationships.  相似文献   

19.
PALI (release 1.2) contains three-dimensional (3-D) structure-dependent sequence alignments as well as structure-based phylogenetic trees of homologous protein domains in various families. The data set of homologous protein structures has been derived by consulting the SCOP database (release 1.50) and the data set comprises 604 families of homologous proteins involving 2739 protein domain structures with each family made up of at least two members. Each member in a family has been structurally aligned with every other member in the same family (pairwise alignment) and all the members in the family are also aligned using simultaneous super-position (multiple alignment). The structural alignments are performed largely automatically, with manual interventions especially in the cases of distantly related proteins, using the program STAMP (version 4.2). Every family is also associated with two dendrograms, calculated using PHYLIP (version 3.5), one based on a structural dissimilarity metric defined for every pairwise alignment and the other based on similarity of topologically equivalent residues. These dendrograms enable easy comparison of sequence and structure-based relationships among the members in a family. Structure-based alignments with the details of structural and sequence similarities, superposed coordinate sets and dendrograms can be accessed conveniently using a web interface. The database can be queried for protein pairs with sequence or structural similarities falling within a specified range. Thus PALI forms a useful resource to help in analysing the relationship between sequence and structure variation at a given level of sequence similarity. PALI also contains over 653 'orphans' (single member families). Using the web interface involving PSI_BLAST and PHYLIP it is possible to associate the sequence of a new protein with one of the families in PALI and generate a phylogenetic tree combining the query sequence and proteins of known 3-D structure. The database with the web interfaced search and dendrogram generation tools can be accessed at http://pauling.mbu.iisc.ernet. in/ approximately pali.  相似文献   

20.
Evolution and phylogenetic utility of the period gene in Lepidoptera   总被引:6,自引:0,他引:6  
Evolution and phylogenetic utility of the period gene are explored through sequence analysis of a relatively conserved 909-bp fragment in 26 lepidopteran species. Taxa range from tribes to superfamilies, primarily within the putative clade Macrolepidotera plus near outgroups, and include both strongly established and problematic groupings. Their divergence dates probably range from the late Cretaceous through much of the Tertiary. Comparisons within the same set of closely related species show that amino acid substitutions in period occur 4.9 and 44 times as frequently as they do in two other nuclear genes--dopa decarboxylase and elongation factor-1 alpha, respectively. In contrast, rates of observed synonymous substitution are within 60% of each other for these three genes. Synonymous changes in period approach saturation by the family level, whereas nonsynonymous and amino acid divergences across the Macrolepidoptera are less than half the maximal values reported for this gene. Phylogenetic analyses of period strongly supported groupings at the family level and below. In contrast to previous analyses at this level with other nuclear genes, much of the information lies in nonsynonymous change. Relationships up to the superfamily level were recovered with decreasing effectiveness, and little, if any, signal was apparent regarding relationships among superfamilies. This could reflect rapid radiation of the superfamilies, however, rather than saturation in the period locus; thus, period, in combination with other genes, remains a plausible candidate for approaching the difficult problems of lepidopteran family and superfamily relationships.   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号