首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Han LY  Cai CZ  Ji ZL  Cao ZW  Cui J  Chen YZ 《Nucleic acids research》2004,32(21):6437-6444
The function of a protein that has no sequence homolog of known function is difficult to assign on the basis of sequence similarity. The same problem may arise for homologous proteins of different functions if one is newly discovered and the other is the only known protein of similar sequence. It is desirable to explore methods that are not based on sequence similarity. One approach is to assign functional family of a protein to provide useful hint about its function. Several groups have employed a statistical learning method, support vector machines (SVMs), for predicting protein functional family directly from sequence irrespective of sequence similarity. These studies showed that SVM prediction accuracy is at a level useful for functional family assignment. But its capability for assignment of distantly related proteins and homologous proteins of different functions has not been critically and adequately assessed. Here SVM is tested for functional family assignment of two groups of enzymes. One consists of 50 enzymes that have no homolog of known function from PSI-BLAST search of protein databases. The other contains eight pairs of homologous enzymes of different families. SVM correctly assigns 72% of the enzymes in the first group and 62% of the enzyme pairs in the second group, suggesting that it is potentially useful for facilitating functional study of novel proteins. A web version of our software, SVMProt, is accessible at http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi.  相似文献   

2.
Matrix metalloproteinase (MMPs) and disintegrin and metalloprotease (ADAMs) belong to the zinc-dependent metalloproteinase family of proteins. These proteins participate in various physiological and pathological states. Thus, prediction of these proteins using amino acid sequence would be helpful. We have developed a method to predict these proteins based on the features derived from Chou’s pseudo amino acid composition (PseAAC) server and support vector machine (SVM) as a powerful machine learning approach. With this method, for ADAMs and MMPs families, an overall accuracy and Matthew’s correlation coefficient (MCC) of 95.89 and 0.90% were achieved respectively. Furthermore, the method is able to predict two major subclasses of MMP family; Furin-activated secreted MMPs and Type II trans-membrane; with MCC of 0.89 and 0.91%, respectively. The overall accuracy for Furin-activated secreted MMPs and Type II trans-membrane was 98.18 and 99.07, respectively. Our data demonstrates an effective classification of Metalloproteinase family based on the concept of PseAAC and SVM.  相似文献   

3.
Cai CZ  Han LY  Ji ZL  Chen YZ 《Proteins》2004,55(1):66-76
One approach for facilitating protein function prediction is to classify proteins into functional families. Recent studies on the classification of G-protein coupled receptors and other proteins suggest that a statistical learning method, Support vector machines (SVM), may be potentially useful for protein classification into functional families. In this work, SVM is applied and tested on the classification of enzymes into functional families defined by the Enzyme Nomenclature Committee of IUBMB. SVM classification system for each family is trained from representative enzymes of that family and seed proteins of Pfam curated protein families. The classification accuracy for enzymes from 46 families and for non-enzymes is in the range of 50.0% to 95.7% and 79.0% to 100% respectively. The corresponding Matthews correlation coefficient is in the range of 54.1% to 96.1%. Moreover, 80.3% of the 8,291 correctly classified enzymes are uniquely classified into a specific enzyme family by using a scoring function, indicating that SVM may have certain level of unique prediction capability. Testing results also suggest that SVM in some cases is capable of classification of distantly related enzymes and homologous enzymes of different functions. Effort is being made to use a more comprehensive set of enzymes as training sets and to incorporate multi-class SVM classification systems to further enhance the unique prediction accuracy. Our results suggest the potential of SVM for enzyme family classification and for facilitating protein function prediction. Our software is accessible at http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi.  相似文献   

4.
The 2-hydroxycarboxylate transporter (2HCT) family of secondary transporters belongs to a much larger structural class of secondary transporters termed ST3 which contains about 2000 transporters in 32 families. The transporters of the 2HCT family are among the best studied in the class. Here we detect weak sequence similarity between the N- and C-terminal halves of the proteins using a sensitive method which uses a database containing the N- and C-terminal halves of all the sequences in ST3 and involves blast searches of each sequence in the database against the whole database. Unrelated families of secondary transporters of the same length and composition were used as controls. The sequence similarity involved major parts of the N- and C-terminal halves and not just a small stretch. The membrane topology of the homologous N- and C-terminal domains was deduced from the experimentally determined topology of the members of the 2HCT family. The domains consist of five transmembrane segments each and have opposite orientations in the membrane. The N terminus of the N-terminal domain is extracellular, while the N terminus of the C-terminal domain is cytoplasmic. The loops between the fourth and fifth transmembrane segment in each domain are well conserved throughout the class and contain a high fraction of residues with small side chains, Gly, Ala and Ser. Experimental work on the citrate transporter CitS in the 2HCT family indicates that the loops are re-entrant or pore loops. The re-entrant loops in the N- and C-terminal domains enter the membrane from opposite sides (trans-re-entrant loops). The combination of inverted membrane topology and trans-re-entrant loops represents a new fold for secondary transporters and resembles the structure of aquaporins and models proposed for Na+/Ca2+ exchangers.  相似文献   

5.
The multidrug/oligosaccharidyl-lipid/polysaccharide (MOP) exporter superfamily (TC #2.A.66) consists of four previously recognized families: (a) the ubiquitous multi-drug and toxin extrusion (MATE) family; (b) the prokaryotic polysaccharide transporter (PST) family; (c) the eukaryotic oligosaccharidyl-lipid flippase (OLF) family and (d) the bacterial mouse virulence factor family (MVF). Of these four families, only members of the MATE family have been shown to function mechanistically as secondary carriers, and no member of the MVF family has been shown to function as a transporter. Establishment of a common origin for the MATE, PST, OLF and MVF families suggests a common mechanism of action as secondary carriers catalyzing substrate/cation antiport. Most protein members of these four families exhibit 12 putative transmembrane alpha-helical segments (TMSs), and several have been shown to have arisen by an internal gene duplication event; topological variation is observed for some members of the superfamily. The PST family is more closely related to the MATE, OLF and MVF families than any of these latter three families are related to each other. This fact leads to the suggestion that primordial proteins most closely related to the PST family were the evolutionary precursors of all members of the MOP superfamily. Here, phylogenetic trees and average hydropathy, similarity and amphipathicity plots for members of the four families are derived and provide detailed evolutionary and structural information about these proteins. We show that each family exhibits unique characteristics. For example, the MATE and PST families are characterized by numerous paralogues within a single organism (58 paralogues of the MATE family are present in Arabidopsis thaliana), while the OLF family consists exclusively of orthologues, and the MVF family consists primarily of orthologues. Only in the PST family has extensive lateral transfer of the encoding genes occurred, and in this family as well as the MVF family, topological variation is a characteristic feature. The results serve to define a large superfamily of transporters that we predict function to export substrates using a monovalent cation antiport mechanism.  相似文献   

6.
Lipid binding proteins play important roles in signaling, regulation, membrane trafficking, immune response, lipid metabolism, and transport. Because of their functional and sequence diversity, it is desirable to explore additional methods for predicting lipid binding proteins irrespective of sequence similarity. This work explores the use of support vector machines (SVMs) as such a method. SVM prediction systems are developed using 14,776 lipid binding and 133,441 nonlipid binding proteins and are evaluated by an independent set of 6,768 lipid binding and 64,761 nonlipid binding proteins. The computed prediction accuracy is 78.9, 79.5, 82.2, 79.5, 84.4, 76.6, 90.6, 79.0, and 89.9% for lipid degradation, lipid metabolism, lipid synthesis, lipid transport, lipid binding, lipopolysaccharide biosynthesis, lipoprotein, lipoyl, and all lipid binding proteins, respectively. The accuracy for the nonmember proteins of each class is 99.9, 99.2, 99.6, 99.8, 99.9, 99.8, 98.5, 99.9, and 97.0%, respectively. Comparable accuracies are obtained when homologous proteins are considered as one, or by using a different SVM kernel function. Our method predicts 86.8% of the 76 lipid binding proteins nonhomologous to any protein in the Swiss-Prot database and 89.0% of the 73 known lipid binding domains as lipid binding. These findings suggest the usefulness of SVMs for facilitating the prediction of lipid binding proteins. Our software can be accessed at the SVMProt server (http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi).  相似文献   

7.
氮素是植物生长发育的重要营养元素,也是限制植物生物量尤其是经济产量的关键营养元素之一.植物不仅能从外界获取无机氮素(硝酸根、铵根和尿素等),还能以氨基酸、寡肽等形式获取有机氮素.植物已进化出复杂的运输系统来吸收与运输这些含氮化合物.硝酸根运输基因家族分为低亲和力硝酸根运输基因(low-affmity nitrate t...  相似文献   

8.
Several studies based on the known three-dimensional (3-D) structures of proteins show that two homologous proteins with insignificant sequence similarity could adopt a common fold and may perform same or similar biochemical functions. Hence, it is appropriate to use similarities in 3-D structure of proteins rather than the amino acid sequence similarities in modelling evolution of distantly related proteins. Here we present an assessment of using 3-D structures in modelling evolution of homologous proteins. Using a dataset of 108 protein domain families of known structures with at least 10 members per family we present a comparison of extent of structural and sequence dissimilarities among pairs of proteins which are inputs into the construction of phylogenetic trees. We find that correlation between the structure-based dissimilarity measures and the sequence-based dissimilarity measures is usually good if the sequence similarity among the homologues is about 30% or more. For protein families with low sequence similarity among the members, the correlation coefficient between the sequence-based and the structure-based dissimilarities are poor. In these cases the structure-based dendrogram clusters proteins with most similar biochemical functional properties better than the sequence-similarity based dendrogram. In multi-domain protein families and disulphide-rich protein families the correlation coefficient for the match of sequence-based and structure-based dissimilarity (SDM) measures can be poor though the sequence identity could be higher than 30%. Hence it is suggested that protein evolution is best modelled using 3-D structures if the sequence similarities (SSM) of the homologues are very low.  相似文献   

9.
Bicarbonate is not freely permeable to membranes. Yet, bicarbonate must be moved across membranes, as part of CO2 metabolism and to regulate cell pH. Mammalian cells ubiquitously express bicarbonate transport proteins to facilitate the transmembrane bicarbonate flux. These bicarbonate transporters, which function by different transport mechanisms, together catalyse transmembrane bicarbonate movement. Recent advances have allowed the identification of several new bicarbonate transporter genes. Bicarbonate transporters cluster into two separate families: (i) the anion exachanger (AE) family of Cl-/HCO3- exchangers is related in sequence to the NBC family of Na+/HCO3- cotransporters and the Na(+)-dependent Cl/HCO3- exchangers and (ii) some members of the SLC26a family of sulfate transporters will also transport bicarbonate but are not related in sequence to the AE/NBC family of transporters. This review summarizes our understanding of the mammalian bicarbonate transporter superfamily.  相似文献   

10.
Phylogenetic relationships within cation transporter families of Arabidopsis   总被引:48,自引:0,他引:48  
Uptake and translocation of cationic nutrients play essential roles in physiological processes including plant growth, nutrition, signal transduction, and development. Approximately 5% of the Arabidopsis genome appears to encode membrane transport proteins. These proteins are classified in 46 unique families containing approximately 880 members. In addition, several hundred putative transporters have not yet been assigned to families. In this paper, we have analyzed the phylogenetic relationships of over 150 cation transport proteins. This analysis has focused on cation transporter gene families for which initial characterizations have been achieved for individual members, including potassium transporters and channels, sodium transporters, calcium antiporters, cyclic nucleotide-gated channels, cation diffusion facilitator proteins, natural resistance-associated macrophage proteins (NRAMP), and Zn-regulated transporter Fe-regulated transporter-like proteins. Phylogenetic trees of each family define the evolutionary relationships of the members to each other. These families contain numerous members, indicating diverse functions in vivo. Closely related isoforms and separate subfamilies exist within many of these gene families, indicating possible redundancies and specialized functions. To facilitate their further study, the PlantsT database (http://plantst.sdsc.edu) has been created that includes alignments of the analyzed cation transporters and their chromosomal locations.  相似文献   

11.
Cai CZ  Han LY  Ji ZL  Chen X  Chen YZ 《Nucleic acids research》2003,31(13):3692-3697
Prediction of protein function is of significance in studying biological processes. One approach for function prediction is to classify a protein into functional family. Support vector machine (SVM) is a useful method for such classification, which may involve proteins with diverse sequence distribution. We have developed a web-based software, SVMProt, for SVM classification of a protein into functional family from its primary sequence. SVMProt classification system is trained from representative proteins of a number of functional families and seed proteins of Pfam curated protein families. It currently covers 54 functional families and additional families will be added in the near future. The computed accuracy for protein family classification is found to be in the range of 69.1-99.6%. SVMProt shows a certain degree of capability for the classification of distantly related proteins and homologous proteins of different function and thus may be used as a protein function prediction tool that complements sequence alignment methods. SVMProt can be accessed at http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi.  相似文献   

12.
The available genomic sequences of five closely related hemiascomycetous yeast species (Kluyveromyces lactis, Kluyveromyces waltii, Candida glabrata, Ashbya (Eremothecium) gossypii with Saccharomyces cerevisiae as a reference) were analysed to identify multidrug resistance (MDR) transport proteins belonging to the ATP-binding cassette (ABC) and major facilitator superfamilies (MFS), respectively. The phylogenetic trees clearly demonstrate that a similar set of gene (sub)families already existed in the common ancestor of all five fungal species studied. However, striking differences exist between the two superfamilies with respect to the evolution of the various subfamilies. Within the ABC superfamily all six half-size transporters with six transmembrane-spanning domains (TMs) and most full-size transporters with 12 TMs have one and only one gene per genome. An exception is the PDR family, in which gene duplications and deletions have occurred independently in individual genomes. Among the MFS transporters, the DHA2 family (TC 2.A.1.3) is more variable between species than the DHA1 family (TC 2.A.1.2). Conserved gene order relationships allow to trace the evolution of most (sub)families, for which the Kluyveromyces lactis genome can serve as an optimal scaffold. Cross-species sequence alignment of orthologous upstream gene sequences led to the identification of conserved sequence motifs ("phylogenetic footprints"). Almost half of them match known sequence motifs for the MDR regulators described in S. cerevisiae. The biological significance of those and of the novel predicted motifs awaits to be confirmed experimentally.  相似文献   

13.
The Transporter Classification (TC) system is a functional/phylogenetic system designed for the classification of all transmembrane transport proteins found in living organisms on Earth. It parallels but differs from the strictly functional EC system developed decades ago by the Enzyme Commission of the International Union of Biochemistry and Molecular Biology (IUBMB) for the classification of enzymes. Recently, the TC system has been adopted by the IUBMB as the internationally acclaimed system for the classification of transporters. Here we present the characteristics of the nearly 400 families of transport systems included in the TC system and provide statistical analyses of these families and their constituent proteins. Specifically, we analyze the transporter types for size and topological differences and analyze the families for the numbers and organismal sources of their constituent members. We show that channels and carriers exhibit distinctive structural and topological features. Bacterial-specific families outnumber eukaryotic-specific families about 2 to 1, while ubiquitous families, found in all three domains of life, are about half as numerous as eukaryotic-specific families. The results argue against appreciable horizontal transfer of genes encoding transporters between the three domains of life over the last 2 billion years.  相似文献   

14.
SET-domain proteins of the Su(var)3-9, E(z) and trithorax families   总被引:13,自引:0,他引:13  
Alvarez-Venegas R  Avramova Z 《Gene》2002,285(1-2):25-37
SET-domain (SET: Su(var)3-9, E(z) and Trithorax)-containing proteins were collected through sequence searches of the available databases. After removing redundancies, the proteins belonging to three families, SU(VAR)3-9, E(Z) and Trithorax, were selected. Analysis of the relationship between the different members is based on pairwise alignment, compilation, and comparison of their SET-domains. The level of homology of the SET-domains defined the distribution of the proteins into families and into clades within the families. The architecture of the entire protein supported the distribution pattern built upon SET-domain similarity. Parallel cladistic and protein-architecture analyses outlined two plausible criteria for predicting function.  相似文献   

15.
The Transporter Classification (TC) system is a functional/phylogenetic system designed for the classification of all transmembrane transport proteins found in living organisms on Earth. It parallels but differs from the strictly functional EC system developed decades ago by the Enzyme Commission of the International Union of Biochemistry and Molecular Biology (IUBMB) for the classification of enzymes. Recently, the TC system has been adopted by the IUBMB as the internationally acclaimed system for the classification of transporters. Here we present the characteristics of the nearly 400 families of transport systems included in the TC system and provide statistical analyses of these families and their constituent proteins. Specifically, we analyze the transporter types for size and topological differences and analyze the families for the numbers and organismal sources of their constituent members. We show that channels and carriers exhibit distinctive structural and topological features. Bacterial-specific families outnumber eukaryotic-specific families about 2 to 1, while ubiquitous families, found in all three domains of life, are about half as numerous as eukaryotic-specific families. The results argue against appreciable horizontal transfer of genes encoding transporters between the three domains of life over the last 2 billion years.  相似文献   

16.
Eukaryotic zinc transporters and their regulation   总被引:49,自引:0,他引:49  
  相似文献   

17.
The Amino acid-Polyamine-Organocation (APC) superfamily is the main family of amino acid transporters found in all domains of life and one of the largest families of secondary transporters. Here, using a sensitive homology threading approach and modelling we show that the predicted structure of APC members is extremely similar to the crystal structures of several prokaryotic transporters belonging to evolutionary distinct protein families with different substrate specificities. All of these proteins, despite having no primary amino acid sequence similarity, share a similar structural core, consisting of two V-shaped domains of five transmembrane domains each, intertwined in an antiparallel topology. Based on this model, we reviewed available data on functional mutations in bacterial, fungal and mammalian APCs and obtained novel mutational data, which provide compelling evidence that the amino acid binding pocket is located in the vicinity of the unwound part of two broken helices, in a nearly identical position to the structures of similar transporters. Our analysis is fully supported by the evolutionary conservation and specific amino acid substitutions in the proposed substrate binding domains. Furthermore, it allows predictions concerning residues that might be crucial in determining the specificity profile of APC members. Finally, we show that two cytoplasmic loops constitute important functional elements in APCs. Our work along with different kinetic and specificity profiles of APC members in easily manipulated bacterial and fungal model systems could form a unique framework for combining genetic, in-silico and structural studies, for understanding the function of one of the most important transporter families.  相似文献   

18.
MOTIVATION: Protein families can be defined based on structure or sequence similarity. We wanted to compare two protein family databases, one based on structural and one on sequence similarity, to investigate to what extent they overlap, the similarity in definition of corresponding families, and to create a list of large protein families with unknown structure as a resource for structural genomics. We also wanted to increase the sensitivity of fold assignment by exploiting protein family HMMs. RESULTS: We compared Pfam, a protein family database based on sequence similarity, to Scop, which is based on structural similarity. We found that 70% of the Scop families exist in Pfam while 57% of the Pfam families exist in Scop. Most families that occur in both databases correspond well to each other, but in some cases they are different. Such cases highlight situations in which structure and sequence approaches differ significantly. The comparison enabled us to compile a list of the largest families that do not occur in Scop; these are suitable targets for structure prediction and determination, and may be useful to guide projects in structural genomics. It can be noted that 13 out of the 20 largest protein families without a known structure are likely transmembrane proteins. We also exploited Pfam to increase the sensitivity of detecting homologs of proteins with known structure, by comparing query sequences to Pfam HMMs that correspond to Scop families. For SWISSPROT+TREMBL, this yielded an increase in fold assignment from 31% to 42% compared to using FASTA only. This method assigned a structure to 22% of the proteins in Saccharomyces cerevisiae, 24% in Escherichia coli, and 16% in Methanococcus jannaschii.  相似文献   

19.
Ichimaru T  Kikuchi T 《Proteins》2003,51(4):515-530
It is a general notion that proteins with very similar three-dimensional structures would show very similar folding kinetics. However, recent studies reveal that the folding kinetic properties of some proteins contradict this thought (i.e., the members in a same protein family fold through different pathways). For example, it has been reported that some beta-proteins in the intracellular lipid-binding protein family fold through quite different pathways (Burns et al., Proteins 1998;33:107-118). Similar differences in folding kinetics are also observed in the members of the globin family (Nishimura et al., Nat Struct Biol 2000;7:679-686). In our study, we examine the possibility of predicting qualitative differences in folding kinetics of the intracellular lipid-binding proteins and two globin proteins (i.e., myoglobin and leghemoglobin). The problem is tackled by means of a contact map based on the average distance statistics between residues, the Average Distance Map (ADM), as constructed from sequence. The ADMs for the three proteins show overall similarity, but some local differences among maps are also observed. Our results demonstrate that some properties of the protein folding kinetics are consistent with local differences in the ADMs. We also discuss the general possibility of predicting folding kinetics from sequence information.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号