首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A structural class in the MemGen classification of membrane proteins is a set of evolutionary related proteins sharing a similar global fold. A structural class contains both closely related pairs of proteins for which homology is clear from sequence comparison and very distantly related pairs, for which it is not possible to establish homology based on sequence similarity alone. In the latter case the evolutionary link is based on hydropathy profile analysis. Here, we use these evolutionary related sets of proteins to analyze the relationship between E-values in BLAST searches, sequence similarities in multiple sequence alignments and structural similarities in hydropathy profile analyses. Two structural classes of secondary transporters termed ST[3], which includes the Ion Transporter (IT) superfamily and ST[4], which includes the DAACS family (TC# 2.A.23) were extracted from the NCBI protein database. ST[3] contains 2051 unique sequences distributed over 32 families and 59 subfamilies. ST[4] is a smaller class containing 399 unique sequences distributed over 2 families and 7 subfamilies. One subfamily in ST[4] contains a new class of binding protein dependent secondary transporters. Comparison of the averaged hydropathy profiles of the subfamilies in ST[3] and ST[4] revealed that the two classes represent different folds. Divergence of the sequences in ST[4] is much smaller than observed in ST[3], suggesting different constraints on the proteins during evolution. Analysis of the correlation between the evolutionary relationship of pairs of proteins in a class and the BLAST E-value revealed that: (i) the BLAST algorithm is unable to pick up the majority of the links between proteins in structural class ST[3], (ii) "low complexity filtering" and "composition based statistics" improve the specificity, but strongly reduce the sensitivity of BLAST searches for distantly related proteins, indicating that these filters are too stringent for the proteins analyzed, and (iii) the E-value cut-off, which may be used to evaluate evolutionary significance of a hit in a BLAST search is very different for the two structural classes of membrane proteins.  相似文献   

2.
A classification scheme for membrane proteins is proposed that clusters families of proteins into structural classes based on hydropathy profile analysis. The averaged hydropathy profiles of protein families are taken as fingerprints of the 3D structure of the proteins and, therefore, are able to detect more distant evolutionary relationships than amino acid sequences. A procedure was developed in which hydropathy profile analysis is used initially as a filter in a BLAST search of the NCBI protein database. The strength of the procedure is demonstrated by the classification of 29 families of secondary transporters into a single structural class, termed ST[3]. An exhaustive search of the database revealed that the 29 families contain 568 unique sequences. The proteins are predominantly from prokaryotic origin and most of the characterized transporters in ST[3] transport organic and inorganic anions and a smaller number are Na(+)/H(+) antiporters. All modes of energy coupling (symport, antiport, uniport) are found in structural class ST[3]. The relevance of the classification for structure/function prediction of uncharacterised transporters in the class is discussed.  相似文献   

3.
The 2-hydroxycarboxylate transporter (2HCT) family is a family of bacterial secondary transporters for substrates like citrate, malate and lactate. The family is in class ST[3] of the MemGen classification system that groups membrane proteins in structural classes based on hydropathy profile analysis. The combination of computational analysis of the proteins in class ST[3] and available experimental data on members of the 2HCT family has yielded a detailed structural model of the transporters. The core of the model is formed by two homologous domains with opposite orientation in the membrane. Each domain consists of 5 trans membrane segments and contains a pore loop between the 4th and 5th segment. The two pore loops enter the membrane-embedded part from opposite sides of the membrane (trans pore loops) and are believed to form the translocation pathway in the 3D structure. A genome wide study of the cellular location of the C-terminus of all Escherichia coli membrane proteins [Daley et al., 2005. Science 308:1321-1323] showed that the C-termini of the 19 E. coli proteins in class ST[3] were correctly predicted by the structural model.  相似文献   

4.
The MemGen structural classification of membrane proteins groups families of proteins by hydropathy profile alignment. Class ST[3] of the MemGen classification contains 32 families of transporter proteins including the IT superfamily. Transporters from 19 different families in class ST[3] were evaluated by the TopScreen experimental topology screening method to verify the structural classification by MemGen. TopScreen involves the determination of the cellular disposition of three sites in the polypeptide chain of the proteins which allows for discrimination between different topology models. For nearly all transporters at least one of the predicted localizations is different in the models produced by MemGen and predictor TMHMM. Comparison to the experimental data showed that in all cases the prediction by MemGen was correct. It is concluded that the structural model available for transporters of the [st324]ESS and [st326]2HCT families is also valid for the other families in class ST[3]. The core structure of the model consists of two homologous domains, each containing 5 transmembrane segments, which have an opposite orientation in the membrane. A reentrant loop is present in between the 4th and 5th segments in each domain. Nearly all of the identified and experimentally confirmed structural variations involve additions of transmembrane segments at the boundaries of the core model, at the N- and C-termini or in between the two domains. Most remarkable is a domain swap in two subfamilies of the [st312]NHAC family that results in an inverted orientation of the proteins in the membrane.  相似文献   

5.
Yang JM  Tung CH 《Nucleic acids research》2006,34(13):3646-3659
As more protein structures become available and structural genomics efforts provide structural models in a genome-wide strategy, there is a growing need for fast and accurate methods for discovering homologous proteins and evolutionary classifications of newly determined structures. We have developed 3D-BLAST, in part, to address these issues. 3D-BLAST is as fast as BLAST and calculates the statistical significance (E-value) of an alignment to indicate the reliability of the prediction. Using this method, we first identified 23 states of the structural alphabet that represent pattern profiles of the backbone fragments and then used them to represent protein structure databases as structural alphabet sequence databases (SADB). Our method enhanced BLAST as a search method, using a new structural alphabet substitution matrix (SASM) to find the longest common substructures with high-scoring structured segment pairs from an SADB database. Using personal computers with Intel Pentium4 (2.8 GHz) processors, our method searched more than 10 000 protein structures in 1.3 s and achieved a good agreement with search results from detailed structure alignment methods. [3D-BLAST is available at http://3d-blast.life.nctu.edu.tw].  相似文献   

6.
Leucine-rich repeats (LRRs) with 20-30 amino acids in unit length are present in many proteins from prokaryotes to eukaryotes. The LRR-containing proteins include a family of nine small proteoglycans, forming three distinct subfamilies: class I contains biglycan/PG-I and decorin/PG-II; class II: lumican, fibromodulin, PRELP, keratocan, and osteoadherin; and class III: epiphycan/PG-Lb and osteoglycin or osteoinductive factor. Comparative sequence analysis of the 34 available protein sequences reveals that these proteoglycans have two types of LRRs, which we call S and T. The type S LRR is 21 residues long and has the consensus sequence of xxaPzxLPxxLxxLxLxxNxI. The type T LRR has 26 residues; its consensus sequence is zzxxaxxxxFxxaxxLxxLxLxxNxL. In both "x" indicates variable residue; "z" is frequently a gap; "a" is Val, Leu, or Ile; and I is Ile or Leu. These type S and TLRRs are ordered into two super-motifs--STT with about 73 residues in classes I and II and ST with about 47 residues in class III. The 12 LRRs in the small proteoglycans of I and II are best represented as (STT)4; the seven LRRs of class III as (ST)T(ST)2. Our analyses indicate that classes I/II and III evolved along different paths after the establishment of the precursor ST, and classes I and II also diverged after the establishment of the precursor (STT)4.  相似文献   

7.
Molecular modeling of proteins is confronted with the problem of finding homologous proteins, especially when few identities remain after the process of molecular evolution. Using even the most recent methods based on sequence identity detection, structural relationships are still difficult to establish with high reliability. As protein structures are more conserved than sequences, we investigated the possibility of using protein secondary structure comparison (observed or predicted structures) to discriminate between related and unrelated proteins sequences in the range of 10%-30% sequence identity. Pairwise comparison of secondary structures have been measured using the structural overlap (Sov) parameter. In this article, we show that if the secondary structures likeness is >50%, most of the pairs are structurally related. Taking into account the secondary structures of proteins that have been detected by BLAST, FASTA, or SSEARCH in the noisy region (with high E: value), we show that distantly related protein sequences (even with <20% identity) can be still identified. This strategy can be used to identify three-dimensional templates in homology modeling by finding unexpected related proteins and to select proteins for experimental investigation in a structural genomic approach, as well as for genome annotation.  相似文献   

8.
I Hanukoglu  E Fuchs 《Cell》1982,31(1):243-252
We have determined the DNA sequence of a cloned cDNA that is complementary to the mRNA for the 50 kilodalton (kd) human epidermal keratin. This provides the first amino acid sequence for a cytoskeletal keratin. Comparison of this sequence with those of other keratins reveals an evolutionary relationship between the cytoskeletal and the microfibrillar keratins, but shows no homology to matrix or feather keratins. The 50 kd keratin shares 28%-30% homology with partial sequences of other intermediate filament proteins, which suggests that keratins may be the most distantly related members of this class of fibrous proteins. Our computer analyses predict that the 50 kd keratin contains two long alpha-helical domains separated by a cluster of helix-inhibitory residues in the middle of the protein. These findings indicate that despite major sequence divergence among intermediate filament proteins, they retain sequences compatible with secondary structural features that appear to be common to all of them.  相似文献   

9.
The complete genome of severe acute respiratory syndrome coronavirus (SARS-CoV) reveals the existence of putative proteins unique to SARS-CoV. Identification of their function facilitates a mechanistic understanding of SARS infection and drug development for its treatment. The sequence of the majority of these putative proteins has no significant similarity to those of known proteins, which complicates the task of using sequence analysis tools to probe their function. Support vector machines (SVM), useful for predicting the functional class of distantly related proteins, is employed to ascribe a possible functional class to SARS-CoV proteins. Testing results indicate that SVM is able to predict the functional class of 73% of the known SARS-CoV proteins with available sequences and 67% of 18 other novel viral proteins. A combination of the sequence comparison method BLAST and SVMProt can further improve the prediction accuracy of SMVProt such that the functional class of two additional SARS-CoV proteins is correctly predicted. Our study suggests that the SARS-CoV genome possibly contains a putative voltage-gated ion channel, structural proteins, a carbon-oxygen lyase, oxidoreductases acting on the CH-OH group of donors, and an ATP-binding cassette transporter. A web version of our software, SVMProt, is accessible at http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi .  相似文献   

10.
Sequence alignment is a standard method to infer evolutionary, structural, and functional relationships among sequences. The quality of alignments depends on the substitution matrix used. Here we derive matrices based on superimpositions from protein pairs of similar structure, but of low or no sequence similarity. In a performance test the matrices are compared with 12 other previously published matrices. It is found that the structure-derived matrices are applicable for comparisons of distantly related sequences. We investigate the influence of evolutionary relationships of protein pairs on the alignment accuracy.  相似文献   

11.
Pánek J  Eidhammer I  Aasland R 《Proteins》2005,58(4):923-934
Structural similarity among proteins is reflected in the distribution of hydropathicity along the amino acids in the protein sequence. Similarities in the hydropathy distributions are obvious for homologous proteins within a protein family. They also were observed for proteins with related structures, even when sequence similarities were undetectable. Here we present a novel method that employs the hydropathy distribution in proteins for identification of (sub)families in a set of (homologous) proteins. We represent proteins as points in a generalized hydropathy space, represented by vectors of specifically defined features. The features are derived from hydropathy of the individual amino acids. Projection of this space onto principal axes reveals groups of proteins with related hydropathy distributions. The groups identified correspond well to families of structurally and functionally related proteins. We found that this method accurately identifies protein families in a set of proteins, or subfamilies in a set of homologous proteins. Our results show that protein families can be identified by the analysis of hydropathy distribution, without the need for sequence alignment.  相似文献   

12.
MOTIVATION: Likelihood ratio approximants (LRA) have been widely used for model comparison in statistics. The present study was undertaken in order to explore their utility as a scoring (ranking) function in the classification of protein sequences. RESULTS: We used a simple LRA-based on the maximal similarity (or minimal distance) scores of the two top ranking sequence classes. The scoring methods (Smith-Waterman, BLAST, local alignment kernel and compression based distances) were compared on datasets designed to test sequence similarities between proteins distantly related in terms of structure or evolution. It was found that LRA-based scoring can significantly outperform simple scoring methods.  相似文献   

13.
Neuronal and glial glutamate transporters remove the excitatory neurotransmitter glutamate from the synaptic cleft and thus prevent neurotoxicity. The proteins belong to a large and widespread family of secondary transporters, including bacterial glutamate, serine, and C4-dicarboxylate transporters; mammalian neutral-amino-acid transporters; and an increasing number of bacterial, archaeal, and eukaryotic proteins that have not yet been functionally characterized. Sixty members of the glutamate transporter family were found in the databases on the basis of sequence homology. The amino acid sequences of the carriers have diverged enormously. Homology between the members of the family is most apparent in a stretch of approximately 150 residues in the C-terminal part of the proteins. This region contains four reasonably well-conserved sequence motifs, all of which have been suggested to be part of the translocation pore or substrate binding site. Phylogenetic analysis of the C-terminal stretch revealed the presence of five subfamilies with characterized members: (i) the eukaryotic glutamate transporters, (ii) the bacterial glutamate transporters, (iii) the eukaryotic neutral-amino-acid transporters, (iv) the bacterial C4-dicarboxylate transporters, and (v) the bacterial serine transporters. A number of other subfamilies that do not contain characterized members have been defined. In contrast to their amino acid sequences, the hydropathy profiles of the members of the family are extremely well conserved. Analysis of the hydropathy profiles has suggested that the glutamate transporters have a global structure that is unique among secondary transporters. Experimentally, the unique structure of the transporters was recently confirmed by membrane topology studies. Although there is still controversy about part of the topology, the most likely model predicts the presence of eight membrane-spanning alpha-helices and a loop-pore structure which is unique among secondary transporters but may resemble loop-pores found in ion channels. A second distinctive structural feature is the presence of a highly amphipathic membrane-spanning helix that provides a hydrophilic path through the membrane. Recent data from analysis of site-directed mutants and studies on the mechanism and pharmacology of the transporters are discussed in relation to the structural model.  相似文献   

14.
Structural Features of the Glutamate Transporter Family   总被引:6,自引:0,他引:6       下载免费PDF全文
Neuronal and glial glutamate transporters remove the excitatory neurotransmitter glutamate from the synaptic cleft and thus prevent neurotoxicity. The proteins belong to a large and widespread family of secondary transporters, including bacterial glutamate, serine, and C4-dicarboxylate transporters; mammalian neutral-amino-acid transporters; and an increasing number of bacterial, archaeal, and eukaryotic proteins that have not yet been functionally characterized. Sixty members of the glutamate transporter family were found in the databases on the basis of sequence homology. The amino acid sequences of the carriers have diverged enormously. Homology between the members of the family is most apparent in a stretch of approximately 150 residues in the C-terminal part of the proteins. This region contains four reasonably well-conserved sequence motifs, all of which have been suggested to be part of the translocation pore or substrate binding site. Phylogenetic analysis of the C-terminal stretch revealed the presence of five subfamilies with characterized members: (i) the eukaryotic glutamate transporters, (ii) the bacterial glutamate transporters, (iii) the eukaryotic neutral-amino-acid transporters, (iv) the bacterial C4-dicarboxylate transporters, and (v) the bacterial serine transporters. A number of other subfamilies that do not contain characterized members have been defined. In contrast to their amino acid sequences, the hydropathy profiles of the members of the family are extremely well conserved. Analysis of the hydropathy profiles has suggested that the glutamate transporters have a global structure that is unique among secondary transporters. Experimentally, the unique structure of the transporters was recently confirmed by membrane topology studies. Although there is still controversy about part of the topology, the most likely model predicts the presence of eight membrane-spanning α-helices and a loop-pore structure which is unique among secondary transporters but may resemble loop-pores found in ion channels. A second distinctive structural feature is the presence of a highly amphipathic membrane-spanning helix that provides a hydrophilic path through the membrane. Recent data from analysis of site-directed mutants and studies on the mechanism and pharmacology of the transporters are discussed in relation to the structural model.  相似文献   

15.
Detection of homologous proteins by an intermediate sequence search   总被引:2,自引:0,他引:2  
We developed a variant of the intermediate sequence search method (ISS(new)) for detection and alignment of weakly similar pairs of protein sequences. ISS(new) relates two query sequences by an intermediate sequence that is potentially homologous to both queries. The improvement was achieved by a more robust overlap score for a match between the queries through an intermediate. The approach was benchmarked on a data set of 2369 sequences of known structure with insignificant sequence similarity to each other (BLAST E-value larger than 0.001); 2050 of these sequences had a related structure in the set. ISS(new) performed significantly better than both PSI-BLAST and a previously described intermediate sequence search method. PSI-BLAST could not detect correct homologs for 1619 of the 2369 sequences. In contrast, ISS(new) assigned a correct homolog as the top hit for 121 of these 1619 sequences, while incorrectly assigning homologs for only nine targets; it did not assign homologs for the remainder of the sequences. By estimate, ISS(new) may be able to assign the folds of domains in approximately 29,000 of the approximately 500,000 sequences unassigned by PSI-BLAST, with 90% specificity (1 - false positives fraction). In addition, we show that the 15 alignments with the most significant BLAST E-values include the nearly best alignments constructed by ISS(new).  相似文献   

16.
The gene encoding the 195,000-Da major merozoite surface antigen (gp195) of the FUP (Uganda-Palo Alto) isolate of Plasmodium falciparum, a strain widely used for monkey vaccination experiments, has been cloned and sequenced. The translated amino acid sequence of the FUP gp195 protein is closely related to the sequences of corresponding proteins of the CAMP (Malaysia) and MAD-20 (Papua New Guinea) isolates and more distantly related to those of the Wellcome (West Africa) and K1 (Thailand) isolates, supporting the proposed allelic dimorphism of gp195 within the parasite population. The prevalence of dimorphic sequences within the gp195 protein suggests that many gp195 epitopes would be group-specific. Despite the extensive differences in amino acid sequence between gp195 proteins of these two groups, the hydropathy profiles of proteins representative of both groups are very similar. The conservation of overall secondary structure shown by the hydropathy profile comparison indicates that gp195 proteins of the various P. falciparum isolates are functionally equivalent. This information on the primary structure of the FUP gp195 protein will enable us to evaluate the possible roles of conserved, group-specific and variable epitopes in immunity to the blood stage of the malaria parasite.  相似文献   

17.
SBASE 2.0 is the second release of SBASE, a collection of annotated protein domain sequences. SBASE entries represent various structural, functional, ligand-binding and topogenic segments of proteins [Pongor, S. et al. (1993) Prot. Eng., in press]. This release contains 34,518 entries provided with standardized names and it is cross-referenced to the major protein and nucleic acid databanks as well as to the PROSITE catalog of protein sequence patterns [Bairoch, A. (1992) Nucl. Acids Res., 20 suppl, 2013-2018]. SBASE can be used for establishing domain homologies using different database-search tools such as FASTA [Lipman and Pearson (1985) Science, 227, 1436-1441], FASTDB [Brutlag et al. (1990) Comp. Appl. Biosci., 6, 237-245] or BLAST3 [Altschul and Lipman (1990) Proc. Natl. Acad. Sci. USA, 87, 5509-5513] which is especially useful in the case of loosely defined domain types for which efficient consensus patterns can not be established. SBASE 2.0 and a set of search and retrieval tools are freely available on request to the authors or by anonymous 'ftp' file transfer from mean value of ftp.icgeb.trieste.it.  相似文献   

18.
A method for comparison of protein sequences based on their primary and secondary structure is described. Protein sequences are annotated with predicted secondary structures (using a modified Chou and Fasman method). Two lettered code sequences are generated (Xx, where X is the amino acid and x is its annotated secondary structure). Sequences are compared with a dynamic programming method (STRALIGN) that includes a similarity matrix for both the amino acids and secondary structures. The similarity value for each paired two-lettered code is a linear combination of similarity values for the paired amino acids and their annotated secondary structures. The method has been applied to eight globin proteins (28 pairs) for which the X-ray structure is known. For protein pairs with high primary sequence similarity (greater than 45%), STRALIGN alignment is identical to that obtained by a dynamic programming method using only primary sequence information. However, alignment of protein pairs with lower primary sequence similarity improves significantly with the addition of secondary structure annotation. Alignment of the pair with the least primary sequence similarity of 16% was improved from 0 to 37% 'correct' alignment using this method. In addition, STRALIGN was successfully applied to seven pairs of distantly related cytochrome c proteins, and three pairs of distantly related picornavirus proteins.  相似文献   

19.
We have previously reported that a human nuclear factor, probably corresponding to the USF/MLTF protein [1,2], is able to bind specifically to a DNA sequence present in DNA replicated at the onset of S-phase [3]. Here we demonstrate that the same factor binds also to several other similar sequences, present in eukaryotic and viral genomes. Mutations or methylation in a CpG dinucleotide, central in the palindromic binding site, completely abolish binding. Furthermore, we present evidence for the existence of at least two other nuclear proteins in human cells with the same DNA binding specificity. The data presented suggest a strong evolutionary conservation, among distantly related organisms, of the binding motif, which is probably the target of a number of nuclear factors that share the same DNA binding specificity albeit in the context of different functions.  相似文献   

20.
Small heat shock proteins (sHSPs), as one subclass of molecular chaperones, are important for cells to protect proteins under stress conditions. Unlike the large HSPs (represented by Hsp60 and Hsp70), sHSPs are highly divergent in both primary sequences and oligomeric status, with their evolutionary relationships being unresolved. Here the phylogenetic analysis of a representative 51 sHSPs (covering the six subfamilies: bacterial class A, bacterial class B, archae, fungi, plant, and animal) reveals a close relationship between bacterial class A and animal sHSPs which form an outgroup. Accumulating data indicate that the oligomers from bacterial class A and animal sHSPs appear to exhibit polydispersity, while those from the rest exhibit monodispersity. Together, the close evolutionary relationship and the similarity in oligomeric polydispersity between bacterial class A and animal sHSPs not only suggest a potential evolutionary origin of the latter from the former, but also imply that their oligomeric polydispersity is somehow a property determined by their primary sequences. [Reviewing Editor: Dr. Martin Kreitman]  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号