首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The degree of chemical shift similarity for homologous proteins has been determined from a chemical shift database of over 50 proteins representing a variety of families and folds, and spanning a wide range of sequence homologies. After sequence alignment, the similarity of the secondary chemical shifts of C protons was examined as a function of amino acid sequence identity for 37 pairs of structurally homologous proteins. A correlation between sequence identity and secondary chemical shift rmsd was observed. Important insights are provided by examining the sequence identity of homologous proteins versus percentage of secondary chemical shifts that fall within 0.1 and 0.3 ppm thresholds. These results begin to establish practical guidelines for the extent of chemical shift similarity to expect among structurally homologous proteins.  相似文献   

2.
An important task in functional genomics is to cluster homologous proteins, which may share common functions. Annotating proteins of unknown function by transferring annotations from their homologues of known annotations is one of the most efficient ways to predict protein function. In this paper, we use a modularity-based method called CD for grouping together homologous proteins. The method employs a global heuristic search strategy to find the partitioning of the weighted adjacency graph with the largest modularity. The weighted adjacency graph is constructed by the sigmodal transformation of all pairwise sequence similarities between all protein sequences in a given dataset. The method has been extensively tested on several subsets from the superfamily level of the SCOP (Structural Classification of Proteins) database, where some homologous proteins have very low sequence similarity. Compared with a widely used method MCL, we observe that the number of clusters obtained by CD is closer to the number of superfamilies in the dataset, the value of the F-measure given by CD is 10% better than MCL on average, and CD is more tolerant to noise to the sequence similarity. The experiment results indicate that CD is ideally suitable for clustering homologous proteins when sequence similarity is low.  相似文献   

3.
4.
Identification of homologous core structures   总被引:7,自引:0,他引:7  
Matsuo Y  Bryant SH 《Proteins》1999,35(1):70-79
Using a large database of protein structure-structure alignments, we test a new method for distinguishing homologous and "analogous" structural neighbors. The homologous neighbors included in the test set show no detectable sequence similarity, but they may be well superimposed and show functional similarity or other evidence of evolutionary relationship. Analogous neighbors also show no sequence similarity and may be well superimposed, but they have different functions and their structural similarity may be the result of convergent evolution. Confirming results of other analyses, we find that remote homologs and analogs are not well distinguished by measures of pairwise structural similarity, including the percentage of identical residues and root-mean-square (RMS) superposition residual. We show, however, that with structure-structure alignments of analogous neighbors rarely superimpose the particular substructure that is shared among homologous neighbors. We call this characteristic substructure the homologous core structure (HCS), and we show that a cross-validated test for presence of the HCS correctly identifies 75% of remote homologs with a false-positive rate of 16% analogs, significantly better than discrimination by RMS or other measures of pairwise similarity. The HCS describes conservation of spatial structure within a protein family in much the way that a sequence motif describes sequence conservation. We suggest that it may be used in the same way, to identify homologous neighbors at greater evolutionary distance than is possible by pairwise comparison.  相似文献   

5.
Han LY  Cai CZ  Ji ZL  Cao ZW  Cui J  Chen YZ 《Nucleic acids research》2004,32(21):6437-6444
The function of a protein that has no sequence homolog of known function is difficult to assign on the basis of sequence similarity. The same problem may arise for homologous proteins of different functions if one is newly discovered and the other is the only known protein of similar sequence. It is desirable to explore methods that are not based on sequence similarity. One approach is to assign functional family of a protein to provide useful hint about its function. Several groups have employed a statistical learning method, support vector machines (SVMs), for predicting protein functional family directly from sequence irrespective of sequence similarity. These studies showed that SVM prediction accuracy is at a level useful for functional family assignment. But its capability for assignment of distantly related proteins and homologous proteins of different functions has not been critically and adequately assessed. Here SVM is tested for functional family assignment of two groups of enzymes. One consists of 50 enzymes that have no homolog of known function from PSI-BLAST search of protein databases. The other contains eight pairs of homologous enzymes of different families. SVM correctly assigns 72% of the enzymes in the first group and 62% of the enzyme pairs in the second group, suggesting that it is potentially useful for facilitating functional study of novel proteins. A web version of our software, SVMProt, is accessible at http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi.  相似文献   

6.
Replication factor C (RFC) is a five-subunit DNA polymerase accessory protein that functions as a structure-specific, DNA-dependent ATPase. The ATPase function of RFC is activated by proliferating cell nuclear antigen. RFC was originally purified from human cells on the basis of its requirement for simian virus 40 DNA replication in vitro. A functionally homologous protein complex from Saccharomyces cerevisiae, called ScRFC, has been identified. Here we report the cloning, by either peptide sequencing or by sequence similarity to the human cDNAs, of the S. cerevisiae genes RFC1, RFC2, RFC3, RFC4, and RFC5. The amino acid sequences are highly similar to the sequences of the homologous human RFC 140-, 37-, 36-, 40-, and 38-kDa subunits, respectively, and also show amino acid sequence similarity to functionally homologous proteins from Escherichia coli and the phage T4 replication apparatus. All five subunits show conserved regions characteristic of ATP/GTP-binding proteins and also have a significant degree of similarity among each other. We have identified eight segments of conserved amino acid sequences that define a family of related proteins. Despite their high degree of sequence similarity, all five RFC genes are essential for cell proliferation in S. cerevisiae. RFC1 is identical to CDC44, a gene identified as a cell division cycle gene encoding a protein involved in DNA metabolism. CDC44/RFC1 is known to interact genetically with the gene encoding proliferating cell nuclear antigen, confirming previous biochemical evidence of their functional interaction in DNA replication.  相似文献   

7.
Detecting homology of distantly related proteins with consensus sequences   总被引:15,自引:0,他引:15  
A simple protocol is described that is suitable for the detection of distantly related members of a protein family. In this procedure, similarity to a consensus sequence is used to distinguish chance similarity from similarity due to common ancestry. The consensus sequence is constructed from the sequences of established members of a protein family and it incorporates features characteristic of the protein fold of this family: conserved residues, the pattern of variable and conserved segments, preferred location of gaps etc. The database is searched with the consensus sequence, using the unitary matrix or log odds matrix for scoring the alignments, with variable gap penalty. The advantage of the method is that it weights key residues, ignores sequence similarity in variable segments (thus partially eliminating "background noise" coming from chance similarity), distinguishes gaps disrupting conserved segments from those occurring in positions known to be tolerant of gap events. The utility of the method was demonstrated in the case of the protein family homologous with the internal repeats of complement B as well as the internal repeats identified in fibroblast proteoglycan PG40. The consensus sequence method succeeded in finding some new members of these protein families that could not be detected by earlier methods of sequence comparison.  相似文献   

8.
The increasing number of whole genomic sequences of microorganisms has led to the complexity of genome-wide annotation and gene sequence comparison among multiple microorganisms. To address this problem, we have developed nWayComp software that compares DNA and protein sequences of phylogenetically-related microorganisms. This package integrates a series of bioinformatics tools such as BLAST, ClustalW, ALIGN, PHYLIP and PRIMER3 for sequence comparison. It searches for homologous sequences among multiple organisms and identifies genes that are unique to a particular organism. The homologous gene sets are then ranked in the descending order of the sequence similarity. For each set of homologous sequences, a table of sequence identity among homologous genes along with sequence variations such as SNPs and INDELS is developed, and a phylogenetic tree is constructed. In addition, a common set of primers that can amplify all the homologous sequences are generated. The nWayComp package provides users with a quick and convenient tool to compare genomic sequences among multiple organisms at the whole-genome level.  相似文献   

9.
Many protein classification systems capture homologous relationships by grouping domains into families and superfamilies on the basis of sequence similarity. Superfamilies with similar 3D structures are further grouped into folds. In the absence of discernable sequence similarity, these structural similarities were long thought to have originated independently, by convergent evolution. However, the growth of databases and advances in sequence comparison methods have led to the discovery of many distant evolutionary relationships that transcend the boundaries of superfamilies and folds. To investigate the contributions of convergent versus divergent evolution in the origin of protein folds, we clustered representative domains of known structure by their sequence similarity, treating them as point masses in a virtual 2D space which attract or repel each other depending on their pairwise sequence similarities. As expected, families in the same superfamily form tight clusters. But often, superfamilies of the same fold are linked with each other, suggesting that the entire fold evolved from an ancient prototype. Strikingly, some links connect superfamilies with different folds. They arise from modular peptide fragments of between 20 and 40 residues that co‐occur in the connected folds in disparate structural contexts. These may be descendants of an ancestral pool of peptide modules that evolved as cofactors in the RNA world and from which the first folded proteins arose by amplification and recombination. Our galaxy of folds summarizes, in a single image, most known and many yet undescribed homologous relationships between protein superfamilies, providing new insights into the evolution of protein domains.  相似文献   

10.
利用抗病基因的保守结构设计引物,从抗叶锈病近等基因系材料TcLr24中扩增出一条703bp的条带RGAl,通过与GenBank比对,选取与RGAI高度同源的若干条带,在它们共有的保守序列位置设计引物,利用cDNA末端快速扩增(RACE):ffL术扩增抗病同源基因cDNA全长.扩增到3条全长cDNA,经BLASTp比较,这些序列都舍有NBS保守结构域和多个LRR结构域.与很多已知植物抗病基因的功能相应区域一致.对FRGA-1,、FRGA-2和FRGA-3实时定量PCR分析,表明这3个基因在小麦叶片中都是组成型表达.本研究在小麦材料TcLr24中得到3条抗病基因同源cDNA全长,为研究小麦抗病基因奠定了基础.  相似文献   

11.
Twilight zone of protein sequence alignments   总被引:38,自引:0,他引:38  
Sequence alignments unambiguously distinguish between protein pairs of similar and non-similar structure when the pairwise sequence identity is high (>40% for long alignments). The signal gets blurred in the twilight zone of 20-35% sequence identity. Here, more than a million sequence alignments were analysed between protein pairs of known structures to re-define a line distinguishing between true and false positives for low levels of similarity. Four results stood out. (i) The transition from the safe zone of sequence alignment into the twilight zone is described by an explosion of false negatives. More than 95% of all pairs detected in the twilight zone had different structures. More precisely, above a cut-off roughly corresponding to 30% sequence identity, 90% of the pairs were homologous; below 25% less than 10% were. (ii) Whether or not sequence homology implied structural identity depended crucially on the alignment length. For example, if 10 residues were similar in an alignment of length 16 (>60%), structural similarity could not be inferred. (iii) The 'more similar than identical' rule (discarding all pairs for which percentage similarity was lower than percentage identity) reduced false positives significantly. (iv) Using intermediate sequences for finding links between more distant families was almost as successful: pairs were predicted to be homologous when the respective sequence families had proteins in common. All findings are applicable to automatic database searches.  相似文献   

12.
The database PALI (Phylogeny and ALIgnment of homologous protein structures) consists of families of protein domains of known three-dimensional (3D) structure. In a PALI family, every member has been structurally aligned with every other member (pairwise) and also simultaneous superposition (multiple) of all the members has been performed. The database also contains 3D structure-based and structure-dependent sequence similarity-based phylogenetic dendrograms for all the families. The PALI release used in the present analysis comprises 225 families derived largely from the HOMSTRAD and SCOP databases. The quality of the multiple rigid-body structural alignments in PALI was compared with that obtained from COMPARER, which encodes a procedure based on properties and relationships. The alignments from the two procedures agreed very well and variations are seen only in the low sequence similarity cases often in the loop regions. A validation of Direct Pairwise Alignment (DPA) between two proteins is provided by comparing it with Pairwise alignment extracted from Multiple Alignment of all the members in the family (PMA). In general, DPA and PMA are found to vary rarely. The ready availability of pairwise alignments allows the analysis of variations in structural distances as a function of sequence similarities and number of topologically equivalent Calpha atoms. The structural distance metric used in the analysis combines root mean square deviation (r.m.s.d.) and number of equivalences, and is shown to vary similarly to r.m.s.d. The correlation between sequence similarity and structural similarity is poor in pairs with low sequence similarities. A comparison of sequence and 3D structure-based phylogenies for all the families suggests that only a few families have a radical difference in the two kinds of dendrograms. The difference could occur when the sequence similarity among the homologues is low or when the structures are subjected to evolutionary pressure for the retention of function. The PALI database is expected to be useful in furthering our understanding of the relationship between sequences and structures of homologous proteins and their evolution.  相似文献   

13.
The dramatic increase in heterogeneous types of biological data—in particular, the abundance of new protein sequences—requires fast and user-friendly methods for organizing this information in a way that enables functional inference. The most widely used strategy to link sequence or structure to function, homology-based function prediction, relies on the fundamental assumption that sequence or structural similarity implies functional similarity. New tools that extend this approach are still urgently needed to associate sequence data with biological information in ways that accommodate the real complexity of the problem, while being accessible to experimental as well as computational biologists. To address this, we have examined the application of sequence similarity networks for visualizing functional trends across protein superfamilies from the context of sequence similarity. Using three large groups of homologous proteins of varying types of structural and functional diversity—GPCRs and kinases from humans, and the crotonase superfamily of enzymes—we show that overlaying networks with orthogonal information is a powerful approach for observing functional themes and revealing outliers. In comparison to other primary methods, networks provide both a good representation of group-wise sequence similarity relationships and a strong visual and quantitative correlation with phylogenetic trees, while enabling analysis and visualization of much larger sets of sequences than trees or multiple sequence alignments can easily accommodate. We also define important limitations and caveats in the application of these networks. As a broadly accessible and effective tool for the exploration of protein superfamilies, sequence similarity networks show great potential for generating testable hypotheses about protein structure-function relationships.  相似文献   

14.
Hao W 《Gene》2011,481(2):57-64
The evolution of influenza viruses is remarkably dynamic. Influenza viruses evolve rapidly in sequence and undergo frequent reassortment of different gene segments. Homologous recombination, although commonly seen as an important component of dynamic genome evolution in many other organisms, is believed to be rare in influenza. In this study, 256 gene segments from 32 influenza A genomes were examined for homologous recombination, three recombinant H1N1 strains were detected and they most likely resulted from one recombination event between two closely rated parental sequences. These findings suggest that homologous recombination in influenza viruses tends to take place between strains sharing high sequence similarity. The three recombinant strains were isolated at different time periods and they form a clade, indicating that recombinant strains could circulate. In addition, the simulation results showed that many recombinant sequences might not be detectable by currently existing recombinant detection programs when the parental sequences are of high sequence similarity. Finally, possible ways were discussed to improve the accuracy of the detection for recombinant sequences in influenza.  相似文献   

15.
Plantaricin 423 is a class IIa bacteriocin produced by Lactobacillus plantarum isolated from sorghum beer. It has been previously determined that plantaricin 423 is encoded by a plasmid designated pPLA4, which is now completely sequenced. The plantaricin 423 operon shares high sequence similarity with the operons of coagulin, pediocin PA-1, and pediocin AcH, with small differences in the DNA sequence encoding the mature bacteriocin peptide and the immunity protein. Apart from the bacteriocin operon, no significant sequence similarity could be detected between the DNA or translated sequence of pPLA4 and the available DNA or translated sequences of the plasmids encoding pediocin AcH, pediocin PA-1, and coagulin, possibly indicating a different origin. In addition to the bacteriocin operon, sequence analysis of pPLA4 revealed the presence of two open reading frames (ORFs). ORF1 encodes a putative mobilization (Mob) protein that is homologous to the pMV158 superfamily of mobilization proteins. Highest sequence similarity occurred between this protein and the Mob protein of L. plantarum NCDO 1088. ORF2 encodes a putative replication protein that revealed low sequence similarity to replication proteins of plasmids pLME300 from Lactobacillus fermentum and pYIT356 from Lactobacillus casei. The immunity protein of plantaricin 423 contains 109 amino acids. Although plantaricin 423 shares high sequence similarity with the pediocin PA-1 operon, no cross-reactivity was recorded between the immunity proteins of plantaricin 423 and pediocin PA-1.  相似文献   

16.
SRS (Sequence Retrieval System) is a widely used keyword search engine for querying biological databases. BLAST2 is the most widely used tool to query databases by sequence similarity search. These tools allow users to retrieve sequences by shared keyword or by shared similarity, with many public web servers available. However, with the increasingly large datasets available it is now quite common that a user is interested in some subset of homologous sequences but has no efficient way to restrict retrieval to that set. By allowing the user to control SRS from the BLAST output, BLAST2SRS (http://blast2srs.embl.de/) aims to meet this need. This server therefore combines the two ways to search sequence databases: similarity and keyword.  相似文献   

17.
The lysA gene encodes meso-diaminopimelate (DAP) decarboxylase (E.C.4.1.1.20), the last enzyme of the lysine biosynthetic pathway in bacteria. We have determined the nucleotide sequence of the lysA gene from Pseudomonas aeruginosa. Comparison of the deduced amino acid sequence of the lysA gene product revealed extensive similarity with the sequences of the functionally equivalent enzymes from Escherichia coli and Corynebacterium glutamicum. Even though both P. aeruginosa and E. coli are Gram-negative bacteria, sequence comparisons indicate a greater similarity between enzymes of P. aeruginosa and the Gram- positive bacterium C. glutamicum than between those of P. aeruginosa and E. coli enzymes. Comparison of DAP decarboxylase with protein sequences present in data bases revealed that bacterial DAP decarboxylases are homologous to mouse (Mus musculus) ornithine decarboxylase (E.C.4.1.1.17), the key enzyme in polyamine biosynthesis in mammals. On the other hand, no similarity was detected between DAP decarboxylases and other bacterial amino acid decarboxylases.   相似文献   

18.
19.

Background  

While the pairwise alignments produced by sequence similarity searches are a powerful tool for identifying homologous proteins - proteins that share a common ancestor and a similar structure; pairwise sequence alignments often fail to represent accurately the structural alignments inferred from three-dimensional coordinates. Since sequence alignment algorithms produce optimal alignments, the best structural alignments must reflect suboptimal sequence alignment scores. Thus, we have examined a range of suboptimal sequence alignments and a range of scoring parameters to understand better which sequence alignments are likely to be more structurally accurate.  相似文献   

20.
应用逆转录-聚合酶链式反应(RT-PCR)技术从吸血后24 h埃及伊蚊海口株总RNA中扩增出了后期胰蛋白酶编码区cDNA序列。采用自动DNA分析仪进行序列分析,并与已知埃及伊蚊美国株后期胰蛋白酶基因及推导的氨基酸序列进行了同源性比较。结果表明:埃及伊蚊海口株后期胰蛋白酶基因序列与美国株同源性达98%,有11个碱基发生变异;氨基酸同源性达99%,仅有3个氨基酸发生变异,但与催化位点密切相关的氨基酸及N末端氨基酸序列完全一致。以上结果显示,埃及伊蚊胰蛋白酶不同地理株间存在微小的差异。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号