首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
MOTIVATION: Blast programs are very efficient in finding relatively strong similarities but some very distantly related sequences are given a very high Expect value and are ranked very low in Blast results. We have developed Ballast, a program to predict local maximum segments (LMSs-i.e. sequence segments conserved relatively to their flanking regions) from a single Blast database search and to highlight these divergent homologues. The TBlastN database searches can also be processed with the help of information from a joint BlastP search. RESULTS: We have applied the Ballast algorithm to BlastP searches performed with sequences belonging to well described dispersed families (aminoacyl-tRNA synthetases; helicases) against the SwissProt 38 database. We show that Ballast is able to build an appropriate conservation profile and that LMSs are predicted that are consistent with the signatures and motifs described in the literature. Furthermore, by comparing the Blast, PsiBlast and Ballast results obtained on a well defined database of structurally related sequences, we show that the LMSs provide a scoring scheme that can concentrate on top ranking distant homologues better than Blast. Using the graphical user interface available on the Web, specific LMSs may be selected to detect divergent homologues sharing the corresponding properties with the query sequence without requiring any additional database search.  相似文献   

2.
Determination of the structures of fibroblast growth factors and interleukin-1s has previously revealed that they both adopt a beta-trefoil fold, similar to those found in Kunitz soybean trypsin inhibitors, ricin-like toxins, plant agglutinins and hisactophilin. These families possess distinct functions and occur in different subcellular localisations, and they appear to lack significant similarities in their sequences, ligands and modes of ligand binding. We have analysed the significance of sequence identities observed after structure alignment and provide statistical evidence that these beta-trefoil proteins are all homologues, having arisen from a common ancestor. In addition, we have explored the sequence space of all beta-trefoil proteins and have determined that the actin-binding proteins fascins, and other proteins of unknown function, are beta-trefoil family homologues. Unlike other beta-trefoil proteins, the triplicated repeats in each of the four beta-trefoil domains of fascins are significantly similar in sequence. This hints at how the beta-trefoil fold arose from the duplication of an ancestral gene encoding a homotrimeric single-repeat protein. The combined analysis of structure and sequence databases for detecting significant similarities is suggested as a highly sensitive approach to determining the common ancestry of extremely divergent homologues.  相似文献   

3.
Nitric oxide (NO) and nitrous oxide (N2O) are climatically important trace gases that are produced by both nitrifying and denitrifying bacteria. In the denitrification pathway, N2O is produced from nitric oxide (NO) by the enzyme nitric oxide reductase (NOR). The ammonia-oxidizing bacterium Nitrosomonas europaea also possesses a functional nitric oxide reductase, which was shown recently to serve a unique function. In this study, sequences homologous to the large subunit of nitric oxide reductase (norB) were obtained from eight additional strains of ammonia-oxidizing bacteria, including Nitrosomonas and Nitrosococcus species (i.e., both beta- and gamma-Proteobacterial ammonia oxidizers), showing widespread occurrence of a norB homologue in ammonia-oxidizing bacteria. However, despite efforts to detect norB homologues from Nitrosospira strains, sequences have not yet been obtained. Phylogenetic analysis placed nitrifier norB homologues in a subcluster, distinct from denitrifier sequences. The similarities and differences of these sequences highlight the need to understand the variety of metabolisms represented within a "functional group" defined by the presence of a single homologous gene. These results expand the database of norB homologue sequences in nitrifying bacteria.  相似文献   

4.
The database PALI (Phylogeny and ALIgnment of homologous protein structures) consists of families of protein domains of known three-dimensional (3D) structure. In a PALI family, every member has been structurally aligned with every other member (pairwise) and also simultaneous superposition (multiple) of all the members has been performed. The database also contains 3D structure-based and structure-dependent sequence similarity-based phylogenetic dendrograms for all the families. The PALI release used in the present analysis comprises 225 families derived largely from the HOMSTRAD and SCOP databases. The quality of the multiple rigid-body structural alignments in PALI was compared with that obtained from COMPARER, which encodes a procedure based on properties and relationships. The alignments from the two procedures agreed very well and variations are seen only in the low sequence similarity cases often in the loop regions. A validation of Direct Pairwise Alignment (DPA) between two proteins is provided by comparing it with Pairwise alignment extracted from Multiple Alignment of all the members in the family (PMA). In general, DPA and PMA are found to vary rarely. The ready availability of pairwise alignments allows the analysis of variations in structural distances as a function of sequence similarities and number of topologically equivalent Calpha atoms. The structural distance metric used in the analysis combines root mean square deviation (r.m.s.d.) and number of equivalences, and is shown to vary similarly to r.m.s.d. The correlation between sequence similarity and structural similarity is poor in pairs with low sequence similarities. A comparison of sequence and 3D structure-based phylogenies for all the families suggests that only a few families have a radical difference in the two kinds of dendrograms. The difference could occur when the sequence similarity among the homologues is low or when the structures are subjected to evolutionary pressure for the retention of function. The PALI database is expected to be useful in furthering our understanding of the relationship between sequences and structures of homologous proteins and their evolution.  相似文献   

5.
There is currently a gap in knowledge between complexes of known three-dimensional structure and those known from other experimental methods such as affinity purifications or the two-hybrid system. This gap can sometimes be bridged by methods that extrapolate interaction information from one complex structure to homologues of the interacting proteins. To do this, it is important to know if and when proteins of the same type (e.g. family, superfamily or fold) interact in the same way. Here, we study interactions of known structure to address this question. We found all instances within the structural classification of proteins database of the same domain pairs interacting in different complexes, and then compared them with a simple measure (interaction RMSD). When plotted against sequence similarity we find that close homologues (30-40% or higher sequence identity) almost invariably interact the same way. Conversely, similarity only in fold (i.e. without additional evidence for a common ancestor) is only rarely associated with a similarity in interaction. The results suggest that there is a twilight zone of sequence similarity where it is not possible to say whether or not domains will interact similarly. We also discuss the rare instances of fold similarities interacting the same way, and those where obviously homologous proteins interact differently.  相似文献   

6.
7.
The structure of calpain   总被引:16,自引:0,他引:16  
Recent very rapid developments in genome and EST projects have identified an increasing number of gene products homologous to those that were previously identified by other methods. Calpain is no exception. At the time this review is written, 83 genes from 23 living organisms have been identified in the database to encode amino acid sequences showing significant similarities to the protease domain of "conventional" calpain, which was first purified as a homogeneous protein in 1978. Progress in genome/EST projects has occurred so quickly that there seems to be some confusion as to the identity of each calpain molecule. This review will attempt to clarify all calpain homologues, to describe the common and differing features of calpain homologues in terms of structure-function relationship, and to discuss the evolutionary process of calpain.  相似文献   

8.
9.
The detection of local structural patterns in proteins (e.g. active sites) can provide insights into protein function in the absence of sequence or fold similarity. Methods to detect such similarities are key during structural annotation, for example with results from Structural Genomics initiatives. PINTS (Patterns in Non-homologous Tertiary Structures, http://pints.embl.de) performs database searches for such patterns and most importantly provides a measure of statistical significance for any similarity uncovered. To aid functional annotation of proteins, we allow comparisons of pre-defined patterns against databases of complete structures and of entire structures to databases of particular residues likely to be functionally important.  相似文献   

10.
Koike R  Kinoshita K  Kidera A 《Proteins》2007,66(3):655-663
Dynamic programming (DP) and its heuristic algorithms are the most fundamental methods for similarity searches of amino acid sequences. Their detection power has been improved by including supplemental information, such as homologous sequences in the profile method. Here, we describe a method, probabilistic alignment (PA), that gives improved detection power, but similarly to the original DP, uses only a pair of amino acid sequences. Receiver operating characteristic (ROC) analysis demonstrated that the PA method is far superior to BLAST, and that its sensitivity and selectivity approach to those of PSI-BLAST. Particularly for orphan proteins having few homologues in the database, PA exhibits much better performance than PSI-BLAST. On the basis of this observation, we applied the PA method to a homology search of two orphan proteins, Latexin and Resuscitation-promoting factor domain. Their molecular functions have been described based on structural similarities, but sequence homologues have not been identified by PSI-BLAST. PA successfully detected sequence homologues for the two proteins and confirmed that the observed structural similarities are the result of an evolutional relationship.  相似文献   

11.
The presence of sequence homologues and the availability of structural information of proteins enable better understanding of the biological function of a protein family. A majority of entries in protein structural databank are single member superfamilies for which it is hard to derive motifs due to the paucity of structural homologues. Important conserved segments for these superfamilies have been identified and compiled into a database, SSToSS (Sequence Structural Templates of Single member Superfamily). Conserved regions, recognized by permitted amino acid exchanges, are mapped on the structure and various structural features (solvent accessibility, secondary structure content, hydrogen bonding and residue packing) are examined. These conserved segments with high structural feature content are projected as sequence-structural templates for the particular superfamily member. Interactive three-dimensional displays of the templates in three-dimensional structure (in Chime and RASMOL) are provided for better understanding and visualization. In SSToSS database, we also provide the application of sequence-structural templates in three different areas: multiple-motif based sequence search, multiple sequence alignment and homology modeling. In each case, the inclusion of the sequence-structural templates can give rise to sensitive and accurate results. This enables the inclusion of singletons to provide added value to the recognition of additional members, comparative modeling and in designing experiments.  相似文献   

12.
13.
A method for assigning functions to unknown sequences based on finding correlations between short signals and functional annotations in a protein database is presented. This approach is based on keyword (KW) and feature (FT) information stored in the SWISS-PROT database. The former refers to particular protein characteristics and the latter locates these characteristics at a specific sequence position. In this way, a certain keyword is only assigned to a sequence if sequence similarity is found in the position described by the FT field. Exhaustive tests performed over sequences with homologues (cluster set) and without homologues (singleton set) in the database show that assigning functions is much 'cleaner' when information about domains (FT field) is used, than when only the keywords are used.  相似文献   

14.
15.
Phosphatidylcholine-specific phospholipase D (PLD) enzymes catalyze hydrolysis of phospholipid phosphodiester bonds, and also transphosphatidylation of phospholipids to acceptor alcohols. Bacterial and plant PLD enzymes have not been shown previously to be homologues or to be homologous to any other protein. Here we show, using sequence analysis methods, that bacterial and plant PLDs show significant sequence similarities both to each other, and to two other classes of phospholipid-specific enzymes, bacterial cardiolipin synthases, and eukaryotic and bacterial phosphatidylserine synthases, indicating that these enzymes form an homologous family. This family is suggested also to include two Poxviridae proteins of unknown function (p37K and protein K4), a bacterial endonuclease (nuc), an Escherichia coli putative protein (o338) containing an N-terminal domain showing similarities with helicase motifs V and VI, and a Synechocystis sp. putative protein with a C-terminal domain likely to possess a DNA-binding function. Surprisingly, four regions of sequence similarity that occur once in nuc and o338, appear twice in all other homologues, indicating that the latter molecules are bi-lobed, having evolved from an ancestor or ancestors that underwent a gene duplication and fusion event. It is suggested that, for each of these enzymes, conserved histidine, lysine, aspartic acid, and/or asparagine residues may be involved in a two-step ping pong mechanism involving an enzyme-substrate intermediate.  相似文献   

16.
Gap junctional proteins of animals: the innexin/pannexin superfamily   总被引:2,自引:0,他引:2  
There has been some controversy as to whether vertebrate pannexins are related to invertebrate innexins. Using statistical, topological and conserved sequence motif analyses, we establish that these proteins belong to a single superfamily. We also demonstrate the occurrence of large homologues with C-terminal proline-rich domains that may have arisen by gene fusion events. Phylogenetic analyses reveal the orthologous and paralogous relationships of these homologues to each other. We show that different sets of orthologous paralogues underwent sequence divergence at markedly different rates, suggesting differential pressures through evolutionary time promoting or restricting sequence divergence. We further show that the first 2 TMS-containing halves of these homologues underwent sequence divergence more slowly than the second 2 TMS-containing halves and analyze these differences. These bioinformatic analyses should serve as useful guides for future studies of structure, function and evolutionary aspects of this important superfamily.  相似文献   

17.
A long standing goal in protein structure studies is the development of reliable energy functions that can be used both to verify protein models derived from experimental constraints as well as for theoretical protein folding and inverse folding computer experiments. In that respect, knowledge-based statistical pair potentials have attracted considerable interests recently mainly because they include the essential features of protein structures as well as solvent effects at a low computing cost. However, the basis on which statistical potentials are derived have been questioned. In this paper, we investigate statistical pair potentials derived from protein three-dimensional structures, addressing in particular questions related to the form of these potentials, as well as to the content of the database from which they are derived. We have shown that statistical pair potentials depend on the size of the proteins included in the database, and that this dependence can be reduced by considering only pairs of residue close in space (i.e., with a cutoff of 8 Å). We have shown also that statistical potentials carry a memory of the quality of the database in terms of the amount and diversity of secondary structure it contains. We find, for example, that potentials derived from a database containing α-proteins will only perform best on α-proteins in fold recognition computer experiments. We believe that this is an overall weakness of these potentials, which must be kept in mind when constructing a database. Proteins 31:139–149, 1998. © 1998 Wiley-Liss, Inc.  相似文献   

18.
The HSSP database of protein structure-sequence alignments.   总被引:4,自引:0,他引:4       下载免费PDF全文
HSSP is a derived database merging structural (3-D) and sequence (1-D) information. For each protein of known 3-D structure from the Protein Data Bank (PDB), the database has a multiple sequence alignment of all available homologues and a sequence profile characteristic of the family. The list of homologues is the result of a database search in SwissProt using a position-weighted dynamic programming method for sequence profile alignment (MaxHom). The database is updated frequently. The listed homologues are very likely to have the same 3-D structure as the PDB protein to which they have been aligned. As a result, the database is not only a database of aligned sequence families, but also a database of implied secondary and tertiary structures covering 29% of all SwissProt-stored sequences.  相似文献   

19.
The HSSP database of protein structure-sequence alignments.   总被引:2,自引:0,他引:2       下载免费PDF全文
HSSP is a derived database merging structural three dimensional (3-D) and sequence one dimensional(1-D) information. For each protein of known 3-D structure from the Protein Data Bank (PDB), the database has a multiple sequence alignment of all available homologues and a sequence profile characteristic of the family. The list of homologues is the result of a database search in Swissprot using a position-weighted dynamic programming method for sequence profile alignment (MaxHom). The database is updated frequently. The listed homologues are very likely to have the same 3-D structure as the PDB protein to which they have been aligned. As a result, the database is not only a database of aligned sequence families, but also a database of implied secondary and tertiary structures covering 27% of all Swissprot-stored sequences.  相似文献   

20.
We discuss the statistical significance of local similarities found between DNA sequences, and illustrate the procedure with reference to the Queen and Korn algorithm. If the longest similarity found for two sequences has length L, this length is said to be significant at the 5% level if there is a probability of no more than 0.05 of finding a length of L or greater between a pair of sequences consisting of randomly chosen bases with the same overall base frequencies. The distribution of longest lengths is related to that of lengths from any particular pair of starting positions on the two sequences. For our implementation of the Queen and Korn algorithm, this latter distribution is constructed by combining the five different blocks of bases that may be added to extend a similarity. A table is given to assess the significance of longest similarities in sequences of length up to 1000 bases. Quite long similarities are expected to occur by chance alone. The critical values we calculate for assessing significance are preferable to expected numbers of similarities used by some commercial computer packages.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号