首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Computational methods such as sequence alignment and motif construction are useful in grouping related proteins into families, as well as helping to annotate new proteins of unknown function. These methods identify conserved amino acids in protein sequences, but cannot determine the specific functional or structural roles of conserved amino acids without additional study. In this work, we present 3MATRIX (http://3matrix.stanford.edu) and 3MOTIF (http://3motif.stanford.edu), a web-based sequence motif visualization system that displays sequence motif information in its appropriate three-dimensional (3D) context. This system is flexible in that users can enter sequences, keywords, structures or sequence motifs to generate visualizations. In 3MOTIF, users can search using discrete sequence motifs such as PROSITE patterns, eMOTIFs, or any other regular expression-like motif. Similarly, 3MATRIX accepts an eMATRIX position-specific scoring matrix, or will convert a multiple sequence alignment block into an eMATRIX for visualization. Each query motif is used to search the protein structure database for matches, in which the motif is then visually highlighted in three dimensions. Important properties of motifs such as sequence conservation and solvent accessible surface area are also displayed in the visualizations, using carefully chosen color shading schemes.  相似文献   

2.
La D  Silver M  Edgar RC  Livesay DR 《Biochemistry》2003,42(30):8988-8998
Protein motifs represent highly conserved regions within protein families and are generally accepted to describe critical regions required for protein stability and/or function. In this comprehensive analysis, we present a robust, unique approach to identify and compare corresponding mesophilic and thermophilic sequence motifs between all orthologous proteins within 44 microbial genomes. Motif similarity is determined through global sequence alignment of mesophilic and thermophilic motif pairs, which are identified by a greedy algorithm. Our results reveal only modest correlation between motif and overall sequence similarity, highlighting the rationale of motif-based approaches in comprehensive multigenome comparisons. Conserved mutations reflect previously suggested physiochemical principles for conferring thermostability. Additionally, comparisons between corresponding mesophilic and thermophilic motif pairs provide key biochemical insights related to thermostability and can be used to test the evolutionary robustness of individual structural comparisons. We demonstrate the ability of our unique approach to provide key insights in two examples: the TATA-box binding protein and glutamate dehydrogenase families. In the latter example, conserved mutations hint at novel origins leading to structural stability differences within the hexamer structures. Additionally, we present amino acid composition data and average protein length comparisons for all 44 microbial genomes.  相似文献   

3.
The sequences of related proteins show the alternance of conserved and variable regions. This fact is generally seen as a reverberation of 3 D constraints onto 1 D structures. Although the exact meaning of such constraints remains elusive, conserved regions can be extracted from protein chains and used to align them. We developed a program that efficiently performs this task. The program constructs symbolic motifs fitting a target subsequence present in every chain without requiring any insertion or deletion. However, a motif can be obliterated by substitutions when it is found in a sequence. The motifs formally consist in aminoacid symbols separated (and virtually preceded and followed) by a variable number of wild-card symbols. A wild-card, which can match any aminoacid of the chains (with no increment of score), represents a variable site within conserved regions. Different motifs are progressively built by substituting a wild-card with an aminoacid symbol within or beside preexisting motifs. Only those motifs showing an outstanding association of high matching score over all chains, and of low deviation between extreme scores over individual chains are selected for making the next generation. Starting with a null motif, the construction ends when no new aminoacid can be introduced into the current motifs. A surviving motif is then considered valid if it maps without ambiguity a unique region in every sequence, and the motif with highest score is finally selected. The construction of new motifs is then reinitated for the left and right parts of the sequences, after these have been split by the previously selected motif.(ABSTRACT TRUNCATED AT 250 WORDS)  相似文献   

4.
An automatic procedure is proposed to identify, from the protein sequence database, conserved amino acid patterns (or sequence motifs) that are exclusive to a group of functionally related proteins. This procedure is applied to the PIR database and a dictionary of sequence motifs that relate to specific superfamilies constructed. The motifs have a practical relevance in identifying the membership of specific superfamilies without the need to perform sequence database searches in 20% of newly determined sequences. The sequence motifs identified represent functionally important sites on protein molecules. When multiple blocks exist in a single motif they are often close together in the 3-D structure. Furthermore, occasionally these motif blocks were found to be split by introns when the correlation with exon structures was examined.  相似文献   

5.
Since our characterization of the slit cDNA sequence, encoding a protein secreted by glial cells and involved in the formation of axonal pathways in Drosophila, we have discovered that the protein contains two additional sequence motifs that are highly conserved in a variety of proteins. A search of the GenPept database with the 73 amino acids at the carboxy terminus of slit revealed that this region contains significant similarity to a carboxy-terminal domain found in six other exported proteins. This observation has allowed us to define a new carboxy-terminal protein motif. In addition, comparisons with a 202 amino acid domain residing between epidermal growth factor (EGF) repeats in slit shows this region to be conserved in laminin, agrin and perlecan and, strikingly, also to lie between EGF repeats in both agrin and perlecan. Our analysis suggests this motif is involved in mediating interactions among extracellular proteins. Consistent with our previous characterization of the slit protein, both new motifs are found only in extracellular proteins. The identification of these two conserved motifs in slit reveals that the entire 1469 amino acids of the protein are made up of modular regions similar to those conserved in other extracellular proteins.  相似文献   

6.
The EMOTIF database is a collection of more than 170 000 highly specific and sensitive protein sequence motifs representing conserved biochemical properties and biological functions. These protein motifs are derived from 7697 sequence alignments in the BLOCKS+ database (released on June 23, 2000) and all 8244 protein sequence alignments in the PRINTS database (version 27.0) using the emotif-maker algorithm developed by Nevill-Manning et al. (Nevill-Manning,C.G., Wu,T.D. and Brutlag,D.L. (1998) Proc. Natl Acad. Sci. USA, 95, 5865-5871; Nevill-Manning,C.G., Sethi,K.S., Wu,T. D. and Brutlag,D.L. (1997) ISMB-97, 5, 202-209). Since the amino acids and the groups of amino acids in these sequence motifs represent critical positions conserved in evolution, search algorithms employing the EMOTIF patterns can identify and classify more widely divergent sequences than methods based on global sequence similarity. The emotif protein pattern database is available at http://motif.stanford.edu/emotif/.  相似文献   

7.
The Cdc6 protein is required to load a complex of Mcm2-7 family members (the MCM complex) into prereplicative complexes at budding yeast origins of DNA replication. Cdc6p is a member of the AAA(+) superfamily of proteins, which includes the prokaryotic and eukaryotic clamp loading proteins. These proteins share a number of conserved regions of homology and a common three-dimensional architecture. Two of the conserved sequence motifs are the Walker A and B motifs that are involved in nucleotide metabolism and are essential for Cdc6p function in vivo. Here, we analyse mutants in the other conserved sequence motifs. Several of these mutants are temperature-sensitive for growth and are unable to recruit the MCM complex to chromatin at the restrictive temperature. In one such temperature-sensitive mutant, a highly conserved asparagine residue in the sensor I motif was changed to alanine. Overexpression of this mutant protein is lethal. This phenotype is very similar to the phenotype previously described for a mutation in the Walker B motif, suggesting a common role for sensor I and the Walker B motif in Cdc6 function.  相似文献   

8.
DAtA: database of Arabidopsis thaliana annotation   总被引:1,自引:0,他引:1       下载免费PDF全文
The Database of Arabidopsis thaliana Annotation (D At A) was created to enable easy access to and analysis of all the Arabidopsis genome project annotation. The database was constructed using the completed A.thaliana genomic sequence data currently in GenBank. An automated annotation process was used to predict coding sequences for GenBank records that do not include annotation. D At A also contains protein motifs and protein similarities derived from searches of the proteins in D At A with motif databases and the non-redundant protein database. The database is routinely updated to include new GenBank submissions for Arabidopsis genomic sequences and new Blast and protein motif search results. A web interface to D At A allows coding sequences to be searched by name, comment, blast similarity or motif field. In addition, browse options present lists of either all the protein names or identified motifs present in the sequenced A.thaliana genome. The database can be accessed at http://baggage. stanford.edu/group/arabprotein/  相似文献   

9.
Structural genomics is the idea of covering protein space so that every protein sequence comes within model building distance of a protein of known structure. Unfortunately, reproducing the structural alignment of distantly related proteins is a difficult challenge to existing sequence alignment and motif search software. We have developed a new transitive alignment algorithm (MaxFlow), which generates accurate alignments between proteins deep in the twilight zone of sequence similarity, below 20% sequence identity. In particular, MaxFlow reliably identifies conserved core motifs between proteins which are only indirect PSI-Blast neighbours. Based on MaxFlow alignments, useful 3D models can be generated for all members of a superfamily from as few as a single structural template – despite hundreds of representatives at 40% sequence identity level and patchy detection of homology by PSI-Blast. We propose novel strategies for target prioritization using MaxFlow scores to predict the optimal templates in a superfamily. Our results support an increase in the granularity of covering protein space that has potentially enormous economic implications for planning the transition to the full production phase of structural genomics.  相似文献   

10.
The arenavirus L protein has the characteristic sequence motifs conserved among the RNA-dependent RNA polymerase L proteins of negative-strand (NS) RNA viruses. Studies based on the use of reverse-genetics approaches have provided direct experimental evidence of the key role played by the arenavirus L protein in viral-RNA synthesis. Sequence alignment shows six conserved domains among L proteins of NS RNA viruses. The proposed polymerase module of L is located within its domain III, which contains highly conserved amino acids within motifs designated A and C. We have examined the role of these conserved residues in the polymerase activity of the L protein of the prototypic arenavirus, lymphocytic choriomeningitis virus (LCMV), in vivo using a minigenome rescue assay. We show here that the presence of sequence SDD, a characteristic of motif C of segmented NS RNA viruses, as well as the presence of the highly conserved D residue within motif A of L proteins, is strictly required for the polymerase activity of the LCMV L protein. The strong dominant negative phenotype associated with many of the mutants examined and results from coimmunoprecipitation studies provided genetic and biochemical evidence, respectively, for the requirement of the L-L interaction for the polymerase activity of the LCMV L protein.  相似文献   

11.
La D  Sutch B  Livesay DR 《Proteins》2005,58(2):309-320
In this report, we demonstrate that phylogenetic motifs, sequence regions conserving the overall familial phylogeny, represent a promising approach to protein functional site prediction. Across our structurally and functionally heterogeneous data set, phylogenetic motifs consistently correspond to functional sites defined by both surface loops and active site clefts. Additionally, the partially buried prosthetic group regions of cytochrome P450 and succinate dehydrogenase are identified as phylogenetic motifs. In nearly all instances, phylogenetic motifs are structurally clustered, despite little overall sequence proximity, around key functional site features. Based on calculated false-positive expectations and standard motif identification methods, we show that phylogenetic motifs are generally conserved in sequence. This result implies that they can be considered motifs in the traditional sense as well. However, there are instances where phylogenetic motifs are not (overall) well conserved in sequence. This point is enticing, because it implies that phylogenetic motifs are able to identify key sequence regions that traditional motif-based approaches would not. Further, phylogenetic motif results are also shown to be consistent with evolutionary trace results, and bootstrapping is used to demonstrate tree significance.  相似文献   

12.
The analysis of atomic-resolution RNA three-dimensional (3D) structures reveals that many internal and hairpin loops are modular, recurrent, and structured by conserved non-Watson–Crick base pairs. Structurally similar loops define RNA 3D motifs that are conserved in homologous RNA molecules, but can also occur at nonhomologous sites in diverse RNAs, and which often vary in sequence. To further our understanding of RNA motif structure and sequence variability and to provide a useful resource for structure modeling and prediction, we present a new method for automated classification of internal and hairpin loop RNA 3D motifs and a new online database called the RNA 3D Motif Atlas. To classify the motif instances, a representative set of internal and hairpin loops is automatically extracted from a nonredundant list of RNA-containing PDB files. Their structures are compared geometrically, all-against-all, using the FR3D program suite. The loops are clustered into motif groups, taking into account geometric similarity and structural annotations and making allowance for a variable number of bulged bases. The automated procedure that we have implemented identifies all hairpin and internal loop motifs previously described in the literature. All motif instances and motif groups are assigned unique and stable identifiers and are made available in the RNA 3D Motif Atlas (http://rna.bgsu.edu/motifs), which is automatically updated every four weeks. The RNA 3D Motif Atlas provides an interactive user interface for exploring motif diversity and tools for programmatic data access.  相似文献   

13.
Mycobacterium tuberculosis H37Rv is an intracellular pathogen responsible for causing tuberculosis in humans. The M. tuberculosis genome has been shown to contain a very large and unique family of PE proteins made of two sub-families: PE-only and PE_PGRS proteins. These two subtypes of proteins play a crucial role in the pathogenesis of the microbe. However, despite numerous investigations, the role of these proteins in disease development remains obscure. In this study, sequence analysis with a search for short conserved motifs revealed a conserved tetra-peptide motif DEVS/DXXS at the PE domain of almost every PE-only and PE_PGRS protein. The motif was found at a distance of 43–46 amino acids from the N-terminal of PE_PGRS proteins, and at a distance of between 35 and 82 amino acids of the PE-only proteins. As phosphorylation of the serine residue of this tetra-peptide could yield a motif similar to the caspase-3 binding recognition sequence DEVD/E, the region from a representative PE_PGRS protein (PE_PGRS45) was docked to human caspase-3. Strong interactions of only the protein containing the phosphorylated motif (DEVpS/DXXpS) to caspase-3 were observed. This suggested that the conserved DEVS/DXXS motif could have evolved for phosphorylation and subsequent recognition by caspase-3. These findings have important implications in unravelling the role of these PE proteins in mycobacterial infection.  相似文献   

14.
Lipocalins are β-barrel proteins, which share three conserved motifs in their amino acid sequence. In this study, we identified by a peptide mapping approach, a seven-amino acid sequence related to one of these motifs (motif 2) that modulates cell survival. A synthetic peptide based on an insect lipocalin displayed cytoprotective activity in serum-deprived endothelial cells and leucocytes. This activity was dependent on nitric oxide synthase. This sequence was found within several lipocalins, including apolipoprotein D, retinol binding protein, lipocalin-type prostaglandin D synthase, and many unknown proteins, suggesting that it is a sequence signature and a lipocalin conserved property.  相似文献   

15.
VISTAS is a suite of programs for protein sequence and structure analysis. The system allows the simultaneous display, in separate windows, of multiple sequence alignments, of known or model 3D structures, and of 2D graphic representations of sequence and/or alignment properties. The displays are fully integrated, and therefore manipulations in one window can be reflected in each of the others. Beyond its display facilities, VISTAS brings together a number of existing tools under a single, user-friendly umbrella: these include a fully functional interactive color alignment procedure, conserved motif selection, a range of database-scanning routines, and interactive access to the OWL composite sequence database and to the PRINTS protein fingerprint database. Exploration of the sequence database is thus straightforward, and predefined structural motifs from the fingerprint database may be readily visualized. Of particular note is the ability to calculate conservation criteria from sequence alignments and to display the information in a 3D context: this renders VISTAS a powerful tool for aiding mutagenesis studies and for facilitating refinement of molecular models.  相似文献   

16.
17.
We present the development of a web server, a protein short motif search tool that allows users to simultaneously search for a protein sequence motif and its secondary structure assignments. The web server is able to query very short motifs searches against PDB structural data from the RCSB Protein Databank, with the users defining the type of secondary structures of the amino acids in the sequence motif. The output utilises 3D visualisation ability that highlights the position of the motif in the structure and on the corresponding sequence. Researchers can easily observe the locations and conformation of multiple motifs among the results. Protein short motif search also has an application programming interface (API) for interfacing with other bioinformatics tools. AVAILABILITY: The database is available for free at http://birg3.fbb.utm.my/proteinsms.  相似文献   

18.
Motif3D is a web-based protein structure viewer designed to allow sequence motifs, and in particular those contained in the fingerprints of the PRINTS database, to be visualised on three-dimensional (3D) structures. Additional functionality is provided for the rhodopsin-like G protein-coupled receptors, enabling fingerprint motifs of any of the receptors in this family to be mapped onto the single structure available, that of bovine rhodopsin. Motif3D can be used via the web interface available at: http://www.bioinf.man.ac.uk/dbbrowser/motif3d/motif3d.html.  相似文献   

19.
An  J.  Wako  H.  Sarai  A. 《Molecular Biology》2001,35(6):905-910
An amino acid sequence pattern conserved among a family of proteins is called motif. It is usually related to the specific function of the family. On the other hand, functions of proteins are realized through their 3D structures. Specific local structures, called structural motifs, are considered as related to their functions. However, searching for common structural motifs in different proteins is much more difficult than for common sequence motifs. We are attempting in this study to convert the information about the structural motifs into a set of one-dimensional digital strings, i.e., a set of codes, to compare them more easily by computer and to investigate their relationship to functions more quantitatively. By applying the Delaunay tessellation to a 3D structure of a protein, we can assign each local structure to a unique code that is defined so as to reflect its structural feature. Since a structural motif is defined as a set of the local structures in this paper, the structural motif is represented by a set of the codes. In order to examine the ability of the set of the codes to distinguish differences among the sets of local structures with a given PROSITE pattern that contain both true and false positives, we clustered them by introducing a similarity measure among the set of the codes. The obtained clustering shows a good agreement with other results by direct structural comparison methods such as a superposition method. The structural motifs in homologous proteins are also properly clustered according to their sources. These results suggest that the structural motifs can be well characterized by these sets of the codes, and that the method can be utilized in comparing structural motifs and relating them with function.  相似文献   

20.
PRINTS--a database of protein motif fingerprints.   总被引:4,自引:1,他引:3       下载免费PDF全文
PRINTS is a compendium of protein motif 'fingerprints'. A fingerprint is defined as a group of motifs excised from conserved regions of a sequence alignment, whose diagnostic power or potency is refined by iterative databasescanning (in this case the OWL composite sequence database). Generally, the motifs do not overlap, but are separated along a sequence, though they may be contiguous in 3D-space. The use of groups of independent, linearly- or spatially-distinct motifs allows protein folds and functionalities to be characterised more flexibly and powerfully than conventional single-component patterns or regular expressions. The current version of the database contains 200 entries (encoding 950 motifs), covering a wide range of globular and membrane proteins, modular polypeptides, and so on. The growth of the databaseis influenced by a number of factors; e.g. the use of multiple motifs; the maximisation of sequence information through iterative database scanning; and the fact that the database searched is a large composite. The information contained within PRINTS is distinct from, but complementary to the consensus expressions stored in the widely-used PROSITE dictionary of patterns.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号