首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Short motifs are known to play diverse roles in proteins, such as in mediating the interactions with other molecules, binding to membranes, or conducting a specific biological function. Standard approaches currently employed to detect short motifs in proteins search for enrichment of amino acid motifs considering mostly the sequence information. Here, we presented a new approach to search for common motifs (protein signatures) which share both physicochemical and structural properties, looking simultaneously at different features. Our method takes as an input an amino acid sequence and translates it to a new alphabet that reflects its intrinsic structural and chemical properties. Using the MEME search algorithm, we identified the proteins signatures within subsets of protein which encompass common sequence and structural information. We demonstrated that we can detect enriched structural motifs, such as the amphipathic helix, from large datasets of linear sequences, as well as predicting common structural properties (such as disorder, surface accessibility, or secondary structures) of known functional‐motifs. Finally, we applied the method to the yeast protein interactome and identified novel putative interacting motifs. We propose that our approach can be applied for de novo protein function prediction given either sequence or structural information. Proteins 2013; © 2012 Wiley Periodicals, Inc.  相似文献   

2.
Computational methods such as sequence alignment and motif construction are useful in grouping related proteins into families, as well as helping to annotate new proteins of unknown function. These methods identify conserved amino acids in protein sequences, but cannot determine the specific functional or structural roles of conserved amino acids without additional study. In this work, we present 3MATRIX (http://3matrix.stanford.edu) and 3MOTIF (http://3motif.stanford.edu), a web-based sequence motif visualization system that displays sequence motif information in its appropriate three-dimensional (3D) context. This system is flexible in that users can enter sequences, keywords, structures or sequence motifs to generate visualizations. In 3MOTIF, users can search using discrete sequence motifs such as PROSITE patterns, eMOTIFs, or any other regular expression-like motif. Similarly, 3MATRIX accepts an eMATRIX position-specific scoring matrix, or will convert a multiple sequence alignment block into an eMATRIX for visualization. Each query motif is used to search the protein structure database for matches, in which the motif is then visually highlighted in three dimensions. Important properties of motifs such as sequence conservation and solvent accessible surface area are also displayed in the visualizations, using carefully chosen color shading schemes.  相似文献   

3.
We present an algorithm to detect protein sub-structural motifs from primary sequence. The input to the algorithm is a set of aligned multiple protein sequences. It uses wavelet transforms to decompose protein sequences represented numerically by different indices (such as polarity, accessible surface area or electron-ion integration potentials of the amino acids). The numerical representation of a protein sequence has significant correlation with its biological activity, thus common motifs are expected to be observable from the wavelet spectrum. The decomposed signals are then up-sampled and similarity search techniques are used to identify similar regions across all the proteins at multiple scales. Results indicate that wavelet transform techniques are a promising approach for rapid motif detection.  相似文献   

4.
Bacterial biodiversity at the species level, in terms of gene acquisition or loss, is so immense that it raises the question of how essential chromosomal regions are spared from uncontrolled rearrangements. Protection of the genome likely depends on specific DNA motifs that impose limits on the regions that undergo recombination. Although most such motifs remain unidentified, they are theoretically predictable based on their genomic distribution properties. We examined the distribution of the “crossover hotspot instigator,” or Chi, in Escherichia coli, and found that its exceptional distribution is restricted to the core genome common to three strains. We then formulated a set of criteria that were incorporated in a statistical model to search core genomes for motifs potentially involved in genome stability in other species. Our strategy led us to identify and biologically validate two distinct heptamers that possess Chi properties, one in Staphylococcus aureus, and the other in several streptococci. This strategy paves the way for wide-scale discovery of other important functional noncoding motifs that distinguish core genomes from the strain-variable regions.  相似文献   

5.
Loh E  Loeb LA 《DNA Repair》2005,4(12):5921-1398
DNA polymerases of the Family A catalyze the addition of deoxynucleotides to a primer with high efficiency, processivity, and selectivity-properties that are critical to their function both in nature and in the laboratory. These polymerases tolerate many amino acid substitutions, even in regions that are evolutionarily conserved. This tolerance can be exploited to create DNA polymerases with novel properties and altered substrate specificities, using rational design and molecular evolution. These efforts have focused mainly on the Family A DNA polymerises -Taq, E. coli Pol I, and T7 - because they are widely utilized in biotechnology today. The redesign of polymerases often requires knowledge of the function of specific residues in the protein, including those located in six evolutionarily conserved regions. The most well characterized of these are motifs A and B, which regulate the fidelity of replication and the incorporation of nucleotide analogs such as dideoxynucleotides. Regions that remain to be more thoroughly characterized are motif C, which is critical for catalysis, and motifs 1, 2 and 6, all of which bind to DNA primer or template. Several recently identified mutants with abilities to incorporate nucleotides with bulky adducts have mutations that are not located within conserved regions and warrant further study. Analysis of these mutants will help advance our understanding of how DNA polymerases select bases with high fidelity.  相似文献   

6.
7.
Local structural disorder imparts plasticity on linear motifs   总被引:5,自引:0,他引:5  
MOTIVATION: The dynamic nature of protein interaction networks requires fast and transient molecular switches. The underlying recognition motifs (linear motifs, LMs) are usually short and evolutionarily variable segments, which in several cases, such as phosphorylation sites or SH3-binding regions, fall into locally disordered regions. We probed the generality of this phenomenon by predicting the intrinsic disorder of all LM-containing proteins enlisted in the Eukaryotic Linear Motif (ELM) database. RESULTS: We demonstrated that LMs in average are embedded in locally unstructured regions, while their amino acid composition and charge/hydropathy properties exhibit a mixture characteristic of folded and disordered proteins. Overall, LMs are constructed by grafting a few specificity-determining residues favoring structural order on a highly flexible carrier region. These results establish a connection between LMs and molecular recognition elements of intrinsically unstructured proteins (IUPs), which realize a non-conventional mode of partner binding mostly in regulatory functions.  相似文献   

8.
Eight amino acid permease genes from the protozoan parasite Leishmania donovani (AAPLDs) were cloned, sequenced, and shown to be expressed in promastigotes. Seven of these belong to the amino acid transporter-1 and one to the amino acid polyamino-choline superfamilies. Using these sequences as well as known and characterized amino acid permease genes from all kingdoms, a training set was established and used to search for motifs, using the MEME motif discovery tool. This study revealed two motifs that are specific to the genus Leishmania, four to the family trypanosomatidae, and a single motif that is common between trypanosomatidae and mammalian systems A1 and N. Interestingly, most of these motifs are clustered in two regions of 50-60 amino acids. Blast search analyses indicated a close relationship between the L. donovani and Trypanosoma brucei amino acid permeases. The results of this work describe the cloning of the first amino acid permease genes in parasitic protozoa and contribute to the understanding of amino acid permease evolution in these organisms. Furthermore, the identification of genus-specific motifs in these proteins might be useful to better understand parasite physiology within its hosts.  相似文献   

9.
GlobPlot: Exploring protein sequences for globularity and disorder   总被引:2,自引:0,他引:2  
A major challenge in the proteomics and structural genomics era is to predict protein structure and function, including identification of those proteins that are partially or wholly unstructured. Non-globular sequence segments often contain short linear peptide motifs (e.g. SH3-binding sites) which are important for protein function. We present here a new tool for discovery of such unstructured, or disordered regions within proteins. GlobPlot (http://globplot.embl.de) is a web service that allows the user to plot the tendency within the query protein for order/globularity and disorder. We show examples with known proteins where it successfully identifies inter-domain segments containing linear motifs, and also apparently ordered regions that do not contain any recognised domain. GlobPlot may be useful in domain hunting efforts. The plots indicate that instances of known domains may often contain additional N- or C-terminal segments that appear ordered. Thus GlobPlot may be of use in the design of constructs corresponding to globular proteins, as needed for many biochemical studies, particularly structural biology. GlobPlot has a pipeline interface--GlobPipe--for the advanced user to do whole proteome analysis. GlobPlot can also be used as a generic infrastructure package for graphical displaying of any possible propensity.  相似文献   

10.
11.
In this paper, we present an updated classification of the ubiquitous MIP (Major Intrinsic Protein) family proteins, including 153 fully or partially sequenced members available in public databases. Presently, about 30 of these proteins have been functionally characterized, exhibiting essentially two distinct types of channel properties: (1) specific water transport by the aquaporins, and (2) small neutral solutes transport, such as glycerol by the glycerol facilitators. Sequence alignments were used to predict amino acids and motifs discriminant in channel specificity. The protein sequences were also analyzed using statistical tools (comparisons of means and correspondence analysis). Five key positions were clearly identified where the residues are specific for each functional subgroup and exhibit high dissimilar physico-chemical properties. Moreover, we have found that the putative channels for small neutral solutes clearly differ from the aquaporins by the amino acid content and the length of predicted loop regions, suggesting a substrate filter function for these loops. From these results, we propose a signature pattern for water transport.  相似文献   

12.
Discovering structural correlations in alpha-helices.   总被引:5,自引:2,他引:3       下载免费PDF全文
We have developed a new representation for structural and functional motifs in protein sequences based on correlations between pairs of amino acids and applied it to alpha-helical and beta-sheet sequences. Existing probabilistic methods for representing and analyzing protein sequences have traditionally assumed conditional independence of evidence. In other words, amino acids are assumed to have no effect on each other. However, analyses of protein structures have repeatedly demonstrated the importance of interactions between amino acids in conferring both structure and function. Using Bayesian networks, we are able to model the relationships between amino acids at distinct positions in a protein sequence in addition to the amino acid distributions at each position. We have also developed an automated program for discovering sequence correlations using standard statistical tests and validation techniques. In this paper, we test this program on sequences from secondary structure motifs, namely alpha-helices and beta-sheets. In each case, the correlations our program discovers correspond well with known physical and chemical interactions between amino acids in structures. Furthermore, we show that, using different chemical alphabets for the amino acids, we discover structural relationships based on the same chemical principle used in constructing the alphabet. This new representation of 3-dimensional features in protein motifs, such as those arising from structural or functional constraints on the sequence, can be used to improve sequence analysis tools including pattern analysis and database search.  相似文献   

13.
MOTIVATION: Identification of short conserved sequence motifs common to a protein family or superfamily can be more useful than overall sequence similarity in suggesting the function of novel gene products. Locating motifs still requires expert knowledge, as automated methods using stringent criteria may not differentiate subtle similarities from statistical noise. RESULTS: We have developed a novel automatic method, based on patterns of conservation of 237 physical-chemical properties of amino acids in aligned protein sequences, to find related motifs in proteins with little or no overall sequence similarity. As an application, our web-server MASIA identified 12 property-based motifs in the apurinic/apyrimidinic endonuclease (APE) family of DNA-repair enzymes of the DNase-I superfamily. Searching with these motifs located distantly related representatives of the DNase-I superfamily, such as Inositol 5'-polyphosphate phosphatases in the ASTRAL40 database, using a Bayesian scoring function. Other proteins containing APE motifs had no overall sequence or structural similarity. However, all were phosphatases and/or had a metal ion binding active site. Thus our automated method can identify discrete elements in distantly related proteins that define local structure and aspects of function. We anticipate that our method will complement existing ones to functionally annotate novel protein sequences from genomic projects. AVAILABILITY: MASIA WEB site: http://www.scsb.utmb.edu/masia/masia.html SUPPLEMENTARY INFORMATION: The dendrogram of 42 APE sequences used to derive motifs is available on http://www.scsb.utmb.edu/comp_biol.html/DNA_repair/publication.html  相似文献   

14.
A direct involvement of the PreS domain of the hepatitis B virus (HBV) large envelope protein, and in particular amino acid residues 21 to 47, in virus attachment to hepatocytes has been suggested by many previous studies. Several PreS-interacting proteins have been identified. However, they share few common sequence motifs, and a bona fide cellular receptor for HBV remains elusive. In this study, we aimed to identify PreS-interacting motifs and to search for novel HBV-interacting proteins and the long-sought receptor. PreS fusion proteins were used as baits to screen a phage display library of random peptides. A group of PreS-binding peptides were obtained. These peptides could bind to amino acids 21 to 47 of PreS1 and shared a linear motif (W1T2X3W4W5) sufficient for binding specifically to PreS and viral particles. Several human proteins with such a motif were identified through BLAST search. Analysis of their biochemical and structural properties suggested that lipoprotein lipase (LPL), a key enzyme in lipoprotein metabolism, might interact with PreS and HBV particles. The interaction of HBV with LPL was demonstrated by in vitro binding, virus capture, and cell attachment assays. These findings suggest that LPL may play a role in the initiation of HBV infection. Identification of peptides and protein ligands corresponding to LPL that bind to the HBV envelope will offer new therapeutic strategies against HBV infection.  相似文献   

15.
Previously we presented the purification, biochemical characterization, and cloning of a cationic peroxidase isoenzyme (CysPrx) from artichoke (Cynara cardunculus subsp scolymus (L.) Hegi) leaves. The protein was shown to have some interesting properties, suggesting that CysPrx could be a considered as a potential candidate for industrial application. In addition, from the CysPrx sequence, two full-lengh cDNAs: CysPrx1 and CysPrx2, differing for three amino acids, were isolated. A three-dimensional model was predicted from CysPrx1 by homology modeling, using two different computational tools. Herein we discuss the roles of particular amino acid residues and structural motifs or regions of both deduced sequences with the aim to find new understandings between the new plant peroxidase isoenzymes and their physiological substrates. Additionally, the obtained information may lead to new methods for improving the stability of the enzyme in several processes of biotechnological interest for peroxidase applications.  相似文献   

16.
17.
18.
The EMOTIF database is a collection of more than 170 000 highly specific and sensitive protein sequence motifs representing conserved biochemical properties and biological functions. These protein motifs are derived from 7697 sequence alignments in the BLOCKS+ database (released on June 23, 2000) and all 8244 protein sequence alignments in the PRINTS database (version 27.0) using the emotif-maker algorithm developed by Nevill-Manning et al. (Nevill-Manning,C.G., Wu,T.D. and Brutlag,D.L. (1998) Proc. Natl Acad. Sci. USA, 95, 5865-5871; Nevill-Manning,C.G., Sethi,K.S., Wu,T. D. and Brutlag,D.L. (1997) ISMB-97, 5, 202-209). Since the amino acids and the groups of amino acids in these sequence motifs represent critical positions conserved in evolution, search algorithms employing the EMOTIF patterns can identify and classify more widely divergent sequences than methods based on global sequence similarity. The emotif protein pattern database is available at http://motif.stanford.edu/emotif/.  相似文献   

19.
Six putative ATP-binding motifs of SecA protein were altered by oligonucleotide-directed mutagenesis to try to define the ATP-binding regions of this multifunctional protein. The effects of the mutations were analysed by genetic and biochemical assays. The results show that SecA contains two essential ATP-binding domains. One domain is responsible for high-affinity ATP binding and contains motifs AO and BO, located at amino acid residues 102-109 and 198-210, respectively. A second domain is responsible for low-affinity ATP binding and contains motifs A3 and a predicted B motif located at amino acid residues 503-511 and 631-653, respectively. The ATP-binding properties of both domains were essential for SecA-dependent translocation ATPase and in vitro protein translocation activities. The significance of these findings for the mechanism of SecA-dependent protein translocation is discussed.  相似文献   

20.
Statistical methods have been developed for finding local patterns, also called motifs, in multiple protein sequences. The aligned segments may imply functional or structural core regions. However, the existing methods often have difficulties in aligning multiple proteins when sequence residue identities are low (e.g., less than 25%). In this article, we develop a Bayesian model and Markov chain Monte Carlo (MCMC) methods for identifying subtle motifs in protein sequences. Specifically, a motif is defined not only in terms of specific sites characterized by amino acid frequency vectors, but also as a combination of secondary characteristics such as hydrophobicity, polarity, etc. Markov chain Monte Carlo methods are proposed to search for a motif pattern with high posterior probability under the new model. A special MCMC algorithm is developed, involving transitions between state spaces of different dimensions. The proposed methods were supported by a simulated study. It was then tested by two real datasets, including a group of helix-turn-helix proteins, and one set from the CATH Protein Structure Classification Database. Statistical comparisons showed that the new approach worked better than a typical Gibbs sampling approach which is based only on an amino acid model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号