首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Scorpion toxins are important physiological probes for characterizing ion channels. Molecular databases have limited functional annotation of scorpion toxins. Their function can be inferred by searching for conserved motifs in sequence signature databases that are derived statistically but are not necessarily biologically relevant. Mutation studies provide biological information on residues and positions important for structure-function relationship but are not normally used for extraction of binding motifs. 3D structure analyses also aid in the extraction of peptide motifs in which non-contiguous residues are clustered spatially. Here we present new, functionally relevant peptide motifs for ion channels, derived from the analyses of scorpion toxin native and mutant peptides.  相似文献   

2.
SPLASH: structural pattern localization analysis by sequential histograms   总被引:6,自引:0,他引:6  
MOTIVATION: The discovery of sparse amino acid patterns that match repeatedly in a set of protein sequences is an important problem in computational biology. Statistically significant patterns, that is patterns that occur more frequently than expected, may identify regions that have been preserved by evolution and which may therefore play a key functional or structural role. Sparseness can be important because a handful of non-contiguous residues may play a key role, while others, in between, may be changed without significant loss of function or structure. Similar arguments may be applied to conserved DNA patterns. Available sparse pattern discovery algorithms are either inefficient or impose limitations on the type of patterns that can be discovered. RESULTS: This paper introduces a deterministic pattern discovery algorithm, called Splash, which can find sparse amino or nucleic acid patterns matching identically or similarly in a set of protein or DNA sequences. Sparse patterns of any length, up to the size of the input sequence, can be discovered without significant loss in performances. Splash is extremely efficient and embarrassingly parallel by nature. Large databases, such as a complete genome or the non-redundant SWISS-PROT database can be processed in a few hours on a typical workstation. Alternatively, a protein family or superfamily, with low overall homology, can be analyzed to discover common functional or structural signatures. Some examples of biologically interesting motifs discovered by Splash are reported for the histone I and for the G-Protein Coupled Receptor families. Due to its efficiency, Splash can be used to systematically and exhaustively identify conserved regions in protein family sets. These can then be used to build accurate and sensitive PSSM or HMM models for sequence analysis. AVAILABILITY: Splash is available to non-commercial research centers upon request, conditional on the signing of a test field agreement. CONTACT: acal@us.ibm.com, Splash main page http://www.research.ibm.com/splash  相似文献   

3.
MOTIVATION: Blast programs are very efficient in finding relatively strong similarities but some very distantly related sequences are given a very high Expect value and are ranked very low in Blast results. We have developed Ballast, a program to predict local maximum segments (LMSs-i.e. sequence segments conserved relatively to their flanking regions) from a single Blast database search and to highlight these divergent homologues. The TBlastN database searches can also be processed with the help of information from a joint BlastP search. RESULTS: We have applied the Ballast algorithm to BlastP searches performed with sequences belonging to well described dispersed families (aminoacyl-tRNA synthetases; helicases) against the SwissProt 38 database. We show that Ballast is able to build an appropriate conservation profile and that LMSs are predicted that are consistent with the signatures and motifs described in the literature. Furthermore, by comparing the Blast, PsiBlast and Ballast results obtained on a well defined database of structurally related sequences, we show that the LMSs provide a scoring scheme that can concentrate on top ranking distant homologues better than Blast. Using the graphical user interface available on the Web, specific LMSs may be selected to detect divergent homologues sharing the corresponding properties with the query sequence without requiring any additional database search.  相似文献   

4.
Annotation of the rapidly accumulating body of sequence data relies heavily on the detection of remote homologues and functional motifs in protein families. The most popular methods rely on sequence alignment. These include programs that use a scoring matrix to compare the probability of a potential alignment with random chance and programs that use curated multiple alignments to train profile hidden Markov models (HMMs). Related approaches depend on bootstrapping multiple alignments from a single sequence. However, alignment-based programs have limitations. They make the assumption that contiguity is conserved between homologous segments, which may not be true in genetic recombination or horizontal transfer. Alignments also become ambiguous when sequence similarity drops below 40%. This has kindled interest in classification methods that do not rely on alignment. An approach to classification without alignment based on the distribution of contiguous sequences of four amino acids (4-grams) was developed. Interest in 4-grams stemmed from the observation that almost all theoretically possible 4-grams (20(4)) occur in natural sequences and the majority of 4-grams are uniformly distributed. This implies that the probability of finding identical 4-grams by random chance in unrelated sequences is low. A Bayesian probabilistic model was developed to test this hypothesis. For each protein family in Pfam-A and PIR-PSD, a feature vector called a probe was constructed from the set of 4-grams that best characterised the family. In rigorous jackknife tests, unknown sequences from Pfam-A and PIR-PSD were compared with the probes for each family. A classification result was deemed a true positive if the probe match with the highest probability was in first place in a rank-ordered list. This was achieved in 70% of cases. Analysis of false positives suggested that the precision might approach 85% if selected families were clustered into subsets. Case studies indicated that the 4-grams in common between an unknown and the best matching probe correlated with functional motifs from PRINTS. The results showed that remote homologues and functional motifs could be identified from an analysis of 4-gram patterns.  相似文献   

5.
Plants and metazoans share many similarities in terms of conserved proteins. Antibodies have been used extensively to detect remote homologues, many of which are yet to be identified conclusively. Genome sequencing and the creation of novel sequence or structure comparison programs have assisted greatly in the identification of distant protein homologues. The continuing development of new software algorithms and the combining of bioinformatics with proteomics offer hope that remaining homologues will be soon identified.  相似文献   

6.
Jaffee MB  Imperiali B 《Biochemistry》2011,50(35):7557-7567
The central enzyme in N-linked glycosylation is the oligosaccharyl transferase (OTase), which catalyzes glycan transfer from a polyprenyldiphosphate-linked carrier to select asparagines within acceptor proteins. PglB from Campylobacter jejuni is a single-subunit OTase with homology to the Stt3 subunit of the complex multimeric yeast OTase. Sequence identity between PglB and Stt3 is low (17.9%); however, both have a similar predicted architecture and contain the conserved WWDxG motif. To investigate the relationship between PglB and other Stt3 proteins, sequence analysis was performed using 28 homologues from evolutionarily distant organisms. Since detection of small conserved motifs within large membrane-associated proteins is complicated by divergent sequences surrounding the motifs, we developed a program to parse sequences according to predicted topology and then analyze topologically related regions. This approach identified three conserved motifs that served as the basis for subsequent mutagenesis and functional studies. This work reveals that several inter-transmembrane loop regions of PglB/Stt3 contain strictly conserved motifs that are essential for PglB function. The recent publication of a 3.4 ? resolution structure of full-length C. lari OTase provides clear structural evidence that these loops play a fundamental role in catalysis [ Lizak , C. ; ( 2011 ) Nature 474 , 350 - 355 ]. The current study provides biochemical support for the role of the inter-transmembrane domain loops in OTase catalysis and demonstrates the utility of combining topology prediction and sequence analysis for exposing buried pockets of homology in large membrane proteins. The described approach allowed detection of the catalytic motifs prior to availability of structural data and reveals additional catalytically relevant residues that are not predicted by structural data alone.  相似文献   

7.
8.
H J?rnvall 《FEBS letters》1999,456(1):85-88
Motifer is a software tool able to find directly in nucleotide databases very distant homologues to an amino acid query sequence. It focuses searches on a specific amino acid pattern, scoring the matching and intervening residues as specified by the user. The program has been developed for searching databases of expressed sequence tags (ESTs), but it is also well suited to search genomic sequences. The query sequence can be a variable pattern with alternative amino acids or gaps and the sequences searched can contain introns or sequencing errors with accompanying frame shifts. Other features include options to generate a searchable output, set the maximal sequencing error frequency, limit searches to given species, or exclude already known matches. Motifer can find sequence homologues that other search algorithms would deem unrelated or would not find because of sequencing errors or a too large number of other homologues. The ability of Motifer to find relatives to a given sequence is exemplified by searches for members of the transforming growth factor-beta family and for proteins containing a WW-domain. The functions aimed at enhancing EST searches are illustrated by the 'in silico' cloning of a novel cytochrome P450 enzyme.  相似文献   

9.

Background  

Many proteins are highly modular, being assembled from globular domains and segments of natively disordered polypeptides. Linear motifs, short sequence modules functioning independently of protein tertiary structure, are most abundant in natively disordered polypeptides but are also found in accessible parts of globular domains, such as exposed loops. The prediction of novel occurrences of known linear motifs attempts the difficult task of distinguishing functional matches from stochastically occurring non-functional matches. Although functionality can only be confirmed experimentally, confidence in a putative motif is increased if a motif exhibits attributes associated with functional instances such as occurrence in the correct taxonomic range, cellular compartment, conservation in homologues and accessibility to interacting partners. Several tools now use these attributes to classify putative motifs based on confidence of functionality.  相似文献   

10.
ABSTRACT: BACKGROUND: The assembly of next-generation short-read sequencing data can result in a fragmented non-contiguous set of genomic sequences. Therefore a common step in a genome project is to join neighbouring sequence regions together and fill gaps. This scaffolding step is non-trivial and requires manually editing large blocks of nucleotide sequence. Joining these sequences together also hides the source of each region in the final genome sequence. Taken together these considerations may make reproducing or editing an existing genome scaffold difficult. METHODS: The software outlined here, "Scaffolder," is implemented in the Ruby programming language and can be installed via the RubyGems software management system. Genome scaffolds are defined using YAML - a data format which is both human and machine-readable. Command line binaries and extensive documentation are available. RESULTS: This software allows a genome build to be defined in terms of the constituent sequences using a relatively simple syntax. This syntax further allows unknown regions to be specified and additional sequence to be used to fill known gaps in the scaffold. Defining the genome construction in a file makes the scaffolding process reproducible and easier to edit compared with large FASTA nucleotide sequences. CONCLUSIONS: Scaffolder is easy-to-use genome scaffolding software which promotes reproducibility and continuous development in a genome project. Scaffolder can be found at http://next.gs.  相似文献   

11.
MOTIVATION: Short linear peptide motifs mediate protein-protein interaction, cell compartment targeting and represent the sites of post-translational modification. The identification of functional motifs by conventional sequence searches, however, is hampered by the short length of the motifs resulting in a large number of hits of which only a small portion is functional. RESULTS: We have developed a procedure for the identification of functional motifs, which scores pattern conservation in homologous sequences by taking explicitly into account the sequence similarity to the query sequence. For a further improvement of this method, sequence filters have been optimized to mask those sequence regions containing little or no linear motifs. The performance of this approach was verified by measuring its ability to identify 576 experimentally validated motifs among a total of 15 563 instances in a set of 415 protein sequences. Compared to a random selection procedure, the joint application of sequence filters and the novel scoring scheme resulted in a 9-fold enrichment of validated functional motifs on the first rank. In addition, only half as many hits need to be investigated to recover 75% of the functional instances in our dataset. Therefore, this motif-scoring approach should be helpful to guide experiments because it allows focusing on those short linear peptide motifs that have a high probability to be functional.  相似文献   

12.
Linking similar proteins structurally is a challenging task that may help in finding the novel members of a protein family. In this respect, identification of conserved sequence can facilitate understanding and classifying the exact role of proteins. However, the exact role of these conserved elements cannot be elucidated without structural and physiochemical information. In this work, we present a novel desktop application MotViz designed for searching and analyzing the conserved sequence segments within protein structure. With MotViz, the user can extract a complete list of sequence motifs from loaded 3D structures, annotate the motifs structurally and analyze their physiochemical properties. The conservation value calculated for an individual motif can be visualized graphically. To check the efficiency, predicted motifs from the data sets of 9 protein families were analyzed and MotViz algorithm was more efficient in comparison to other online motif prediction tools. Furthermore, a database was also integrated for storing, retrieving and performing the detailed functional annotation studies. In summary, MotViz effectively predicts motifs with high sensitivity and simultaneously visualizes them into 3D strucures. Moreover, MotViz is user-friendly with optimized graphical parameters and better processing speed due to the inclusion of a database at the back end. MotViz is available at http://www.fi-pk.com/motviz.html.  相似文献   

13.
Soil bacteria are heavily exposed to environmental methylating agents such as methylchloride and may have special requirements for repair of alkylation damage on DNA. We have used functional complementation of an Escherichia coli tag alkA mutant to screen for 3-methyladenine DNA glycosylase genes in genomic libraries of the soil bacterium Bacillus cereus. Three genes were recovered: alkC, alkD and alkE. The amino acid sequence of AlkE is homologous to the E. coli AlkA sequence. AlkC and AlkD represent novel proteins without sequence similarity to any protein of known function. However, iterative and indirect sequence similarity searches revealed that AlkC and AlkD are distant homologues of each other within a new protein superfamily that is ubiquitous in the prokaryotic kingdom. Homologues of AlkC and AlkD were also identified in the amoebas Entamoeba histolytica and Dictyostelium discoideum, but no other eukaryotic counterparts of the superfamily were found. The alkC and alkD genes were expressed in E. coli and the proteins were purified to homogeneity. Both proteins were found to be specific for removal of N-alkylated bases, and showed no activity on oxidized or deaminated base lesions in DNA. B. cereus AlkC and AlkD thus define novel families of alkylbase DNA glycosylases within a new protein superfamily.  相似文献   

14.
15.
16.
Improved sequence alignment at low pairwise identity is important for identifying potential remote homologues in database searches and for obtaining accurate alignments as a prelude to modeling structures by homology. Our work is motivated by two observations: structural data provide superior training examples for developing techniques to improve the alignment of remote homologues; and general substitution patterns for remote homologues differ from those of closely related proteins. We introduce a new set of amino acid residue interchange matrices built from structural superposition data. These matrices exploit known structural homology as a means of characterizing the effect evolution has on residue-substitution profiles. Given their origin, it is not surprising that the individual residue-residue interchange frequencies are chemically sensible.The structural interchange matrices show a significant increase both in pairwise alignment accuracy and in functional annotation/fold recognition accuracy across distantly related sequences. We demonstrate improved pairwise alignment by using superpositions of homologous domains extracted from a structural database as a gold standard and go on to show an increase in fold recognition accuracy using a database of homologous fold families. This was applied to the unassigned open reading frames from the genome of Helicobacter pylori to identify five matches, two of which are not represented by new annotations in the sequence databases. In addition, we describe a new cyclic permutation strategy to identify distant homologues that experienced gene duplication and subsequent deletions. Using this method, we have identified a potential homologue to one additional previously unassigned open reading frame from the H. pylori genome.  相似文献   

17.
MOTIVATION: Most proteins have evolved to perform specific functions that are dependent on the adoption of well-defined three-dimensional (3D) structures. Specific patterns of conserved residues in amino acid sequences of divergently evolved proteins are frequently observed; these may reflect evolutionary restraints arising both from the need to maintain tertiary structure and the requirement to conserve residues more directly involved in function. Databases of such sequence patterns are valuable in identifying distant homologues, in predicting function and in the study of evolution. RESULTS: A fully automated database of protein sequence patterns, Functional Protein Sequence Pattern Database (FPSPD), has been derived from the analysis of the conserved residues that are predicted to be functional in structurally aligned homologous families in the HOMSTRAD database. Environment-dependent substitution tables, evolutionary trace analysis, solvent accessibility calculations and 3D-structures were used to obtain the FPSPD. The method yielded 3584 patterns that are considered functional and 3049 patterns that are probably functional. FPSPD could be useful for assigning a protein to a homologous superfamily and thereby providing clues about function. AVAILABILITY: FPSPD is available at http://www-cryst.bioc.cam.ac.uk/~fpspd/  相似文献   

18.
This work presents DNA sequence motifs from the internal transcribed spacer (ITS) of the nuclear rRNA repeat unit which are useful for the identification of five European and Asiatic truffles (Tuber magnatum, T. melanosporum, T. indicum, T. aestivum, and T. mesentericum). Truffles are edible mycorrhizal ascomycetes that show similar morphological characteristics but that have distinct organoleptic and economic values. A total of 36 out of 46 ITS1 or ITS2 sequence motifs have allowed an accurate in silico distinction of the five truffles to be made (i.e., by pattern matching and/or BLAST analysis on downloaded GenBank sequences and directly against GenBank databases). The motifs considered the intraspecific genetic variability of each species, including rare haplotypes, and assigned their respective species from either the ascocarps or ectomycorrhizas. The data indicate that short ITS1 or ITS2 motifs (< or = 50 bp in size) can be considered promising tools for truffle species identification. A dot blot hybridization analysis of T. magnatum and T. melanosporum compared with other close relatives or distant lineages allowed at least one highly specific motif to be identified for each species. These results were confirmed in a blind test which included new field isolates. The current work has provided a reliable new tool for a truffle oligonucleotide bar code and identification in ecological and evolutionary studies.  相似文献   

19.
20.
Short motifs are known to play diverse roles in proteins, such as in mediating the interactions with other molecules, binding to membranes, or conducting a specific biological function. Standard approaches currently employed to detect short motifs in proteins search for enrichment of amino acid motifs considering mostly the sequence information. Here, we presented a new approach to search for common motifs (protein signatures) which share both physicochemical and structural properties, looking simultaneously at different features. Our method takes as an input an amino acid sequence and translates it to a new alphabet that reflects its intrinsic structural and chemical properties. Using the MEME search algorithm, we identified the proteins signatures within subsets of protein which encompass common sequence and structural information. We demonstrated that we can detect enriched structural motifs, such as the amphipathic helix, from large datasets of linear sequences, as well as predicting common structural properties (such as disorder, surface accessibility, or secondary structures) of known functional‐motifs. Finally, we applied the method to the yeast protein interactome and identified novel putative interacting motifs. We propose that our approach can be applied for de novo protein function prediction given either sequence or structural information. Proteins 2013; © 2012 Wiley Periodicals, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号