首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Several methods have been developed for identifying more or less complex RNA structures in a genome. All these methods are based on the search for conserved primary and secondary sub-structures. In this paper, we present a simple formal representation of a helix, which is a combination of sequence and folding constraints, as a constrained regular expression. This representation allows us to develop a well-founded algorithm that searches for all approximate matches of a helix in a genome. The algorithm is based on an alignment graph constructed from several copies of a pushdown automaton, arranged one on top of another. This is a first attempt to take advantage of the possibilities of pushdown automata in the context of approximate matching. The worst time complexity is O(krpn), where k is the error threshold, n the size of the genome, p the size of the secondary expression, and r its number of union symbols. We then extend the algorithm to search for pseudo-knots and secondary structures containing an arbitrary number of helices.  相似文献   

2.
3.
We use methods from Data Mining and Knowledge Discovery to design an algorithm for detecting motifs in protein sequences. The algorithm assumes that a motif is constituted by the presence of a "good" combination of residues in appropriate locations of the motif. The algorithm attempts to compile such good combinations into a "pattern dictionary" by processing an aligned training set of protein sequences. The dictionary is subsequently used to detect motifs in new protein sequences. Statistical significance of the detection results are ensured by statistically determining the various parameters of the algorithm. Based on this approach, we have implemented a program called GYM. The Helix-Turn-Helix motif was used as a model system on which to test our program. The program was also extended to detect Homeodomain motifs. The detection results for the two motifs compare favorably with existing programs. In addition, the GYM program provides a lot of useful information about a given protein sequence.  相似文献   

4.
SUMMARY: HELM is a web tool designed to automate the analysis of protein sequences searching for alpha helix motifs. This analysis can be useful in protein engineering studies, aimed at the identification of regions to be modified in order to obtain more suitable features of local and/or global stability. AVAILABILITY: The tool is available to academic and commercial institutions at the URL http://crisceb.area.na.cnr.it/angelo/ PROTEIN_TOOLS/HELM/ CONTACT: angelo@crisceb.area.na.cnr.it  相似文献   

5.

Background

Pattern mining for biological sequences is an important problem in bioinformatics and computational biology. Biological data mining yield impact in diverse biological fields, such as discovery of co-occurring biosequences, which is important for biological data analyses. The approaches of mining sequential patterns can discover all-length motifs of biological sequences. Nevertheless, traditional approaches of mining sequential patterns inefficiently mine DNA and protein data since the data have fewer letters and lengthy sequences. Furthermore, gap constraints are important in computational biology since they cope with irrelative regions, which are not conserved in evolution of biological sequences.

Results

We devise an approach to efficiently mine sequential patterns (motifs) with gap constraints in biological sequences. The approach is the Depth-First Spelling algorithm for mining sequential patterns of biological sequences with Gap constraints (termed DFSG).

Conclusions

PrefixSpan is one of the most efficient methods in traditional approaches of mining sequential patterns, and it is the basis of GenPrefixSpan. GenPrefixSpan is an approach built on PrefixSpan with gap constraints, and therefore we compare DFSG with GenPrefixSpan. In the experimental results, DFSG mines biological sequences much faster than GenPrefixSpan.
  相似文献   

6.
We propose a new algorithm for identifying cis-regulatory modules in genomic sequences. The proposed algorithm, named RISO, uses a new data structure, called box-link, to store the information about conserved regions that occur in a well-ordered and regularly spaced manner in the data set sequences. This type of conserved regions, called structured motifs, is extremely relevant in the research of gene regulatory mechanisms since it can effectively represent promoter models. The complexity analysis shows a time and space gain over the best known exact algorithms that is exponential in the spacings between binding sites. A full implementation of the algorithm was developed and made available online. Experimental results show that the algorithm is much faster than existing ones, sometimes by more than four orders of magnitude. The application of the method to biological data sets shows its ability to extract relevant consensi.  相似文献   

7.
We present an update of our method for systematic detection and evaluation of potential helix-turn-helix DNA-binding motifs in protein sequences [Dodd, I. and Egan, J. B. (1987) J. Mol. Biol. 194, 557-564]. The new method is considerably more powerful, detecting approximately 50% more likely helix-turn-helix sequences without an increase in false predictions. This improvement is due almost entirely to the use of a much larger reference set of 91 presumed helix-turn-helix sequences. The scoring matrix derived from this reference set has been calibrated against a large protein sequence database so that the score obtained by a sequence can be used to give a practical estimation of the probability that the sequence is a helix-turn-helix motif.  相似文献   

8.

Background

False occurrences of functional motifs in protein sequences can be considered as random events due solely to the sequence composition of a proteome. Here we use a numerical approach to investigate the random appearance of functional motifs with the aim of addressing biological questions such as: How are organisms protected from undesirable occurrences of motifs otherwise selected for their functionality? Has the random appearance of functional motifs in protein sequences been affected during evolution?

Results

Here we analyse the occurrence of functional motifs in random sequences and compare it to that observed in biological proteomes; the behaviour of random motifs is also studied. Most motifs exhibit a number of false positives significantly similar to the number of times they appear in randomized proteomes (=expected number of false positives). Interestingly, about 3% of the analysed motifs show a different kind of behaviour and appear in biological proteomes less than they do in random sequences. In some of these cases, a mechanism of evolutionary negative selection is apparent; this helps to prevent unwanted functionalities which could interfere with cellular mechanisms.

Conclusion

Our thorough statistical and biological analysis showed that there are several mechanisms and evolutionary constraints both of which affect the appearance of functional motifs in protein sequences.
  相似文献   

9.
Information about the three-dimensional structure or functionof a newly determined protein sequence can be obtained if theprotein is found to contain a characterized motif or patternof residues. Recently a database (PROSITE) has been establishedthat contains 337 known motifs encoded as a list of allowedresidue types at specific positions along the sequence. PROMOTis a FORTRAN computer program that takes a protein sequenceand examines if it contains any of the motifs in PROSITE. Theprogram also extends the definitions of patterns beyond thoseused in PROSITE to provide a simple, yet flexible, method toscan either a PROSITE or a user-defined pattern against a proteinsequence database. Received on October 17, 1990; accepted on November 15, 1990  相似文献   

10.
11.
We find recurring amino-acid residue packing patterns, or spatial motifs, that are characteristic of protein structural families, by applying a novel frequent subgraph mining algorithm to graph representations of protein three-dimensional structure. Graph nodes represent amino acids, and edges are chosen in one of three ways: first, using a threshold for contact distance between residues; second, using Delaunay tessellation; and third, using the recently developed almost-Delaunay edges. For a set of graphs representing a protein family from the Structural Classification of Proteins (SCOP) database, subgraph mining typically identifies several hundred common subgraphs corresponding to spatial motifs that are frequently found in proteins in the family but rarely found outside of it. We find that some of the large motifs map onto known functional regions in two protein families explored in this study, i.e., serine proteases and kinases. We find that graphs based on almost-Delaunay edges significantly reduce the number of edges in the graph representation and hence present computational advantage, yet the patterns extracted from such graphs have a biological interpretation approximately equivalent to that of those extracted from distance based graphs.  相似文献   

12.
A sensitive technique for protein sequence motif recognition based on neural networks has been developed. It involves three major steps. (1) At each appropriate alignment position of a set of N matched sequences, a set of N aligned oligopeptides is specified with preselected window length. N neural nets are subsequently and successively trained on N-1 amino acid spans after eliminating each ith oligopeptide. A test for recognition of each of the ith spans is performed. The average neural net recognition over N such trials is used as a measure of conservation for the particular windowed region of the multiple alignment. This process is repeated for all possible spans of given length in the multiple alignment. (2) The M most conserved regions are regarded as motifs and the oligopeptides within each are used to train intensively M individual neural networks. (3) The M networks are then applied in a search for related primary structures in a databank of known protein sequences. The oligopeptide spans in the database sequence with strongest neural net output for each of the M networks are saved and then scored according to the output signals and the proper combination that follows the expected N- to C-terminal sequence order. The motifs from the database with highest similarity scores can then be used to retrain the M neural nets, which can be subsequently utilized for further searches in the databank, thus providing even greater sensitivity to recognize distant familial proteins. This technique was successfully applied to the integrase, DNA-polymerase and immunoglobulin families.  相似文献   

13.

Background  

A large number of PROSITE patterns select false positives and/or miss known true positives. It is possible that – at least in some cases – the weak specificity and/or sensitivity of a pattern is due to the fact that one, or maybe more, functional and/or structural key residues are not represented in the pattern. Multiple sequence alignments are commonly used to build functional sequence patterns. If residues structurally conserved in proteins sharing a function cannot be aligned in a multiple sequence alignment, they are likely to be missed in a standard pattern construction procedure.  相似文献   

14.
The problem of detecting DNA motifs with functional relevance in real biological sequences is difficult due to a number of biological, statistical and computational issues and also because of the lack of knowledge about the structure of searched patterns. Many algorithms are implemented in fully automated processes, which are often based upon a guess of input parameters from the user at the very first step. In this paper, we present a novel method for the detection of seeded DNA motifs, composed by regions with a different extent of variability. The method is based on a multi-step approach, which was implemented in a motif searching web tool (MOST). Overrepresented exact patterns are extracted from input sequences and clustered to produce motifs core regions, which are then extended and scored to generate seeded motifs. The combination of automated pattern discovery algorithms and different display tools for the evaluation and selection of results at several analysis steps can potentially lead to much more meaningful results than complete automation can produce. Experimental results on different yeast and human real datasets proved the methodology to be a promising solution for finding seeded motifs. MOST web tool is freely available at http://telethon.bio.unipd.it/bioinfo/MOST.  相似文献   

15.
MOTIVATION: Motif detection is an important component of the classification and annotation of protein sequences. A method for aligning motifs with an amino acid sequence is introduced. The motifs can be described by the secondary (i.e. functional, biophysical, etc.) characteristics of a signal or pattern to be detected. The results produced are based on the statistical relevance of the alignment. The method was targeted to avoid the problems (i.e. over-fitting, biological interpretation and mathematical soundness) encountered in other methods currently available. RESULTS: The method was tested on lipoprotein signals in B. subtilis yielding stable results. The results of signal prediction were consistent with other methods where literature was available. AVAILABILITY: An implementation of the motif alignment, refining and bootstrapping is available for public use online at http://www.expasy.org/tools/patoseq/  相似文献   

16.
MOTIVATION: Sequence databases represent an enormous resource of phylogenetic information, but there is a lack of tools for accessing that information in order to assess the amount of evolutionary information in these databases that may be suitable for phylogenetic reconstruction and for identifying areas of the taxonomy that are under-represented for specific gene sequences. RESULTS: We have developed TreeGeneBrowser which allows inspection and evaluation of gene sequence data for phylogenetic reconstruction. This program improves the efficiency of identification of genes that may be useful for particular phylogenetic studies and identifies taxa and taxonomic branches that are under-represented in sequence databases.  相似文献   

17.
We have developed a pattern comparative method for identifying functionally important motifs in protein sequences. The essence of most standard pattern comparative methods is a comparison of patterns occurring in different sequences using an optimized weight matrix. In contrast, our approach is based on a measure of similarity among all the candidate motifs within the same sequence. This method may prove to be particularly efficient for proteins encoding the same biochemical function, but with different primary sequences, and when tertiary structure information from one or more sequences is available. We have applied this method to a special class of zinc-binding enzymes known as endopeptidases.  相似文献   

18.
This work presents a method to compare local clusters of interactingresidues as observed in a known three-dimensional protein structurewith corresponding clusters inferred from homologous proteinsequences, assuming conserved protein folding. For this purposethe local environment of a selected residue in a known proteinstructure is defined as the ensemble of amino acids in contactwith it in the folded state. Using a multiple sequence alignmentto identify corresponding residues in homologous proteins, adetailed comparison can be performed between the local environmentof a selected amino acid in the template protein structure andthe expected local environments at the sets of equivalent residues,derived from the aligned protein sequences. The comparison makesit possible to detect conserved local features such as hydrogenbonding or complementarity in residue substitution. A globalmeasure of environmental similarity is also defined, to searchfor conserved amino acid clusters subject to functional or structural constraints. The proposed approach is useful for investigatingprotein function as well as for site-directed mutagenesis experiments,where appropriate amino acid substitutions can be suggestedby observing naturally occurring protein variants.  相似文献   

19.
V Kothekar 《FEBS letters》1990,274(1-2):217-222
We report here a computer simulation of the three-dimensional structures of seven zinc finger motifs from cellular nucleic acid binding protein involved in negative feedback inhibition of cholesterol biosynthesis. The structures are optimised using steric constraints imposed by tetrahedral coordination of the zinc ion with Cys and His residues, by molecular mechanics technique. We have also optimised the structure of a finger-I with GpT sequence. The model for the interaction of seven fingered protein with single-stranded d(GTGCGGTG) from sterol regulatory element (SRE) is given on the basis of these results. We also propose a scheme for recognition of a multifingered regulatory protein with small single-stranded DNA fragments.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号