期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Detection of generic spaced motifs using submotif pattern mining

Wijaya E Rajaraman K Yiu SM Sung WK 《Bioinformatics (Oxford, England)》2007,23(12):1476-1485

MOTIVATION: Identification of motifs is one of the critical stages in studying the regulatory interactions of genes. Motifs can have complicated patterns. In particular, spaced motifs, an important class of motifs, consist of several short segments separated by spacers of different lengths. Locating spaced motifs is not trivial. Existing motif-finding algorithms are either designed for monad motifs (short contiguous patterns with some mismatches) or have assumptions on the spacer lengths or can only handle at most two segments. An effective motif finder for generic spaced motifs is highly desirable. RESULTS: This article proposes a novel approach for identifying spaced motifs with any number of spacers of different lengths. We introduce the notion of submotifs to capture the segments in the spaced motif and formulate the motif-finding problem as a frequent submotif mining problem. We provide an algorithm called SPACE to solve the problem. Based on experiments on real biological datasets, synthetic datasets and the motif assessment benchmarks by Tompa et al., we show that our algorithm performs better than existing tools for spaced motifs with improvements in both sensitivity and specificity and for monads, SPACE performs as good as other tools. AVAILABILITY: The source code is available upon request from the authors. 相似文献

2.

Finding motifs from all sequences with and without binding sites

Leung HC Chin FY 《Bioinformatics (Oxford, England)》2006,22(18):2217-2223

相似文献

3.

BioOptimizer: a Bayesian scoring function approach to motif discovery 总被引：5，自引：0，他引：5

Jensen ST Liu JS 《Bioinformatics (Oxford, England)》2004,20(10):1557-1564

相似文献

4.

Finding motifs using random projections. 总被引：19，自引：0，他引：19

Jeremy Buhler Martin Tompa 《Journal of computational biology》2002,9(2):225-242

相似文献

5.

A Monte Carlo EM Algorithm for De Novo Motif Discovery in Biomolecular Sequences 总被引：1，自引：0，他引：1

Bi Chengpeng 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2009,6(3):370-386

Motif discovery methods play pivotal roles in deciphering the genetic regulatory codes (i.e., motifs) in genomes as well as in locating conserved domains in protein sequences. The Expectation Maximization (EM) algorithm is one of the most popular methods used in de novo motif discovery. Based on the position weight matrix (PWM) updating technique, this paper presents a Monte Carlo version of the EM motif-finding algorithm that carries out stochastic sampling in local alignment space to overcome the conventional EM's main drawback of being trapped in a local optimum. The newly implemented algorithm is named as Monte Carlo EM Motif Discovery Algorithm (MCEMDA). MCEMDA starts from an initial model, and then it iteratively performs Monte Carlo simulation and parameter update until convergence. A log-likelihood profiling technique together with the top-k strategy is introduced to cope with the phase shifts and multiple modal issues in motif discovery problem. A novel grouping motif alignment (GMA) algorithm is designed to select motifs by clustering a population of candidate local alignments and successfully applied to subtle motif discovery. MCEMDA compares favorably to other popular PWM-based and word enumerative motif algorithms tested using simulated (l, d)-motif cases, documented prokaryotic, and eukaryotic DNA motif sequences. Finally, MCEMDA is applied to detect large blocks of conserved domains using protein benchmarks and exhibits its excellent capacity while compared with other multiple sequence alignment methods. 相似文献

6.

PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny

Siddharthan R Siggia ED van Nimwegen E 《PLoS computational biology》2005,1(7):e67

相似文献

7.

GAME: detecting cis-regulatory elements using a genetic algorithm 总被引：3，自引：0，他引：3

Wei Z Jensen ST 《Bioinformatics (Oxford, England)》2006,22(13):1577-1584

相似文献

8.

Accurate recognition of cis-regulatory motifs with the correct lengths in prokaryotic genomes

Guojun Li Bingqiang Liu Ying Xu 《Nucleic acids research》2010,38(2):e12

We present a new computational method for solving a classical problem, the identification problem of cis-regulatory motifs in a given set of promoter sequences, based on one key new idea. Instead of scoring candidate motifs individually like in all the existing motif-finding programs, our method scores groups of candidate motifs with similar sequences, called motif closures, using a P-value, which has substantially improved the prediction reliability over the existing methods. Our new P-value scoring scheme is sequence length independent, hence allowing direct comparisons among predicted motifs with different lengths on the same footing. We have implemented this method as a Motif Recognition Computer (MREC) program, and have extensively tested MREC on both simulated and biological data from prokaryotic genomes. Our test results indicate that MREC can accurately pick out the actual motif with the correct length as the best scoring candidate for the vast majority of the cases in our test set. We compared our prediction results with two motif-finding programs Cosmo and MEME, and found that MREC outperforms both programs across all the test cases by a large margin. The MREC program is available at http://csbl.bmb.uga.edu/~bingqiang/MREC1/. 相似文献

9.

Mclip: motif detection based on cliques of gapped local profile-to-profile alignments

Frickey T Weiller G 《Bioinformatics (Oxford, England)》2007,23(4):502-503

A multitude of motif-finding tools have been published, which can generally be assigned to one of three classes: expectation-maximization, Gibbs-sampling or enumeration. Irrespective of this grouping, most motif detection tools only take into account similarities across ungapped sequence regions, possibly causing short motifs located peripherally and in varying distance to a 'core' motif to be missed. We present a new method, adding to the set of expectation-maximization approaches, that permits the use of gapped alignments for motif elucidation. Availability: The program is available for download from: http://bioinfoserver.rsbs.anu.edu.au/downloads/mclip.jar. Supplementary information: http://bioinfoserver.rsbs.anu.edu.au/utils/mclip/info.php. 相似文献

10.

CoMoDis: composite motif discovery in mammalian genomes

Donaldson IJ Göttgens B 《Nucleic acids research》2007,35(1):e1

相似文献

11.

Modeling within-motif dependence for transcription factor binding site predictions 总被引：8，自引：0，他引：8

Zhou Q Liu JS 《Bioinformatics (Oxford, England)》2004,20(6):909-916

相似文献

12.

Finding motifs in the twilight zone 总被引：8，自引：0，他引：8

Keich U Pevzner PA 《Bioinformatics (Oxford, England)》2002,18(10):1374-1381

相似文献

13.

A novel Bayesian DNA motif comparison method for clustering and retrieval

Habib N Kaplan T Margalit H Friedman N 《PLoS computational biology》2008,4(2):e1000010

相似文献

14.

An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments 总被引：19，自引：0，他引：19

Liu XS Brutlag DL Liu JS 《Nature biotechnology》2002,20(8):835-839

相似文献

15.

Assessing the effects of symmetry on motif discovery and modeling

Motlhabi LM Stormo GD 《PloS one》2011,6(9):e24908

相似文献

16.

Transcription factor binding site identification using the self-organizing map 总被引：4，自引：0，他引：4

Mahony S Hendrix D Golden A Smith TJ Rokhsar DS 《Bioinformatics (Oxford, England)》2005,21(9):1807-1814

MOTIVATION: The automatic identification of over-represented motifs present in a collection of sequences continues to be a challenging problem in computational biology. In this paper, we propose a self-organizing map of position weight matrices as an alternative method for motif discovery. The advantage of this approach is that it can be used to simultaneously characterize every feature present in the dataset, thus lessening the chance that weaker signals will be missed. Features identified are ranked in terms of over-representation relative to a background model. RESULTS: We present an implementation of this approach, named SOMBRERO (self-organizing map for biological regulatory element recognition and ordering), which is capable of discovering multiple distinct motifs present in a single dataset. Demonstrated here are the advantages of our approach on various datasets and SOMBRERO's improved performance over two popular motif-finding programs, MEME and AlignACE. AVAILABILITY: SOMBRERO is available free of charge from http://bioinf.nuigalway.ie/sombrero SUPPLEMENTARY INFORMATION: http://bioinf.nuigalway.ie/sombrero/additional. 相似文献

17.

Systematic identification of mammalian regulatory motifs' target genes and functions

Warner JB Philippakis AA Jaeger SA He FS Lin J Bulyk ML 《Nature methods》2008,5(4):347-353

We developed an algorithm, Lever, that systematically maps metazoan DNA regulatory motifs or motif combinations to sets of genes. Lever assesses whether the motifs are enriched in cis-regulatory modules (CRMs), predicted by our PhylCRM algorithm, in the noncoding sequences surrounding the genes. Lever analysis allows unbiased inference of functional annotations to regulatory motifs and candidate CRMs. We used human myogenic differentiation as a model system to statistically assess greater than 25,000 pairings of gene sets and motifs or motif combinations. We assigned functional annotations to candidate regulatory motifs predicted previously and identified gene sets that are likely to be co-regulated via shared regulatory motifs. Lever allows moving beyond the identification of putative regulatory motifs in mammalian genomes, toward understanding their biological roles. This approach is general and can be applied readily to any cell type, gene expression pattern or organism of interest. 相似文献

18.

Subtle motifs: defining the limits of motif finding algorithms 总被引：4，自引：0，他引：4

Keich U Pevzner PA 《Bioinformatics (Oxford, England)》2002,18(10):1382-1390

MOTIVATION: What constitutes a subtle motif? Intuitively, it is a motif that is almost indistinguishable, in the statistical sense, from random motifs. This question has important practical consequences: consider, for example, a biologist that is generating a sample of upstream regulatory sequences with the goal of finding a regulatory pattern that is shared by these sequences. If the sequences are too short then one risks losing some of the regulatory patterns that are located further upstream. Conversely, if the sequences are too long, the motif becomes too subtle and one is then likely to encounter random motifs which are at least as significant statistically as the regulatory pattern itself. In practical terms one would like to recognize the sequence length threshold, or the twilight zone, beyond which the motifs are in some sense too subtle. RESULTS: The paper defines the motif twilight zone where every motif finding algorithm would be exposed to random motifs which are as significant as the one which is sought. We also propose an objective tool for evaluating the performance of subtle motif finding algorithms. Finally we apply these tools to evaluate the success of our MULTIPROFILER algorithm to detect subtle motifs. 相似文献

19.

Binding site discovery from nucleic acid sequences by discriminative learning of hidden Markov models

Jonas Maaskola Nikolaus Rajewsky 《Nucleic acids research》2014,42(21):12995-13011

相似文献

20.

Biological network motif detection: principles and practice

Wong E Baur B Quader S Huang CH 《Briefings in bioinformatics》2012,13(2):202-215

Network motifs are statistically overrepresented sub-structures (sub-graphs) in a network, and have been recognized as 'the simple building blocks of complex networks'. Study of biological network motifs may reveal answers to many important biological questions. The main difficulty in detecting larger network motifs in biological networks lies in the facts that the number of possible sub-graphs increases exponentially with the network or motif size (node counts, in general), and that no known polynomial-time algorithm exists in deciding if two graphs are topologically equivalent. This article discusses the biological significance of network motifs, the motivation behind solving the motif-finding problem, and strategies to solve the various aspects of this problem. A simple classification scheme is designed to analyze the strengths and weaknesses of several existing algorithms. Experimental results derived from a few comparative studies in the literature are discussed, with conclusions that lead to future research directions. 相似文献