期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A novel Bayesian DNA motif comparison method for clustering and retrieval

Habib N Kaplan T Margalit H Friedman N 《PLoS computational biology》2008,4(2):e1000010

相似文献

2.

Prediction of transcription factor binding sites using genetical genomics methods

von Rohr P Friberg MT Kadarmideen HN 《Journal of bioinformatics and computational biology》2007,5(3):773-793

相似文献

3.

WildSpan: mining structured motifs from protein sequences

Hsu CM Chen CY Liu BJ 《Algorithms for molecular biology : AMB》2011,6(1):6

Background

Automatic extraction of motifs from biological sequences is an important research problem in study of molecular biology. For proteins, it is desired to discover sequence motifs containing a large number of wildcard symbols, as the residues associated with functional sites are usually largely separated in sequences. Discovering such patterns is time-consuming because abundant combinations exist when long gaps (a gap consists of one or more successive wildcards) are considered. Mining algorithms often employ constraints to narrow down the search space in order to increase efficiency. However, improper constraint models might degrade the sensitivity and specificity of the motifs discovered by computational methods. We previously proposed a new constraint model to handle large wildcard regions for discovering functional motifs of proteins. The patterns that satisfy the proposed constraint model are called W-patterns. A W-pattern is a structured motif that groups motif symbols into pattern blocks interleaved with large irregular gaps. Considering large gaps reflects the fact that functional residues are not always from a single region of protein sequences, and restricting motif symbols into clusters corresponds to the observation that short motifs are frequently present within protein families. To efficiently discover W-patterns for large-scale sequence annotation and function prediction, this paper first formally introduces the problem to solve and proposes an algorithm named WildSpan (sequential pattern mining across large wildcard regions) that incorporates several pruning strategies to largely reduce the mining cost. 相似文献

4.

Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae 总被引：12，自引：0，他引：12

Hughes JD Estep PW Tavazoie S Church GM 《Journal of molecular biology》2000,296(5):1205-1214

相似文献

5.

A cluster refinement algorithm for motif discovery

Li G Chan TM Leung KS Lee KH 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2010,7(4):654-668

相似文献

6.

MUSA: a parameter free algorithm for the identification of biologically significant motifs

Mendes ND Casimiro AC Santos PM Sá-Correia I Oliveira AL Freitas AT 《Bioinformatics (Oxford, England)》2006,22(24):2996-3002

MOTIVATION: The ability to identify complex motifs, i.e. non-contiguous nucleotide sequences, is a key feature of modern motif finders. Addressing this problem is extremely important, not only because these motifs can accurately model biological phenomena but because its extraction is highly dependent upon the appropriate selection of numerous search parameters. Currently available combinatorial algorithms have proved to be highly efficient in exhaustively enumerating motifs (including complex motifs), which fulfill certain extraction criteria. However, one major problem with these methods is the large number of parameters that need to be specified. RESULTS: We propose a new algorithm, MUSA (Motif finding using an UnSupervised Approach), that can be used either to autonomously find over-represented complex motifs or to estimate search parameters for modern motif finders. This method relies on a biclustering algorithm that operates on a matrix of co-occurrences of small motifs. The performance of this method is independent of the composite structure of the motifs being sought, making few assumptions about their characteristics. The MUSA algorithm was applied to two datasets involving the bacterium Pseudomonas putida KT2440. The first one was composed of 70 sigma(54)-dependent promoter sequences and the second dataset included 54 promoter sequences of up-regulated genes in response to phenol, as suggested by quantitative proteomics. The results obtained indicate that this approach is very effective at identifying complex motifs of biological significance. AVAILABILITY: The MUSA algorithm is available upon request from the authors, and will be made available via a Web based interface. 相似文献

7.

变长度Motif识别中的Gibbs抽样算法(英文)

陈晓林汪四水《生物数学学报》2010,(3):442-448

Motif识别是计算生物学中的重要问题.处理缺失数据的方法被大家广泛应用于生物序列中的Motif识别,例如EM算法,Gibbs抽样等等.现在识别Motif的方法都是首先假定Motif的长度是给的,但是,事实上Motif的长度是未知的,在这篇文章中,我们用Gibbs抽样算法在寻找Motif的位置的同时确定Motif的长度. 相似文献

8.

Identification of co-regulated genes through Bayesian clustering of predicted regulatory binding sites

Qin ZS McCue LA Thompson W Mayerhofer L Lawrence CE Liu JS 《Nature biotechnology》2003,21(4):435-439

相似文献

9.

Refining motifs by improving information content scores using neighborhood profile search

Chandan K Reddy Yao-Chung Weng Hsiao-Dong Chiang 《Algorithms for molecular biology : AMB》2006,1(1):23-14

相似文献

10.

Combining phylogenetic motif discovery and motif clustering to predict co-regulated genes 总被引：2，自引：0，他引：2

Jensen ST Shen L Liu JS 《Bioinformatics (Oxford, England)》2005,21(20):3832-3839

相似文献

11.

Finding motifs with insufficient number of strong binding sites.

Henry C M Leung Francis Y L Chin S M Yiu Roni Rosenfeld W W Tsang 《Journal of computational biology》2005,12(6):686-701

相似文献

12.

Combining phylogenetic data with co-regulated genes to identify regulatory motifs 总被引：17，自引：0，他引：17

Wang T Stormo GD 《Bioinformatics (Oxford, England)》2003,19(18):2369-2380

MOTIVATION: Discovery of regulatory motifs in unaligned DNA sequences remains a fundamental problem in computational biology. Two categories of algorithms have been developed to identify common motifs from a set of DNA sequences. The first can be called a 'multiple genes, single species' approach. It proposes that a degenerate motif is embedded in some or all of the otherwise unrelated input sequences and tries to describe a consensus motif and identify its occurrences. It is often used for co-regulated genes identified through experimental approaches. The second approach can be called 'single gene, multiple species'. It requires orthologous input sequences and tries to identify unusually well conserved regions by phylogenetic footprinting. Both approaches perform well, but each has some limitations. It is tempting to combine the knowledge of co-regulation among different genes and conservation among orthologous genes to improve our ability to identify motifs. RESULTS: Based on the Consensus algorithm previously established by our group, we introduce a new algorithm called PhyloCon (Phylogenetic Consensus) that takes into account both conservation among orthologous genes and co-regulation of genes within a species. This algorithm first aligns conserved regions of orthologous sequences into multiple sequence alignments, or profiles, then compares profiles representing non-orthologous sequences. Motifs emerge as common regions in these profiles. Here we present a novel statistic to compare profiles of DNA sequences and a greedy approach to search for common subprofiles. We demonstrate that PhyloCon performs well on both synthetic and biological data. AVAILABILITY: Software available upon request from the authors. http://ural.wustl.edu/softwares.html 相似文献

13.

CMfinder--a covariance model based RNA motif finding algorithm 总被引：5，自引：0，他引：5

Yao Z Weinberg Z Ruzzo WL 《Bioinformatics (Oxford, England)》2006,22(4):445-452

相似文献

14.

Consensus folding of unaligned RNA sequences revisited.

Vineet Bafna Haixu Tang Shaojie Zhang 《Journal of computational biology》2006,13(2):283-295

As one of the earliest problems in computational biology, RNA secondary structure prediction (sometimes referred to as "RNA folding") problem has attracted attention again, thanks to the recent discoveries of many novel non-coding RNA molecules. The two common approaches to this problem are de novo prediction of RNA secondary structure based on energy minimization and the consensus folding approach (computing the common secondary structure for a set of unaligned RNA sequences). Consensus folding algorithms work well when the correct seed alignment is part of the input to the problem. However, seed alignment itself is a challenging problem for diverged RNA families. In this paper, we propose a novel framework to predict the common secondary structure for unaligned RNA sequences. By matching putative stacks in RNA sequences, we make use of both primary sequence information and thermodynamic stability for prediction at the same time. We show that our method can predict the correct common RNA secondary structures even when we are given only a limited number of unaligned RNA sequences, and it outperforms current algorithms in sensitivity and accuracy. 相似文献

15.

Finding motifs from all sequences with and without binding sites

Leung HC Chin FY 《Bioinformatics (Oxford, England)》2006,22(18):2217-2223

相似文献

16.

Large multiple organism gene finding by collapsed Gibbs sampling.

Sourav Chatterji Lior Pachter 《Journal of computational biology》2005,12(6):599-608

The Gibbs sampling method has been widely used for sequence analysis after it was successfully applied to the problem of identifying regulatory motif sequences upstream of genes. Since then, numerous variants of the original idea have emerged: however, in all cases the application has been to finding short motifs in collections of short sequences (typically less than 100 nucleotides long). In this paper, we introduce a Gibbs sampling approach for identifying genes in multiple large genomic sequences up to hundreds of kilobases long. This approach leverages the evolutionary relationships between the sequences to improve the gene predictions, without explicitly aligning the sequences. We have applied our method to the analysis of genomic sequence from 14 genomic regions, totaling roughly 1.8 Mb of sequence in each organism. We show that our approach compares favorably with existing ab initio approaches to gene finding, including pairwise comparison based gene prediction methods which make explicit use of alignments. Furthermore, excellent performance can be obtained with as little as four organisms, and the method overcomes a number of difficulties of previous comparison based gene finding approaches: it is robust with respect to genomic rearrangements, can work with draft sequence, and is fast (linear in the number and length of the sequences). It can also be seamlessly integrated with Gibbs sampling motif detection methods. 相似文献

17.

Mining frequent stem patterns from unaligned RNA sequences 总被引：1，自引：0，他引：1

Hamada M Tsuda K Kudo T Kin T Asai K 《Bioinformatics (Oxford, England)》2006,22(20):2480-2487

MOTIVATION: In detection of non-coding RNAs, it is often necessary to identify the secondary structure motifs from a set of putative RNA sequences. Most of the existing algorithms aim to provide the best motif or few good motifs, but biologists often need to inspect all the possible motifs thoroughly. RESULTS: Our method RNAmine employs a graph theoretic representation of RNA sequences and detects all the possible motifs exhaustively using a graph mining algorithm. The motif detection problem boils down to finding frequently appearing patterns in a set of directed and labeled graphs. In the tasks of common secondary structure prediction and local motif detection from long sequences, our method performed favorably both in accuracy and in efficiency with the state-of-the-art methods such as CMFinder. AVAILABILITY: The software is available upon request. 相似文献

18.

Computational discovery of regulatory elements in a continuous expression space

Mathieu Lajoie Olivier Gascuel Vincent Lefort Laurent Bréhélin 《Genome biology》2012,13(11):R109

Approaches for regulatory element discovery from gene expression data usually rely on clustering algorithms to partition the data into clusters of co-expressed genes. Gene regulatory sequences are then mined to find overrepresented motifs in each cluster. However, this ad hoc partition rarely fits the biological reality. We propose a novel method called RED²that avoids data clustering by estimating motif densities locally around each gene. We show that RED²detects numerous motifs not detected by clustering-based approaches, and that most of these correspond to characterized motifs. RED²can be accessed online through a user-friendly interface. 相似文献

19.

INCLUSive: integrated clustering, upstream sequence retrieval and motif sampling 总被引：7，自引：0，他引：7

Thijs G Moreau Y De Smet F Mathys J Lescot M Rombauts S Rouze P De Moor B Marchal K 《Bioinformatics (Oxford, England)》2002,18(2):331-332

INCLUSive allows automatic multistep analysis of microarray data (clustering and motif finding). The clustering algorithm (adaptive quality-based clustering) groups together genes with highly similar expression profiles. The upstream sequences of the genes belonging to a cluster are automatically retrieved from GenBank and can be fed directly into Motif Sampler, a Gibbs sampling algorithm that retrieves statistically over-represented motifs in sets of sequences, in this case upstream regions of co-expressed genes. 相似文献

20.

Searching RNA motifs and their intermolecular contacts with constraint networks 总被引：2，自引：0，他引：2

Thébault P de Givry S Schiex T Gaspin C 《Bioinformatics (Oxford, England)》2006,22(17):2074-2080

MOTIVATION: Searching RNA gene occurrences in genomic sequences is a task whose importance has been renewed by the recent discovery of numerous functional RNA, often interacting with other ligands. Even if several programs exist for RNA motif search, none exists that can represent and solve the problem of searching for occurrences of RNA motifs in interaction with other molecules. RESULTS: We present a constraint network formulation of this problem. RNA are represented as structured motifs that can occur on more than one sequence and which are related together by possible hybridization. The implemented tool MilPat is used to search for several sRNA families in genomic sequences. Results show that MilPat allows to efficiently search for interacting motifs in large genomic sequences and offers a simple and extensible framework to solve such problems. New and known sRNA are identified as H/ACA candidates in Methanocaldococcus jannaschii. AVAILABILITY: http://carlit.toulouse.inra.fr/MilPaT/MilPat.pl. 相似文献