共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
3.
Finding conserved motifs in genomic sequences represents one of essential bioinformatic problems. However, achieving high discovery performance without imposing substantial auxiliary constraints on possible motif features remains a key algorithmic challenge. This work describes BAMBI-a sequential Monte Carlo motif-identification algorithm, which is based on a position weight matrix model that does not require additional constraints and is able to estimate such motif properties as length, logo, number of instances and their locations solely on the basis of primary nucleotide sequence data. Furthermore, should biologically meaningful information about motif attributes be available, BAMBI takes advantage of this knowledge to further refine the discovery results. In practical applications, we show that the proposed approach can be used to find sites of such diverse DNA-binding molecules as the cAMP receptor protein (CRP) and Din-family site-specific serine recombinases. Results obtained by BAMBI in these and other settings demonstrate better statistical performance than any of the four widely-used profile-based motif discovery methods: MEME, BioProspector with BioOptimizer, SeSiMCMC and Motif Sampler as measured by the nucleotide-level correlation coefficient. Additionally, in the case of Din-family recombinase target site discovery, the BAMBI-inferred motif is found to be the only one functionally accurate from the underlying biochemical mechanism standpoint. C++ and Matlab code is available at http://www.ee.columbia.edu/~guido/BAMBI or http://genomics.lbl.gov/BAMBI/. 相似文献
4.
5.
6.
7.
8.
Assessing computational tools for the discovery of transcription factor binding sites 总被引:34,自引:0,他引:34
Tompa M Li N Bailey TL Church GM De Moor B Eskin E Favorov AV Frith MC Fu Y Kent WJ Makeev VJ Mironov AA Noble WS Pavesi G Pesole G Régnier M Simonis N Sinha S Thijs G van Helden J Vandenbogaert M Weng Z Workman C Ye C Zhu Z 《Nature biotechnology》2005,23(1):137-144
The prediction of regulatory elements is a problem where computational methods offer great hope. Over the past few years, numerous tools have become available for this task. The purpose of the current assessment is twofold: to provide some guidance to users regarding the accuracy of currently available tools in various settings, and to provide a benchmark of data sets for assessing future tools. 相似文献
9.
10.
Evolution of transcription factor DNA binding sites 总被引:2,自引:0,他引:2
11.
12.
13.
A fundamental challenge facing biologists is to identify DNA binding sites for unknown regulatory factors, given a collection of genes believed to be coregulated. The program YMF identifies good candidates for such binding sites by searching for statistically overrepresented motifs. More specifically, YMF enumerates all motifs in the search space and is guaranteed to produce those motifs with greatest z-scores. This note describes the YMF web software, available at http://bio.cs.washington.edu/software.html. 相似文献
14.
15.
16.
The helix-turn-helix DNA binding motif 总被引:98,自引:0,他引:98
17.
18.
Raphael B Liu LT Varghese G 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2004,1(2):91-94
Buhler and Tompa (2002) introduced the random projection algorithm for the motif discovery problem and demonstrated that this algorithm performs well on both simulated and biological samples. We describe a modification of the random projection algorithm, called the uniform projection algorithm, which utilizes a different choice of projections. We replace the random selection of projections by a greedy heuristic that approximately equalizes the coverage of the projections. We show that this change in selection of projections leads to improved performance on motif discovery problems. Furthermore, the uniform projection algorithm is directly applicable to other problems where the random projection algorithm has been used, including comparison of protein sequence databases. 相似文献
19.
DNA binding sites: representation and discovery 总被引:60,自引:0,他引:60
Stormo GD 《Bioinformatics (Oxford, England)》2000,16(1):16-23
The purpose of this article is to provide a brief history of the development and application of computer algorithms for the analysis and prediction of DNA binding sites. This problem can be conveniently divided into two subproblems. The first is, given a collection of known binding sites, develop a representation of those sites that can be used to search new sequences and reliably predict where additional binding sites occur. The second is, given a set of sequences known to contain binding sites for a common factor, but not knowing where the sites are, discover the location of the sites in each sequence and a representation for the specificity of the protein. 相似文献