期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

MOTIVATION: The sequence specificity of DNA-binding proteins is typically represented as a position weight matrix in which each base position contributes independently to relative affinity. Assessment of the accuracy and broad applicability of this representation has been limited by the lack of extensive DNA-binding data. However, new microarray techniques, in which preferences for all possible K-mers are measured, enable a broad comparison of both motif representation and methods for motif discovery. Here, we consider the problem of accounting for all of the binding data in such experiments, rather than the highest affinity binding data. We introduce the RankMotif++, an algorithm designed for finding motifs whenever sequences are associated with a semi-quantitative measure of protein-DNA-binding affinity. RankMotif++ learns motif models by maximizing the likelihood of a set of binding preferences under a probabilistic model of how sequence binding affinity translates into binding preference observations. Because RankMotif++ makes few assumptions about the relationship between binding affinity and the semi-quantitative readout, it is applicable to a wide variety of experimental assays of DNA-binding preference. RESULTS: By several criteria, RankMotif++ predicts binding affinity better than two widely used motif finding algorithms (MDScan, MatrixREDUCE) or more recently developed algorithms (PREGO, Seed and Wobble), and its performance is comparable to a motif model that separately assigns affinities to 8-mers. Our results validate the PWM model and provide an approximation of the precision and recall that can be expected in a genomic scan. AVAILABILITY: RankMotif++ is available upon request. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. 相似文献

8.

TherMos: Estimating protein–DNA binding energies from in vivo binding profiles

Wenjie Sun Xiaoming Hu Michael H. K. Lim Calista K. L. Ng Siew Hua Choo Diogo S. Castro Daniela Drechsel Fran?ois Guillemot Prasanna R. Kolatkar Ralf Jauch Shyam Prabhakar 《Nucleic acids research》2013,41(11):5555-5568

相似文献

9.

SOMEA: self-organizing map based extraction algorithm for DNA motif identification with heterogeneous model

Lee NK Wang D 《BMC bioinformatics》2011,12(Z1):S16

相似文献

10.

On the role of structural information in remote homology detection and sequence alignment: new methods using hybrid sequence profiles

Tang CL Xie L Koh IY Posy S Alexov E Honig B 《Journal of molecular biology》2003,334(5):1043-1062

Structural alignments often reveal relationships between proteins that cannot be detected using sequence alignment alone. However, profile search methods based entirely on structural alignments alone have not been found to be effective in finding remote homologs. Here, we explore the role of structural information in remote homolog detection and sequence alignment. To this end, we develop a series of hybrid multidimensional alignment profiles that combine sequence, secondary and tertiary structure information into hybrid profiles. Sequence-based profiles are profiles whose position-specific scoring matrix is derived from sequence alignment alone; structure-based profiles are those derived from multiple structure alignments. We compare pure sequence-based profiles to pure structure-based profiles, as well as to hybrid profiles that use combined sequence-and-structure-based profiles, where sequence-based profiles are used in loop/motif regions and structural information is used in core structural regions. All of the hybrid methods offer significant improvement over simple profile-to-profile alignment. We demonstrate that both sequence-based and structure-based profiles contribute to remote homology detection and alignment accuracy, and that each contains some unique information. We discuss the implications of these results for further improvements in amino acid sequence and structural analysis. 相似文献

11.

BLSSpeller to discover novel regulatory motifs in maize

Razgar Seyed Rahmani Dries Decap Jan Fostier Kathleen Marchal 《DNA research》2022,29(4)

相似文献

12.

A Systems Biology Approach to Transcription Factor Binding Site Prediction

Xiang Zhou Pavel Sumazin Presha Rajbhandari Andrea Califano 《PloS one》2010,5(3)

相似文献

13.

iGibbs: improving Gibbs motif sampler for proteins by sequence clustering and iterative pattern sampling

Kim S Wang Z Dalkilic M 《Proteins》2007,66(3):671-681

The motif prediction problem is to predict short, conserved subsequences that are part of a family of sequences, and it is a very important biological problem. Gibbs is one of the first successful motif algorithms and it runs very fast compared with other algorithms, and its search behavior is based on the well-studied Gibbs random sampling. However, motif prediction is a very difficult problem and Gibbs may not predict true motifs in some cases. Thus, the authors explored a possibility of improving the prediction accuracy of Gibbs while retaining its fast runtime performance. In this paper, the authors considered Gibbs only for proteins, not for DNA binding sites. The authors have developed iGibbs, an integrated motif search framework for proteins that employs two previous techniques of their own: one for guiding motif search by clustering sequences and another by pattern refinement. These two techniques are combined to a new double clustering approach to guiding motif search. The unique feature of their framework is that users do not have to specify the number of motifs to be predicted when motifs occur in different subsets of the input sequences since it automatically clusters input sequences into clusters and predict motifs from the clusters. Tests on the PROSITE database show that their framework improved the prediction accuracy of Gibbs significantly. Compared with more exhaustive search methods like MEME, iGibbs predicted motifs more accurately and runs one order of magnitude faster. 相似文献

14.

Discriminative motif discovery in DNA and protein sequences using the DEME algorithm

Emma Redhead Timothy L Bailey 《BMC bioinformatics》2007,8(1):385

相似文献

15.

RNA 3D Modules in Genome-Wide Predictions of RNA 2D Structure

Corinna Theis Craig L. Zirbel Christian H?ner zu Siederdissen Christian Anthon Ivo L. Hofacker Henrik Nielsen Jan Gorodkin 《PloS one》2015,10(10)

Recent experimental and computational progress has revealed a large potential for RNA structure in the genome. This has been driven by computational strategies that exploit multiple genomes of related organisms to identify common sequences and secondary structures. However, these computational approaches have two main challenges: they are computationally expensive and they have a relatively high false discovery rate (FDR). Simultaneously, RNA 3D structure analysis has revealed modules composed of non-canonical base pairs which occur in non-homologous positions, apparently by independent evolution. These modules can, for example, occur inside structural elements which in RNA 2D predictions appear as internal loops. Hence one question is if the use of such RNA 3D information can improve the prediction accuracy of RNA secondary structure at a genome-wide level. Here, we use RNAz in combination with 3D module prediction tools and apply them on a 13-way vertebrate sequence-based alignment. We find that RNA 3D modules predicted by metaRNAmodules and JAR3D are significantly enriched in the screened windows compared to their shuffled counterparts. The initially estimated FDR of 47.0% is lowered to below 25% when certain 3D module predictions are present in the window of the 2D prediction. We discuss the implications and prospects for further development of computational strategies for detection of RNA 2D structure in genomic sequence. 相似文献

16.

CoMoDis: composite motif discovery in mammalian genomes

Donaldson IJ Göttgens B 《Nucleic acids research》2007,35(1):e1

相似文献

17.

GAME: detecting cis-regulatory elements using a genetic algorithm 总被引：3，自引：0，他引：3

Wei Z Jensen ST 《Bioinformatics (Oxford, England)》2006,22(13):1577-1584

相似文献

18.

Computational discovery of soybean promoter cis‐regulatory elements for the construction of soybean cyst nematode‐inducible synthetic promoters

Wusheng Liu Mitra Mazarei Yanhui Peng Michael H. Fethe Mary R. Rudis Jingyu Lin Reginald J. Millwood Prakash R. Arelli Charles Neal Stewart Jr. 《Plant biotechnology journal》2014,12(8):1015-1026

Computational methods offer great hope but limited accuracy in the prediction of functional cis‐regulatory elements; improvements are needed to enable synthetic promoter design. We applied an ensemble strategy for de novo soybean cyst nematode (SCN)‐inducible motif discovery among promoters of 18 co‐expressed soybean genes that were selected from six reported microarray studies involving a compatible soybean–SCN interaction. A total of 116 overlapping motif regions (OMRs) were discovered bioinformatically that were identified by at least four out of seven bioinformatic tools. Using synthetic promoters, the inducibility of each OMR or motif itself was evaluated by co‐localization of gain of function of an orange fluorescent protein reporter and the presence of SCN in transgenic soybean hairy roots. Among 16 OMRs detected from two experimentally confirmed SCN‐inducible promoters, 11 OMRs (i.e. 68.75%) were experimentally confirmed to be SCN‐inducible, leading to the discovery of 23 core motifs of 5‐ to 7‐bp length, of which 14 are novel in plants. We found that a combination of the three best tools (i.e. SCOPE, W‐AlignACE and Weeder) could detect all 23 core motifs. Thus, this strategy is a high‐throughput approach for de novo motif discovery in soybean and offers great potential for novel motif discovery and synthetic promoter engineering for any plant and trait in crop biotechnology. 相似文献

19.

Data augmentation algorithms for detecting conserved domains in protein sequences: a comparative study

Bi C 《Journal of proteome research》2008,7(1):192-201

Protein conserved domains are distinct units of molecular structure, usually associated with particular aspects of molecular function such as catalysis or binding. These conserved subsequences are often unobserved and thus in need of detection. Motif discovery methods can be used to find these unobserved domains given a set of sequences. This paper presents the data augmentation (DA) framework that unifies a suite of motif-finding algorithms through maximizing the same likelihood function by imputing the unobserved data. The data augmentation refers to those methods that formulate iterative optimization by exploiting the unobserved data. Two categories of maximum likelihood based motif-finding algorithms are illustrated under the DA framework. The first is the deterministic algorithms that are to maximize the likelihood function by performing an iteratively optimal local search in the alignment space. The second is the stochastic algorithms that are to iteratively draw motif location samples via Monte Carlo simulation and simultaneously keep track of the superior solution with the best likelihood. As a result, four DA motif discovery algorithms are described, evaluated, and compared by aligning real and simulated protein sequences. 相似文献

20.

Finding motifs using random projections. 总被引：19，自引：0，他引：19

Jeremy Buhler Martin Tompa 《Journal of computational biology》2002,9(2):225-242

相似文献