期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Fast, sensitive discovery of conserved genome-wide motifs

Ihuegbu NE Stormo GD Buhler J 《Journal of computational biology》2012,19(2):139-147

Regulatory sites that control gene expression are essential to the proper functioning of cells, and identifying them is critical for modeling regulatory networks. We have developed Magma (Multiple Aligner of Genomic Multiple Alignments), a software tool for multiple species, multiple gene motif discovery. Magma identifies putative regulatory sites that are conserved across multiple species and occur near multiple genes throughout a reference genome. Magma takes as input multiple alignments that can include gaps. It uses efficient clustering methods that make it about 70 times faster than PhyloNet, a previous program for this task, with slightly greater sensitivity. We ran Magma on all non-coding DNA conserved between Caenorhabditis elegans and five additional species, about 70 Mbp in total, in <4 h. We obtained 2,309 motifs with lengths of 6-20 bp, each occurring at least 10 times throughout the genome, which collectively covered about 566 kbp of the genomes, approximately 0.8% of the input. Predicted sites occurred in all types of non-coding sequence but were especially enriched in the promoter regions. Comparisons to several experimental datasets show that Magma motifs correspond to a variety of known regulatory motifs. 相似文献

2.

PSSMTS: position specific scoring matrices on tree structures

Sato K Morita K Sakakibara Y 《Journal of mathematical biology》2008,56(1-2):201-214

Identifying non-coding RNA regions on the genome using computational methods is currently receiving a lot of attention. In general, it is essentially more difficult than the problem of detecting protein-coding genes because non-coding RNA regions have only weak statistical signals. On the other hand, most functional RNA families have conserved sequences and secondary structures which are characteristic of their molecular function in a cell. These are known as sequence motifs and consensus structures, respectively. In this paper, we propose an improved method which extends a pairwise structural alignment method for RNA sequences to handle position specific scoring matrices and hence to incorporate motifs into structural alignment of RNA sequences. To model sequence motifs, we employ position specific scoring matrices (PSSMs). Experimental results show that PSSMs enable us to find individual RNA families efficiently, especially if we have biological knowledge such as sequence motifs. K. Sato and K. Morita contributed equally to this work. 相似文献

3.

A structure-based flexible search method for motifs in RNA.

Isana Veksler-Lublinsky Michal Ziv-Ukelson Danny Barash Klara Kedem 《Journal of computational biology》2007,14(7):908-926

相似文献

4.

Combining phylogenetic data with co-regulated genes to identify regulatory motifs 总被引：17，自引：0，他引：17

Wang T Stormo GD 《Bioinformatics (Oxford, England)》2003,19(18):2369-2380

MOTIVATION: Discovery of regulatory motifs in unaligned DNA sequences remains a fundamental problem in computational biology. Two categories of algorithms have been developed to identify common motifs from a set of DNA sequences. The first can be called a 'multiple genes, single species' approach. It proposes that a degenerate motif is embedded in some or all of the otherwise unrelated input sequences and tries to describe a consensus motif and identify its occurrences. It is often used for co-regulated genes identified through experimental approaches. The second approach can be called 'single gene, multiple species'. It requires orthologous input sequences and tries to identify unusually well conserved regions by phylogenetic footprinting. Both approaches perform well, but each has some limitations. It is tempting to combine the knowledge of co-regulation among different genes and conservation among orthologous genes to improve our ability to identify motifs. RESULTS: Based on the Consensus algorithm previously established by our group, we introduce a new algorithm called PhyloCon (Phylogenetic Consensus) that takes into account both conservation among orthologous genes and co-regulation of genes within a species. This algorithm first aligns conserved regions of orthologous sequences into multiple sequence alignments, or profiles, then compares profiles representing non-orthologous sequences. Motifs emerge as common regions in these profiles. Here we present a novel statistic to compare profiles of DNA sequences and a greedy approach to search for common subprofiles. We demonstrate that PhyloCon performs well on both synthetic and biological data. AVAILABILITY: Software available upon request from the authors. http://ural.wustl.edu/softwares.html 相似文献

5.

Informatic Resources for Identifying and Annotating Structural RNA Motifs

George AD Tenenbaum SA 《Molecular biotechnology》2009,41(2):180-193

相似文献

6.

Novel GACG-hairpin pair motif in the 5'' untranslated region of type C retroviruses related to murine leukemia virus. 总被引：6，自引：5，他引：1

下载免费PDF全文

D A Konings M A Nash J V Maizel R B Arlinghaus 《Journal of virology》1992,66(2):632-640

We searched for the presence of common RNA structural motifs in mammalian type C retroviruses related to murine leukemia viruses and the closely related avian spleen necrosis virus. A novel motif consisting of a pair of hairpins, called hairpin pair motif, was detected in the 5' untranslated regions of the genomes of these retroviruses. A combination of computational analyses that included the assessment of phylogenetic sequence conservation by multiple alignment, the search for regions with unusual RNA folding properties, and the analysis of RNA secondary structure by suboptimal free-energy calculations highlighted the significance of this hairpin pair motif. The hairpin pair motif encompasses 70 to 80 nucleotides between the splice donor site and the gag translational initiation codon of these viruses. The motif is composed of two adjacent hairpins both with a perfectly conserved GACG tetraloop. We propose that the novel GACG-hairpin pair motif described here constitutes an essential component of the regulatory machinery in these type C retroviruses. 相似文献

7.

CMfinder--a covariance model based RNA motif finding algorithm 总被引：5，自引：0，他引：5

Yao Z Weinberg Z Ruzzo WL 《Bioinformatics (Oxford, England)》2006,22(4):445-452

相似文献

8.

Evidence-ranked motif identification

Stoyan Georgiev Alan P Boyle Karthik Jayasurya Xuan Ding Sayan Mukherjee 《Genome biology》2010,11(2):R19

cERMIT is a computationally efficient motif discovery tool based on analyzing genome-wide quantitative regulatory evidence. Instead of pre-selecting promising candidate sequences, it utilizes information across all sequence regions to search for high-scoring motifs. We apply cERMIT on a range of direct binding and overexpression datasets; it substantially outperforms state-of-the-art approaches on curated ChIP-chip datasets, and easily scales to current mammalian ChIP-seq experiments with data on thousands of non-coding regions. 相似文献

9.

A network of conserved co-occurring motifs for the regulation of alternative splicing

Mikita Suyama Eoghan D. Harrington Svetlana Vinokourova Magnus von Knebel Doeberitz Osamu Ohara Peer Bork 《Nucleic acids research》2010,38(22):7916-7926

Cis-acting short sequence motifs play important roles in alternative splicing. It is now possible to identify such sequence motifs as conserved sequence patterns in genome sequence alignments. Here, we report the systematic search for motifs in the neighboring introns of alternatively spliced exons by using comparative analysis of mammalian genome alignments. We identified 11 conserved sequence motifs that might be involved in the regulation of alternative splicing. These motifs are not only significantly overrepresented near alternatively spliced exons, but they also co-occur with each other, thus, forming a network of cis-elements, likely to be the basis for context-dependent regulation. Based on this finding, we applied the motif co-occurrence to predict alternatively skipped exons. We verified exon skipping in 29 cases out of 118 predictions (25%) by EST and mRNA sequences in the databases. For the predictions not verified by the database sequences, we confirmed exon skipping in 10 additional cases by using both RT–PCR experiments and the publicly available RNA-Seq data. These results indicate that even more alternative splicing events will be found with the progress of large-scale and high-throughput analyses for various tissue samples and developmental stages. 相似文献

10.

Predicting candidate genomic sequences that correspond to synthetic functional RNA motifs

下载免费PDF全文

Laserson U Gan HH Schlick T 《Nucleic acids research》2005,33(18):6057-6069

Riboswitches and RNA interference are important emerging mechanisms found in many organisms to control gene expression. To enhance our understanding of such RNA roles, finding small regulatory motifs in genomes presents a challenge on a wide scale. Many simple functional RNA motifs have been found by in vitro selection experiments, which produce synthetic target-binding aptamers as well as catalytic RNAs, including the hammerhead ribozyme. Motivated by the prediction of Piganeau and Schroeder [(2003) Chem. Biol., 10, 103–104] that synthetic RNAs may have natural counterparts, we develop and apply an efficient computational protocol for identifying aptamer-like motifs in genomes. We define motifs from the sequence and structural information of synthetic aptamers, search for sequences in genomes that will produce motif matches, and then evaluate the structural stability and statistical significance of the potential hits. Our application to aptamers for streptomycin, chloramphenicol, neomycin B and ATP identifies 37 candidate sequences (in coding and non-coding regions) that fold to the target aptamer structures in bacterial and archaeal genomes. Further energetic screening reveals that several candidates exhibit energetic properties and sequence conservation patterns that are characteristic of functional motifs. Besides providing candidates for experimental testing, our computational protocol offers an avenue for expanding natural RNA's functional repertoire. 相似文献

11.

cis-Acting Signals in Bromovirus RNA Replication and Gene Expression: Networking with Viral Proteins and Host Factors

《Seminars in Virology》1997,8(3):221-230

Bromoviruses are representative members of the alphavirus-like superfamily of animal and plant positive-strand RNA viruses. Tractable biochemical and genetic features have made bromoviruses useful systems forin vivoandin vitrostudies ofcis-acting RNA sequences andtrans-acting factors in RNA replication, subgenomic mRNA synthesis, translation, encapsidation, and virus–host interactions. Among other findings, bromoviruscis-acting RNA replication signals are large, structurally complex, and conserve potential conformational switches that may coordinate RNA replication with other infection processes. The tRNA-like 3′ ends of bromovirus RNAs are required for negative-strand synthesis and recognized by multiple tRNA-specific host enzymes. The presence of additional host regulatory sequence motifs in other bromoviruscis-acting regions suggests that their function also involves interaction with host as well as viral factors. 相似文献

12.

Functional Analysis of the Core Human Immunodeficiency Virus Type 1 Packaging Signal in a Permissive Cell Line 总被引：7，自引：5，他引：2

下载免费PDF全文

Geoffrey P. Harrison Gino Miele Eric Hunter Andrew M. L. Lever 《Journal of virology》1998,72(7):5886-5896

Packaging of type C retrovirus genomic RNAs into budding virions requires a highly specific interaction between the viral Gag precursor and unique cis-acting packaging signals on the full-length RNA genome, allowing the selection of this RNA species from among a pool of spliced viral RNAs and similar cellular RNAs. This process is thought to involve RNA secondary and tertiary structural motifs since there is little conservation of the primary sequence of this region between retroviruses. To confirm RNA secondary structures, which we and others have predicted for this region, disruptive, compensatory, and deletion mutations were introduced into proviral constructs, which were then assayed in a permissive cell line. Disruption of either of two predicted stem-loops was found to greatly reduce RNA encapsidation and replication, whereas compensatory mutations restoring base pairing to these stem-loops had a wild-type phenotype. A GGNGR motif was identified in the loops of three hairpins in this region. Results were consistent with the hypothesis that the process of efficient RNA encapsidation is linked to dimerization. Replication and encapsidation were shown to occur at a reduced rate in the absence of the previously described kissing hairpin motif. 相似文献

13.

A Monte Carlo EM Algorithm for De Novo Motif Discovery in Biomolecular Sequences 总被引：1，自引：0，他引：1

Bi Chengpeng 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2009,6(3):370-386

Motif discovery methods play pivotal roles in deciphering the genetic regulatory codes (i.e., motifs) in genomes as well as in locating conserved domains in protein sequences. The Expectation Maximization (EM) algorithm is one of the most popular methods used in de novo motif discovery. Based on the position weight matrix (PWM) updating technique, this paper presents a Monte Carlo version of the EM motif-finding algorithm that carries out stochastic sampling in local alignment space to overcome the conventional EM's main drawback of being trapped in a local optimum. The newly implemented algorithm is named as Monte Carlo EM Motif Discovery Algorithm (MCEMDA). MCEMDA starts from an initial model, and then it iteratively performs Monte Carlo simulation and parameter update until convergence. A log-likelihood profiling technique together with the top-k strategy is introduced to cope with the phase shifts and multiple modal issues in motif discovery problem. A novel grouping motif alignment (GMA) algorithm is designed to select motifs by clustering a population of candidate local alignments and successfully applied to subtle motif discovery. MCEMDA compares favorably to other popular PWM-based and word enumerative motif algorithms tested using simulated (l, d)-motif cases, documented prokaryotic, and eukaryotic DNA motif sequences. Finally, MCEMDA is applied to detect large blocks of conserved domains using protein benchmarks and exhibits its excellent capacity while compared with other multiple sequence alignment methods. 相似文献

14.

Using RNA secondary structures to guide sequence motif finding towards single-stranded regions 总被引：2，自引：0，他引：2

Hiller M Pudimat R Busch A Backofen R 《Nucleic acids research》2006,34(17):e117

RNA binding proteins recognize RNA targets in a sequence specific manner. Apart from the sequence, the secondary structure context of the binding site also affects the binding affinity. Binding sites are often located in single-stranded RNA regions and it was shown that the sequestration of a binding motif in a double-strand abolishes protein binding. Thus, it is desirable to include knowledge about RNA secondary structures when searching for the binding motif of a protein. We present the approach MEMERIS for searching sequence motifs in a set of RNA sequences and simultaneously integrating information about secondary structures. To abstract from specific structural elements, we precompute position-specific values measuring the single-strandedness of all substrings of an RNA sequence. These values are used as prior knowledge about the motif starts to guide the motif search. Extensive tests with artificial and biological data demonstrate that MEMERIS is able to identify motifs in single-stranded regions even if a stronger motif located in double-strand parts exists. The discovered motif occurrences in biological datasets mostly coincide with known protein-binding sites. This algorithm can be used for finding the binding motif of single-stranded RNA-binding proteins in SELEX or other biological sequence data. 相似文献

15.

Finding motifs from all sequences with and without binding sites

Leung HC Chin FY 《Bioinformatics (Oxford, England)》2006,22(18):2217-2223

相似文献

16.

rMotifGen: random motif generator for DNA and protein sequences

Eric C Rouchka C Timothy Hardin 《BMC bioinformatics》2007,8(1):292

相似文献

17.

Analyses of the functional regions of DEAD-box RNA "helicases" with deletion and chimera constructs tested in vivo and in vitro

Banroques J Cordin O Doère M Linder P Tanner NK 《Journal of molecular biology》2011,413(2):451-472

The DEAD-box family of putative RNA helicases is composed of ubiquitous proteins that are found in nearly all organisms and that are involved in virtually all processes involving RNA. They are characterized by two tandemly linked, RecA-like domains that contain 11 conserved motifs and highly variable amino- and carboxy-terminal flanking sequences. For this reason, they are often considered to be modular multi-domain proteins. We tested this by making extensive BLASTs and sequence alignments to elucidate the minimal functional unit in nature. We then used this information to construct chimeras and deletions of six essential yeast proteins that were assayed in vivo. We purified many of the different constructs and characterized their biochemical properties in vitro. We found that sequence elements can only be switched between closely related proteins and that the carboxy-terminal sequences are important for high ATPase and strand displacement activities and for high RNA binding affinity. The amino-terminal elements were often toxic when overexpressed in vivo, and they may play regulatory roles. Both the amino and the carboxyl regions have a high frequency of sequences that are predicted to be intrinsically disordered, indicating that the flanking regions do not form distinct modular domains but probably assume an ordered structure with ligand binding. Finally, the minimal functional unit of the DEAD-box core starts two amino acids before the isolated phenylalanine of the Q motif and extends to about 35 residues beyond motif VI. These experiments provide evidence for how a highly conserved structural domain can be adapted to different cellular needs. 相似文献

18.

A profile-based deterministic sequential Monte Carlo algorithm for motif discovery

Liang KC Wang X Anastassiou D 《Bioinformatics (Oxford, England)》2008,24(1):46-55

相似文献

19.

A generic motif discovery algorithm for sequential data

Jensen KL Styczynski MP Rigoutsos I Stephanopoulos GN 《Bioinformatics (Oxford, England)》2006,22(1):21-28

MOTIVATION: Motif discovery in sequential data is a problem of great interest and with many applications. However, previous methods have been unable to combine exhaustive search with complex motif representations and are each typically only applicable to a certain class of problems. RESULTS: Here we present a generic motif discovery algorithm (Gemoda) for sequential data. Gemoda can be applied to any dataset with a sequential character, including both categorical and real-valued data. As we show, Gemoda deterministically discovers motifs that are maximal in composition and length. As well, the algorithm allows any choice of similarity metric for finding motifs. Finally, Gemoda's output motifs are representation-agnostic: they can be represented using regular expressions, position weight matrices or any number of other models for any type of sequential data. We demonstrate a number of applications of the algorithm, including the discovery of motifs in amino acids sequences, a new solution to the (l,d)-motif problem in DNA sequences and the discovery of conserved protein substructures. AVAILABILITY: Gemoda is freely available at http://web.mit.edu/bamel/gemoda 相似文献

20.

Mining frequent stem patterns from unaligned RNA sequences 总被引：1，自引：0，他引：1

Hamada M Tsuda K Kudo T Kin T Asai K 《Bioinformatics (Oxford, England)》2006,22(20):2480-2487

MOTIVATION: In detection of non-coding RNAs, it is often necessary to identify the secondary structure motifs from a set of putative RNA sequences. Most of the existing algorithms aim to provide the best motif or few good motifs, but biologists often need to inspect all the possible motifs thoroughly. RESULTS: Our method RNAmine employs a graph theoretic representation of RNA sequences and detects all the possible motifs exhaustively using a graph mining algorithm. The motif detection problem boils down to finding frequently appearing patterns in a set of directed and labeled graphs. In the tasks of common secondary structure prediction and local motif detection from long sequences, our method performed favorably both in accuracy and in efficiency with the state-of-the-art methods such as CMFinder. AVAILABILITY: The software is available upon request. 相似文献