共查询到20条相似文献,搜索用时 15 毫秒
1.
Finding the most significant common sequence and structure motifs in a set of RNA sequences. 总被引:12,自引:4,他引:12
下载免费PDF全文

We present a computational scheme to locally align a collection of RNA sequences using sequence and structure constraints. In addition, the method searches for the resulting alignments with the most significant common motifs, among all possible collections. The first part utilizes a simplified version of the Sankoff algorithm for simultaneous folding and alignment of RNA sequences, but maintains tractability by constructing multi-sequence alignments from pairwise comparisons. The algorithm finds the multiple alignments using a greedy approach and has similarities to both CLUSTAL and CONSENSUS, but the core algorithm assures that the pairwise alignments are optimized for both sequence and structure conservation. The choice of scoring system and the method of progressively constructing the final solution are important considerations that are discussed. Example solutions, and comparisons with other approaches, are provided. The solutions include finding consensus structures identical to published ones. 相似文献
2.
3.
Carvalho AM Freitas AT Oliveira AL Sagot MF 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2006,3(2):126-140
We propose a new algorithm for identifying cis-regulatory modules in genomic sequences. The proposed algorithm, named RISO, uses a new data structure, called box-link, to store the information about conserved regions that occur in a well-ordered and regularly spaced manner in the data set sequences. This type of conserved regions, called structured motifs, is extremely relevant in the research of gene regulatory mechanisms since it can effectively represent promoter models. The complexity analysis shows a time and space gain over the best known exact algorithms that is exponential in the spacings between binding sites. A full implementation of the algorithm was developed and made available online. Experimental results show that the algorithm is much faster than existing ones, sometimes by more than four orders of magnitude. The application of the method to biological data sets shows its ability to extract relevant consensi. 相似文献
4.
5.
In phylogenetic foot-printing, putative regulatory elements are found in upstream regions of orthologous genes by searching for common motifs. Motifs in different upstream sequences are subject to mutations along the edges of the corresponding phylogenetic tree, consequently taking advantage of the tree in the motif search is an appealing idea. We describe the Motif Yggdrasil sampler; the first Gibbs sampler based on a general tree that uses unaligned sequences. Previous tree-based Gibbs samplers have assumed a star-shaped tree or partially aligned upstream regions. We give a probabilistic model (MY model) describing upstream sequences with regulatory elements and build a Gibbs sampler with respect to this model. The model allows toggling, i.e., the restriction of a position to a subset of nucleotides, but does not require aligned sequences nor edge lengths, which may be difficult to come by. We apply the collapsing technique to eliminate the need to sample nuisance parameters, and give a derivation of the predictive update formula. We show that the MY model improves the modeling of difficult motif instances and that the use of the tree achieves a substantial increase in nucleotide level correlation coefficient both for synthetic data and 37 bacterial lexA genes. We investigate the sensitivity to errors in the tree and show that using random trees MY sampler still has a performance similar to the original version. 相似文献
6.
Mehmet Serkan Apaydin Douglas L Brutlag Carlos Guestrin David Hsu Jean-Claude Latombe Chris Varma 《Journal of computational biology》2003,10(3-4):257-281
Classic molecular motion simulation techniques, such as Monte Carlo (MC) simulation, generate motion pathways one at a time and spend most of their time in the local minima of the energy landscape defined over a molecular conformation space. Their high computational cost prevents them from being used to compute ensemble properties (properties requiring the analysis of many pathways). This paper introduces stochastic roadmap simulation (SRS) as a new computational approach for exploring the kinetics of molecular motion by simultaneously examining multiple pathways. These pathways are compactly encoded in a graph, which is constructed by sampling a molecular conformation space at random. This computation, which does not trace any particular pathway explicitly, circumvents the local-minima problem. Each edge in the graph represents a potential transition of the molecule and is associated with a probability indicating the likelihood of this transition. By viewing the graph as a Markov chain, ensemble properties can be efficiently computed over the entire molecular energy landscape. Furthermore, SRS converges to the same distribution as MC simulation. SRS is applied to two biological problems: computing the probability of folding, an important order parameter that measures the "kinetic distance" of a protein's conformation from its native state; and estimating the expected time to escape from a ligand-protein binding site. Comparison with MC simulations on protein folding shows that SRS produces arguably more accurate results, while reducing computation time by several orders of magnitude. Computational studies on ligand-protein binding also demonstrate SRS as a promising approach to study ligand-protein interactions. 相似文献
7.
We present a fast algorithm to produce a graphic matrix representationof sequence homology. The algorithm is based on lexicographicalordering of fragments. It preserves most of the options of asimple naive algorithm with a significant increase in speed.This algorithm was the basis for a program, called DNAMAT, thathas been extensively tested during the last three years at theWeizmann Institute of Science and has proven to be very useful.In addition we suggest a way to extend our approach to analysea series of related DNA or RNA sequences, in order to determinecertain common structural features. The analysis is done bysumming a set of dot-matrices to produce an overallmatrix that displays structural elements common to most of thesequences. We give an example of this procedure by analysingtRNA sequences. Received on June 26, 1986; accepted on September 28, 1986 相似文献
8.
Optimal reconstruction of a sequence from its probes. 总被引:4,自引:0,他引:4
An important combinatorial problem, motivated by DNA sequencing in molecular biology, is the reconstruction of a sequence over a small finite alphabet from the collection of its probes (the sequence spectrum), obtained by sliding a fixed sampling pattern over the sequence. Such construction is required for Sequencing-by-Hybridization (SBH), a novel DNA sequencing technique based on an array (SBH chip) of short nucleotide sequences (probes). Once the sequence spectrum is biochemically obtained, a combinatorial method is used to reconstruct the DNA sequence from its spectrum. Since technology limits the number of probes on the SBH chip, a challenging combinatorial question is the design of a smallest set of probes that can sequence an arbitrary DNA string of a given length. We present in this work a novel probe design, crucially based on the use of universal bases [bases that bind to any nucleotide (Loakes and Brown, 1994)] that drastically improves the performance of the SBH process and asymptotically approaches the information-theoretic bound up to a constant factor. Furthermore, the sequencing algorithm we propose is substantially simpler than the Eulerian path method used in previous solutions of this problem. 相似文献
9.
Rapid and efficient protocol for DNA extraction and molecular identification of the basidiomycete Crinipellis perniciosa 总被引:1,自引:0,他引:1
DNA isolation from some fungal organisms is difficult because they have cell walls or capsules that are relatively unsusceptible to lysis. Beginning with a yeast Saccharomyces cerevisiae genomic DNA isolation method, we developed a 30-min DNA isolation protocol for filamentous fungi by combining cell wall digestion with cell disruption by glass beads. High-quality DNA was isolated with good yield from the hyphae of Crinipellis perniciosa, which causes witches' broom disease in cacao, from three other filamentous fungi, Lentinus edodes, Agaricus blazei, Trichoderma stromaticum, and from the yeast S. cerevisiae. Genomic DNA was suitable for PCR of specific actin primers of C. perniciosa, allowing it to be differentiated from fungal contaminants, including its natural competitor, T. stromaticum. 相似文献
10.
11.
Here we present an algorithm designed to carry out multiple structure alignment and to detect recurring substructural motifs. So far we have implemented it for comparison of protein structures. However, this general method is applicable to comparisons of RNA structures and to detection of a pharmacophore in a series of drug molecules. Further, its sequence order independence permits its application to detection of motifs on protein surfaces, interfaces, and binding/active sites. While there are many methods designed to carry out pairwise structure comparisons, there are only a handful geared toward the multiple structure alignment task. Most of these tackle multiple structure comparison as a collection of pairwise structure comparison tasks. The multiple structural alignment algorithm presented here automatically finds the largest common substructure (core) of atoms that appears in all the molecules in the ensemble. The detection of the core and the structural alignment are done simultaneously. The algorithm begins by finding small substructures that are common to all the proteins in the ensemble. One of the molecules is considered the reference; the others are the source molecules. The small substructures are stored in special arrays termed combinatorial buckets, which define sets of multistructural alignments from the source molecules that coincide with the same small set of reference atoms (C(alpha)-atoms here). These substructures are initial small fragments that have congruent copies in each of the proteins. The substructures are extended, through the processing of the combinatorial buckets, by clustering the superpositions (transformations). The method is very efficient. 相似文献
12.
Storage of sequence data is a big concern as the amount of data generated is exponential in nature at several locations. Therefore,there is a need to develop techniques to store data using compression algorithm. Here we describe optimal storage algorithm(OPTSDNA) for storing large amount of DNA sequences of varying length. This paper provides performance analysis of optimalstorage algorithm (OPTSDNA) of a distributed bioinformatics computing system for analysis of DNA sequences. OPTSDNAalgorithm is used for storing various sizes of DNA sequences into database. DNA sequences of different lengths were stored byusing this algorithm. These input DNA sequences are varied in size from very small to very large. Storage size is calculated by thisalgorithm. Response time is also calculated in this work. The efficiency and performance of the algorithm is high (in size calculationwith percentage) when compared with other known with sequential approach. 相似文献
13.
OBSTRUCT: a program to obtain largest cliques from a protein sequence set according to structural resolution and sequence similarity 总被引:4,自引:0,他引:4
Heringa Jaap; Sommerfeldt Hubert; Higgins Desmond; Argos Patrick 《Bioinformatics (Oxford, England)》1992,8(6):599-600
A program OBSTRUCT has been developed to obtain the largestpossible subset according to specific constraints from a setof protein sequences whose tertiary structures have been determinedcrystallographically. The user can request a range in sequencesimilarity level and/or structural resolution. The program optionallyincludes sequences with known threedimensional foldselicited from NMR data. 相似文献
14.
15.
Study of bread wheat (Triticum aestivum) may help to resolve several questions related to polyploid evolution. One such question regards the possibility that the component genomes of polyploids may themselves be polyphyletic, resulting from hybridization and introgression among different polyploid species sharing a single genome. We used the B genome of wheat as a model system to test hypotheses that bear on the monophyly or polyphyly of the individual constituent genomes. By using aneuploid wheat stocks, combined with PCR-based cloning strategies, we cloned and sequenced two single-copy-DNA sequences from each of the seven chromosomes of the wheat B genome and the homologous sequences from representatives of the five diploid species in section Sitopsis previously suggested as sister groups to the B genome. Phylogenetic comparisons of sequence data suggested that the B genome of wheat underwent a genetic bottleneck and has diverged from the diploid B genome donor. The extent of genetic diversity among the Sitopsis diploids and the failure of any of the Sitopsis species to group with the wheat B genome indicated that these species have also diverged from the ancestral B genome donor. Our results support monophyly of the wheat B genome. 相似文献
16.
Giuseppina Di Fruscio Angela Schulz Rossella De Cegli Marco Savarese Margherita Mutarelli Giancarlo Parenti Sandro Banfi Thomas Braulke Vincenzo Nigro Andrea Ballabio 《Autophagy》2015,11(6):928-938
The autophagy-lysosomal pathway (ALP) regulates cell homeostasis and plays a crucial role in human diseases, such as lysosomal storage disorders (LSDs) and common neurodegenerative diseases. Therefore, the identification of DNA sequence variations in genes involved in this pathway and their association with human diseases would have a significant impact on health. To this aim, we developed Lysoplex, a targeted next-generation sequencing (NGS) approach, which allowed us to obtain a uniform and accurate coding sequence coverage of a comprehensive set of 891 genes involved in lysosomal, endocytic, and autophagic pathways. Lysoplex was successfully validated on 14 different types of LSDs and then used to analyze 48 mutation-unknown patients with a clinical phenotype of neuronal ceroid lipofuscinosis (NCL), a genetically heterogeneous subtype of LSD. Lysoplex allowed us to identify pathogenic mutations in 67% of patients, most of whom had been unsuccessfully analyzed by several sequencing approaches. In addition, in 3 patients, we found potential disease-causing variants in novel NCL candidate genes. We then compared the variant detection power of Lysoplex with data derived from public whole exome sequencing (WES) efforts. On average, a 50% higher number of validated amino acid changes and truncating variations per gene were identified. Overall, we identified 61 truncating sequence variations and 488 missense variations with a high probability to cause loss of function in a total of 316 genes. Interestingly, some loss-of-function variations of genes involved in the ALP pathway were found in homozygosity in the normal population, suggesting that their role is not essential. Thus, Lysoplex provided a comprehensive catalog of sequence variants in ALP genes and allows the assessment of their relevance in cell biology as well as their contribution to human disease. 相似文献
17.
Rose HL Dewey CA Ely MS Willoughby SL Parsons TM Cox V Spencer PM Weller SA 《PloS one》2011,6(7):e22668
Eight DNA extraction products or methods (Applied Biosystems PrepFiler Forensic DNA Extraction Kit; Bio-Rad Instagene Only, Bio-Rad Instagene & Spin Column Purification; EpiCentre MasterPure DNA & RNA Kit; FujiFilm QuickGene Mini80; Idaho Technologies 1-2-3 Q-Flow Kit; MoBio UltraClean Microbial DNA Isolation Kit; Sigma Extract-N-Amp Plant and Seed Kit) were adapted to facilitate extraction of DNA under BSL3 containment conditions. DNA was extracted from 12 common interferents or sample types, spiked with spores of Bacillus atropheaus. Resulting extracts were tested by real-time PCR. No one method was the best, in terms of DNA extraction, across all sample types. Statistical analysis indicated that the PrepFiler method was the best method from six dry powders (baking, biological washing, milk, plain flour, filler and talcum) and one solid (Underarm deodorant), the UltraClean method was the best from four liquids (aftershave, cola, nutrient broth, vinegar), and the MasterPure method was the best from the swab sample type. The best overall method, in terms of DNA extraction, across all sample types evaluated was the UltraClean method. 相似文献
18.
I. Métais C. Aubry B. Hamon D. Peltier R. Jalouzot 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》1998,97(1-2):232-237
We describe the cloning and the characterization of a 130-bp DNA fragment, called OPG9-130, amplified from bean (Phaseolus vulgaris L.) genomic DNA. This fragment corresponds to a minisatellite DNA sequence containing seven repeats of 15 bp which differ
slightly from each other in their sequence. Southern analysis showed that the core sequence of 15 bp is repeated in clusters
dispersed throughout the genome. The use of this fragment as a probe allowed us to identify common bean lines by their DNA
fingerprints. We suggest that OPG9-130 will be useful for line identification as well as for the analysis of genetic relatedness
between bean species and lines.
Received: 14 February 1998 / Accepted: 10 February 1998 相似文献
19.
20.
Janna Radtke Steven Dooley Nikolaus Blin Gerhard Unteregger 《Molecular biology reports》1991,15(2):87-92
We investigated DNA-protein-interactions occurring in the promoter region of c-fos using two-dimensional electrophoresis and south-western-blotting. When nuclear extracts from the human glioblastoma cell line HeRoSV were tested for their DNA-binding behaviour to a 650 bp-fragment within the promoter region of c-fos, we found 4 proteins designated as 120/6.6, 75/5.4, 65/6.4 55/5.0 interacting with this fragment. An additional protein 60/6.0 was detected by using a digoxygenine-labelled probe. These observations let us to assume that beside the well characterized SRF and FOS-JUN proteins additional factors recognize the promoter sequence and may play a role in c-fos regulation.Abbreviations DRE
direct repeat element
- DSE
dyad symmetry element
- DTE
Dithiothreitol
- EGF
Epidermal growth factor
- FCS
fetal calf serum
- PA
polyacrylamide
- PMSF
Phenylmethylsulfonyl fluoride
- SDS
Sodiumdodecylsulfate
- SRF
serum response factor
- TCF
ternary complex factor
- TPA
12-O-tetradecanoyl phorbol-13-acetate
- 2D
two-dimensional 相似文献