首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background  

Extracting motifs from sequences is a mainstay of bioinformatics. We look at the problem of mining structured motifs, which allow variable length gaps between simple motif components. We propose an efficient algorithm, called EXMOTIF, that given some sequence(s), and a structured motif template, extracts all frequent structured motifs that have quorum q. Potential applications of our method include the extraction of single/composite regulatory binding sites in DNA sequences.  相似文献   

2.
MOTIVATION AND RESULTS: Motivated by the recent rise of interest in small regulatory RNAs, we present Locomotif--a new approach for locating RNA motifs that goes beyond the previous ones in three ways: (1) motif search is based on efficient dynamic programming algorithms, incorporating the established thermodynamic model of RNA secondary structure formation. (2) motifs are described graphically, using a Java-based editor, and search algorithms are derived from the graphics in a fully automatic way. The editor allows us to draw secondary structures, annotated with size and sequence information. They closely resemble the established, but informal way in which RNA motifs are communicated in the literature. Thus, the learning effort for Locomotif users is minimal. (3) Locomotif employs a client-server approach. Motifs are designed by the user locally. Search programs are generated and compiled on a bioinformatics server. They are made available both for execution on the server, and for download as C source code plus an appropriate makefile. AVAILABILITY: Locomotif is available at http://bibiserv.techfak.uni-bielefeld.de/locomotif.  相似文献   

3.
This paper presents a language for describing arrangements of motifs in biological sequences, and a program that uses the language to find the arrangements in motif match databases. The program does not by itself search for the constituent motifs, and is thus independent of how they are detected, which allows it to use motif match data of various origins. AVAILABILITY: The program can be tested online at http://hits.isb-sib.ch and the distribution is available from ftp://ftp.isrec.isb-sib.ch/pub/software/unix/mmsearch-1.0.tar.gz CONTACT: Thomas.Junier@isrec.unil.ch SUPPLEMENTARY INFORMATION: The full documentation about mmsearchis available from http://hits.isb-sib.ch/~tjunier/mmsearch/doc.  相似文献   

4.
The establishment and rapid expansion of microarray databases has created a need for new search tools. Here we present CellMontage, the first server for expression profile similarity search over a large database-69 000 microarray experiments derived from NCBI's; GEO site. CellMontage provides a novel, content-based search engine for accessing gene expression data. Microarray experiments with similar overall expression to a user-provided expression profile (e.g. microarray experiment) are computed and displayed-usually within 20 s. The core search engine software is downloadable from the site.  相似文献   

5.
MOTIVATION: The discovery of patterns shared by several sequences that differ greatly is a basic task in sequence analysis, and still a challenge. Several methods have been developed for detecting patterns. Methods commonly used for motif search include the Gibbs sampler, Expectation-Maximization (EM) algorithm and some intuitive greedy approaches. One cannot guarantee the optimality of the result produced by the Gibbs sampler in a single run. The deterministic EM methods tend to get trapped by local optima. Solutions found by greedy approaches are rarely sufficiently good. RESULTS: A simple model describing a motif or a portion of local multiple sequence alignment is the weight matrix model, in which a motif is characterized with position-specific probabilities. Two substitution matrices are proposed to relate the sequence similarity with the weight matrix. Combining the substitution matrix and weight matrix, we examine three typical sets of protein sequences with increasing complexity. At a low score threshold for pair similarity, sliding windows are compared with a seed window to find the score sum, which provides a measure of statistical significance for multiple sequence comparison. Such a similarity analysis reveals many aspects of motifs. Blocks determined by similarity can be used to deduce a primary weight matrix or an improved substitution matrix. The algorithm successfully obtains the optimal solution for the test sets by just greedy iteration.  相似文献   

6.
Family pairwise search with embedded motif models.   总被引:1,自引:0,他引:1  
MOTIVATION: Statistical models of protein families, such as position-specific scoring matrices, profiles and hidden Markov models, have been used effectively to find remote homologs when given a set of known protein family members. Unfortunately, training these models typically requires a relatively large set of training sequences. Recent work (Grundy, J. Comput. Biol., 5,<479-492, 1998) has shown that, when only a few family members are known, several theoretically justified statistical modeling techniques fail to provide homology detection performance on a par with Family Pairwise Search (FPS), an algorithm that combines scores from a pairwise sequence similarity algorithm such as BLAST. RESULTS: The present paper provides a model-based algorithm that improves FPS by incorporating hybrid motif-based models of the form generated by Cobbler (Henikoff and Henikoff, Protein Sci., 6, 698-705, 1997). For the 73 protein families investigated here, this cobbled FPS algorithm provides better homology detection performance than either Cobbler or FPS alone. This improvement is maintained when BLAST is replaced with the full Smith-Waterman algorithm. AVAILABILITY: http://fps.sdsc.edu  相似文献   

7.
Zoology: a search for pattern in form and function   总被引:1,自引:1,他引:0  
《Journal of Zoology》2007,271(1):1-2
  相似文献   

8.
McManus SA  Li Y 《Biochemistry》2007,46(8):2198-2204
The catalytic and structural characteristics of two new self-phosphorylating deoxyribozymes (referred to as deoxyribozyme kinases), denoted "Dk3" and "Dk4", are compared to those of Dk2, a previously reported deoxyribozyme kinase. All three deoxyribozymes not only utilize GTP as the source of activated phosphate and Mn(II) as the divalent metal cofactor but also share a common secondary structure with significant sequence variations. Multiple Watson-Crick helices are identified within the secondary structure, and these helical interactions confine three extremely conserved sequence elements of 8, 5, and 14 nucleotides in length, presumably for the formation of the catalytic core for GTP binding and the self-phosphorylating reaction. The locations of the conserved regions suggest that these three deoxyribozymes arose independently from in vitro selection. The existence of three sequence variants of the same deoxyribozyme from the same in vitro selection experiment implies that these catalytic DNAs may represent the simplest structural solution for the DNA self-phosphorylation reaction when GTP is used as the substrate.  相似文献   

9.
A program has been written to determine T-score profile patterns, thus allowing the immediate assessment of morphological similarities and differences between taxa. For a given data set, the program will produce T-scores, T-score correlation coefficients and associated significance statistics, weighted means and standard deviations for all input variables, and a graphic display of the T-score profile pattern(s). This program should prove to be of great interest to physical anthropologists, zoologists and other researchers in the life sciences with an interest in taxonomy, systematics and phylogeny.  相似文献   

10.
MOTIVATION: Profile HMMs are a powerful tool for modeling conserved motifs in proteins. These models are widely used by search tools to classify new protein sequences into families based on domain architecture. However, the proliferation of known motifs and new proteomic sequence data poses a computational challenge for search, requiring days of CPU time to annotate an organism's proteome. RESULTS: We use PROSITE-like patterns as a filter to speed up the comparison between protein sequence and profile HMM. A set of patterns is designed starting from the HMM, and only sequences matching one of these patterns are compared to the HMM by full dynamic programming. We give an algorithm to design patterns with maximal sensitivity subject to a bound on the false positive rate. Experiments show that our patterns typically retain at least 90% of the sensitivity of the source HMM while accelerating search by an order of magnitude. AVAILABILITY: Contact the first author at the address below.  相似文献   

11.
Ribosomal RNA (rRNA) genes are probably the most frequently used data source in phylogenetic reconstruction. Individual columns of rRNA alignments are not independent as a consequence of their highly conserved secondary structures. Unless explicitly taken into account, these correlation can distort the phylogenetic signal and/or lead to gross overestimates of tree stability. Maximum likelihood and Bayesian approaches are of course amenable to using RNA-specific substitution models that treat conserved base pairs appropriately, but require accurate secondary structure models as input. So far, however, no accurate and easy-to-use tool has been available for computing structure-aware alignments and consensus structures that can deal with the large rRNAs. The RNAsalsa approach is designed to fill this gap. Capitalizing on the improved accuracy of pairwise consensus structures and informed by a priori knowledge of group-specific structural constraints, the tool provides both alignments and consensus structures that are of sufficient accuracy for routine phylogenetic analysis based on RNA-specific substitution models. The power of the approach is demonstrated using two rRNA data sets: a mitochondrial rRNA set of 26 Mammalia, and a collection of 28S nuclear rRNAs representative of the five major echinoderm groups.  相似文献   

12.
Sequence elements, at all levels-DNA, RNA and protein, play a central role in mediating molecular recognition and thereby molecular regulation and signaling. Studies that focus on -measuring and investigating sequence-based recognition make use of statistical and computational tools, including approaches to searching sequence motifs. State-of-the-art motif searching tools are limited in their coverage and ability to address large motif spaces. We develop and present statistical and algorithmic approaches that take as input ranked lists of sequences and return significant motifs. The efficiency of our approach, based on suffix trees, allows searches over motif spaces that are not covered by existing tools. This includes searching variable gap motifs-two half sites with a flexible length gap in between-and searching long motifs over large alphabets. We used our approach to analyze several high-throughput measurement data sets and report some validation results as well as novel suggested motifs and motif refinements. We suggest a refinement of the known estrogen receptor 1 motif in humans, where we observe gaps other than three nucleotides that also serve as significant recognition sites, as well as a variable length motif related to potential tyrosine phosphorylation.  相似文献   

13.
In this paper, we propose an efficient, reliable shotgun sequence assembly algorithm based on a fingerprinting scheme that is robust to both noise and repetitive sequences in the data, two primary roadblocks to effective whole-genome shotgun sequencing. Our algorithm uses exact matches of short patterns randomly selected from fragment data to identify fragment overlaps, construct an overlap map, and deliver a consensus sequence. We show how statistical clues made explicit in our approach can easily be exploited to correctly assemble results even in the presence of extensive repetitive sequences. Our approach is both accurate and exceptionally fast in practice: e.g., we have correctly assembled the whole Mycoplasma genitalium genome (approximately 580 kbp) is roughly 8 minutes of 64MB 200MHz Pentium Pro CPU time from real shotgun data, where most existing algorithms can be expected to run for several hours to a day on the same data. Moreover, experiments with artificially-shotgunned data prepared from real DNA sequences from a wide range of organisms (including human DNA) and containing complex repeating regions demonstrate our algorithm's robustness to input noise and the presence of repetitive sequences. For example, we have correctly assembled a 238-kbp human DNA sequence in less than 3 min of 64-MB 200-MHz Pentium Pro CPU time.  相似文献   

14.
15.
Kim S  Wang Z  Dalkilic M 《Proteins》2007,66(3):671-681
The motif prediction problem is to predict short, conserved subsequences that are part of a family of sequences, and it is a very important biological problem. Gibbs is one of the first successful motif algorithms and it runs very fast compared with other algorithms, and its search behavior is based on the well-studied Gibbs random sampling. However, motif prediction is a very difficult problem and Gibbs may not predict true motifs in some cases. Thus, the authors explored a possibility of improving the prediction accuracy of Gibbs while retaining its fast runtime performance. In this paper, the authors considered Gibbs only for proteins, not for DNA binding sites. The authors have developed iGibbs, an integrated motif search framework for proteins that employs two previous techniques of their own: one for guiding motif search by clustering sequences and another by pattern refinement. These two techniques are combined to a new double clustering approach to guiding motif search. The unique feature of their framework is that users do not have to specify the number of motifs to be predicted when motifs occur in different subsets of the input sequences since it automatically clusters input sequences into clusters and predict motifs from the clusters. Tests on the PROSITE database show that their framework improved the prediction accuracy of Gibbs significantly. Compared with more exhaustive search methods like MEME, iGibbs predicted motifs more accurately and runs one order of magnitude faster.  相似文献   

16.
17.
The angular and spectral reflectance of single scales of five different butterfly species was measured and related to the scale anatomy. The scales of the pierids Pieris rapae and Delias nigrina scatter white light randomly, in close agreement with Lambert’s cosine law, which can be well understood from the randomly organized beads on the scale crossribs. The reflectance of the iridescent blue scales of Morpho aega is determined by multilayer structures in the scale ridges, causing diffraction in approximately a plane. The purple scales in the dorsal wing tips of the male Colotis regina act similarly as the Morpho scale in the blue, due to multilayers in the ridges, but the scattering in the red occurs as in the Pieris scale, because the scales contain beads with pigment that does not absorb in the red wavelength range. The green–yellow scales of Urania fulgens backscatter light in a narrow spatial angle, because of a multilayer structure in the scale body.  相似文献   

18.
19.
Many classes of non-coding RNAs (ncRNAs; including Y RNAs, vault RNAs, RNase P RNAs, and MRP RNAs, as well as a novel class recently discovered in Dictyostelium discoideum) can be characterized by a pattern of short but well-conserved sequence elements that are separated by poorly conserved regions of sometimes highly variable lengths. Local alignment algorithms such as BLAST are therefore ill-suited for the discovery of new homologs of such ncRNAs in genomic sequences. The Fragrep tool instead implements an efficient algorithm for detecting the pattern fragments that occur in a given order. For each pattern fragment, the mismatch tolerance and bounds on the length of the intervening sequences can be specified separately. Furthermore, matches can be ranked by a statistically well-motivated scoring scheme.  相似文献   

20.
The idea that populations are spatially structured has become a very powerful concept in ecology, raising interest in many research areas. However, despite dispersal being a core component of the concept, it typically does not consider the movement behavior underlying any dispersal. Using individual‐based simulations in continuous space, we explored the emergence of a spatially structured population in landscapes with spatially heterogeneous resource distribution and with organisms following simple area‐concentrated search (ACS); individuals do not, however, perceive or respond to any habitat attributes per se but only to their foraging success. We investigated the effects of different resource clustering pattern in landscapes (single large cluster vs. many small clusters) and different resource density on the spatial structure of populations and movement between resource clusters of individuals. As results, we found that foraging success increased with increasing resource density and decreasing number of resource clusters. In a wide parameter space, the system exhibited attributes of a spatially structured populations with individuals concentrated in areas of high resource density, searching within areas of resources, and “dispersing” in straight line between resource patches. “Emigration” was more likely from patches that were small or of low quality (low resource density), but we observed an interaction effect between these two parameters. With the ACS implemented, individuals tended to move deeper into a resource cluster in scenarios with moderate resource density than in scenarios with high resource density. “Looping” from patches was more likely if patches were large and of high quality. Our simulations demonstrate that spatial structure in populations may emerge if critical resources are heterogeneously distributed and if individuals follow simple movement rules (such as ACS). Neither the perception of habitat nor an explicit decision to emigrate from a patch on the side of acting individuals is necessary for the emergence of such spatial structure.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号