共查询到20条相似文献,搜索用时 15 毫秒
1.
Mining frequent stem patterns from unaligned RNA sequences 总被引:1,自引:0,他引:1
MOTIVATION: In detection of non-coding RNAs, it is often necessary to identify the secondary structure motifs from a set of putative RNA sequences. Most of the existing algorithms aim to provide the best motif or few good motifs, but biologists often need to inspect all the possible motifs thoroughly. RESULTS: Our method RNAmine employs a graph theoretic representation of RNA sequences and detects all the possible motifs exhaustively using a graph mining algorithm. The motif detection problem boils down to finding frequently appearing patterns in a set of directed and labeled graphs. In the tasks of common secondary structure prediction and local motif detection from long sequences, our method performed favorably both in accuracy and in efficiency with the state-of-the-art methods such as CMFinder. AVAILABILITY: The software is available upon request. 相似文献
2.
Recognition of characteristic patterns in sets of functionally equivalent DNA sequences 总被引:2,自引:0,他引:2
An algorithm has been developed for the identification of unknownpatterns which are distinctive for a set of short DNA sequencesbelieved to be functionally equivalent. A pattern is definedas being a string, containing fully or partially specified nucleotidesat each position of the string. The advantage of this vaguedefinition of the pattern is that it imposes minimum constraintson the characterization of patterns. A new feature of the approachdeveloped here is that it allows a fair simultaneoustesting of patterns of all degrees of degeneracy. This analysisis based on an evaluation of inhomogeneity in the empiricaloccurrence distribution of any such pattern within a set ofsequences. The use of the nonparametric kernel density estimationof Parzen allows one to assess small disturbances among thesequence alignments. The method also makes it possible to identifysequence subsets with different characteristic patterns. Thisalgorithm was implemented in the analysis of patterns characteristicof sets of promoters, terminators and splice junction sequences.The results are compared with those obtained by other methods.
Received on November 17, 1986; accepted on June 15, 1987 相似文献
3.
A new method to predict the consensus secondary structure of a set of unaligned RNA sequences 总被引:3,自引:0,他引:3
MOTIVATION: To predict the consensus secondary structure, possibly including pseudoknots, of a set of RNA unaligned sequences. RESULTS: We have designed a method based on a new representation of any RNA secondary structure as a set of structural relationships between the helices of the structure. We refer to this representation as a structural pattern. In a first step, we use thermodynamic parameters to select, for each sequence, the best secondary structures according to energy minimization and we represent each of them using its corresponding structural pattern. In a second step, we search for the repeated structural patterns, i.e. the largest structural patterns that occur in at least one sequence, i.e. included in at least one of the structural patterns associated to each sequence. Thanks to an efficient encoding of structural patterns, this search comes down to identifying the largest repeated word suffixes in a dictionary. In a third step, we compute the plausibility of each repeated structural pattern by checking if it occurs more frequently in the studied sequences than in random RNA sequences. We then suppose that the consensus secondary structure corresponds to the repeated structural pattern that displays the highest plausibility. We present several experiments concerning tRNA, fragments of 16S rRNA and 10Sa RNA (including pseudoknots); in each of them, we found the putative consensus secondary structure. 相似文献
4.
Identification of functional elements in unaligned nucleic acid sequences by a novel tuple search algorithm 总被引:7,自引:1,他引:7
Wolfertstetter Franz; Frech Kornelie; Herrmann Gunter; Werner Thomas 《Bioinformatics (Oxford, England)》1996,12(1):71-80
We present an algorithm to identify potential functional elementslike protein binding sites in DNA sequences, solely from nucleotidesequence data. Prerequisites are a set of at least seven notclosely related sequences with a common biological functionwhich is correlated to one or more unknown sequence elementspresent in most but not necessarily all of the sequences. Thealgorithm is based on a search for n-tuples which occur at leastin a minimum percentage of the sequences with no or one mismatch,which may be at any position of the tuple. In contrast to functionaltuples, random tuples show no preferred pattern of mismatchlocations within the tuple nor is the conservation extendedbeyond the tuple. Both features of functional tuples are usedto eliminate random tuples. Selection is carried out by maximizationof the information content first for the n-tuple, then for aregion containing the tuple and finally for the complete bindingsite. Further matches are found in an additional selection step,using the ConsInd method previously described. The algorithmis capable of identifying and delimiting elements (e.g. proteinbinding sites) represented by single short cores (e.g. TATAbox) in sets of unaligned sequences of about 500 nucleotidesusing no information other than the nucleotide sequences. Furthermore, we show its ability to identify multiple elements in aset of complete LTR sequences (more than 600 nucleotides persequence). 相似文献
5.
We consider the problem of comparing several nucleic acid sequencesto identify words occurring imperfectly (patterns with no gap)with unusual frequency. Methods for computing, representing,and inspecting interactively the structure of such repeatingmotifs in nucleic acids and more generally any text are described.Multiple sequences are treated as one large concatenate. Ina preprocessing step, a lexical index is created to providerapid string matching for the enumeration of the words matchinga pattern. For given word features (word length, minimal frequency),a sequence profile is displayed. The profile can be inspectedinteractively with on-line algorithms. Applications to the identificationof regulatory elements in DNA regions involved in the controlof gene expression are presented. Our program (DNA-Lexemics)runs on the Macintosh. 相似文献
6.
7.
As one of the earliest problems in computational biology, RNA secondary structure prediction (sometimes referred to as "RNA folding") problem has attracted attention again, thanks to the recent discoveries of many novel non-coding RNA molecules. The two common approaches to this problem are de novo prediction of RNA secondary structure based on energy minimization and the consensus folding approach (computing the common secondary structure for a set of unaligned RNA sequences). Consensus folding algorithms work well when the correct seed alignment is part of the input to the problem. However, seed alignment itself is a challenging problem for diverged RNA families. In this paper, we propose a novel framework to predict the common secondary structure for unaligned RNA sequences. By matching putative stacks in RNA sequences, we make use of both primary sequence information and thermodynamic stability for prediction at the same time. We show that our method can predict the correct common RNA secondary structures even when we are given only a limited number of unaligned RNA sequences, and it outperforms current algorithms in sensitivity and accuracy. 相似文献
8.
9.
Closely related DNA sequences specify distinct patterns of developmental expression in Drosophila melanogaster.
下载免费PDF全文

Three short synthetic DNA sequences, which are closely related to one another, confer three distinct patterns of developmental expression on the heat shock hsp70 gene in transgenic Drosophila melanogaster lines. These results show that small variations or even single base pair changes in a repeated element of a regulatory sequence can create promoters that display new specificities of tissue and developmental regulation. Interestingly, the three patterns of developmental expression conferred by the synthetic DNAs resemble in part those of the known developmental genes: glucose dehydrogenase (Gld), Dopa decarboxylase (Ddc), and salivary gland secretory proteins (Sgs), respectively. In each case, the defined regulatory region of the known developmental gene contains multiple sequences that are similar or identical to the synthetic sequence that confers a similar pattern of developmental expression on the hsp70 gene. Thus, these results are congruent with the view that short sequence elements in multiple copies can confer either simple or relatively complex patterns of developmental expression on a receptive promoter like that of hsp70. Furthermore, the fact that the three variants tested produced three distinct patterns of expression in transgenic animals suggests that the number of different elements is large. 相似文献
10.
L. S. Guzevatykh 《Russian Journal of Bioorganic Chemistry》2008,34(5):526-543
The occurrence of individual amino acids and dipeptide fragments in the sequences of 60 known atypical opioid peptides was analyzed. An expressed predominance of Tyr-Pro fragment suggested a high probability of analgesic activity for this dipeptide, and it was experimentally studied. It was shown on the somatic and visceral pain sensitivity models that, at the i.p. administration of Tyr-Pro in doses of 1.0–10 mg/kg of body mass, it exhibits an analgesic activity eliminated by naloxone and naloxone metiodide. However, in tests on ileum preparations of guinea pig and mouse vas deference in vitro, Tyr-Pro was devoid of opioid activity, which proved its indirect influence on opioid receptors. 相似文献
11.
We describe an algorithm (IRSA) for identification of common regulatory signals in samples of unaligned DNA sequences. The algorithm was tested on randomly generated sequences of fixed length with implanted signal of length 15 with 4 mutations, and on natural upstream regions of bacterial genes regulated by PurR, ArgR and CRP. Then it was applied to upstream regions of orthologous genes from Escherichia coli and related genomes. Some new palindromic binding and direct repeats signals were identified. Finally we present a parallel version suitable for computers supporting the MPI protocol. This implementation is not strictly bounded by the number of available processors. The computation speed linearly depends on the number of processors. 相似文献
12.
13.
Finding composite regulatory patterns in DNA sequences 总被引:1,自引:0,他引:1
Pattern discovery in unaligned DNA sequences is a fundamental problem in computational biology with important applications in finding regulatory signals. Current approaches to pattern discovery focus on monad patterns that correspond to relatively short contiguous strings. However, many of the actual regulatory signals are composite patterns that are groups of monad patterns that occur near each other. A difficulty in discovering composite patterns is that one or both of the component monad patterns in the group may be 'too weak'. Since the traditional monad-based motif finding algorithms usually output one (or a few) high scoring patterns, they often fail to find composite regulatory signals consisting of weak monad parts. In this paper, we present a MITRA (MIsmatch TRee Algorithm) approach for discovering composite signals. We demonstrate that MITRA performs well for both monad and composite patterns by presenting experiments over biological and synthetic data. 相似文献
14.
G.A. Parker 《Journal of theoretical biology》1982,96(2):281-294
It is suggested that sperm competition (competition between the sperm from two or more males over the fertilization of ova) may account for the fact that sperm are so small and so numerous. In the entire absence of sperm competition, selection may favour an increase in sperm size so that the sperm contributes nutriment to the subsequent viability and success of the zygote. However, an extremely low incidence of sperm competition is adequate to prevent sperm size increasing. Vertebrate sperm should remain at minimal size provided that double matings (one female mated by two males) occur more often than about 4 times the ratio of sperm size: ovum size. The classical theory that sperm are small simply because of the difficulties of ensuring that ova do get fertilized may also explain sperm size, and both effects (sperm competition and ensuring fertilization) are likely to contribute to the stability of anisogamy. Large numbers of sperm can be produced because sperm are tiny and the optimal allocation of reproductive reserves to ejaculates is not trivially small even when double matings are rather rare. It is suggested that of its total mating effort, a male vertebrate should spend a fraction on sperm that is roughly equivalent to a quarter of the probability of double mating. 相似文献
15.
A computer search of the pBR322 DNA sequence identified five sites matching reported glucocorticoid regulatory element (GRE) DNA consensus sequences and three related sites. A pBR322 DNA fragment containing one GRE site was shown to bind immobilized HeLa S3 cell glucocorticoid receptor and to compete for receptor binding in a competitive binding assay. Conversely, a pBR322 DNA fragment devoid of GRE sites showed barely detectable interaction with glucocorticoid receptor in either of these assays. These results demonstrate the importance of GRE consensus sequences in glucocorticoid receptor interactions with DNA, and further identify a cause for high background binding observed when pBR322 DNA is used as a negative control in studies of glucocorticoid receptor-DNA interactions. 相似文献
16.
Taylor Philip; Rosenberg Paul; Samsonova Mary.G. 《Bioinformatics (Oxford, England)》1991,7(4):495-500
We describe a fast computer algorithm for identifying consensuspatterns in DNA sequences. The method requires no prior assumptionsabout the consensus pattern other than its length. In particularno previous knowledge of the frequency or spacing of consensuspatterns is required. However, a priori information about theshape of the consensus pattern, or invariability of individualpositions, or the overall conservation level, can be utilizedto enhance the selectivity and sensitivity of search. As thenumber of all possible consensus words increases very rapidlywith length, comprehensive searches have usually been restrictedto a maximum of 1012 nucleotides, even when large mainframesare used. Our algorithm enables searching for consensus patternsof this order on current mid-range and powerful microcomputers.Searches may be conducted on single, long sequences or a setof possibly aligned shorter sequences. We give examples of identifiedconsensus patterns in both prokaryotic and eukaryotic DNA sequences,along with some typical program timings.
Received on January 14, 1991; accepted on March 5, 1991 相似文献
17.
18.
George N. Rudenko Caius M. T. Rommens H. John J. Nijkamp Jacques Hille 《Plant molecular biology》1993,21(4):723-728
We describe a novel modification of the polymerase chain reaction for efficient in vitro amplification of genomic DNA sequences flanking short stretches of known sequence. The technique utilizes a target enrichment step, based on the selective isolation of biotinylated fragments from the bulk of genomic DNA on streptavidin-containing support. Subsequently, following ligation with a second universal linker primer, the selected fragments can be amplified to amounts suitable for further molecular studies. The procedure has been applied to recover T-DNA flanking sequences in transgenic tomato plants which could subsequently be used to assign the positions of T-DNA to the molecular map of tomato. The method called supported PCR (sPCR) is a simple and efficient alternative to techniques used in the isolation of specific sequences flanking a known DNA segment. 相似文献
19.