期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. 总被引：2，自引：0，他引：2

L Marsan M F Sagot 《Journal of computational biology》2000,7(3-4):345-362

This paper introduces two exact algorithms for extracting conserved structured motifs from a set of DNA sequences. Structured motifs may be described as an ordered collection of p > or = 1 "boxes" (each box corresponding to one part of the structured motif), p substitution rates (one for each box) and p - 1 intervals of distance (one for each pair of successive boxes in the collection). The contents of the boxes--that is, the motifs themselves--are unknown at the start of the algorithm. This is precisely what the algorithms are meant to find. A suffix tree is used for finding such motifs. The algorithms are efficient enough to be able to infer site consensi, such as, for instance, promoter sequences or regulatory sites, from a set of unaligned sequences corresponding to the noncoding regions upstream from all genes of a genome. In particular, both algorithms time complexity scales linearly with N2n where n is the average length of the sequences and N their number. An application to the identification of promoter and regulatory consensus sequences in bacterial genomes is shown. 相似文献

2.

Mutational analysis of the ompA promoter from Flavobacterium johnsoniae

Chen S Bagdasarian M Kaufman MG Bates AK Walker ED 《Journal of bacteriology》2007,189(14):5108-5118

相似文献

3.

Regulatory motif discovery using a population clustering evolutionary algorithm 总被引：2，自引：0，他引：2

Lones MA Tyrrell AM 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2007,4(3):403-414

This paper describes a novel evolutionary algorithm for regulatory motif discovery in DNA promoter sequences. The algorithm uses data clustering to logically distribute the evolving population across the search space. Mating then takes place within local regions of the population, promoting overall solution diversity and encouraging discovery of multiple solutions. Experiments using synthetic data sets have demonstrated the algorithm's capacity to find position frequency matrix models of known regulatory motifs in relatively long promoter sequences. These experiments have also shown the algorithm's ability to maintain diversity during search and discover multiple motifs within a single population. The utility of the algorithm for discovering motifs in real biological data is demonstrated by its ability to find meaningful motifs within muscle-specific regulatory sequences. 相似文献

4.

MUSA: a parameter free algorithm for the identification of biologically significant motifs

Mendes ND Casimiro AC Santos PM Sá-Correia I Oliveira AL Freitas AT 《Bioinformatics (Oxford, England)》2006,22(24):2996-3002

MOTIVATION: The ability to identify complex motifs, i.e. non-contiguous nucleotide sequences, is a key feature of modern motif finders. Addressing this problem is extremely important, not only because these motifs can accurately model biological phenomena but because its extraction is highly dependent upon the appropriate selection of numerous search parameters. Currently available combinatorial algorithms have proved to be highly efficient in exhaustively enumerating motifs (including complex motifs), which fulfill certain extraction criteria. However, one major problem with these methods is the large number of parameters that need to be specified. RESULTS: We propose a new algorithm, MUSA (Motif finding using an UnSupervised Approach), that can be used either to autonomously find over-represented complex motifs or to estimate search parameters for modern motif finders. This method relies on a biclustering algorithm that operates on a matrix of co-occurrences of small motifs. The performance of this method is independent of the composite structure of the motifs being sought, making few assumptions about their characteristics. The MUSA algorithm was applied to two datasets involving the bacterium Pseudomonas putida KT2440. The first one was composed of 70 sigma(54)-dependent promoter sequences and the second dataset included 54 promoter sequences of up-regulated genes in response to phenol, as suggested by quantitative proteomics. The results obtained indicate that this approach is very effective at identifying complex motifs of biological significance. AVAILABILITY: The MUSA algorithm is available upon request from the authors, and will be made available via a Web based interface. 相似文献

5.

Tsukuba BB: a branch and bound algorithm for local multiple alignment of DNA and protein sequences.

P Horton 《Journal of computational biology》2001,8(3):283-303

In this paper we present a branch and bound algorithm for local gapless multiple sequence alignment (motif alignment) and its implementation. The algorithm uses both score-based bounding and a novel bounding technique based on the "consistency" of the alignment. A sequence order independent search tree is used in conjunction with a technique for avoiding redundant calculations inherent in the structure of the tree. This is the first program to exploit the fact that the motif alignment problem is easier for short motifs. Indeed, for a short fixed motif width, the running time of the algorithm is asymptotically linear in the size of the input. We tested the performance of the program on a dataset of 300 E. coli promoter sequences and a dataset of 85 lipocalin protein sequences. For a motif width of 4, the optimal alignment of the entire set of sequences can be found. For the more natural motif width of 6, the program can align 21 sequences of length 100, more than twice the number of sequences which can be aligned by the best previous exact algorithm. The algorithm can relax the constraint of requiring each sequence to be aligned, and align 105 of the 300 promoter sequences with a motif width of 6. For the lipocalin dataset, we introduce a technique for reducing the effective alphabet size with a minimal loss of useful information. With this technique, we show that the program can find meaningful motifs in a reasonable amount of time by optimizing the score over three motif positions. 相似文献

6.

E. coli promoter spacer regions contain nonrandom sequences which correlate to spacer length. 总被引：4，自引：1，他引：3

下载免费PDF全文

B A Beutel M T Record Jr 《Nucleic acids research》1990,18(12):3597-3603

The -10 and -35 regions of E. coli promoter sequences are separated by a spacer region which has a consensus length of 17 base-pairs. This region is thought to contribute to promoter function by correctly positioning the two conserved regions. We have performed a statistical evaluation of 224 spacer sequences and found that spacers which deviate from the 17 base-pair consensus length have nonrandom sequences in their upstream ends. Spacer regions which are shorter than 17 base-pairs in length have a significantly higher than expected frequency of purine-purine and pyrimidine-pyrimidine homo-dinucleotides at the six upstream positions. Spacer regions which are longer than 17 base-pairs in length have a significantly higher than expected frequency of purine-pyrimidine and pyrimidine-purine hetero-dinucleotides at these positions. This suggests that the nature of the purine-pyrimidine sequence at the upstream end of spacer regions affect promoter function in a manner which is related to the spacer length. We examine the spacer sequences as a function of spacer length and discuss some possible explanations for the observed relationship between sequence and length. 相似文献

7.

Spacer promoters are essential for efficient enhancement of X. laevis ribosomal transcription 总被引：27，自引：0，他引：27

R F De Winter T Moss 《Cell》1986,44(2):313-318

相似文献

8.

PO149, a new member of pollen pectate lyase-like gene family from alfalfa 总被引：5，自引：0，他引：5

Yongzhong Wu Xiao Qiu Sarah Du Larry Erickson 《Plant molecular biology》1996,32(6):1037-1042

PO149 is a low-copy-number gene expressed in the late stages of pollen development. The promoter region contains no similarities in DNA sequence to those of other pollen-specific genes, except for a tobacco sequence (AAATGA), which occurs four times in this alfalfa gene and much further upstream than in tobacco. Four distinct TATA boxes were detected in the promoter with the distal and proximal TATA boxes being separated by a spacer of 269 nucleotides. Hairpin loop structures were found in the 5-and 3-untranslated regions of PO149 mRNA. The coding region of PO149 is interrupted by two introns and encodes a putative prepeptide of 450 amino acids with homology to pollen pectate lyase-like proteins and pollen allergens. The coding region also contains sequences characteristic of both a signal peptide and a nuclear localization signal. 相似文献

9.

Molecular Evolution and Phylogenetic Utility of the Internal Transcribed Spacer 2 (ITS2) in Calyptratae (Diptera: Brachycera)

Song ZK Wang XZ Liang GQ 《Journal of molecular evolution》2008,67(5):448-464

The resolution potential of internal transcribed spacer 2 (ITS2) at deeper levels remains controversial. In this study, 105 ITS2 sequences of 55 species in Calyptratae were analyzed to examine the phylogenetic utility of the spacer above the subfamily level and to further understand its evolutionary characteristics. We predicted the secondary structure of each sequence using the minimum-energy algorithm and constructed two data matrixes for phylogenetic analysis. The ITS2 regions of Calyptratae display strong A-T bias and slight variation in length. The tandem and dispersed repeats embedded in the spacers possibly resulted from replication slippage or transposition. Most foldings conformed to the four-domain model. Sequence comparison in combination with the secondary structures revealed six conserved motifs. Covariation analysis from the conserved motifs indicated that the secondary structure restrains the sequence evolution of the spacer. The deep-level phylogeny derived from the ITS2 data largely agreed with the phylogenetic hypotheses from morphologic and other molecular evidence. Our analyses suggest that the accordant resolutions generated from different analyses can be used to infer deep-level phylogenetic relations. 相似文献

10.

SIGffRid: A tool to search for sigma factor binding sites in bacterial genomes using comparative approach and biologically driven statistics

Fabrice Touzain Sophie Schbath Isabelle Debled-Rennesson Bertrand Aigle Gregory Kucherov Pierre Leblond 《BMC bioinformatics》2008,9(1):73

相似文献

11.

Genomic identification of microRNA promoters and their <Emphasis Type="Italic">cis</Emphasis>-acting elements in <Emphasis Type="Italic">Populus</Emphasis>

Min Chen Ming Wei Zhanghui Dong Hai Bao Yanwei Wang 《Genes & genomics.》2016,38(4):377-387

相似文献

12.

Deletion analysis of a tobacco pollen-specific polygalacturonase promoter

S. J. Tebbutt D. M. Lonsdale 《Sexual plant reproduction》1995,8(4):242-246

相似文献

13.

Artificial ants deposit pheromone to search for regulatory DNA elements

Yunlong?Liu Email author Hiroki?Yokota 《BMC genomics》2006,7(1):221

相似文献

14.

Mutational Analysis of the cbb Operon (CO2 Assimilation) Promoter of Ralstonia eutropha 总被引：1，自引：0，他引：1

Thomas Jeffke Niels-Holger Gropp Claudia Kaiser Claudia Grzeszik Bernhard Kusian Botho Bowien 《Journal of bacteriology》1999,181(14):4374-4380

相似文献

15.

Mining protein sequences for motifs.

Giri Narasimhan Changsong Bu Yuan Gao Xuning Wang Ning Xu Kalai Mathee 《Journal of computational biology》2002,9(5):707-720

We use methods from Data Mining and Knowledge Discovery to design an algorithm for detecting motifs in protein sequences. The algorithm assumes that a motif is constituted by the presence of a "good" combination of residues in appropriate locations of the motif. The algorithm attempts to compile such good combinations into a "pattern dictionary" by processing an aligned training set of protein sequences. The dictionary is subsequently used to detect motifs in new protein sequences. Statistical significance of the detection results are ensured by statistically determining the various parameters of the algorithm. Based on this approach, we have implemented a program called GYM. The Helix-Turn-Helix motif was used as a model system on which to test our program. The program was also extended to detect Homeodomain motifs. The detection results for the two motifs compare favorably with existing programs. In addition, the GYM program provides a lot of useful information about a given protein sequence. 相似文献

16.

Expectation maximization algorithm for identifying protein-binding sites with variable lengths from unaligned DNA fragments.

L R Cardon G D Stormo 《Journal of molecular biology》1992,223(1):159-170

An Expectation Maximization algorithm for identification of DNA binding sites is presented. The approach predicts the location of binding regions while allowing variable length spacers within the sites. In addition to predicting the most likely spacer length for a set of DNA fragments, the method identifies individual sites that differ in spacer size. No alignment of DNA sequences is necessary. The method is illustrated by application to 231 Escherichia coli DNA fragments known to contain promoters with variable spacings between their consensus regions. Maximum-likelihood tests of the differences between the spacing classes indicate that the consensus regions of the spacing classes are not distinct. Further tests suggest that several positions within the spacing region may contribute to promoter specificity. 相似文献

17.

Alignment of U3 region sequences of mammalian type C viruses: identification of highly conserved motifs and implications for enhancer design. 总被引：29，自引：21，他引：8

下载免费PDF全文

E A Golemis N A Speck N Hopkins 《Journal of virology》1990,64(2):534-542

We aligned published sequences for the U3 region of 35 type C mammalian retroviruses. The alignment reveals that certain sequence motifs within the U3 region are strikingly conserved. A number of these motifs correspond to previously identified sites. In particular, we found that the enhancer region of most of the viruses examined contains a binding site for leukemia virus factor b, a viral corelike element, the consensus motif for nuclear factor 1, and the glucocorticoid response element. Most viruses containing more than one copy of enhancer sequences include these binding sites in both copies of the repeat. We consider this set of binding sites to constitute a framework for the enhancers of this set of viruses. Other highly conserved motifs in the U3 region include the retrovirus inverted repeat sequence, a negative regulatory element, and the CCAAT and TATA boxes. In addition, we identified two novel motifs in the promoter region that were exceptionally highly conserved but have not been previously described. 相似文献

18.

Modeling of DNA local parameters predicts encrypted architectural motifs in Xenopus laevis ribosomal gene promoter 总被引：1，自引：1，他引：0

Roux-Rouquie M Marilley M 《Nucleic acids research》2000,28(18):3433-3441

相似文献

19.

Structure and function of the promoter of the carrot V-type H(+)-ATPase catalytic subunit gene 总被引：4，自引：0，他引：4

I Struve T Rausch P Bernasconi L Taiz 《The Journal of biological chemistry》1990,265(14):7927-7932

相似文献

20.

PASSIM – an open source software system for managing information in biomedical studies

Juris Viksna Edgars Celms Martins Opmanis Karlis Podnieks Peteris Rucevskis Andris Zarins Amy Barrett Sudeshna Guha Neogi Maria Krestyaninova Mark I McCarthy Alvis Brazma Ugis Sarkans 《BMC bioinformatics》2007,8(1):1-7

相似文献