共查询到20条相似文献,搜索用时 0 毫秒
1.
Jonathan M Carlson Arijit Chakravarty Robert H Gross 《Journal of computational biology》2006,13(3):686-701
The identification of potential protein binding sites (cis-regulatory elements) in the upstream regions of genes is key to understanding the mechanisms that regulate gene expression. To this end, we present a simple, efficient algorithm, BEAM (beam-search enumerative algorithm for motif finding), aimed at the discovery of cis-regulatory elements in the DNA sequences upstream of a related group of genes. This algorithm dramatically limits the search space of expanded sequences, converting the problem from one that is exponential in the length of motifs sought to one that is linear. Unlike sampling algorithms, our algorithm converges and is capable of finding statistically overrepresented motifs with a low failure rate. Further, our algorithm is not dependent on the objective function or the organism used. Limiting the space of candidate motifs enables the algorithm to focus only on those motifs that are most likely to be biologically relevant and enables the algorithm to use direct evaluations of background frequencies instead of resorting to probabilistic estimates. In addition, limiting the space of candidate motifs makes it possible to use computationally expensive objective functions that are able to correctly identify biologically relevant motifs. 相似文献
2.
Although cis-regulatory binding sites (CRBSs) are at least as important as the coding sequences in a genome, our general understanding of them in most sequenced genomes is very limited due to the lack of efficient and accurate experimental and computational methods for their characterization, which has largely hindered our understanding of many important biological processes. In this article, we describe a novel algorithm for genome-wide de novo prediction of CRBSs with high accuracy. We designed our algorithm to circumvent three identified difficulties for CRBS prediction using comparative genomics principles based on a new method for the selection of reference genomes, a new metric for measuring the similarity of CRBSs, and a new graph clustering procedure. When operon structures are correctly predicted, our algorithm can predict 81% of known individual binding sites belonging to 94% of known cis-regulatory motifs in the Escherichia coli K12 genome, while achieving high prediction specificity. Our algorithm has also achieved similar prediction accuracy in the Bacillus subtilis genome, suggesting that it is very robust, and thus can be applied to any other sequenced prokaryotic genome. When compared with the prior state-of-the-art algorithms, our algorithm outperforms them in both prediction sensitivity and specificity. 相似文献
3.
4.
Bioinformatic identification of candidate cis-regulatory elements involved in human mRNA polyadenylation 总被引:5,自引:0,他引:5
下载免费PDF全文

Polyadenylation is an essential step for the maturation of almost all cellular mRNAs in eukaryotes. In human cells, most poly(A) sites are flanked by the upstream AAUAAA hexamer or a close variant, and downstream U/GU-rich elements. In yeast and plants, additional cis elements have been found to be located upstream of the poly(A) site, including UGUA, UAUA, and U-rich elements. In this study, we have developed a computer program named PROBE (Polyadenylation-Related Oligonucleotide Bidimensional Enrichment) to identify cis elements that may play regulatory roles in mRNA polyadenylation. By comparing human genomic sequences surrounding frequently used poly(A) sites with those surrounding less frequently used ones, we found that cis elements occurring in yeast and plants also exist in human poly(A) regions, including the upstream U-rich elements, and UAUA and UGUA elements. In addition, several novel elements were found to be associated with human poly(A) sites, including several G-rich elements. Thus, we suggest that many cis elements are evolutionarily conserved among eukaryotes, and human poly(A) sites have an additional set of cis elements that may be involved in the regulation of mRNA polyadenylation. 相似文献
5.
6.
7.
8.
9.
10.
A targeted-replacement system for identification of signals for de novo methylation in Neurospora crassa. 总被引:2,自引:1,他引:2
下载免费PDF全文

Transformation of eukaryotic cells can be used to test potential signals for DNA methylation. This approach is not always reliable, however, because of chromosomal position effects and because integration of multiple and/or rearranged copies of transforming DNA can influence DNA methylation. We developed a robust system to evaluate the potential of DNA fragments to function as signals for de novo methylation in Neurospora crassa. The requirements of the system were (i) a location in the N. crassa genome that becomes methylated only in the presence of a bona fide methylation signal and (ii) an efficient gene replacement protocol. We report here that the am locus fulfills these requirements, and we demonstrate its utility with the identification of a 2.7-kb fragment from the psi 63 locus as a new portable signal for de novo methylation. 相似文献
11.
12.
13.
For the identification of novel proteins using MS/MS, de novo sequencing software computes one or several possible amino acid sequences (called sequence tags) for each MS/MS spectrum. Those tags are then used to match, accounting amino acid mutations, the sequences in a protein database. If the de novo sequencing gives correct tags, the homologs of the proteins can be identified by this approach and software such as MS-BLAST is available for the matching. However, de novo sequencing very often gives only partially correct tags. The most common error is that a segment of amino acids is replaced by another segment with approximately the same masses. We developed a new efficient algorithm to match sequence tags with errors to database sequences for the purpose of protein and peptide identification. A software package, SPIDER, was developed and made available on Internet for free public use. This paper describes the algorithms and features of the SPIDER software. 相似文献
14.
15.
16.
Motivation: The key to MS -based proteomics is peptide sequencing.The major challenge in peptide sequencing, whether library searchor de novo, is to better infer statistical significance andbetter attain noise reduction. Since the noise in a spectrumdepends on experimental conditions, the instrument used andmany other factors, it cannot be predicted even if the peptidesequence is known. The characteristics of the noise can onlybe uncovered once a spectrum is given. We wish to overcome suchissues. Results: We designed RAId to identify peptides from their associatedtandem mass spectrometry data. RAId performs a novel de novosequencing followed by a search in a peptide library that wecreated. Through de novo sequencing, we establish the spectrum-specificbackground score statistics for the library search. When thedatabase search fails to return significant hits, the top-rankingde novo sequences become potential candidates for new peptidesthat are not yet in the database. The use of spectrum-specificbackground statistics seems to enable RAId to perform well evenwhen the spectral quality is marginal. Other important featuresof RAId include its potential in de novo sequencing alone andthe ease of incorporating post-translational modifications. Availability: Programs implementing the methods described areavailable from the authors on request. Contact: yyu{at}ncbi.nlm.nih.gov Supplementary information: ftp://ftp.ncbi.nih.gov/pub/yyu/Proteomics/MSMS/RAId/MSMS_bioinfo_supp.pdf 相似文献
17.
18.
Campagna D Romualdi C Vitulo N Del Favero M Lexa M Cannata N Valle G 《Bioinformatics (Oxford, England)》2005,21(5):582-588
MOTIVATION: DNA repeats are a common feature of most genomic sequences. Their de novo identification is still difficult despite being a crucial step in genomic analysis and oligonucleotides design. Several efficient algorithms based on word counting are available, but too short words decrease specificity while long words decrease sensitivity, particularly in degenerated repeats. RESULTS: The Repeat Analysis Program (RAP) is based on a new word-counting algorithm optimized for high resolution repeat identification using gapped words. Many different overlapping gapped words can be counted at the same genomic position, thus producing a better signal than the single ungapped word. This results in better specificity both in terms of low-frequency detection, being able to identify sequences repeated only once, and highly divergent detection, producing a generally high score in most intron sequences. AVAILABILITY: The program is freely available for non-profit organizations, upon request to the authors. CONTACT: giorgio.valle@unipd.it SUPPLEMENTARY INFORMATION: The program has been tested on the Caenorhabditis elegans genome using word lengths of 12, 14 and 16 bases. The full analysis has been implemented in the UCSC Genome Browser and is accessible at http://genome.cribi.unipd.it. 相似文献
19.
The Drosophila gene Serrate encodes a membrane spanning protein, which is expressed in a complex pattern during embryogenesis and larval stages. Loss
of Serrate function leads to larval lethality, which is associated with several morphogenetic defects, including the failure to develop
wings and halteres. Serrate has been suggested to act as a short-range signal during wing development. It is required for
the induction of the organising centre at the dorsal/ventral compartment boundary, from which growth and patterning of the
wing is controlled. In order to understand the regulatory network required to control the spatially and temporally dynamic
expression of Serrate, we analysed its cis-regulatory elements by fusing various genomic fragments upstream of the reporter gene lacZ. Enhancer elements reflecting the expression pattern of endogenous Serrate in embryonic and postembryonic tissues could be
confined to 26 kb of genomic DNA, including 9 kb of transcribed region. Expression in some embryonic tissues is under the
control of multiple enhancers located in the 5’ region and in intron sequences. The data presented here provide the tools
to unravel the genetic network which regulates Serrate during different developmental stages in diverse tissues.
Received: 27 March 1998 / Accepted: 17 May 1998 相似文献