共查询到20条相似文献,搜索用时 15 毫秒
1.
Finding composite regulatory patterns in DNA sequences 总被引:1,自引:0,他引:1
Pattern discovery in unaligned DNA sequences is a fundamental problem in computational biology with important applications in finding regulatory signals. Current approaches to pattern discovery focus on monad patterns that correspond to relatively short contiguous strings. However, many of the actual regulatory signals are composite patterns that are groups of monad patterns that occur near each other. A difficulty in discovering composite patterns is that one or both of the component monad patterns in the group may be 'too weak'. Since the traditional monad-based motif finding algorithms usually output one (or a few) high scoring patterns, they often fail to find composite regulatory signals consisting of weak monad parts. In this paper, we present a MITRA (MIsmatch TRee Algorithm) approach for discovering composite signals. We demonstrate that MITRA performs well for both monad and composite patterns by presenting experiments over biological and synthetic data. 相似文献
2.
3.
Zhang LH Liu DP Liang CC 《The international journal of biochemistry & cell biology》2003,35(1):95-103
DNA regulatory sequences control gene expression by forming DNA-protein complex with specific DNA binding protein. A major task of studies of gene regulation is to identify DNA regulatory sequences in genome-wide. Especially with the rapid pace of genome project, the function of DNA regulatory sequences becomes one of the focuses in functional genome era. Several approaches for screening and characterizing DNA regulatory sequences emerged one by one, from initial low-throughput methods to high-throughput strategies. Even though at present bioinformatics tools facilitate the process of screening regulatory fragments, the most reliable results will come from experimental test. This article highlights some experimental methods for the identification of regulatory sequences. A brief review of the history and procedures for selection methods are provided. Tendency as well as limitation and extension of these methods are also presented. 相似文献
4.
Bacterial plasmids with stringently regulated copy numbers have directly repeated DNA sequences, termed iterons, in the vicinity of their replication origins. These sequences bind a specific protein exerting a key role in the initiation of plasmid replication. Plasmids P1, pSC101 and RFS1010 have different iteron sequences and belong to three different incompatibility groups. Used as DNA probes each of these plasmids generates specific patterns in mammals similar to those obtained by the DNA fingerprinting technique. The iteron-containing regions were identified as the part of the plasmids responsible for those patterns by using polymerase chain reaction (PCR) amplified DNA segments that contained the iteron regions as probes. 相似文献
5.
6.
Rajasekhar Kakumani Omair Ahmad Vijay Devabhaktuni 《EURASIP Journal on Bioinformatics and Systems Biology》2012,2012(1):12
CpG dinucleotide clusters also referred to as CpG islands (CGIs) are usually located in the promoter regions of genes in a deoxyribonucleic acid (DNA) sequence. CGIs play a crucial role in gene expression and cell differentiation, as such, they are normally used as gene markers. The earlier CGI identification methods used the rich CpG dinucleotide content in CGIs, as a characteristic measure to identify the locations of CGIs. The fact, that the probability of nucleotide G following nucleotide C in a CGI is greater as compared to a non-CGI, is employed by some of the recent methods. These methods use the difference in transition probabilities between subsequent nucleotides to distinguish between a CGI from a non-CGI. These transition probabilities vary with the data being analyzed and several of them have been reported in the literature sometimes leading to contradictory results. In this article, we propose a new and efficient scheme for identification of CGIs using statistically optimal null filters. We formulate a new CGI identification characteristic to reliably and efficiently identify CGIs in a given DNA sequence which is devoid of any ambiguities. Our proposed scheme combines maximum signal-to-noise ratio and least squares optimization criteria to estimate the CGI identification characteristic in the DNA sequence. The proposed scheme is tested on a number of DNA sequences taken from human chromosomes 21 and 22, and proved to be highly reliable as well as efficient in identifying the CGIs. 相似文献
7.
8.
9.
Recognition of characteristic patterns in sets of functionally equivalent DNA sequences 总被引:2,自引:0,他引:2
An algorithm has been developed for the identification of unknownpatterns which are distinctive for a set of short DNA sequencesbelieved to be functionally equivalent. A pattern is definedas being a string, containing fully or partially specified nucleotidesat each position of the string. The advantage of this vaguedefinition of the pattern is that it imposes minimum constraintson the characterization of patterns. A new feature of the approachdeveloped here is that it allows a fair simultaneoustesting of patterns of all degrees of degeneracy. This analysisis based on an evaluation of inhomogeneity in the empiricaloccurrence distribution of any such pattern within a set ofsequences. The use of the nonparametric kernel density estimationof Parzen allows one to assess small disturbances among thesequence alignments. The method also makes it possible to identifysequence subsets with different characteristic patterns. Thisalgorithm was implemented in the analysis of patterns characteristicof sets of promoters, terminators and splice junction sequences.The results are compared with those obtained by other methods.
Received on November 17, 1986; accepted on June 15, 1987 相似文献
10.
11.
Finding gene-expression patterns in bacterial biofilms 总被引:5,自引:0,他引:5
The production of biofilms by bacteria is a lifestyle that is thought to require or involve a differential gene expression compared with that of planktonic bacteria. Recently, we have witnessed a change of focus from the simple hunt for hypothetical essential biofilm genes to the identification of late and more complex biofilm functions. However, finding common bacterial biofilm gene-expression patterns through global expression analysis remains difficult. Owing to the apparently minimal overlap between functions involved in biofilm formation by different bacteria, exploring the biofilm lifestyle could prove to be a case-by-case task for which global approaches show their limits. 相似文献
12.
Christianson ML 《American journal of botany》2005,92(8):1221-1233
13.
The presence of the genes for Escherichia coli adherence factor (EAF), attaching and effacing lesion (eae) and bundle-forming pili (bfp) in 72 strains identified as enteropathogenic E. coli (EPEC) by slide agglutination was evaluated using hybridization and PCR. The adherence property of these strains was assayed using 3h HeLa cells adherence assay. The results obtained indicated that virulence-associated genes were present in 65% of the strains but only ten (13.9%) isolates were positive for all the three markers (typical EPEC), 37 (51.4%) isolates carried either one or two of these determinants (atypical EPEC) and the remaining 25 (34.7%) were negative for all these genes. In vitro adherence assay showed that 44 (61.1%) strains adhered to HeLa cells with a defined pattern, 13 (18.1%) isolates adhered loosely with no definite pattern and the remaining 15 (20.8%) were non-adherent. Analysis of the results showed a statistically significant association between the presence of the virulence-related genes with adherence of the strains with a defined pattern (P=0.0001). These results indicated that since over 60% of the strains identified by serogrouping carried at least one of the putative virulence markers, it therefore seems that this simple test is still of value in our setting although the need for a confirmatory test is also indicated. 相似文献
14.
Finding approximate tandem repeats in genomic sequences. 总被引:1,自引:0,他引:1
Ydo Wexler Zohar Yakhini Yechezkel Kashi Dan Geiger 《Journal of computational biology》2005,12(7):928-942
An efficient algorithm is presented for detecting approximate tandem repeats in genomic sequences. The algorithm is based on a flexible statistical model which allows a wide range of definitions of approximate tandem repeats. The ideas and methods underlying the algorithm are described and its effectiveness on genomic data is demonstrated. 相似文献
15.
16.
Identification of consensus patterns in unaligned DNA sequences known to be functionally related 总被引:16,自引:0,他引:16
Hertz Gerald Z.; Hartzell George W. III; Stormo Gary D. 《Bioinformatics (Oxford, England)》1990,6(2):81-92
We have developed a method for identifying consensus patternsin a set of unaligned DNA sequences known to bind a common proteinor to have some other common biochemical function. The methodis based on a tnatrix representation of binding site patterns.Each row of the matrix represents one of the four possible bases,each column represents one of the positions of the binding siteand each element is determined by the frequency the indicatedbase occurs at the indicated position. The goal of the methodis to find the most significant matrix-i.e. the one with thelowest probability of occurring by chance-out of all the matricesthat can be formed from the set of related sequences. The reliabilityof the method improves with the number of sequences, while thetime required increases only linearly with the number of sequences.To test this method, we analysed 11 DNA sequences containingpromoters regulated by the Escherichia coli LexA protein. Thematrices we' found were consistent with the known consensussequence, and could distinguish the generally accepted LexAbinding sites from other DNA sequences.
Received on November 6, 1989; accepted on December 20, 1989 相似文献
17.
Nucleosome formation and positioning, which play important roles in a number of biological processes, are thought to be related to the distinctive periodic dinucleotide patterns observed in the DNA sequence wrapped around the protein octamer. Previous research shows that flexibility is a key structural property of a nucleosomal DNA sequence. However, the relationship between the flexibility and the periodic dinucleotide patterns has received little attention in research in the past. In this study, we propose the use of three different models to measure the flexibility of yeast DNA sequences. Although the three models involve different parameters, they deliver consistent results showing that yeast nucleosomal DNA sequences are more flexible than non-nucleosomal ones. In contrast to random flexibility values along non-nucleosomal DNA sequences, the flexibility of nucleosomal DNA sequences shows a clear periodicity of 10.14 base pairs, which is consistent with the periodicity of dinucleotide distributions. We also demonstrate that there is a strong relationship between the peak positions of the flexibility and the dinucleotide frequencies. Correlation between the flexibility and the dinucleotide patterns of CA/TG, CG, GC, GG/CC, AG/CT, AC/GT and GA/TC are positive with an average value of 0.5946. The highest correlation is shown by CA/TG with a value of 0.7438 and the lowest correlation is shown by AA/TT with a value of −0.7424. The source codes and data sets are available for downloading on http://www.hy8.com/bioinformatics.htm. 相似文献
18.
Methylation patterns of repetitive DNA sequences in germ cells of Mus musculus. 总被引:10,自引:4,他引:10
下载免费PDF全文

The major and the minor satellite sequences of Mus musculus were undermethylated in both sperm and oocyte DNAs relative to the amount of undermethylation observed in adult somatic tissue DNA. This hypomethylation was specific for satellite sequences in sperm DNA. Dispersed repetitive and low copy sequences show a high degree of methylation in sperm DNA; however, a dispersed repetitive sequence was undermethylated in oocyte DNA. This finding suggests a difference in the amount of total genomic DNA methylation between sperm and oocyte DNA. The methylation levels of the minor satellite sequences did not change during spermiogenesis, and were not associated with the onset of meiosis or a specific stage in sperm development. 相似文献
19.
MOTIVATION: Finding common patterns, or motifs, in the promoter regions of co-expressed genes is an important problem in bioinformatics. A common representation of the motif is by probability matrix or PSSM (position specific scoring matrix). However, even for a motif of length six or seven, there is no algorithm that can guarantee finding the exact optimal matrix from an infinite number of possible matrices. RESULTS: This paper introduces the first algorithm, called EOMM, for finding the exact optimal matrix-represented motif, or simply optimal motif. Based on branch-and-bound searching by partitioning the solution space recursively, EOMM can find the optimal motif of size up to eight or nine, and a motif of larger size with any desired accuracy on the principle that the smaller the error bound, the longer the running time. Experiments show that for some real and simulated data sets, EOMM finds the motif despite very weak signals when existing software, such as MEME and MITRA-PSSM, fails to do so. 相似文献
20.
The PCR technique can use protein-derived oligonucleotide sequences as primers to develop probes for screening recombinant libraries. Here we report a method with highly degenerate mixtures of oligonucleotides as primers for the PCR that eliminates the need to identify or isolate the DNA sequences derived by PCR. The method uses the pool of PCR-generated DNA sequences radiolabeled during the extension reaction as a probe, combined with highly stringent hybridization and wash conditions that permit only homologous sequences to hybridize and therefore target desired clones. This technique was used successfully to clone the receptor for tumor necrosis factor. 相似文献