首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Finding composite regulatory patterns in DNA sequences   总被引:1,自引:0,他引:1  
Pattern discovery in unaligned DNA sequences is a fundamental problem in computational biology with important applications in finding regulatory signals. Current approaches to pattern discovery focus on monad patterns that correspond to relatively short contiguous strings. However, many of the actual regulatory signals are composite patterns that are groups of monad patterns that occur near each other. A difficulty in discovering composite patterns is that one or both of the component monad patterns in the group may be 'too weak'. Since the traditional monad-based motif finding algorithms usually output one (or a few) high scoring patterns, they often fail to find composite regulatory signals consisting of weak monad parts. In this paper, we present a MITRA (MIsmatch TRee Algorithm) approach for discovering composite signals. We demonstrate that MITRA performs well for both monad and composite patterns by presenting experiments over biological and synthetic data.  相似文献   

2.
We present a new method for the identification of conserved patterns in a set of unaligned related protein sequences. It is able to discover patterns of a quite general form, allowing for both ambiguous positions and for variable length wildcard regions. It allows the user to define a class of patterns (e.g., the degree of ambiguity allowed and the length and number of gaps), and the method is then guaranteed to find the conserved patterns in this class scoring highest according to a significance measure defined. Identified patterns may be refined using one of two new algorithms. We present a new (nonstatistical) significance measure for flexible patterns. The method is shown to recover known motifs for PROSITE families and is also applied to some recently described families from the literature.  相似文献   

3.
Cluster-Buster: Finding dense clusters of motifs in DNA sequences   总被引:15,自引:2,他引:13       下载免费PDF全文
Frith MC  Li MC  Weng Z 《Nucleic acids research》2003,31(13):3666-3668
  相似文献   

4.
DNA regulatory sequences control gene expression by forming DNA-protein complex with specific DNA binding protein. A major task of studies of gene regulation is to identify DNA regulatory sequences in genome-wide. Especially with the rapid pace of genome project, the function of DNA regulatory sequences becomes one of the focuses in functional genome era. Several approaches for screening and characterizing DNA regulatory sequences emerged one by one, from initial low-throughput methods to high-throughput strategies. Even though at present bioinformatics tools facilitate the process of screening regulatory fragments, the most reliable results will come from experimental test. This article highlights some experimental methods for the identification of regulatory sequences. A brief review of the history and procedures for selection methods are provided. Tendency as well as limitation and extension of these methods are also presented.  相似文献   

5.
Bacterial plasmids with stringently regulated copy numbers have directly repeated DNA sequences, termed iterons, in the vicinity of their replication origins. These sequences bind a specific protein exerting a key role in the initiation of plasmid replication. Plasmids P1, pSC101 and RFS1010 have different iteron sequences and belong to three different incompatibility groups. Used as DNA probes each of these plasmids generates specific patterns in mammals similar to those obtained by the DNA fingerprinting technique. The iteron-containing regions were identified as the part of the plasmids responsible for those patterns by using polymerase chain reaction (PCR) amplified DNA segments that contained the iteron regions as probes.  相似文献   

6.
本文在引入近似度等概念的基础上,构造了频繁近似模式,并证明了相关性质,同时提出了相应的频繁近似模式的挖掘算法(SFAP)算法。实验结果表明该算法能有效挖掘DNA序列中的频繁近似模式,DNA序列中频繁近似模式的挖掘为生物学的相关实验提供基础。  相似文献   

7.
8.
CpG dinucleotide clusters also referred to as CpG islands (CGIs) are usually located in the promoter regions of genes in a deoxyribonucleic acid (DNA) sequence. CGIs play a crucial role in gene expression and cell differentiation, as such, they are normally used as gene markers. The earlier CGI identification methods used the rich CpG dinucleotide content in CGIs, as a characteristic measure to identify the locations of CGIs. The fact, that the probability of nucleotide G following nucleotide C in a CGI is greater as compared to a non-CGI, is employed by some of the recent methods. These methods use the difference in transition probabilities between subsequent nucleotides to distinguish between a CGI from a non-CGI. These transition probabilities vary with the data being analyzed and several of them have been reported in the literature sometimes leading to contradictory results. In this article, we propose a new and efficient scheme for identification of CGIs using statistically optimal null filters. We formulate a new CGI identification characteristic to reliably and efficiently identify CGIs in a given DNA sequence which is devoid of any ambiguities. Our proposed scheme combines maximum signal-to-noise ratio and least squares optimization criteria to estimate the CGI identification characteristic in the DNA sequence. The proposed scheme is tested on a number of DNA sequences taken from human chromosomes 21 and 22, and proved to be highly reliable as well as efficient in identifying the CGIs.  相似文献   

9.
10.
An algorithm has been developed for the identification of unknownpatterns which are distinctive for a set of short DNA sequencesbelieved to be functionally equivalent. A pattern is definedas being a string, containing fully or partially specified nucleotidesat each position of the string. The advantage of this ‘vague’definition of the pattern is that it imposes minimum constraintson the characterization of patterns. A new feature of the approachdeveloped here is that it allows a ‘fair’ simultaneoustesting of patterns of all degrees of degeneracy. This analysisis based on an evaluation of inhomogeneity in the empiricaloccurrence distribution of any such pattern within a set ofsequences. The use of the nonparametric kernel density estimationof Parzen allows one to assess small disturbances among thesequence alignments. The method also makes it possible to identifysequence subsets with different characteristic patterns. Thisalgorithm was implemented in the analysis of patterns characteristicof sets of promoters, terminators and splice junction sequences.The results are compared with those obtained by other methods. Received on November 17, 1986; accepted on June 15, 1987  相似文献   

11.
12.
13.
We consider the problem of predicting alternative splicing patterns from a set of expressed sequences (cDNAs and ESTs). Some of these expressed sequences may be errorous, thus forming incorrect exons/introns. These incorrect exons/introns may cause a lot of false positives. For example, we examined a popular alternative splicing database, ECgene, which predicts alternate splicing patterns from expressed sequences. The result shows that about 81.3%-81.6% (sensitivity) of known patterns are found, but the specificity can be as low as 5.9%. Based on the idea that errorous sequences are usually not consistent with other sequences, in this paper we provide an alternative approach for finding alternative splicing patterns which ensures that individual exons/introns of the reported patterns have enough support from the expressed sequences. On the same dataset, our approach can achieve a much higher specificity and a slight increase in sensitivity (38.9% and 84.9%, respectively). Our approach also gives better results compared with popular alternative splicing databases (ASD, ECgene, SpliceNest) and the software ClusterMerge.  相似文献   

14.
15.
Finding gene-expression patterns in bacterial biofilms   总被引:5,自引:0,他引:5  
The production of biofilms by bacteria is a lifestyle that is thought to require or involve a differential gene expression compared with that of planktonic bacteria. Recently, we have witnessed a change of focus from the simple hunt for hypothetical essential biofilm genes to the identification of late and more complex biofilm functions. However, finding common bacterial biofilm gene-expression patterns through global expression analysis remains difficult. Owing to the apparently minimal overlap between functions involved in biofilm formation by different bacteria, exploring the biofilm lifestyle could prove to be a case-by-case task for which global approaches show their limits.  相似文献   

16.
The presence of the genes for Escherichia coli adherence factor (EAF), attaching and effacing lesion (eae) and bundle-forming pili (bfp) in 72 strains identified as enteropathogenic E. coli (EPEC) by slide agglutination was evaluated using hybridization and PCR. The adherence property of these strains was assayed using 3h HeLa cells adherence assay. The results obtained indicated that virulence-associated genes were present in 65% of the strains but only ten (13.9%) isolates were positive for all the three markers (typical EPEC), 37 (51.4%) isolates carried either one or two of these determinants (atypical EPEC) and the remaining 25 (34.7%) were negative for all these genes. In vitro adherence assay showed that 44 (61.1%) strains adhered to HeLa cells with a defined pattern, 13 (18.1%) isolates adhered loosely with no definite pattern and the remaining 15 (20.8%) were non-adherent. Analysis of the results showed a statistically significant association between the presence of the virulence-related genes with adherence of the strains with a defined pattern (P相似文献   

17.
Benns et al. have recently combined a chemoproteomic profiling method with a CRISPR-based gene-editing method to identify chemically targetable residues essential for fitness in the parasite Toxoplasma gondii. The result is a strategy that enables rapid discovery of new drug targets to combat T. gondii and other related parasites.  相似文献   

18.
Finding approximate tandem repeats in genomic sequences.   总被引:1,自引:0,他引:1  
An efficient algorithm is presented for detecting approximate tandem repeats in genomic sequences. The algorithm is based on a flexible statistical model which allows a wide range of definitions of approximate tandem repeats. The ideas and methods underlying the algorithm are described and its effectiveness on genomic data is demonstrated.  相似文献   

19.
We have developed a method for identifying consensus patternsin a set of unaligned DNA sequences known to bind a common proteinor to have some other common biochemical function. The methodis based on a tnatrix representation of binding site patterns.Each row of the matrix represents one of the four possible bases,each column represents one of the positions of the binding siteand each element is determined by the frequency the indicatedbase occurs at the indicated position. The goal of the methodis to find the most significant matrix-i.e. the one with thelowest probability of occurring by chance-out of all the matricesthat can be formed from the set of related sequences. The reliabilityof the method improves with the number of sequences, while thetime required increases only linearly with the number of sequences.To test this method, we analysed 11 DNA sequences containingpromoters regulated by the Escherichia coli LexA protein. Thematrices we' found were consistent with the known consensussequence, and could distinguish the generally accepted LexAbinding sites from other DNA sequences. Received on November 6, 1989; accepted on December 20, 1989  相似文献   

20.
Nucleosome formation and positioning, which play important roles in a number of biological processes, are thought to be related to the distinctive periodic dinucleotide patterns observed in the DNA sequence wrapped around the protein octamer. Previous research shows that flexibility is a key structural property of a nucleosomal DNA sequence. However, the relationship between the flexibility and the periodic dinucleotide patterns has received little attention in research in the past. In this study, we propose the use of three different models to measure the flexibility of yeast DNA sequences. Although the three models involve different parameters, they deliver consistent results showing that yeast nucleosomal DNA sequences are more flexible than non-nucleosomal ones. In contrast to random flexibility values along non-nucleosomal DNA sequences, the flexibility of nucleosomal DNA sequences shows a clear periodicity of 10.14 base pairs, which is consistent with the periodicity of dinucleotide distributions. We also demonstrate that there is a strong relationship between the peak positions of the flexibility and the dinucleotide frequencies. Correlation between the flexibility and the dinucleotide patterns of CA/TG, CG, GC, GG/CC, AG/CT, AC/GT and GA/TC are positive with an average value of 0.5946. The highest correlation is shown by CA/TG with a value of 0.7438 and the lowest correlation is shown by AA/TT with a value of −0.7424. The source codes and data sets are available for downloading on http://www.hy8.com/bioinformatics.htm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号