首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
本文在引入近似度等概念的基础上,构造了频繁近似模式,并证明了相关性质,同时提出了相应的频繁近似模式的挖掘算法(SFAP)算法。实验结果表明该算法能有效挖掘DNA序列中的频繁近似模式,DNA序列中频繁近似模式的挖掘为生物学的相关实验提供基础。  相似文献   

3.
秦丹  徐存拴 《遗传》2013,35(11):1253-1264
非编码DNA序列是指基因组中不编码蛋白质的DNA序列。这些序列可以结合调节因子、转录为功能性RNA、单独或协同地调节生理活动和病理过程。文章围绕基因表达调控作用, 总结了近几年非编码DNA序列的研究成果, 对其结构、功能和可能的作用机制进行了初步阐述, 介绍了目前鉴定非编码DNA序列中功能元件的计算方法和实验技术, 并对非编码DNA未来的研究进行了展望。  相似文献   

4.
5.
We study the length distribution functions for the 16 possible distinct dimeric tandem repeats in DNA sequences of diverse taxonomic partitions of GenBank (known human and mouse genomes, and complete genomes of Caenorhabditis elegans and yeast). For coding DNA, we find that all 16 distribution functions are exponential. For non-coding DNA, the distribution functions for most of the dimeric repeats have surprisingly long tails, that fit a power-law function. We hypothesize that: (i) the exponential distributions of dimeric repeats in protein coding sequences indicate strong evolutionary pressure against tandem repeat expansion in coding DNA sequences; and (ii) long tails in the distributions of dimers in non-coding DNA may be a result of various mutational mechanisms. These long, non-exponential tails in the distribution of dimeric repeats in non-coding DNA are hypothesized to be due to the higher tolerance of non-coding DNA to mutations. By comparing genomes of various phylogenetic types of organisms, we find that the shapes of the distributions are not universal, but rather depend on the specific class of species and the type of a dimer.  相似文献   

6.
Putative synapomorphy assessment (primary homology assessment) is distinct for DNA strings having a codon structure (hereafter, coding DNA) versus those lacking it (hereafter, non-coding DNA). The first requires the identification of a reading frame and of usually few in-frame insertions and deletions. In non-coding DNA, where length variation is much more common, putative synapomorphy assessment is considerably less straightforward and highly depends on the alignment method. Appreciating the existence of evolutionary constraints, alignments that consider patterns associated with specific putative evolutionary events are favored. Once the sequences have been aligned, the postulated putative evolutionary events need to be coded as an additional step. In order for the alignments and the alignment coding to be falsifiable, they should be carried out using justified and explicitly formulated criteria. Alternative coding methods for the most common patterns present in alignments of non-coding DNA are discussed here. Simpler putative synapomorphy assessment will not always correlate to more reliable phylogenetic information because simplicity does not necessarily correlate to the degree of homoplasy. The use of non-coding DNA can result in more laborious coding, but at the same time in more corroborated hypotheses, mirroring their accuracy for phylogenetic inference.  相似文献   

7.
8.
9.
10.
11.
The most rapidly renaturing sequences in the main-band DNA of Mus musculus, isolated on hydroxyapatite, are found to consist of two discrete families: a presumed “foldback” DNA fraction and a fraction renaturing bimolecularly. The latter family, which we call “main-band hydroxyapatite-isolated rapidly renaturing DNA”, has a kinetic complexity about an order of magnitude greater than that of mouse satellite DNA. It shows about twice as much mismatching as renatured mouse satellite, as judged by its thermal denaturation curve. In situ hybridization localizes the sequences to all chromosomes in the mouse karyotype, and to at least several regions of each chromosome. The in situ result and solution hybridization studies eliminate the possibility that the main-band rapidly renaturing DNA is composed of mouse satellite sequences attached to sequences of higher buoyant density. Nuelease S1 digestion experiments disclose that even at low molecular weight there are unrenatured “tails” attached to the rapidly renaturing sequences. When the main-band DNA fragment size is increased the amount of rapidly renaturing sequences remains constant, but the amount of attached tails of unrenatured DNA increases as judged by S1 nuclease digestibility, hyperchromicity and buoyant density. It is concluded that at least 5% of the mouse genome is composed of segments of the rapidly renaturing sequences averaging about 1500 base pairs, alternating with segments of more complex DNA averaging about 2200 base pairs. This interspersion of sequences is compared to that found in several other organisms. The properties of the foldback DNA are similarly investigated as a function of DNA fragment size.  相似文献   

12.
Assembling millions of short DNA sequences using SSAKE   总被引:7,自引:0,他引:7  
Novel DNA sequencing technologies with the potential for up to three orders magnitude more sequence throughput than conventional Sanger sequencing are emerging. The instrument now available from Solexa Ltd, produces millions of short DNA sequences of 25 nt each. Due to ubiquitous repeats in large genomes and the inability of short sequences to uniquely and unambiguously characterize them, the short read length limits applicability for de novo sequencing. However, given the sequencing depth and the throughput of this instrument, stringent assembly of highly identical sequences can be achieved. We describe SSAKE, a tool for aggressively assembling millions of short nucleotide sequences by progressively searching through a prefix tree for the longest possible overlap between any two sequences. SSAKE is designed to help leverage the information from short sequence reads by stringently assembling them into contiguous sequences that can be used to characterize novel sequencing targets. Availability: http://www.bcgsc.ca/bioinfo/software/ssake.  相似文献   

13.
A statistical analysis of occurrence of particular nucleotide runs (1 divided by 10 nucleotides long) in DNA sequences of different species has been carried out. There are considerable differences in run distributions in DNA sequences of prokaryotes, invertebrates and vertebrates. Distribution of various types of runs has been found to be different in coding and non-coding sequences. There is an abundance of short runs 1 divided by 2 nucleotides long in coding sequences, and there is a deficiency of such runs in the non-coding regions. However, some interesting exceptions from this rule exist: for run distribution of adenine in prokaryotes and for distribution of purine-pyrimidine runs in eukaryotes. This may be stipulated by the fact that the distribution of runs are predetermined by structural peculiarities of the entire DNA molecule. Runs of guanine or cytosine of three to six nucleotides long occur predominantly in the non-coding DNA regions in eukaryotes, especially in vertebrates.  相似文献   

14.
The ability of peptide nucleic acid (PNA) to open up duplex DNA in a highly sequence-specific manner makes it possible to detect short DNA sequences on the background of or within genomic DNA under non-denaturing conditions. To do so, chosen marker sites in double-stranded DNA are locally opened by a pair of PNA openers, thus transforming one strand within the target region (20-30 bp) into the single-stranded form. Onto this accessible DNA sequence a circular oligonucleotide probe is assembled, which serves as a template for rolling circle amplification (RCA). Both homogeneous and heterogeneous assay formats are investigated, as are different formats for fluorescence-based amplicon detection. Our recent data with immobilized analytes suggest that marker sequences in plasmid and bacterial chromosomal DNA can be successfully detected.  相似文献   

15.
Reddy MS  Hardin SH 《Biochemistry》2003,42(2):350-362
We have discovered that short guanine-rich oligonucleotides are able to self-associate into higher order structures that stimulate DNA synthesis in vitro without the addition of a conventional template [Ying, J., Bradley, R. K., Jones, L. B., Reddy, M. S., Colbert, D. T., Smalley, R. E., and Hardin, S. H. (1999) Biochemistry 38, 16461-16468]. Our initial analysis indicated the importance of the presence of three contiguous guanines (G) in an oligonucleotide that stimulates DNA polymerization. To gain insight into and to refine sequence requirements for the unexpected DNA synthesis, we analyzed a 231-member guanine-rich octamer library in a fluorescent nucleotide polymerization assay. We observe that, in addition to three contiguous Gs, the presence of a secondary G cluster within the octamer is essential. Furthermore, the location of the primary G cluster in the center of the molecule is most stimulatory. The majority of the octamers that form extended DNA products have a single non-G base separating the primary and secondary G clusters, the identity of which is predominantly thymine (T). Further, a T 5' or 3' of the primary G cluster positively influences the stimulatory function of the oligonucleotide. Overall, the occurrence of bases in the octamer is in the descending order of G > T > A > C. Our studies demonstrate that structures stabilized by noncanonical base pairings are recognized by a DNA polymerase in vitro, and these findings may have relevance within the cell. In particular, the features of these G-rich stimulatory sequences show striking similarities to telomeric sequences that form diverse G-quartet structures in vitro.  相似文献   

16.
The rates of evolution of purified long and short repetitive DNA sequences were examined by hybridisation analysis between the DNAs from several species of sea urchins. We find that the rates of nucleotide substitution are very comparable within mutually retained sequences for the two classes of repetitive DNA. The loss of hybridisable sequences between species also occurs at similar rates among both the short and long repetitive DNA sequences. Between species that separated less than 50 million years ago, hybridisable short repetitive sequences are lost all through the spectrum of reiteration frequencies. The long repeats contain a few sequences which are highly conserved within all of the species examined, and which amount to approximately 1% of the total genome. The short repetitive class, on the other hand, does not seem to contain any such highly conserved elements. The long repetitive sequences internally appear to contain short 'units' of reiteration, which may comprise families within the long repetitive class. We find no evidence to indicate that the majority of long and short repetitive sequences evolve by different mechanisms or at different rates.  相似文献   

17.
Extending assembly of short DNA sequences to handle error   总被引:2,自引:0,他引:2  
Inexpensive de novo genome sequencing, particularly in organisms with small genomes, is now possible using several new sequencing technologies. Some of these technologies such as that from Illumina's Solexa Sequencing, produce high genomic coverage by generating a very large number of small reads ( approximately 30 bp). While prior work shows that partial assembly can be performed by k-mer extension in error-free reads, this algorithm is unsuccessful with the sequencing error rates found in practice. We present VCAKE (Verified Consensus Assembly by K-mer Extension), a modification of simple k-mer extension that overcomes error by using high depth coverage. Though it is a simple modification of a previous approach, we show significant improvements in assembly results on simulated and experimental datasets that include error. AVAILABILITY: http://152.2.15.114/~labweb/VCAKE  相似文献   

18.
19.
SUMMARY: Many biological papers describe short, functional DNA sites without specifying their exact positions in the genome. We have developed a Web server that automates the tedious task of locating such sites in eukaryotic genomes, thus giving access to the context of rich annotations that are increasingly available for genome sequences. AVAILABILITY: http://zlab.bu.edu/site2genome/  相似文献   

20.
Adenosine to inosine (A-to-I) RNA editing is the most abundant editing event in animals. It converts adenosine to inosine in double-stranded RNA regions through the action of the adenosine deaminase acting on RNA (ADAR) proteins. Editing of pre-mRNA coding regions can alter the protein codon and increase functional diversity. However, most of the A-to-I editing sites occur in the non-coding regions of pre-mRNA or mRNA and non-coding RNAs. Untranslated regions (UTRs) and introns are located in pre-mRNA non-coding regions, thus A-to-I editing can influence gene expression by nuclear retention, degradation, alternative splicing, and translation regulation. Non-coding RNAs such as microRNA (miRNA), small interfering RNA (siRNA) and long non-coding RNA (lncRNA) are related to pre-mRNA splicing, translation, and gene regulation. A-to-I editing could therefore affect the stability, biogenesis, and target recognition of non-coding RNAs. Finally, it may influence the function of non-coding RNAs, resulting in regulation of gene expression. This review focuses on the function of ADAR-mediated RNA editing on mRNA non-coding regions (UTRs and introns) and non-coding RNAs (miRNA, siRNA, and lncRNA).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号