共查询到20条相似文献,搜索用时 15 毫秒
1.
We study the length distribution functions for the 16 possible distinct dimeric tandem repeats in DNA sequences of diverse taxonomic partitions of GenBank (known human and mouse genomes, and complete genomes of Caenorhabditis elegans and yeast). For coding DNA, we find that all 16 distribution functions are exponential. For non-coding DNA, the distribution functions for most of the dimeric repeats have surprisingly long tails, that fit a power-law function. We hypothesize that: (i) the exponential distributions of dimeric repeats in protein coding sequences indicate strong evolutionary pressure against tandem repeat expansion in coding DNA sequences; and (ii) long tails in the distributions of dimers in non-coding DNA may be a result of various mutational mechanisms. These long, non-exponential tails in the distribution of dimeric repeats in non-coding DNA are hypothesized to be due to the higher tolerance of non-coding DNA to mutations. By comparing genomes of various phylogenetic types of organisms, we find that the shapes of the distributions are not universal, but rather depend on the specific class of species and the type of a dimer. 相似文献
2.
Helga Ochoterena 《Plant Systematics and Evolution》2009,282(3-4):151-168
Putative synapomorphy assessment (primary homology assessment) is distinct for DNA strings having a codon structure (hereafter, coding DNA) versus those lacking it (hereafter, non-coding DNA). The first requires the identification of a reading frame and of usually few in-frame insertions and deletions. In non-coding DNA, where length variation is much more common, putative synapomorphy assessment is considerably less straightforward and highly depends on the alignment method. Appreciating the existence of evolutionary constraints, alignments that consider patterns associated with specific putative evolutionary events are favored. Once the sequences have been aligned, the postulated putative evolutionary events need to be coded as an additional step. In order for the alignments and the alignment coding to be falsifiable, they should be carried out using justified and explicitly formulated criteria. Alternative coding methods for the most common patterns present in alignments of non-coding DNA are discussed here. Simpler putative synapomorphy assessment will not always correlate to more reliable phylogenetic information because simplicity does not necessarily correlate to the degree of homoplasy. The use of non-coding DNA can result in more laborious coding, but at the same time in more corroborated hypotheses, mirroring their accuracy for phylogenetic inference. 相似文献
3.
Characteristics of nucleotide blocks in coding and non-coding DNA sequences from different organisms
Iu A Sprizhitski? Iu D Nechipurenko A A Aleksandrov M V Vol'kenshte?n 《Molekuliarnaia biologiia》1988,22(2):338-356
A statistical analysis of occurrence of particular nucleotide runs (1 divided by 10 nucleotides long) in DNA sequences of different species has been carried out. There are considerable differences in run distributions in DNA sequences of prokaryotes, invertebrates and vertebrates. Distribution of various types of runs has been found to be different in coding and non-coding sequences. There is an abundance of short runs 1 divided by 2 nucleotides long in coding sequences, and there is a deficiency of such runs in the non-coding regions. However, some interesting exceptions from this rule exist: for run distribution of adenine in prokaryotes and for distribution of purine-pyrimidine runs in eukaryotes. This may be stipulated by the fact that the distribution of runs are predetermined by structural peculiarities of the entire DNA molecule. Runs of guanine or cytosine of three to six nucleotides long occur predominantly in the non-coding DNA regions in eukaryotes, especially in vertebrates. 相似文献
4.
G S Mani 《Journal of theoretical biology》1992,158(4):429-445
In this paper various aspects of codon usage and k-tuple correlations in the DNA are compared. It is shown that the correlation structures of the coding and the non-coding regions are very similar and that codon usage is reasonably specific for large groups of organisms. These results suggest that the origin of codon usage is related to the origin and structure of the DNA. 相似文献
5.
Organization of gene and non-gene sequences in micronuclear DNA of Oxytricha nova. 总被引:8,自引:0,他引:8 下载免费PDF全文
In order to study the derivation of the macronuclear genome from the micronuclear genome in Oxytricha nova micronuclear DNA was partially digested with EcoRI, size fractionated, and then cloned in the lambda phage Charon 8. Clones were selected a) at random b) by hybridization with macronuclear DNA or c) by hybridization with clones of macronuclear DNA. One group of these clones contains only unique sequence DNA, and all of these had sequences that were homologous to macronuclear sequences. The number of macronuclear genes with sequences homologous to these micronuclear clones indicates that macronuclear sequences are clustered in the micronuclear genome. Many micronuclear clones contain repetitive DNA sequences and hybridize to numerous EcoRI fragments of total micronuclear DNA, yielding similar but non-identical patterns. Some micronuclear clones containing these repetitive sequences also contained unique sequence DNA that hybridized to a macronuclear sequence. These clones define a major interspersed repetitive sequence family in the micronuclear genome that is eliminated during formation of the macronuclear genome. 相似文献
6.
Dominique Mouchiroud Christian Gautier Giorgio Bernardi 《Journal of molecular evolution》1988,27(4):311-320
Summary The compositional distributions of coding sequences and DNA molecules (in the 50-100-kb range) are remarkably narrower in murids (rat and mouse) compared to humans (as well as to all other mammals explored so far). In murids, both distributions begin at higher and end at lower GC values. A comparison of homologous coding sequences from murids and humans revealed that their different compositional distributions are due to differences in GC levels in all three codon positions, particularly of genes located at both ends of the distribution. In turn, these differences are responsible for differences in both codon usage and amino acids. When GC levels at first+second codon positions and third codon positions, respectively, of murid genes are plotted against corresponding GC levels of homologous human genes, linear relationships (with very high correlation coefficients and slopes of about 0.78 and 0.60, respectively) are found. This indicates a conservation of the order of GC levels in homologous genes from humans and murids. (The same comparison for mouse and rat genes indicates a conservation of GC levels of homologous genes.) A similar linear relationship was observed when plotting GC levels of corresponding DNA fractions (as obtained by density gradient centrifugation in the presence of a sequence-specific ligand) from mouse and human. These findings indicate that orderly compositional changes affecting not only coding sequences but also noncoding sequences took place since the divergence of murids. Such directional fixations of mutations point to the existence of selective pressures affecting the genome as a whole. 相似文献
7.
B. Edwin Blaisdell 《Journal of molecular evolution》1983,19(2):122-133
Summary Coding sequences of eucaryotic nuclear DNA were characterized by an excess of short runs and a deficit of long runs of weak
and of strong hydrogen bonding bases; non-coding sequences by a deficit of short runs and an excess of long runs, in the same
of purines and of pyrimidines. The conservation of these attributes across DNA sequences coding for proteins of widely different
function, across widely different eucaryotic species for the same protein and across related genes that diverged a long time
ago and that now show large differences in base and, if coding, amino acid sequence suggested that these attributes have survival
value. It was concluded that these attributes constitute probalistic constraints on th primary structure (base sequence) of
both coding and non-coding DNA. 相似文献
8.
New statistical approach to discriminate between protein coding and non-coding regions in DNA sequences and its evaluation 总被引:3,自引:0,他引:3
C J Michel 《Journal of theoretical biology》1986,120(2):223-236
We propose a new approach to study protein coding and non-coding regions in DNA sequences, by making use of two complementary statistical methods. The principal component analysis (PCA) is a graphical method to represent DNA sequences which are characterized by some quantitative parameters: it is a help to the intuition. The discriminating analysis (DA) is a quantitative method which permits to classify the DNA sequences. It leads to an evaluation of the first method and to a decision. The value of this approach has been confirmed since we also have found some results which had been described recently in the literature. Furthermore, this general methodology has permitted us to show the existence of parameters which identify the nucleic acid sequence functional domains, without having to make use of the properties of the genetic code. 相似文献
9.
Micronuclear DNA sequences of Oxytricha fallax homologous to the macronuclear inverted terminal repeat. 总被引:4,自引:2,他引:4 下载免费PDF全文
The macronucleus of the protozoan Oxytricha fallax is generated from a micronucleus following conjugation. While the micronucleus contains high molecular weight DNA, the macronucleus contains only short linear DNA molecules which all end in the same 20 bp inverted terminal repeat (Ma-ITR). The Ma-ITR was radioactively labeled and purified for use as a probe in hybridizations to micronuclear and macronuclear DNA. Sequences homologous to the Ma-ITR were detected in micronuclear DNA. The copy number of the repeat in the micronuclear genome is approximately that required to encode the macronuclear DNA termini. The micronuclear copies are found embedded in repeated long sequence blocks. 相似文献
10.
Gene-sized macronuclear DNA molecules are clustered in micronuclear chromosomes of the ciliate Oxytricha nova. 总被引:5,自引:1,他引:5 下载免费PDF全文
L A Klobutcher A M Vailonis-Walsh K Cahill R M Ribas-Aparicio 《Molecular and cellular biology》1986,6(11):3606-3613
Following the sexual phase of its life cycle, the hypotrichous ciliate Oxytricha nova transforms a copy of its chromosomal micronucleus into a macronucleus containing short, linear DNA molecules with an average size of 2.2 kilobase pairs. In addition, more than 90% of the DNA sequences in the micronuclear genome are eliminated during this process. We have examined the organization of macronuclear DNA molecules in the micronuclear chromosomes. Macronuclear DNA molecules were found to be clustered and separated by less than 550 base pairs in two cloned segments of micronuclear DNA. Recombinant clones of two macronuclear DNA molecules that are adjacent in the micronucleus were also isolated and examined by DNA sequencing. The two macronuclear DNA molecules were found to be separated by only 90 base pairs in the micronuclear genome. 相似文献
11.
Identification of coding regions in DNA sequences remains challenging. Various methods have been proposed, but these are limited by species-dependence and the need for adequate training sets. The elements in DNA coding regions are known to be distributed in a quasi-random way, while those in non-coding regions have typical similar structures. For short sequences, these statistical characteristics cannot be extracted correctly and cannot even be detected. This paper introduces a new way to solve the problem: balanced estimation of diffusion entropy (BEDE). 相似文献
12.
Identification of coding and non-coding sequences using local Holder exponent formalism 总被引:2,自引:0,他引:2
Kulkarni OC Vigneshwar R Jayaraman VK Kulkarni BD 《Bioinformatics (Oxford, England)》2005,21(20):3818-3823
MOTIVATION: Accurate prediction of genes in genomes has always been a challenging task for bioinformaticians and computational biologists. The discovery of existence of distinct scaling relations in coding and non-coding sequences has led to new perspectives in the understanding of the DNA sequences. This has motivated us to exploit the differences in the local singularity distributions for characterization and classification of coding and non-coding sequences. RESULTS: The local singularity density distribution in the coding and non-coding sequences of four genomes was first estimated using the wavelet transform modulus maxima methodology. Support vector machines classifier was then trained with the extracted features. The trained classifier is able to provide an average test accuracy of 97.7%. The local singularity features in a DNA sequence can be exploited for successful identification of coding and non-coding sequences. CONTACT: Available on request from bd.kulkarni@ncl.res.in. 相似文献
13.
The purpose of this work is to determine the most frequent short sequences in non-coding DNA. They may play a role in maintaining the structure and function of eukaryotic chromosomes. We present a simple method for the detection and analysis of such sequences in several genomes, including Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster and Homo sapiens. We also study two chromosomes of man and mouse with a length similar to the whole genomes of the other species. We provide a list of the most common sequences of 9–14 bases in each genome. As expected, they are present in human Alu sequences. Our programs may also give a graph and a list of their position in the genome. Detection of clusters is also possible. In most cases, these sequences contain few alternating regions. Their intrinsic structure and their influence on nucleosome formation are not known. In particular, we have found new features of short sequences in C. elegans, which are distributed in heterogeneous clusters. They appear as punctuation marks in the chromosomes. Such clusters are not found in either A. thaliana or D. melanogaster. We discuss the possibility that they play a role in centromere function and homolog recognition in meiosis. 相似文献
14.
15.
Micronuclear DNA of Oxytricha nova contains sequences with autonomously replicating activity in Saccharomyces cerevisiae. 总被引:1,自引:0,他引:1 下载免费PDF全文
Oxytricha nova is a hypotrichous ciliate with micronuclei and macronuclei. Micronuclei, which contain large, chromosomal-sized DNA, are genetically inert but undergo meiosis and exchange during cell mating. Macronuclei, which contain only small, gene-sized DNA molecules, provide all of the nuclear RNA needed to run the cell. After cell mating the macronucleus is derived from a micronucleus, a derivation that includes excision of the genes from chromosomes and elimination of the remaining DNA. The eliminated DNA includes all of the repetitious sequences and approximately 95% of the unique sequences. We cloned large restriction fragments from the micronucleus that confer replication ability on a replication-deficient plasmid in Saccharomyces cerevisiae. Sequences that confer replication ability are called autonomously replicating sequences. The frequency and effectiveness of autonomously replicating sequences in micronuclear DNA are similar to those reported for DNAs of other organisms introduced into yeast cells. Of the 12 micronuclear fragments with autonomously replicating sequence activity, 9 also showed homology to macronuclear DNA, indicating that they contain a macronuclear gene sequence. We conclude from this that autonomously replicating sequence activity is nonrandomly distributed throughout micronuclear DNA and is preferentially associated with those regions of micronuclear DNA that contain genes. 相似文献
16.
非编码DNA序列是指基因组中不编码蛋白质的DNA序列。这些序列可以结合调节因子、转录为功能性RNA、单独或协同地调节生理活动和病理过程。文章围绕基因表达调控作用, 总结了近几年非编码DNA序列的研究成果, 对其结构、功能和可能的作用机制进行了初步阐述, 介绍了目前鉴定非编码DNA序列中功能元件的计算方法和实验技术, 并对非编码DNA未来的研究进行了展望。 相似文献
17.
R H Stanley N V Dokholyan S V Buldyrev S Havlin H E Stanley 《Journal of biomolecular structure & dynamics》1999,17(1):79-87
We develop a quantitative method for analyzing repetitions of identical short oligomers in coding and noncoding DNA sequences. We analyze sequences presently available in the GenBank separately for primate, mammal, vertebrate, rodent, invertebrate and plant taxonomic partitions. We find that some oligomers "cluster" more than they would if randomly distributed, while other oligomers "repel" each other. To quantify this degree of clustering, we define clustering measures. We find that (i) clustering significantly differs in coding and noncoding DNA; (ii) in most cases, monomers, dimers and tetramers cluster in noncoding DNA but appear to repel each other in coding DNA. (iii) The degree of clustering for different sources (primates, invertebrates, and plants) is more conserved among these sources in the case of coding DNA than in the case of noncoding DNA. (iv) In contrast to other oligomers, we find that trimers always prefer to cluster. (v) Clustering of each particular oligomer is conserved within the same organism. 相似文献
18.
Arrangement of nucleotide sequences in adeno-associated virus DNA 总被引:22,自引:0,他引:22
There are two types of adeno-associated virus virions which contain complementary single-stranded DNA genomes of about 1.4 × 106 daltons. The purified complementary single polynucleotide chains anneal to form duplex linear monomers, circular monomers, and linear dimers, in addition to other less well-defined structures, as identified by sedimentation and electron microscopy. All duplex species are formed by linear single polynucleotide chains of unit length, thus duplex circles and linear dimers are assumed to be held together by relatively short overlapping hydrogen-bonded regions. The initial linear monomers present after annealing the complementary single strands do not form duplex circles or oligomers when re-exposed to annealing conditions, but DNA which sediments as linear monomers after heating linear dimers at a temperature from 7 to 25 deg. C below the Tm of duplex AAV2 DNA does re-form oligomers and circles when exposed to annealing conditions. Denaturation and reannealing of any duplex species leads to the formation of all forms, indicating that the over-all single strand composition of all species is equivalent. Disruption of duplex circles by limited exonucleolytic digestion using either 3′ or 5′ exonucleases leads to the conclusion that the overlap region may have either 3′ or 5′ termini and that the overlap region represents less than 6% of the length of the genome. Exonucleolytic digestion of linear monomers to the extent of 50% leaves polynucleotide chains which cannot reanneal after denaturation, thus AAV DNA is not randomly circularly permuted. Duplex linear monomers which do not form circles when exposed to annealing conditions do form duplex circles after 1% exonuclease III digestion. A model consistent with these data is one in which the linear single polynucleotide chains present in AAV virions consist of two or more permutations. All of these chains contain terminal repetitions and their start points occur within a limited region, representing < 6% of the length of the genome. According to this interpretation AAV DNA would exist within the virion as a linear single polynucleotide chain or is cleaved at a few specific sites during extraction. 相似文献
19.
S M Ossadnik S V Buldyrev A L Goldberger S Havlin R N Mantegna C K Peng M Simons H E Stanley 《Biophysical journal》1994,67(1):64-70
Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences. 相似文献
20.
Many years ago compositional correlations were found to hold between coding and contiguous non-coding sequences. These correlations were essentially studied in whole genomes of mammals, which are characterized by strong compositional heterogeneities. Here we investigated whether these correlations also hold within the much more homogeneous isochore families. This point was checked not only in the case of mammals, but also in that of phylogenetically distant vertebrates, which are characterized by very different compositional patterns. Indeed, these are remarkably different in cold- and warm-blooded vertebrates. Fish genomes, for instance, are much more homogeneous than those of mammals and birds. The compositional correlations between coding sequences and the corresponding introns, or their 5′ and 3′ flanking regions, were studied in the isochore families of the fully sequenced genomes from four fishes (Brachydanio rerio, Oryzias latipes, Gasterosteus aculeatus and Tetraodon nigroviridis), human and chicken. 相似文献