首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Summary The compositional distribution of coding sequences from five vertebrates (Xenopus, chicken, mouse, rat, and human) is shifted toward higher GC values compared to that of the DNA molecules (in the 35–85-kb size range) isolated from the corresponding genomes. This shift is due to the lower GC levels of intergenic sequences compared to coding sequences. In the cold-blooded vertebrate, the two distributions are similar in that GC-poor genes and GC-poor DNA molecules are largely predominant. In contrast, in the warm-blooded vertebrates, GC-rich genes are largely predominant over GC-poor genes, whereas GC-poor DNA molecules are largely predominant over GC-rich DNA molecules. As a consequence, the genomes of warm-blooded vertebrates show a compositional gradient of gene concentration. The compositional distributions of coding sequences (as well as of DNA molecules) showed remarkable differences between chicken and mammals, and between mouse (or rat) and human. Differences were also detected in the compositional distribution of housekeeping and tissue-specific genes, the former being more abundant among GC-rich genes.  相似文献   

2.
We describe here the cloning, characterization and expression in E. coli of the gene coding for a DNA methylase from Spiroplasma sp. strain MQ1 (M.SssI). This enzyme methylates completely and exclusively CpG sequences. The Spiroplasma gene was transcribed in E. coli using its own promoter. Translation of the entire message required the use of an opal suppressor, suggesting that UGA triplets code for tryptophan in Spiroplasma. Sequence analysis of the gene revealed several UGA triplets, in a 1158 bp long open reading frame. The deduced amino acid sequence revealed in M.SssI all common domains characteristic of bacterial cytosine DNA methylases. The putative sequence recognition domain of M.SssI showed no obvious similarities with that of the mouse DNA methylase, in spite of their common sequence specificity. The cloned enzyme methylated exclusively CpG sequences both in vivo and in vitro. In contrast to the mammalian enzyme which is primarily a maintenance methylase, M.SssI displayed de novo methylase activity, characteristic of prokaryotic cytosine DNA methylases.  相似文献   

3.
A method for measuring the non-random bias of a codon usage table   总被引:7,自引:3,他引:4       下载免费PDF全文
We describe a new statistical method for measuring bias in the codon usage table of a gene. The test is based on the multinomial and Poisson distributions. The method is used to scan DNA sequences and measure the strength of codon preference. For E. Coli we show that the strength of codon preference is related to levels of gene expression. The method can also be used to compare base triplet frequencies with those expected from the base composition. This second type of codon bias test is useful for distinguishing coding from non-coding regions.  相似文献   

4.
M.J. Bibb  P.R. Findlay  M.W. Johnson   《Gene》1984,30(1-3):157-166
Bacterial genes that code for proteins appear to possess a codon usage characteristic of their overall base composition. This results in different but predictable non-random distributions of nucleotides within codons, permitting the recognition of protein-coding sequences in a wide range of bacterial species. The nature of this distribution depends on the base composition of the coding sequence. The position-specific differences are especially conspicuous in genes of extreme G + C content, allowing the particularly reliable prediction of the reading frame and coding strand of experimentally determined DNA sequences. This fmding has been exploited to identify the coding sequence of the viomycin phosphotransferase (vph) gene of Streptomyces vinaceus. An easily applied computer program (“Frame”) has been written to carry out and display such analyses.  相似文献   

5.
T.L. Sitnikova  A.A. Zharkikh   《Bio Systems》1993,30(1-3):113-135
This work is an attempt to study the structural features and evolutionary patterns of nucleotide sequences by analyzing their 1- through 4-plet frequencies and statistical relations between them. We present mathematical apparatus for this analysis. In particular, we introduce criteria to estimate the degree of homogeneity of L-plet composition in a given set of sequences and the dependence of the L-plet frequencies on the composition of lower orders. We apply these criteria to the study of eubacteria, mitochondria and chloroplasts. We demonstrate that L-plet frequencies are quite useful for revealing evolutionary relationship between DNA sequences and that the non-random distribution is more typical for doublets than to triplets. Non-randomness of triplet composition is more characteristic to coding than to non-coding regions, while no significant differences in dinucleotide composition can be observed. The obtained results can be used for revealing possible mechanisms of the codon usage phenomena.  相似文献   

6.
The bacterial DNA sequence in GenBank database were divided into coding and noncoding regions and examined for the base-trimer distribution in every triplet frame on the sense and antisense strands. The results revealed that for the noncoding region, both strands have very similar base-trimer distributions and have no frame specificity; that is, DNA is symmetric in the noncoding region. For the coding region, on the other hand, the symmetry is broken only in the triplet framework, and we found a special triplet-frame-specific symmetry which appears when the two complementary strands of the coding region are read from their 5 ends. In addition, the following frame specificity was also observed in the distribution of stop codons on the antisense strand of the coding region. When the antisense sequences of the open reading frames (ORFs) in the database are read in the three reading frames, the same reading frame as the corresponding ORF contains a significantly larger amount of long open frames without stop codons (i.e., nonstop frames [NSFs]) than expected, while the number of NSFs in the other two reading frames is similar to that of the expected one. That is, NSFs as well as ORFs are maintained in a frame-specific manner, and in this sense, DNA becomes symmetrical even in the coding region. These two kinds of frame-specific symmetries indicate that only an ORF and its complementary triplets are specifically recognized and maintained in DNA. We suppose that the antisense strands as well as the sense strands in the coding region may be transcribed, thereby producing various kinds of proteins corresponding to NSFs, though their amount may not be large. The presence of these proteins should have some benefits for living organisms, and therefore we propose that these proteins are upcoming enzymes having novel functions.Correspondence to: I. Urabe  相似文献   

7.
Abstract The influence of local base composition on mutations in chloroplast DNA (cpDNA) is studied in detail and the resulting, empirically derived, mutation dynamics are used to analyze both base composition and codon usage bias. A 4 × 4 substitution matrix is generated for each of the 16 possible flanking base combinations (contexts) using 17,253 noncoding sites, 1309 of which are variable, from an alignment of three complete grass chloroplast genome sequences. It is shown that substitution bias at these sites is correlated with flanking base composition and that the A+T content of these flanking sites as well as the number of flanking pyrimidines on the same strand appears to have general influences on substitution properties. The context-dependent equilibrium base frequencies predicted from these matrices are then applied to two analyses. The first examines whether or not context dependency of mutations is sufficient to generate average compositional differences between noncoding cpDNA and silent sites of coding sequences. It is found that these two classes of sites exist, on average, in very different contexts and that the observed mutation dynamics are expected to generate significant differences in overall composition bias that are similar to the differences observed in cpDNA. Context dependency, however, cannot account for all of the observed differences: although silent sites in coding regions appear to be at the equilibrium predicted, noncoding cpDNA has a significantly lower A+T content than expected from its own substitution dynamics, possibly due to the influence of indels. The second study examines the codon usage of low-expression chloroplast genes. When context is accounted for, codon usage is very similar to what is predicted by the substitution dynamics of noncoding cpDNA. However, certain codon groups show significant deviation when followed by a purine in a manner suggesting some form of weak selection other than translation efficiency. Overall, the findings indicate that a full understanding of mutational dynamics is critical to understanding the role selection plays in generating composition bias and sequence structure.  相似文献   

8.
Gene prediction relies on the identification of characteristic features of coding sequences that distinguish them from non-coding DNA. The recent large-scale sequencing of entire genomes from higher eukaryotes, in conjunction with currently used gene prediction algorithms, has provided an abundance of putative genes that can now be analysed for their compositional properties. Strong, systematic differences still exist, in several species, between the compositional properties of sets of ex novo predicted genes and genes that have been experimentally detected and/or verified. This is particularly evident in the estimated gene set (>45,000 genes) of the recently sequenced rice genome, where roughly half the predicted genes are compositionally unusual and have no known orthologues in the dicot Arabidopsis. In a few cases such differences might suggest a bias in experimental gene-finding protocols, but the quasi-random nature of the compositionally aberrant predicted genes is a strong indication that many, if not most, of them are false positives. It therefore appears that some important features of coding regions have not yet been taken into account in existing gene prediction programs. Statistical base compositional properties of curated gene data sets from vertebrates, which we briefly review here, should therefore provide a useful benchmark for fine-tuning probabilistic gene models and model parameters that are currently in use.  相似文献   

9.
The compositional distributions of large DNA fragments reflect those of the isochores that make up vertebrate genomes and can provide novel phylogenetic insights in the case of mammalian genomes (see Sabeur et al. 1993). This approach has been complemented here by an analysis of the compositional patterns of coding sequences and their codon positions (which also reflect the isochore pattern) and by a comparison of the base compositions of codon positions from homologous genes in a number of pairs of species. The results obtained using these two approaches support the existence of a general compositional pattern for mammalian genomes and of a distinct pattern for Myomorpha. The other two “special” patterns identified in a megachiropteran and in pangolin could not be tested here. Presented at the NATO Advanced Research Workshop onGenome Organization and Evolution, Spetsai, Greece, 16–22 September 1992  相似文献   

10.
Summary We have investigated the compositional properties of coding sequences from cold-blooded vertebrates and we have compared them with those from warm-blooded vertebrates. Moreover, we have studied the compositional correlations of coding sequences with the genomes in which they are contained, as well as the compositional correlations among the codon positions of the genes analyzed.The distribution of GC levels of the third codon positions of genes from cold-blooded vertebrates are distinctly different from those of warm-blooded vertebrates in that they do not reach the high values attained by the latter. Moreover, coding sequences from cold-blooded vertebrates are either equal, or, in most cases, lower in GC (not only in third, but also in first and second codon positions) than homologous coding sequences from warm-blooded vertebrates; higher values are exceptional. These results at the gene level are in agreement with the compositional differences between cold-blooded and warm-blooded vertebrates previously found at the whole genome (DNA) level (Bernardi and Bernardi 1990a,b).Two linear correlations were found: one between the GC levels of coding sequences (or of their third codon positions) and the GC levels of the genomes of cold-blooded vertebrates containing them; and another between the GC levels of third and first+ second codon positions of genes from cold-blooded vertebrates. The first correlation applies to the genomes (or genome compartments) of all vertebrates and the second to the genes of all living organisms. These correlations are tantamount to a genomic code.  相似文献   

11.
The "ovalbumin Y" gene, one of three which constitute the ovalbumin gene family in chicken has been completely sequenced. The exact location of exons can be derived from the comparison with the ovalbumin gene sequence and from the map previously established by electron microscopy analysis. During evolution of the Y gene, selective pressure has operated to retain a sequence coding for an ovalbumin-like protein. The location of splice junctions, the length of protein coding exons and the reading phase are as in the ovalbumin gene. The overall homology between the Y and ovalbumin protein coding sequences is 72.6% (resulting in a 58% homology for the amino acid sequences). A significantly high number of base changes within coding sequences are present in clusters, which appear in several cases to be correlated with the occurrence of direct repeats. The 3' untranslated sequences of the Y and ovalbumin mRNAs have diverged much more, and the Y sequence contains a peculiar U(T) rich region. Corresponding introns of the ovalbumin and Y genes differ extensively both in sequence and in length. They share however characteristic biases in their base distribution.  相似文献   

12.
A simple model is put forward to explain the long-known three-base periodicity in coding DNA. We propose the concept of same-phase triplet clustering, i.e. a condition wherein a triplet appears several times in one phase without interruption by the two other possible phases. For instance, in the sequence (i): NTT_GNN_NTT_GNN_NTT_GNN_NNN_NTT_GNN (where N is any nucleotide but combinations producing TTG are excluded) there would be clustering of same-phase TTG because this triplet appears uninterruptedly in phase 2. In contrast, in the sequence (ii): TTG_NTT_GNN_NNT_TGN_NNN_NTT_GNN there is no same-phase clustering because neighboring TTGs are all in different phases. Observe also that in sequence (i) TTG triplets are separated by 3, 3 and 6 nucleotides (3n distances), while in sequence (ii) they are separated by 1, 4 and 5 nucleotides (non-3n distances). In this work, we demonstrate that in coding DNA the 3n distances generated by (i)-type sequences proportionally outnumber the non-3n distances generated by (ii)-type sequences, this condition would be the basis of three-base periodicity. Randomized sequences had (i)- and (ii)-type sequences too but clustering was statistically different. To prove our model we generated (i)-type sequences in a randomized sequence by inducing clustering of same-phase triplets. In agreement with the model this sequence displayed three-base periodicity. Furthermore, two- and four-base periodicities could also be induced by artificially inducing clustering of duplets and tetraplets.  相似文献   

13.
The general property of asymmetry in word use in meaningful texts written in a variety of languages, motivates a quantification of the differences in the use of mutually symmetric triplets in genomic sequences. When this is done in the three reading frames, high values found for one of them are used as indication that the sequence is coding for a protein. Moreover, a similar quantification of the differences in the use of complementary triplets is introduced, again with predictive power of the coding character of a sequence. This method reflects the non-equivalence between sense and anti-sense strand of a coding segment. In both approaches, "linguistic asymmetry" in coding sequences is related to the form of the genetic code and to the bias in codon usage and amino acid use skews.  相似文献   

14.
A Markov analysis of DNA sequences   总被引:12,自引:0,他引:12  
We present a model by which we look at the DNA sequence as a Markov process. It has been suggested by several workers that some basic biological or chemical features of nucleic acids stand behind the frequencies of dinucleotides (doublets) in these chains. Comparing patterns of doublet frequencies in DNA of different organisms was shown to be a fruitful approach to some phylogenetic questions (Russel & Subak-Sharpe, 1977). Grantham (1978) formulated mRNA sequence indices, some of which involve certain doublet frequencies. He suggested that using these indices may provide indications of the molecular constraints existing during gene evolution. Nussinov (1981) has shown that a set of dinucleotide preference rules holds consistently for eukaryotes, and suggested a strong correlation between these rules and degenerate codon usage. Gruenbaum, Cedar & Razin (1982) found that methylation in eukaryotic DNA occurs exclusively at C-G sites. Important biological information thus seems to be contained in the doublet frequencies. One of the basic questions to be asked (the "correlation question") is to what extent are the 64 trinucleotide (triplet) frequencies measured in a sequence determined by the 16 doublet frequencies in the same sequence. The DNA is described here as a Markov process, with the nucleotides being outcomes of a sequence generator. Answering the correlation question mentioned above means finding the order of the Markov process. The difficulty is that natural sequences are of finite length, and statistical noise is quite strong. We show that even for a 16000 nucleotide long sequence (like that of the human mitochondrial genome) the finite length effect cannot be neglected. Using the Markov chain model, the correlation between doublet and triplet frequencies can, however, be determined even for finite sequences, taking proper account of the finite length. Two natural DNA sequences, the human mitochondrial genome and the SV40 DNA, are analysed as examples of the method.  相似文献   

15.
Sánchez J 《Bioinformation》2011,6(9):327-329
All coding DNAs exhibit 3-base periodicity (TBP), which may be defined as the tendency of nucleotides and higher order n-tuples, e.g. trinucleotides (triplets), to be preferentially spaced by 3, 6, 9 etc, bases, and we have proposed an association between TBP and clustering of same-phase triplets. We here investigated if TBP was affected by intercodon dinucleotide tendencies and whether clustering of same-phase triplets was involved. Under constant protein sequence intercodon dinucleotide frequencies depend on the distribution of synonymous codons. So, possible effects were revealed by randomly exchanging synonymous codons without altering protein sequences to subsequently document changes in TBP via frequency distribution of distances (FDD) of DNA triplets. A tripartite positive correlation was found between intercodon dinucleotide frequencies, clustering of same-phase triplets and TBP. So, intercodon C|A (where "|" indicates the boundary between codons) was more frequent in native human DNA than in the codon-shuffled sequences; higher C|A frequency occurred along with more frequent clustering of C|AN triplets (where N jointly represents A, C, G and T) and with intense CAN TBP. The opposite was found for C|G, which was less frequent in native than in shuffled sequences; lower C|G frequency occurred together with reduced clustering of C|GN triplets and with less intense CGN TBP. We hence propose that intercodon dinucleotides affect TBP via same-phase triplet clustering. A possible biological relevance of our findings is briefly discussed.  相似文献   

16.
17.
18.
19.
In this paper, we first present a new concept of ‘weight’ for 64 triplets and define a different weight for each kind of triplet. Then, we give a novel 2D graphical representation for DNA sequences, which can transform a DNA sequence into a plot set to facilitate quantitative comparisons of DNA sequences. Thereafter, associating with a newly designed measure of similarity, we introduce a novel approach to make similarities/dissimilarities analysis of DNA sequences. Finally, the applications in similarities/dissimilarities analysis of the complete coding sequences of β-globin genes of 11 species illustrate the utilities of our newly proposed method.  相似文献   

20.
Carels N  Bernardi G 《FEBS letters》2000,472(2-3):302-306
The base composition patterns of genes, coding sequences and gene expression levels were analyzed in the available long sequences (contigs) of Arabidopsis. Chromosome 5 was analyzed in detail and all chromosomes for which sequence data are now available show essentially the same large-scale compositional properties. Guanine+cytosine levels of genes and of their coding regions, as well as gene densities and expression levels, all show a marked tendency to be higher in the distal regions of Arabidopsis chromosomes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号