首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
For almost 50 years the conclusive explanation of Chargaff’s second parity rule (CSPR), the equality of frequencies of nucleotides A=T and C=G or the equality of direct and reverse complement trinucleotides in the same DNA strand, has not been determined yet. Here, we relate CSPR to the interstrand mirror symmetry in 20 symbolic quadruplets of trinucleotides (direct, reverse complement, complement, and reverse) mapped to double-stranded genome. The symmetries of Q-box corresponding to quadruplets can be obtained as a consequence of Watson–Crick base pairing and CSPR together. Alternatively, assuming Natural symmetry law for DNA creation that each trinucleotide in one strand of DNA must simultaneously appear also in the opposite strand automatically leads to Q-box direct-reverse mirror symmetry which in conjunction with Watson–Crick base pairing generates CSPR. We demonstrate quadruplet’s symmetries in chromosomes of wide range of organisms, from Escherichia coli to Neanderthal and human genomes, introducing novel quadruplet-frequency histograms and 3D-diagrams with combined interstrand frequencies. These “landscapes” are mutually similar in all mammals, including extinct Neanderthals, and somewhat different in most of older species. In human chromosomes 1–12, and X, Y the “landscapes” are almost identical and slightly different in the remaining smaller and telocentric chromosomes. Quadruplet frequencies could provide a new robust tool for characterization and classification of genomes and their evolutionary trajectories.  相似文献   

2.
Distribution of repetitious sequences in chick nuclear DNA   总被引:7,自引:3,他引:4  
By an improved method of hydroxylapatite chromatography, the reassociated sequences of chick nuclear DNA were isolated, and their base composition analysed. By increasing the amount of reassociation, the G + C content of the renatured sequences decreased progressively to reach a mean value corresponding to that of the total DNA. In order to study the distribution of the families, or group of families having different amount of reassociation, DNA was fractionated by CsC1 density gradient centrifugation. Fractions having different G + C content were obtained, and their reassociation rates analysed. At high C(o)t value of renaturation (C(o)t=50) the amount of reassociated sequences included in the high or in the low buoyant density DNA fractions was approximately the same, but their G + C content was as expected different. At lower C(o)t values of renaturation (between C(o)t of 0.2 and the C(o)t of 10), the results indicated an heterogeneity of the repeated sequences in the A + T rich DNA fractions, as compared to the G + C rich ones.  相似文献   

3.
It has been hypothesized that a large fraction of 24% noncoding DNA in R. prowazekii consists of degraded genes. This hypothesis has been based on the relatively high G+C content of noncoding DNA. However, a comparison with other genomes also having a low overall G+C content shows that this argument would also apply to other bacteria. To test this hypothesis, we study the coding potential in sets of genes, pseudogenes, and intergenic regions. We find that the correlation function and the χ2-measure are clearly indicative of the coding function of genes and pseudogenes. However, both coding potentials make almost no indication of a preexisting reading frame in the remaining 23% of noncoding DNA. We simulate the degradation of genes due to single-nucleotide substitutions and insertions/deletions and quantify the number of mutations required to remove indications of the reading frame. We discuss a reduced selection pressure as another possible origin of this comparatively large fraction of noncoding sequences. Received: 27 December 1999 / Accepted: 5 July 2000  相似文献   

4.
In recent years, the amount of molecular sequencing data from Tetrahymena thermophila has dramatically increased. We analyzed G + C content, codon usage, initiator codon context and stop codon sites in the extremely A + T rich genome of this ciliate. Average G + C content was 38% for protein coding regions, 21% for 5' non-coding sequences, 19% for 3' non-coding sequences, 15% for introns, 19% for micronuclear limited sequences and 17% for macronuclear retained sequences flanking micronuclear specific regions. The 75 available T. thermophila protein coding sequences favored codons ending in T and, where possible, avoided those with G in the third position. Highly expressed genes were relatively G + C-rich and exhibited an extremely biased pattern of codon usage while developmentally regulated genes were more A + T-rich and showed less codon usage bias. Regions immediately preceding Tetrahymena translation initiator codons were generally A-rich. For the 60 stop codons examined, the frequency of G in the end + 1 site was much higher than expected whereas C never occupied this position.  相似文献   

5.
A frequently used approach for detecting potential coding regions is to search for stop codons. In the standard genetic code 3 out of 64 trinucleotides are stop codons. Hence, in random or non-coding DNA one can expect every 21st trinucleotide to have the same sequence as a stop codon. In contrast, the open reading frames (ORFs) of most protein-coding genes are considerably longer. Thus, the stop codon frequency in coding sequences deviates from the background frequency of the corresponding trinucleotides. This has been utilized for gene prediction, in particular, in detecting protein-coding ORFs. Traditional methods based on stop codon frequency are based on the assumption that the GC content is about 50%. However, many genomes show significant deviations from that value. With the presented method we can describe the effects of GC content on the selection of appropriate length thresholds of potentially coding ORFs. Conversely, for a given length threshold, we can calculate the probability of observing it in a random sequence. Thus, we can derive the maximum GC content for which ORF length is practicable as a feature for gene prediction methods and the resulting false positive rates. A rough estimate for an upper limit is a GC content of 80%. This estimate can be made more precise by including further parameters and by taking into account start codons as well. We demonstrate the feasibility of this method by applying it to the genomes of the bacteria Rickettsia prowazekii, Escherichia coli and Caulobacter crescentus, exemplifying the effect of GC content variations according to our predictions. We have adapted the method for predicting coding ORFs by stop codon frequency to the case of GC contents different from 50%. Usually, several methods for gene finding need to be combined. Thus, our results concern a specific part within a package of methods. Interestingly, for genomes with low GC content such as that of R. prowazekii, the presented method provides remarkably good results even when applied alone.  相似文献   

6.
Thalassiosira weissflogii (Grun.) Fryxell et Hasle is one of the more commonly studied centric diatoms, and yet molecular studies of this organism are still in their infancy. The ability to identify open reading frames and thus distinguish between introns and exons, coding and noncoding sequence is essential to move from nuclear DNA sequences to predicted amino acid sequences. To facilitate the identification of open reading frames in T. weissflogii , two newly identified nuclear genes encoding β-tubulin and t  -complex polypeptide (TCP)-γ, along with six previously published nuclear DNA sequences, were examined for general structural features. The coding region of the nuclear open reading frames had a G + C content of about 49% and could readily be distinguished from noncoding sequence due to a significant difference in G + C content. The introns were uniformly small, about 100 base pairs in size. Furthermore, the 5' and 3' splice sites of introns displayed the canonical GT/AG sequence, further facilitating recognition of noncoding regions. Six of the nuclear open reading frames displayed relatively little bias in the use of synonymous codons, as exemplified by the cDNAs encoding β-tubulin and TCP-γ. Two open reading frames displayed strong bias in the use of particular codons (although the codons used were different), as exemplified by the cDNA encoding fucoxanthin chlorophyll a/c binding protein. Knowledge of codon bias should facilitate, for example, design of degenerate PCR primers and potential heterologous reporter gene constructs.  相似文献   

7.
A Dictyostelium discoideum DNA fragment isolated on the basis of its ability to complement the ural mutation of yeast, codes for a dihydroorotate dehydrogenase activity. The complete nucleotide sequence of this 1898 bp fragment has been determined and reveals an open reading frame capable of coding for a 369 amino acid polypeptide of molecular mass 47.000. The gene shows preferential use of codons with weak pairing forces. Eleven codons, mainly those with a G in the third position, are absent. The flanking sequences are unusually rich in A + T (80%). Several direct and inverted repeats exist in the 5' flanking sequence.  相似文献   

8.
9.
10.
Summary The G+C content of DNA varies widely in different organisms, especially microorganisms. This variation is accompanied by changes in the nucleotide composition of silent positions in codons. (Silent positions are defined and explained in the text.) These changes are mostly neutral or near neutral, and appear to result from mutation pressure in the direction of increasing either A+T (AT pressure) or G+C(GC pressure) content. Variations in G+C content are also accompanied by substitutions at replacement positions in codons. These substituions produce changes in the amino acid content of homologous proteins. The examples studied were genes for 13 mitochondrial proteins in five species, and A and B genes for bacterial tryptophan synthase in four species.In microorganisms, varying AT and GC mutational pressures, presumably resulting from shifts in the DNA polymerase system, exert strong effects on molecular evolution by changing the G+C content of DNA. These effects may be greater than those of random drift. The effects of GC pressure on silent substitutions in the systems examined are several times as great as the effects on replacement substitutions.GC pressure is exerted on noncoding as well as coding regions in mitochondrial DNA. This is shown by the close correlation (correlation coefficient, 0.99) of the G+C content of the noncoding D loop of mitochondria with the G+C content of silent positions in the corresponding mitochondrial genes.  相似文献   

11.
The mean (G + C) composition (51.0%) and standard deviation (+/- 3.8%) of published DNA sequences accounting for 10% of the E. coli genome is in excellent agreement with the principal overall distribution determined by high resolution melting. While differences in base and neighbor characteristics are small and uniform throughout all regions of the genome, it is found that the (G + C) content of sequences varies in segmented fashion within boundaries corresponding to coding (53% G + C) and noncoding (46% G + C) regions; with variances in the latter being six-fold greater than in coding regions. The variance in different regions shows a strong negative dependence on (G + C) content of the region, reflecting the condition that A-T and G-C base pairs are preferred neighbors of A-T and C-G pairs, respectively; with the bias increasing with decreasing (G + C) content. Neighbor analysis indicates the most extreme positive biases occur in AA, TT, GC and CG throughout all regions, but particularly in noncoding regions. Extraordinary numbers of oligomeric strings of (A)n, etc., are the further consequence of this bias. These and other characteristics point to the existence of inherent biases in neighbor frequencies levied during replication or repair, and which reflect, in turn, neighbor influences during mutation. The bias in codon usage noted by Grantham and others is seen here as due, in part, to the adaptation of coding sequences to this microenvironment through selection among synonymous codons so as to preserve inherent neighbor biases.  相似文献   

12.
Primary structure of the herpesvirus saimiri genome.   总被引:55,自引:41,他引:14       下载免费PDF全文
This report describes the complete nucleotide sequence of the genome of herpesvirus saimiri, the prototype of gammaherpesvirus subgroup 2 (rhadinoviruses). The unique low-G + C-content DNA region has 112,930 bp with an average base composition of 34.5% G + C and is flanked by about 35 noncoding high-G + C-content DNA repeats of 1,444 bp (70.8% G + C) in tandem orientation. We identified 76 major open reading frames and a set of seven U-RNA genes for a total of 83 potential genes. The genes are closely arranged, with only a few regions of sizable noncoding sequences. For 60 of the predicted proteins, homologous sequences are found in other herpesviruses. Genes conserved between herpesvirus saimiri and Epstein-Barr virus (gammaherpesvirus subgroup 1) show that their genomes are generally collinear, although conserved gene blocks are separated by unique genes that appear to determine the particular phenotype of these viruses. Several deduced protein sequences of herpesvirus saimiri without counterparts in most of the other sequenced herpesviruses exhibited significant homology with cellular proteins of known function. These include thymidylate synthase, dihydrofolate reductase, complement control proteins, the cell surface antigen CD59, cyclins, and G protein-coupled receptors. Searching for functional protein motifs revealed that the virus may encode a cytosine-specific methylase and a tyrosine-specific protein kinase. Several herpesvirus saimiri genes are potential candidates to cooperate with the gene for saimiri transformation-associated protein of subgroup A (STP-A) in T-lymphocyte growth stimulation.  相似文献   

13.
A portion of human satellite I DNA is digested by HinfI into three fragments of 775, 875 and 820bp in length which form a tandemly repeated unit 2.47kb in length, specific to male DNA. One Alu family member per repeat is found within the relatively G+C rich 775bp fragment. The 875 and 820bp fragments are highly A+T rich and consist of long stretches of poly dAdT and related sequences.  相似文献   

14.
15.
Li Z 《Bio Systems》1999,52(1-3):55-61
Any DNA strand can be identified with a word in the language X* where X=?A, C, G, T?. By encoding A as 000, C as 010, G as 101, and T as 111, we treat the DNA operations concatenation, union, reverse, complement, annealing and melting, from the algebraic point of view. The concatenation and union play the roles of multiplication and addition over some algebraic structures, respectively. Then the rest of the operations turn out to be the homomorphisms or anti-homomorphisms of these algebraic structures. Using this technique, we find the relationship among these DNA operations.  相似文献   

16.
CpG deficiency, dinucleotide distributions and nucleosome positioning   总被引:2,自引:0,他引:2  
The dinucleotide CpG is deficient in (A + T)-rich regions of vertebrate DNA in both coding and non-coding sequences and there is a corresponding increase above expectation in the occurrence of TpG and CpA. By contrast in (G + C)-rich regions no deficiency of CpG is found. Such (G + C)-rich sequences, containing the expected number of CpG dinucleotides, alternate along the genome with (A + T)-rich sequences which have a lower than expected CpG content. The G + C content of vertebrate DNA can oscillate with a period of 150-200 bp and this may be a factor in positioning nucleosomes. The role of mutagenesis in loss of CpG and increase of A + T, particularly in non-coding regions, is discussed.  相似文献   

17.
Intergenic sequences represent 63% of the mitochondrial 'long' (85 kb) genome of Saccharomyces cerevisiae. They comprise 170-200 AT spacers that correspond to 47% of the genome and are separated from each other by GC clusters, ORFs, ori sequences, as well as by protein-coding genes. Intergenic AT spacers have an average size of 190 bp, and a GC level of 5%; they are formed by short (20-30 nt on the average) A/T stretches separated by C/G mono- to trinucleotides. An analysis of the primary structures of all intergenic AT spacers already sequenced (32 kb; 80% of the total) has shown that they are characterized by an extremely high level of short sequence repetitiveness and by a characteristic sequence pattern; the frequencies of A/T isostichs conspicuously deviate from statistical expectations, and exponentially decrease when their (AT + TA)/(AA + TT) ratio, R, decreases. A situation basically identical was found in the AT spacers of the mitochondrial genome (19 kb) of Torulopsis glabrata. The sequence features of the AT spacers indicate that they were built in evolution by an expansion process mainly involving rounds of duplication, inversion and translocation events which affected an initial oligodeoxynucleotide (endowed with a particular R ratio) and the sequences derived from it. In turn, the initial oligodeoxynucleotide appears to have arisen from an ancestral promoter-replicator sequence which was at the origin of the nonanucleotide promoters present in the mitochondrial genomes of several yeasts. Common sequence patterns indicate that the AT spacers so formed gave rise to the var1 gene (by linking and phasing of short ORFs), to the DNA stretches corresponding to the untranslated mRNA sequences and to the central stretches of ori sequences from S. cerevisiae.  相似文献   

18.
I show that the recognition sequences of Type II restriction systems are correlated with the G + C content of the host bacterial DNA. Almost all restriction systems with G + C rich tetranucleotide recognition sequences are found in species with A + T rich genomes, whereas G + C rich hexanucleotide and octanucleotide recognition sequences are found almost exclusively in species with G + C rich genomes. Most hexanucleotide recognition sequences found in species with A + T rich genomes are A + T rich. This distribution eliminates a substantial proportion of the potential variance in the frequency of restriction recognition sequences in the host genomes. As a consequence, almost all restriction recognition sequences, including those eight base pairs in length (Not I and Sfi I), are predicted to occur with a frequency ranging from once every 300 to once every 5,000 base pairs in the host genome. Since the G + C content of bacteriophage DNA and of the host genome are also correlated, the data presented is evidence that most Type II "restriction systems" are indeed involved in phage restriction.  相似文献   

19.
CpG islands in vertebrate genomes   总被引:120,自引:0,他引:120  
  相似文献   

20.
We investigated the occurrence of gene conversions between paralogous sequences of Salmoninae derived from ancestral tetraploidization and their effect on the evolutionary history of DNA sequences. A microsatellite with long flanking regions (750 bp) including both coding and noncoding sequences was analyzed. Microsatellite size polymorphism was used to detect the alleles of both paralogous counterparts and infer linkage arrangement between loci. DNA sequencing of seven Salmoninae species revealed that paralogous sequences were highly differentiated within species, especially for noncoding regions. Ten gene conversion events between paralogous sequences were inferred. While these events appears to have homogenized regions of otherwise highly differential paralogous sequences, they amplified the differentiation among orthologous sequences. Their effects were larger on coding than on noncoding regions. As a consequence, noncoding sequences grouped by orthologous lineages in phylogenetic trees, whereas coding regions grouped by taxa. Based upon these results, we present a model showing how gene conversion events may also result in the PCR amplification of nonorthologous sequences in different taxa, with obvious complications for phylogenetic inferences, comparative mapping, and population genetic studies. Received: 11 October 2000 / Accepted: 18 September 2001  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号