首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
CpG deficiency, dinucleotide distributions and nucleosome positioning   总被引:2,自引:0,他引:2  
The dinucleotide CpG is deficient in (A + T)-rich regions of vertebrate DNA in both coding and non-coding sequences and there is a corresponding increase above expectation in the occurrence of TpG and CpA. By contrast in (G + C)-rich regions no deficiency of CpG is found. Such (G + C)-rich sequences, containing the expected number of CpG dinucleotides, alternate along the genome with (A + T)-rich sequences which have a lower than expected CpG content. The G + C content of vertebrate DNA can oscillate with a period of 150-200 bp and this may be a factor in positioning nucleosomes. The role of mutagenesis in loss of CpG and increase of A + T, particularly in non-coding regions, is discussed.  相似文献   

2.
Simmen MW 《Genomics》2008,92(1):33-40
In mammalian genomes CpGs occur at one-fifth their expected frequency. This is accepted as resulting from cytosine methylation and deamination of 5-methylcytosine leading to TpG and CpA dinucleotides. The corollary that a CpG deficit should correlate with TpG excess has not hitherto been systematically tested at a genomic level. I analyzed genome sequences (human, chimpanzee, mouse, pufferfish, zebrafish, sea squirt, fruitfly, mosquito, and nematode) to do this and generally to assess the hypothesis that CpG deficit, TpG excess, and other data are accountable in terms of 5-methylcytosine mutation. In all methylated genomes local CpG deficit decreases with higher G + C content. Local TpG surplus, while positively associated with G + C level in mammalian genomes but negatively associated with G + C in nonmammalian methylated genomes, is always explicable in terms of the CpG trend under the methylation model. Covariance of dinucleotide abundances with G + C demonstrates that correlation analyses should control for G + C. Doing this reveals a strong negative correlation between local CpG and TpG abundances in methylated genomes, in accord with the methylation hypothesis. CpG deficit also correlates with CpT excess in mammals, which may reflect enhanced cytosine mutation in the context 5'-YCG-3'. Analyses with repeat-masked sequences show that the results are not attributable to repetitive elements.  相似文献   

3.
Zhao Z  Zhang F 《Gene》2006,366(2):316-324
We analyzed n-mers (n=3-8) in the local environment of 8,249,446 human SNPs and compared their distribution with that in the genome reference sequences. The results revealed that the short sequences, which contained at least one CpG dinucleotide, occurred more frequently in the local SNP sequences than in the genome sequences. To exclude the hypermutability effect of the methylated CpG dinucleotides on the sequence context of SNPs, we examined the distribution patterns for each of the six categories of substitution. We observed the similar pattern (i.e., CpG-containing n-mers vs. non-CpG-containing n-mers) in SNP categories A/G, C/T and C/G but the opposite pattern in category A/T. We next identified 34,928 putative CpG islands in the human genome and located 133,591 SNPs within these islands. In the CpG islands, CpG SNPs were 3.92-fold less prevalent relative to the presence of CpG dinucleotides. Conversely, in the human genome, the frequency of CpG dinucleotides at the polymorphic sites was 6.09 times that in the genome reference sequences. These results support the previous views of mutational suppression at the CpG sites in the CpG islands and hypermutability of the methylated CpG dinucleotides that are prevalent in the non-CpG island sequences in the human genome. Our study represents a comprehensive investigation of the sequence context of SNPs in the human genome and in human CpG islands.  相似文献   

4.
Why does the human factor IX gene have a G + C content of 40%?   总被引:20,自引:2,他引:18       下载免费PDF全文
The factor IX gene has a G + C content of approximately 40% in all mammalian species examined. In human factor IX, C----T and G----A transitions at the dinucleotide CpG are elevated at least 24-fold relative to other transitions. Can the G + C content be explained solely by this hot spot of mutation? Using our mathematical model, we show that the elevation of mutation at CpG cannot alone lower the G + C content below 45%. To search for other hot spots of mutation that might contribute to the reduction of G + C content, we assessed the relative rates of base substitution in our sample of 160 families with hemophilia B. Seventeen independent single-base substitutions are reported herein for a total of 96 independent point mutations in our sample. The following conclusions emerge from the analysis of our data and, where appropriate, the data of others: (1) Transversions at CpG are elevated an estimated 7.7-fold relative to other transversions. (2) The mutation rates at non-CpG dinucleotides are remarkably uniform; none of the observed rates are either more than twofold above the median for transitions or more than threefold above the median for transversions. (3) The pattern of recent mutation is compatible with the pattern during mammalian evolution that has maintained the G + C content of the factor IX gene at approximately 40%.  相似文献   

5.
Parvoviruses are rapidly evolving viruses that infect a wide range of hosts, including vertebrates and invertebrates. Extensive methylation of the parvovirus genome has been recently demonstrated. A global pattern of methylation of CpG dinucleotides is seen in vertebrate genomes, compared to “fractional” methylation patterns in invertebrate genomes. It remains unknown if the loss of CpG dinucleotides occurs in all viruses of a given DNA virus family that infect host species spanning across vertebrates and invertebrates. We investigated the link between the extent of CpG dinucleotide depletion among autonomous parvoviruses and the evolutionary lineage of the infected host. We demonstrate major differences in the relative abundance of CpG dinucleotides among autonomous parvoviruses which share similar genome organization and common ancestry, depending on the infected host species. Parvoviruses infecting vertebrate hosts had significantly lower relative abundance of CpG dinucleotides than parvoviruses infecting invertebrate hosts. The strong correlation of CpG dinucleotide depletion with the gain in TpG/CpA dinucleotides and the loss of TpA dinucleotides among parvoviruses suggests a major role for CpG methylation in the evolution of parvoviruses. Our data present evidence that links the relative abundance of CpG dinucleotides in parvoviruses to the methylation capabilities of the infected host. In sum, our findings support a novel perspective of host-driven evolution among autonomous parvoviruses.  相似文献   

6.
Asymmetrical distribution of CpG in an 'average' mammalian gene.   总被引:24,自引:7,他引:17       下载免费PDF全文
The frequency and distribution of the rare dinucleotide CpG was examined in 15 mammalian genes. CpG is highly methylated at cytosine in mammalian DNA (1,2) and 5-methylcytosine (5mC) is thought to undergo a transition mutation via deamination to produce thymine (3). This would result in the accumulation of TpG and CpA and depletion of CpG during evolution (4). Consistent with this hypothesis, the gene sample of 26,541 dinucleotides contained CpG at 40% the frequency expected by base composition and the CpG transition products, TpG+CpA, were significantly elevated at 124% of expected random frequency. However, because CpG occurs at only 25% of expected random frequency in the genome, the sampled genes were considerably enriched in this dinucleotide. CpGs were asymmetrically distributed in sequences flanking the genes. 5'-flanking sequences were enriched in CpG at 135% of the frequency expected assuming a symmetrical distribution of all the CpGs in the sampled genes (p less than 0.01), while 3'-flanking regions were depleted in CpG at 40% of expected values (p less than 0.0001). This asymmetry may reflect the role of 5-methylcytosine in gene expression. In contrast the frequencies of GpC and GpT+ ApC did not differ significantly from that predicted by base composition and these dinucleotides were not asymmetrically distributed.  相似文献   

7.
8.
Ornithine transcarbamylase (OTC) deficiency, the most common inborn error of the urea cycle, shows an X-linked inheritance with frequent new mutations. Investigations of patients with OTC deficiency have indicated an overproportionate share of mutations at CpG dinucleotides. These statistics may, however, be biased because of the easy detection of CpG mutations by screening for TaqI and MspI restriction sites. In the present study, we investigated 30 patients, with diagnosed OTC deficiency, for new sites with an increased probability of mutation by complete DNA sequence analysis of all ten exons of the OTC gene. In six patients, two codons in exons 2 and 5, respectively, contained novel recurrent mutations, all of them affecting CpG dinucleotides. They included C to T and G to A transitions in codon 40, changing an arginine to cysteine and histidine, respectively, and a C to T transition in codon 178 causing the substitution of threonine by methionine. The first two mutations were characterized by a mild clinical course with high risk of sudden death in late childhood or early adulthood, whereas the third mutation showed a more severe phenotypic expression. In addition to these novel mutations, we identified four patients with the known R277W mutation, making it the most common point mutation of the OTC gene.  相似文献   

9.
CpG islands in vertebrate genomes   总被引:120,自引:0,他引:120  
  相似文献   

10.
The CpG island in the 5' region of the G6PD gene of man and mouse   总被引:1,自引:0,他引:1  
D Toniolo  M Filippi  R Dono  T Lettieri  G Martini 《Gene》1991,102(2):197-203
The nucleotide (nt) sequence of the entire CpG island in the 5' region of the human glucose-6-phosphate dehydrogenase-encoding gene (G6PD) and of the corresponding region in mouse was determined. In comparison to the human gene, the 5' region of the mouse G6PD gene has highly reduced G + C and CpG dinucleotide content, but maintains the functional features of a CpG island, as it is differentially methylated on the active vs. the inactive X chromosome. In addition to the expected conservation of exons, nt sequence comparison showed that several boxes are highly conserved between the two species in the 5' flanking DNA and in the first intron. Moreover, the conservation of the position of most CpG dinucleotides in the promoter region and in one of the upstream boxes, at about -900, gives support to the hypothesis that, in each island, specific CpGs play a major role in the regulation of gene expression.  相似文献   

11.
Increased G + C content of DNA stabilizes methyl CpG dinucleotides.   总被引:3,自引:1,他引:2       下载免费PDF全文
The vertebrate genome is a mosaic of regions differing dramatically in their G + C content. Those regions with a high G + C content contain the expected number of CpG dinucleotides and we propose that following methylation these have been protected from deamination by the increased stability of the surrounding DNA duplex. This argument applies both to the microenvironment of the CpG dinucleotide and to whole gene regions.  相似文献   

12.
Summary The extent to which CpG dinucleotides were depleted in a large set of angiosperm genes was, on average, very similar to the extent of CpG depletion in total angiosperm genomic DNA and far less than the extent of CpG depletion in vertebrate genes. Gene sequences from Arabidopsis thaliana, a dicotyledonous species with relatively low levels of total 5-methylcytosine, were just as CpG depleted as the angiosperm genes in general. Furthermore, levels of TpG and CpA, the potential deamination mutation products of methylated CpG, were elevated in A. thaliana genes, supporting a high rate of deamination mutation as the cause of the CpG deficiency. Using a method that takes into account the dinucleotide frequencies within each sequence of interest, we calculated the expected frequencies of CpNpG trinucleotides, which are also highly methylated in angiosperm genomes. CpNpG trinucleotides were not extensively enriched or depleted in the angiosperm genes. Two hypotheses could account for our results. Differential depletion of CpG and CpNpG within angiosperm genes and differential depletion of CpG in angiosperm and vertebrate genes could arise from different efficiencies of mismatch repair or from different levels of cytosine methylation in the cell lineages that contribute to germ cells.Offprint requests to: M. Gardiner-Garden  相似文献   

13.
J E Hyde  P F Sims 《Gene》1987,61(2):177-187
We have statistically analysed the distribution of nucleotides and dinucleotides in 21 genes of the 81% A + T-rich human malaria parasite Plasmodium falciparum. The mRNA-synonymous strands of this protozoan show in general a marked excess of purines over pyrimidines, correlated with abnormally high levels of Lys and Glu. We have used the large differences in base composition between coding and non-coding regions to estimate that the parasite possesses in the range of 2700-5400 genes. The dinucleotide preference patterns are compared with consensus patterns derived from other organisms [Nussinov, Nucl. Acids Res. 12 (1984) 1749-1763]. Patterns in the coding regions surprisingly resemble those of higher, rather than lower eukaryotes, particularly with respect to TG elevation and CG suppression. The latter is correlated with an abnormally low level of Arg in these parasites. In the non-coding regions, the four dinucleotides made up of C and/or G are found with significantly higher frequencies than expected (approx. 50-150%), specifically to the 5' side of the coding regions. The possible role of these dinucleotides in control sequences is discussed.  相似文献   

14.
Zhao Z  Zhang F 《Genomics》2006,87(1):68-74
A genome-wide view of sequence mutability in mice is still limited, although biologists usually assume the same scenario for mice as for humans. In this study, we examined the sequence context in the local environment of 482,528 mouse single nucleotide polymorphisms (SNPs). We found that CpG-containing short sequences, in general, had more representation in the local sequences of SNPs compared to the genome sequences. The extent of this overrepresentation was stronger in mice than in humans, which is inconsistent with previous observations of the weaker neighboring-nucleotide biases on mouse SNPs. To exclude the CpG effect, we compared the distribution patterns of short sequences among the six categories of SNPs. The results revealed an even stronger pattern in the CpG-containing group for C/G substitution compared to for A/G or C/T substitutions. We next performed the first genome-wide sequence context analysis of SNPs in the mouse CpG islands. SNPs occurring at CpG sites were 3.14-fold less prevalent than expected, suggesting the suppression of methylation-dependent deamination in the CpG islands. The extent of this suppression was less in mice than in humans. Finally, compared with humans, the observations of a greater deficit of CpG dinucleotides, a stronger overrepresentation of CpG-containing n-mers surrounding the polymorphic sites, and a higher SNP/genome ratio of CpG dinucleotides in the mouse genome support the "loss of CpG islands" model in the mouse lineage.  相似文献   

15.

Background

Papillomaviruses and polyomaviruses are small ds-DNA viruses infecting a wide-range of vertebrate hosts. Evidence supporting co-evolution of the virus with the host does not fully explain the evolutionary path of papillomaviruses and polyomaviruses. Studies analyzing CpG dinucleotide frequencies in virus genomes have provided interesting insights on virus evolution. CpG dinucleotide depletion has not been extensively studied among papillomaviruses and polyomaviruses. We sought to analyze the relative abundance of dinucleotides and the relative roles of evolutionary pressures in papillomaviruses and polyomaviruses.

Methods

We studied 127 full-length sequences from papillomaviruses and 56 full-length sequences from polyomaviruses. We analyzed the relative abundance of dinucleotides, effective codon number (ENC), differences in synonymous codon usage. We examined the association, if any, between the extent of CpG dinucleotide depletion and the evolutionary lineage of the infected host. We also investigated the contribution of mutational pressure and translational selection to the evolution of papillomaviruses and polyomaviruses.

Results

All papillomaviruses and polyomaviruses are CpG depleted. Interestingly, the evolutionary lineage of the infected host determines the extent of CpG depletion among papillomaviruses and polyomaviruses. CpG dinucleotide depletion was more pronounced among papillomaviruses and polyomaviruses infecting human and other mammals as compared to those infecting birds. Our findings demonstrate that CpG depletion among papillomaviruses is linked to mutational pressure; while CpG depletion among polyomaviruses is linked to translational selection. We also present evidence that suggests methylation of CpG dinucleotides may explain, at least in part, the depletion of CpG dinucleotides among papillomaviruses but not polyomaviruses.

Conclusions

The extent of CpG depletion among papillomaviruses and polyomaviruses is linked to the evolutionary lineage of the infected host. Our results highlight the existence of divergent evolutionary pressures leading to CpG dinucleotide depletion among small ds-DNA viruses infecting vertebrate hosts.  相似文献   

16.
R L Stallings 《Genomics》1992,13(3):890-891
Simple microsatellite repetitive sequences are widely distributed in eukaryotic genomes. Using the GCG Find program, the distribution of each type of mono- and dinucleotide repetitive sequence has been examined in GenBank sequences. Examples of each type of simple satellite sequence could be found, although the frequency of (CpG)n greater than or equal to 8 repeats was extremely low. The suppression of CpG dinucleotides in vertebrates does not adequately explain the rarity of this repeat since (CpG)n repeats are also extremely infrequent in species genomes where CpG dinucleotides are not suppressed. Instead, it is proposed that (CpG)n repeats must possess a DNA conformation that has a deleterious structural effect.  相似文献   

17.
CpG dinucleotides mutate at a high rate because cytosine is vulnerable to deamination, cytosines in CpG dinucleotides are often methylated, and deamination of 5-methylcytosine (5mC) produces thymidine. Previous experiments have shown that DNA melting is the rate-limiting step in cytosine deamination. Here we show, through the analysis of human single-nucleotide polymorphisms (SNPs), that the mutation rate produced by 5mC deamination is highly dependent on local GC content. In fact, linear regression analysis showed that the log(10) of the 5mC mutation rates (inferred from SNP frequencies) had slopes of -3 when graphed with respect to the GC content of neighboring sequences. This is the ideal slope that would be expected if the correlation between CpG underrepresentation and GC content had been solely caused by DNA melting. Moreover, this same result was obtained regardless of the SNP locations (all SNPs versus only SNPs in noncoding intergenic regions, excluding CpG islands) and regardless of the lengths over which GC content was calculated (SNP sequences with a modal length of 564 bp versus genomic contigs with a modal length of 163 kb). Several alternative interpretations are discussed.  相似文献   

18.
19.
20.
MOTIVATION: It has been speculated that CpG dinucleotide deficiency in genomes is a consequence of DNA methylation. However, this hypothesis does not adequately explain CpG deficiency in bacteria. The hypothesis based on DNA structure constraint as an alternative explanation was therefore examined. RESULTS: By comparing real bacterial genomes and Markov artificial genomes in the second order, we found that the core structure of a restricted pattern, the TTCGAA pattern, was under represented in low GC content bacterial genomes regardless of CpG dinucleotide level. This is in contrast to the AACGTT pattern, indicating that the counterselection is context-dependent. Further study discovered nine underrepresented patterns that were supposed to be capable of inducing DNA structure constraint. In summary, most of them are in TTCGNA and TTCGAN patterns in both DNA strands. An explanation is also proposed for the strong correlation between GC content and CpG deficiency. The result of random sequence simulation showed that the occurrences of these patterns were correlated with GC content, as well as the percentage of CpG dinucleotides being trapped in these patterns. Finally, we suggest that the degree of counter-selection against these restricted patterns could be influenced by global GC content of a genome.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号