首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Vertebrate genomes are characterized with CpG deficiency, particularly for GCpoor regions. The GC content-related CpG deficiency is probably caused by context-dependent deamination of methylated CpG sites. This hypothesis was examined in this study by comparing nucleotide frequencies at CpG flanking positions among invertebrate and vertebrate genomes. The finding is a transition of nucleotide preference of 5' T to 5' A at the invertebrate-vertebrate boundary, indicating that a large number of CpG sites with 5' Ts were depleted because of global DNA methylation developed in vertebrates. At genome level, we investigated CpG observed/expected (obs/exp) values in 500 bp fragments, and found that higher CpG obs/exp value is shown in GC-poor regions of invertebrate genomes (except sea urchin) but in GC-rich sequences of vertebrate genomes. We next compared GC content at CpG flanking positions with genomic average, showing that the GC content is lower than the average in invertebrate genomes, but higher than that in vertebrate genomes. These results indicate that although 5' T and 5' A are different in inducing deamination of methylated CpG sites, GC content is even more important in affecting the deamination rate. In all the tests, the results of sea urchin are similar to vertebrates perhaps due to its fractional DNA methylation. CpG deficiency is therefore suggested to be mainly a result of high mutation rates of methylated CpG sites in GC-poor regions.  相似文献   

2.
The relationship between the silent substitution rate (K s) and the GC content along the genome is a focal point of the debate about the origin of the isochore structure in vertebrates. Recent estimation of the silent substitution rate showed a positive correlation between K s and GC content, in contradiction with the predictions of both the regional mutation bias model and the selection or biased gene conversion model. The aim of this paper is to help resolve this contradiction between theoretical studies and data. We analyzed the relationship between K s and GC content under (1) uniform mutation bias, (2) a regional mutation bias, and (3) mutation bias and selection. We report that an increase in K s with GC content is expected under mutation bias because of either nonequilibrium of the isochore structure or an increasing mutation rate from AT toward GC nucleotides in GC-richer isochores. We show by simulations that CpG deamination tends to increase the mutation rate with GC content in a regional mutation bias model. We also demonstrate that the relationship between K s and GC under the selectionist or biased gene conversion model is positive under weak selection if the mutation selection equilibrium GC frequency is less than 0.5. Received: 28 March 2001 / Accepted: 16 May 2001  相似文献   

3.
We assess the similarity of base substitution processes, described by empirically derived 4 × 4 matrices, using chi-square homogeneity tests. Such significance analyses allow us to assess variation in sequence evolution across sites and we apply them to matrices derived from noncoding sites in different contexts in grass chloroplast DNA. We show that there is statistically significant variation in rates and patterns of mutation among noncoding sites in different contexts and then demonstrate a similar and significant influence of context on substitutions at fourfold degenerate sites of coding regions from grass chloroplast DNA. These results show that context has the same general effect on substitution bias in coding and noncoding DNA: the A+T content of flanking bases is correlated with rate of substitution, transition bias, and GC → AT pressure, while the number of flanking pyrimidines on a single strand is correlated with a mutational bias, or skew, toward pyrimidines. Despite the similarity in general trends, however, when we compare coding and noncoding matrices we find that there is a statistically significant difference between them even when we control for context. Most noticeably, fourfold degenerate sites in coding sequences are undergoing substitution at a higher rate and there are also significant differences in the relationship between pyrimidines skew and the number of flanking pyrimidines. Possible reasons for the differences between coding and noncoding sites are discussed. Furthermore, our analysis illustrates a simple statistical way for comparing substitution processes across sites allowing us to better study variation in evolutionary processes across a genome. [Reviewing Editor: Dr. Martin Kreitman]  相似文献   

4.
Schmegner C  Hoegel J  Vogel W  Assum G 《Genetics》2007,175(1):421-428
The human genome is composed of long stretches of DNA with distinct GC contents, called isochores or GC-content domains. A boundary between two GC-content domains in the human NF1 gene region is also a boundary between domains of early- and late-replicating sequences and of regions with high and low recombination frequencies. The perfect conservation of the GC-content distribution in this region between human and mouse demonstrates that GC-content stabilizing forces must act regionally on a fine scale at this locus. To further elucidate the nature of these forces, we report here on the spectrum of human SNPs and base pair substitutions between human and chimpanzee. The results show that the mutation rate changes exactly at the GC-content transition zone from low values in the GC-poor sequences to high values in GC-rich ones. The GC content of the GC-poor sequences can be explained by a bias in favor of GC > AT mutations, whereas the GC content of the GC-rich segment may result from a fixation bias in favor of AT > GC substitutions. This fixation bias may be explained by direct selection by the GC content or by biased gene conversion.  相似文献   

5.
Jiang C  Zhao Z 《Genomics》2006,88(5):527-534
So far, there is no genome-wide estimation of the mutational spectrum in humans. In this study, we systematically examined the directionality of the point mutations and maintenance of GC content in the human genome using approximately 1.8 million high-quality human single nucleotide polymorphisms and their ancestral sequences in chimpanzees. The frequency of C-->T (G-->A) changes was the highest among all mutation types and the frequency of each type of transition was approximately fourfold that of each type of transversion. In intergenic regions, when the GC content increased, the frequency of changes from G or C increased. In exons, the frequency of G:C-->A:T was the highest among the genomic categories and contributed mainly by the frequent mutations at the CpG sites. In contrast, mutations at the CpG sites, or CpG-->TpG/CpA mutations, occurred less frequently in the CpG islands relative to intergenic regions with similar GC content. Our results suggest that the GC content is overall not in equilibrium in the human genome, with a trend toward shifting the human genome to be AT rich and shifting the GC content of a region to approach the genome average. Our results, which differ from previous estimates based on limited loci or on the rodent lineage, provide the first representative and reliable mutational spectrum in the recent human genome and categorized genomic regions.  相似文献   

6.
Compositional evolution of noncoding DNA in the human and chimpanzee genomes   总被引:11,自引:0,他引:11  
We have examined the compositional evolution of noncoding DNA in the primate genome by comparison of lineage-specific substitutions observed in 1.8 Mb of genomic alignments of human, chimpanzee, and baboon with 6542 human single-nucleotide polymorphisms (SNPs) rooted using chimpanzee sequence. The pattern of compositional evolution, measured in terms of the numbers of GC-->AT and AT-->GC changes, differs significantly between fixed and polymorphic sites, and indicates that there is a bias toward fixation of AT-->GC mutations, which could result from weak directional selection or biased gene conversion in favor of high GC content. Comparison of the frequency distributions of a subset of the SNPs revealed no significant difference between GC-->AT and AT-->GC polymorphisms, although AT-->GC polymorphisms in regions of high GC segregate at slightly higher frequencies on average than GC-->AT polymorphisms, which is consistent with a fixation bias favoring high GC in these regions. However, the substitution data suggest that this fixation bias is relatively weak, because the compositional structure of the human and chimpanzee genomes is becoming homogenized, with regions of high GC decreasing in GC content and regions of low GC increasing in GC content. The rate and pattern of nucleotide substitution in 333 Alu repeats within the human-chimpanzee-baboon alignments are not significantly affected by the GC content of the region in which they are inserted, providing further evidence that, since the time of the human-chimpanzee ancestor, there has been little or no regional variation in mutation bias.  相似文献   

7.
Differences in the regional substitution patterns in the human genome created patterns of large-scale variation of base composition known as genomic isochores. To gain insight into the origin of the genomic isochores, we develop a maximum-likelihood approach to determine the history of substitution patterns in the human genome. This approach utilizes the vast amount of repetitive sequence deposited in the human genome over the past approximately 250 Myr. Using this approach, we estimate the frequencies of seven types of substitutions: the four transversions, two transitions, and the methyl-assisted transition of cytosine in CpG. Comparing substitutional patterns in repetitive elements of various ages, we reconstruct the history of the base-substitutional process in the different isochores for the past 250 Myr. At around 90 MYA (around the time of the mammalian radiation), we find an abrupt fourfold to eightfold increase of the cytosine transition rate in CpG pairs compared with that of the reptilian ancestor. Further analysis of nucleotide substitutions in regions with different GC content reveals concurrent changes in the substitutional patterns. Although the substitutional pattern was dependent on the regional GC content in such ways that it preserved the regional GC content before the mammalian radiation, it lost this dependence afterward. The substitutional pattern changed from an isochore-preserving to an isochore-degrading one. We conclude that isochores have been established before the radiation of the eutherian mammals and have been subject to the process of homogenization since then.  相似文献   

8.
Jiang Z  Wu XL  Zhang M  Michal JJ  Wright RW 《Genetics》2008,180(1):639-647
Bayesian analysis was performed to examine the single-nucleotide polymorphism (SNPs) neighborhood patterns in cattle using 15,110 SNPs, each with a flanking sequence of 500 bp. Our analysis confirmed three well-known features reported in plants and/or other animals: (1) the transition is the most abundant type of SNPs, accounting for 69.8% in cattle; (2) the transversion occurs most frequently (38.56%) in cattle when the A + T content equals two at their immediate adjacent sites; and (3) C <--> T and A <--> G transitions have reverse complementary neighborhood patterns and so do A <--> C and G <--> T transversions. Our study also revealed several novel SNP neighborhood patterns that have not been reported previously. First, cattle and humans share an overall SNP pattern, indicating a common mutation system in mammals. Second, unlike C <--> T/A <--> G and A <--> C/G <--> T, the true neighborhood patterns for A <--> T and C <--> G might remain mysterious because the sense and antisense sequences flanking these mutations are not actually recognizable. Third, among the reclassified four types of SNPs, the neighborhood ratio between A + T and G + C was quite different. The ratio was lowest for C <--> G, but increased for C <--> T/A <--> G, further for A <--> C/G <--> T, and the most for A <--> T. Fourth, when two immediate adjacent sites provide structures for CpG, it significantly increased transitions compared to the structures without the CpG. Finally, unequal occurrence between A <--> G and C <--> T in five paired neighboring structures indicates that the methylation-induced deamination reactions were responsible for approximately 20% of total transitions. In addition, conversion can occur at both CpG sites and non-CpG sites. Our study provides new insights into understanding molecular mechanisms of mutations and genome evolution.  相似文献   

9.
It has been known that in noncoding regions of the chloroplast genome, the pattern of nucleotide substitution is influenced by the two nucleotides flanking the substitution site. In a GC-rich environment, a bias toward transition was observed, whereas in an AT-rich environment, a bias toward transversion was observed. In this study, the influence of the two adjacent neighbors on the substitution pattern was observed in the first intron of the mitochondrial nad4 gene, although the AT content of this intron is only 48%. The proportion of transversions increases from 0.32 to 0.75 as the A + T content (number of A's + T's) of the two nearest neighbors increases from 0 to 2. This trend was also observed in another mitochondrial group I intron with an AT content of 64%. In addition, a similar, though weaker, effect was observed in vertebrate pseudogenes. So this effect is present in all three types of genomes. Furthermore, in contrast to the situation in the noncoding regions of chloroplast DNA, where most nucleotide substitutions occurred in the categories with an A + T content of either 1 or 2, nucleotide substitutions in the mitochondrial first nad4 intron occurred more evenly in three categories of different A + T contents. This might be due largely to the difference in the AT content (0.48 vs. 0.72) between the mitochondrial first nad4 intron and the chloroplast DNA regions studied.  相似文献   

10.
Vanishing GC-rich isochores in mammalian genomes   总被引:25,自引:0,他引:25  
Duret L  Semon M  Piganeau G  Mouchiroud D  Galtier N 《Genetics》2002,162(4):1837-1847
To understand the origin and evolution of isochores-the peculiar spatial distribution of GC content within mammalian genomes-we analyzed the synonymous substitution pattern in coding sequences from closely related species in different mammalian orders. In primate and cetartiodactyls, GC-rich genes are undergoing a large excess of GC --> AT substitutions over AT --> GC substitutions: GC-rich isochores are slowly disappearing from the genome of these two mammalian orders. In rodents, our analyses suggest both a decrease in GC content of GC-rich isochores and an increase in GC-poor isochores, but more data will be necessary to assess the significance of this pattern. These observations question the conclusions of previous works that assumed that base composition was at equilibrium. Analysis of allele frequency in human polymorphism data, however, confirmed that in the GC-rich parts of the genome, GC alleles have a higher probability of fixation than AT alleles. This fixation bias appears not strong enough to overcome the large excess of GC --> AT mutations. Thus, whatever the evolutionary force (neutral or selective) at the origin of GC-rich isochores, this force is no longer effective in mammals. We propose a model based on the biased gene conversion hypothesis that accounts for the origin of GC-rich isochores in the ancestral amniote genome and for their decline in present-day mammals.  相似文献   

11.
The nature of the forces affecting base composition is a key question in genome evolution. There is uncertainty as to whether differences in the GC contents of non-coding sequences reflect differences in mutational bias, or in the intensity of selection or biased gene conversion. We have used a polymorphism dataset for non-coding sequences on the X chromosome of Drosophila simulans to examine this question. The proportion of GC-->AT versus AT-->GC polymorphic mutations in a locus is correlated with its GC content. This implies the action of forces that favour GC over AT base pairs, which are apparently strongest in GC-rich sequences.  相似文献   

12.
Recently, numerous genome analyses revealed the existence of a universal G:C → A:T mutation bias in bacteria, fungi, plants and animals. To explore the molecular basis for this mutation bias, we examined the three well-known DNA mutation models, i.e., oxidative damage model, UV-radiation damage model and CpG hypermutation model. It was revealed that these models cannot provide a sufficient explanation to the universal mutation bias. Therefore, we resorted to a DNA mutation model proposed by Löwdin 40 years ago, which was based on inter-base double proton transfers (DPT). Since DPT is a fundamental and spontaneous chemical process and occurs much more frequently within GC pairs than AT pairs, Löwdin model offers a common explanation for the observed universal mutation bias and thus has broad biological implications.  相似文献   

13.
Recently, numerous genome analyses revealed the existence of a universal G:C→A:T mutation bias in bacteria, fungi, plants and animals. To explore the molecular basis for this mutation bias, we examined the three well-known DNA mutation models, i.e., oxidative damage model, UV-radiation damage model and CpG hypermutation model. It was revealed that these models cannot provide a sufficient explanation to the universal mutation bias. Therefore, we resorted to a DNA mutation model proposed by L?wdin 40 years ago, which was based on inter-base double proton transfers (DPT). Since DPT is a fundamental and spontaneous chemical process and occurs much more frequently within GC pairs than AT pairs, L?wdin model offers a common explanation for the observed universal mutation bias and thus has broad biological implications.  相似文献   

14.
Abstract The influence of local base composition on mutations in chloroplast DNA (cpDNA) is studied in detail and the resulting, empirically derived, mutation dynamics are used to analyze both base composition and codon usage bias. A 4 × 4 substitution matrix is generated for each of the 16 possible flanking base combinations (contexts) using 17,253 noncoding sites, 1309 of which are variable, from an alignment of three complete grass chloroplast genome sequences. It is shown that substitution bias at these sites is correlated with flanking base composition and that the A+T content of these flanking sites as well as the number of flanking pyrimidines on the same strand appears to have general influences on substitution properties. The context-dependent equilibrium base frequencies predicted from these matrices are then applied to two analyses. The first examines whether or not context dependency of mutations is sufficient to generate average compositional differences between noncoding cpDNA and silent sites of coding sequences. It is found that these two classes of sites exist, on average, in very different contexts and that the observed mutation dynamics are expected to generate significant differences in overall composition bias that are similar to the differences observed in cpDNA. Context dependency, however, cannot account for all of the observed differences: although silent sites in coding regions appear to be at the equilibrium predicted, noncoding cpDNA has a significantly lower A+T content than expected from its own substitution dynamics, possibly due to the influence of indels. The second study examines the codon usage of low-expression chloroplast genes. When context is accounted for, codon usage is very similar to what is predicted by the substitution dynamics of noncoding cpDNA. However, certain codon groups show significant deviation when followed by a purine in a manner suggesting some form of weak selection other than translation efficiency. Overall, the findings indicate that a full understanding of mutational dynamics is critical to understanding the role selection plays in generating composition bias and sequence structure.  相似文献   

15.
This study presents compelling evidence that recombination significantly increases the silent GC content of a genome in a selectively neutral manner, resulting in a highly significant positive correlation between recombination and "GC3s" in the yeast Saccharomyces cerevisiae. Neither selection nor mutation can explain this relationship. A highly significant GC-biased mismatch repair system is documented for the first time in any member of the Kingdom Fungi. Much of the variation in the GC3s within yeast appears to result from GC-biased gene conversion. Evidence suggests that GC-biased mismatch repair exists in numerous organisms spanning six kingdoms. This transkingdom GC mismatch repair bias may have evolved in response to a ubiquitous AT mutational bias. A significant positive correlation between recombination and GC content is found in many of these same organisms, suggesting that the processes influencing the evolution of the yeast genome may be a general phenomenon. Nonrecombining regions of the genome and nonrecombining genomes would not be subject to this type of molecular drive. It is suggested that the low GC content characteristic of many nonrecombining genomes may be the result of three processes (1) a prevailing AT mutational bias, (2) random fixation of the most common types of mutation, and (3) the absence of the GC-biased gene conversion which, in recombining organisms, permits the reversal of the most common types of mutation. A model is proposed to explain the observation that introns, intergenic regions, and pseudogenes typically have lower GC content than the silent sites of corresponding open reading frames. This model is based on the observation that the greater the heterology between two sequences, the less likely it is that recombination will occur between them. According to this "Constraint" hypothesis, the formation and propagation of heteroduplex DNA is expected to occur, on average, more frequently within conserved coding and regulatory regions of the genome. In organisms possessing GC-biased mismatch repair, this would enhance the GC content of these regions through biased gene conversion. These findings have a number of important implications for the way we view genome evolution and suggest a new model for the evolution of sex.  相似文献   

16.
张乃心  张玉娟  余果  陈斌 《昆虫学报》2013,56(4):398-407
研究双翅目昆虫线粒体基因组的结构特点, 并设计其测序的通用引物, 为今后双翅目昆虫线粒体基因组的研究提供参考和依据。利用比较基因组学和生物信息学方法, 分析了已经完全测序的26个双翅目昆虫线粒体基因组的结构特点、 碱基组成和保守区, 并据此设计了双翅目昆虫基因组测序的通用引物。结果表明: 双翅目昆虫线粒体基因组长14 503~19 517 bp, 其结构保守, 含有37个编码基因, 包括13个蛋白质编码基因, 22个tRNA编码基因和2个rRNA编码基因, 此外还包含一段长度差异很大的非编码区(AT富含区)。基因组内基因排列次序稳定, 除个别基因外, 其余都与黑腹果蝇Drosophila melanogaster基因排列次序一致。基因组的碱基组成不均衡, AT含量在72.59%~85.15%之间, 碱基使用存在偏向性, 偏好使用AC碱基。全基因组的核苷酸和氨基酸序列保守, 共鉴定了11个保守区。在保守区内共设计了26对双翅目线粒体基因组测序通用引物, 扩增的目标片段都在1 200 bp以内。将该套通用引物用于葱蝇Delia antiqua线粒体全基因组测序, 结果证明其高效、 合用。  相似文献   

17.
Codon usage in the G+C-rich Streptomyces genome.   总被引:45,自引:0,他引:45  
F Wright  M J Bibb 《Gene》1992,113(1):55-65
The codon usage (CU) patterns of 64 genes from the Gram+ prokaryotic genus Streptomyces were analysed. Despite the extremely high overall G+C content of the Streptomyces genome (estimated at 0.74), individual genes varied in G+C content from 0.610 to 0.797, and had third codon position G+C contents (GC3s) that varied from 0.764 to 0.983. The variation in GC3s explains a significant proportion of the variation in CU patterns. This is consistent with an evolutionary model of the Streptomyces genome where biased mutation pressure has led to a high average G+C content with random variation about the mean, although the variation observed is greater than that expected from a simple binomial model. The only gene in the sample that can be confidently predicted to be highly expressed, EF-Tu of Streptomyces coelicolor A3(2) (GC3s = 0.927), shows a preference for a third position C in several of the four codon families, and for CGY and GGY for Arg and Gly codons, respectively (Y = pyrimidine); similar CU patterns are found in highly expressed genes of the G+C-rich Micrococcus luteus genome. It thus appears that codon usage in Streptomyces is determined predominantly by mutation bias, with weak translational selection operating only in highly expressed genes. We discuss the possible consequences of the extreme codon bias of Streptomyces and consider how it may have evolved. A set of CU tables is provided for use with computer programs that locate protein-coding regions.  相似文献   

18.
The mean (G + C) composition (51.0%) and standard deviation (+/- 3.8%) of published DNA sequences accounting for 10% of the E. coli genome is in excellent agreement with the principal overall distribution determined by high resolution melting. While differences in base and neighbor characteristics are small and uniform throughout all regions of the genome, it is found that the (G + C) content of sequences varies in segmented fashion within boundaries corresponding to coding (53% G + C) and noncoding (46% G + C) regions; with variances in the latter being six-fold greater than in coding regions. The variance in different regions shows a strong negative dependence on (G + C) content of the region, reflecting the condition that A-T and G-C base pairs are preferred neighbors of A-T and C-G pairs, respectively; with the bias increasing with decreasing (G + C) content. Neighbor analysis indicates the most extreme positive biases occur in AA, TT, GC and CG throughout all regions, but particularly in noncoding regions. Extraordinary numbers of oligomeric strings of (A)n, etc., are the further consequence of this bias. These and other characteristics point to the existence of inherent biases in neighbor frequencies levied during replication or repair, and which reflect, in turn, neighbor influences during mutation. The bias in codon usage noted by Grantham and others is seen here as due, in part, to the adaptation of coding sequences to this microenvironment through selection among synonymous codons so as to preserve inherent neighbor biases.  相似文献   

19.
CpG deficiency, dinucleotide distributions and nucleosome positioning   总被引:2,自引:0,他引:2  
The dinucleotide CpG is deficient in (A + T)-rich regions of vertebrate DNA in both coding and non-coding sequences and there is a corresponding increase above expectation in the occurrence of TpG and CpA. By contrast in (G + C)-rich regions no deficiency of CpG is found. Such (G + C)-rich sequences, containing the expected number of CpG dinucleotides, alternate along the genome with (A + T)-rich sequences which have a lower than expected CpG content. The G + C content of vertebrate DNA can oscillate with a period of 150-200 bp and this may be a factor in positioning nucleosomes. The role of mutagenesis in loss of CpG and increase of A + T, particularly in non-coding regions, is discussed.  相似文献   

20.
Asymmetrical distribution of CpG in an 'average' mammalian gene.   总被引:24,自引:7,他引:17       下载免费PDF全文
The frequency and distribution of the rare dinucleotide CpG was examined in 15 mammalian genes. CpG is highly methylated at cytosine in mammalian DNA (1,2) and 5-methylcytosine (5mC) is thought to undergo a transition mutation via deamination to produce thymine (3). This would result in the accumulation of TpG and CpA and depletion of CpG during evolution (4). Consistent with this hypothesis, the gene sample of 26,541 dinucleotides contained CpG at 40% the frequency expected by base composition and the CpG transition products, TpG+CpA, were significantly elevated at 124% of expected random frequency. However, because CpG occurs at only 25% of expected random frequency in the genome, the sampled genes were considerably enriched in this dinucleotide. CpGs were asymmetrically distributed in sequences flanking the genes. 5'-flanking sequences were enriched in CpG at 135% of the frequency expected assuming a symmetrical distribution of all the CpGs in the sampled genes (p less than 0.01), while 3'-flanking regions were depleted in CpG at 40% of expected values (p less than 0.0001). This asymmetry may reflect the role of 5-methylcytosine in gene expression. In contrast the frequencies of GpC and GpT+ ApC did not differ significantly from that predicted by base composition and these dinucleotides were not asymmetrically distributed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号