首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
CpG and TpA dinucleotides are underrepresented in the human genome. The CpG deficiency is due to the high mutation rate from C to T in methylated CpG's. The TpA suppression was thought to reflect a counterselection against TpA's destabilizing effect in RNA. Unexpectedly, the TpA and CpG deficiencies vary according to the G+C contents of sequences. It has been proposed that the variation in CpG suppression was correlated with a particular chromatin organization in G+C-rich isochores. Here, we present an improved model of dinucleotide evolution accounting for the overlap between successive dinucleotides. We show that an increased mutation rate from CpG to TpG or CpA induces both an apparent TpA deficiency and a correlation between CpG and TpA deficiencies and G+C content. Moreover, this model shows that the ratio of observed over expected CpG frequency underestimates the real CpG deficiency in G+C-rich sequences. The predictions of our model fit well with observed frequencies in human genomic data. This study suggests that previously published selectionist interpretations of patterns of dinucleotide frequencies should be taken with caution. Moreover, we propose new criteria to identify unmethylated CpG islands taking into account this bias in the measure of CpG depletion.  相似文献   

2.
We have isolated a new family of moderately repetitive nucleotide sequences (about 2500 copies per haploid genome) specific to the genus Zea and absent in other graminaceous species. These sequences are interspersed in the genome and they show the same genomic organization pattern and similar copy number in all the Zea species examined. These two facts, consistency in the copy number and the same organization pattern, would indicate on the one hand that these sequences were amplified before the divergence of Zea species, and on the other hand that maize and all the teosintes could be considered as the same evolutionary population. Independent clones corresponding to the repetitive sequences have been isolated and sequenced from a genomic library of the teosinte, Zea diploperennis. The repeats, flanked by HaeIII sites, are more than 70% G + C-rich, on average 253 bp long and show 78% similarity to each other. These repetitive sequences are in a highly methylated-C context and they present some features resembling those of coding sequences, such as high CpG and low TpA content, and similar codon usage to maize genes in one of the reading frames. Moreover, the repetitive probe hybridizes with RNA extracted from different tissues of maize and from teosinte, indicating that these repeats or similar ones are present in transcribed sequences.  相似文献   

3.
A detailed computer analysis of the untranslated regions, 5′-UTR and 3′- UTR, of human mRNA sequences is reported. The compositional properties of these regions, compared with those of the corresponding coding regions, indicate that 5′-UTR and 3′-UTR are less affected by the isochore compartmentalization than the corresponding third codon positions of mRNAs. The presence of higher functional constraints in 5′-UTR is also reported. Dinucleotide analysis shows a depletion of CpG and TpA in both sequences. A search for significant sequence motifs using the WORDUP algorithm reveals the patterns already known to have a functional role in the mRNA UTR, and several other motifs whose functional roles remain to be demonstrated. This type of analysis may be particularly useful for guiding site-directed mutagenesis experiments. In addition, it can be used for assessing the nature of anonymous sequences now produced in large amounts in megabase sequencing projects.  相似文献   

4.
Some bacterial genomes are known to have low CpG dinucleotide frequencies. While their causes are not clearly understood, the frequency of CpG is suppressed significantly in the genome of Mycoplasma genitalium, but not in that of Mycoplasma pneumoniae. We compared orthologous gene pairs of the two closely related species to analyze CpG substitution patterns between these two genomes. We also divided genome sequences into three regions: protein-coding, noncoding, and RNA-coding, and obtained the CpG frequencies for each region for each organism. It was found that the observed/expected ratio of CpG dinucleotides is low in both the protein-coding and noncoding regions; while that ratio is in the normal range in the RNA-coding region. Our results indicate that CpG suppression of the Mycoplasma genome is not caused by (1) biased usage amino acid; (2) biased usage of synonymous codon; or (3) methylation effects by the CpG methyltransferase in the genomes of their hosts. Instead, we consider it likely that a certain global pressure, such as genome-wide pressure for the advantages of DNA stability or replication, has the effect of decreasing CpG over the entire genome, which, in turn, resulted in the biased codon usage.  相似文献   

5.
Summary We have investigated the compositional properties of coding sequences from cold-blooded vertebrates and we have compared them with those from warm-blooded vertebrates. Moreover, we have studied the compositional correlations of coding sequences with the genomes in which they are contained, as well as the compositional correlations among the codon positions of the genes analyzed.The distribution of GC levels of the third codon positions of genes from cold-blooded vertebrates are distinctly different from those of warm-blooded vertebrates in that they do not reach the high values attained by the latter. Moreover, coding sequences from cold-blooded vertebrates are either equal, or, in most cases, lower in GC (not only in third, but also in first and second codon positions) than homologous coding sequences from warm-blooded vertebrates; higher values are exceptional. These results at the gene level are in agreement with the compositional differences between cold-blooded and warm-blooded vertebrates previously found at the whole genome (DNA) level (Bernardi and Bernardi 1990a,b).Two linear correlations were found: one between the GC levels of coding sequences (or of their third codon positions) and the GC levels of the genomes of cold-blooded vertebrates containing them; and another between the GC levels of third and first+ second codon positions of genes from cold-blooded vertebrates. The first correlation applies to the genomes (or genome compartments) of all vertebrates and the second to the genes of all living organisms. These correlations are tantamount to a genomic code.  相似文献   

6.
A correspondence analysis of codon usage in human genes revealed, as expected, that the first axis is strongly correlated with the base composition at synonymous third codon positions. At one extreme of the second axis were localized genes with a high frequency of NCG and CGN codons. The great majority of these sequences were embedded in CpG islands, while the opposite is true for the genes placed at the other extreme. The two main conclusions of this paper are: (1) the influence of CpG islands on codon usage, and (2) since the second axis is orthogonal (and therefore independent) of the first, GC3-rich genes are not necessarily associated with CpG islands.  相似文献   

7.
K G Ford  L H Pearl    S Neidle 《Nucleic acids research》1987,15(16):6553-6562
The molecular structure of the DNA-intercalating ligand tetra-(4-N-methylpyridyl) porphin has been determined by X-ray crystallography. The porphyrin has a precise centre of symmetry; the central core is planar, with the N-methylpyridyl groups inclined to it at angles of 66-72 degrees. Molecular modelling of this structure into TpA and CpG sites of intercalated DNA, has been performed, and approximate energetics calculated. It has been shown that only the CpG site can have full ligand intercalation, since the thymine methyl group sterically hinders such geometry at TpA sites. Modelling indicates the importance of electrostatic effects in the low-energy forms of intercalated and part-intercalated complexes at both sequences.  相似文献   

8.
G D'Onofrio  G Bernardi 《Gene》1992,110(1):81-88
We have investigated the compositional distributions of third codon positions of genes from the 16 prokaryotes and seven eukaryotes for which the largest numbers of coding sequences are available in data banks. In prokaryotes, both narrow and broad distributions were found. In eukaryotes, distributions were very broad (except for Saccharomyces cerevisiae) and remarkably different for different genomes. In low-GC genomes, third codon positions were lower in GC than first + second codon positions and trailed towards high GC; the opposite situation was found for high-GC genomes. In all genomes, first codon positions were higher in GC than second codon positions. We then investigated the compositional correlations between third and first + second codon positions in prokaryotic genomes (the 16 mentioned above plus 87 additional ones) and in genome compartments of eukaryotes. A general, common relationship was found, which also holds within the same (heterogeneous) genomes. This universal correlation is due to the fact that the relative effects of compositional constraints on different codon positions are the same, on the average, whatever the genome under consideration.  相似文献   

9.
Virus-host biological interaction is a continuous coevolutionary process involving both host immune system and viral escape mechanisms. Flaviviridae family is composed of fast evolving RNA viruses that infects vertebrate (mammals and birds) and/or invertebrate (ticks and mosquitoes) organisms. These host groups are very distinct life forms separated by a long evolutionary time, so lineage-specific anti-viral mechanisms are likely to have evolved. Flaviviridae viruses which infect a single host lineage would be subjected to specific host-induced pressures and, therefore, selected by them. In this work we compare the genomic evolutionary patterns of Flaviviridae viruses and their hosts in an attempt to uncover coevolutionary processes inducing common features in such disparate groups. Especially, we have analyzed dinucleotide and codon usage patterns in the coding regions of vertebrate and invertebrate organisms as well as in Flaviviridae viruses which specifically infect one or both host types. The two host groups possess very distinctive dinucleotide and codon usage patterns. A pronounced CpG under-representation was found in the vertebrate group, possibly induced by the methylation-deamination process, as well as a prominent TpA decrease. The invertebrate group displayed only a TpA frequency reduction bias. Flaviviridae viruses mimicked host nucleotide motif usage in a host-specific manner. Vertebrate-infecting viruses possessed under-representation of CpG and TpA, and insect-only viruses displayed only a TpA under-representation bias. Single-host Flaviviridae members which persistently infect mammals or insect hosts (Hepacivirus and insect-only Flavivirus, respectively) were found to posses a codon usage profile more similar to that of their hosts than to related Flaviviridae. We demonstrated that vertebrates and mosquitoes genomes are under very distinct lineage-specific constraints, and Flaviviridae viruses which specifically infect these lineages appear to be subject to the same evolutionary pressures that shaped their host coding regions, evidencing the lineage-specific coevolutionary processes between the viral and host groups.  相似文献   

10.
Intercodon dinucleotides affect codon choice in plant genes   总被引:2,自引:0,他引:2       下载免费PDF全文
In this work, 710 CDSs corresponding to over 290 000 codons equally distributed between Brassica napus, Arabidopsis thaliana, Lycopersicon esculentum, Nicotiana tabacum, Pisum sativum, Glycine max, Oryza sativa, Triticum aestivum, Hordeum vulgare and Zea mays were considered. For each amino acid, synonymous codon choice was determined in the presence of A, G, C or T as the initial nucleotide of the subsequent triplet; data were statistically analysed under the hypothesis of an independent assortment of codons. In 33.4% of cases, a frequency significantly (P = 0.01) different from that expected was recorded. This was mainly due to a pervasive intercodon TpA and CpG deficiency. As a general rule, intercodon TpAs and CpGs were preferably replaced by CpAs and TpGs, respectively. In several instances, codon frequencies were also modified to avoid homotetramer and homotrimer formation, to reduce intercodon ApCs downstream {1,2} GG or AG dinucleotides, as well as to increase GpA or ApG intercodons under certain contexts. Since TpA, CpG and homotetra(tri)mer deficiency directly or indirectly accounted for 77% of significant variation in the codon frequency, it can be concluded that codon usage mirrors precise needs at the DNA structure level. Plant species exhibited a phylogenetically-related adaptation to structural constraints. Codon usage flexibility was reflected in strikingly different arrays of optimum codons for probe design.  相似文献   

11.
To understand the variation in genomic composition and its effect on codon usage, we performed the comparative analysis of codon usage and nucleotide usage in the genes of three dicots, Glycine max, Arabidopsis thaliana and Medicago truncatula. The dicot genes were found to be A/T rich and have predominantly A-ending and/or T-ending codons. GC3s directly mimic the usage pattern of global GC content. Relative synonymous codon usage analysis suggests that the high usage frequency of A/T over G/C mononucleotide containing codons in AT-rich dicot genome is due to compositional constraint as a factor of codon usage bias. Odds ratio analysis identified the dinucleotides TpG, TpC, GpA, CpA and CpT as over-represented, where, CpG and TpA as under-represented dinucleotides. The results of (NcExp?NcObs)/NcExp plot suggests that selection pressure other than mutation played a significant role in influencing the pattern of codon usage in these dicots. PR2 analysis revealed the significant role of selection pressure on codon usage. Analysis of varience on codon usage at start and stop site showed variation in codon selection in these sites. This study provides evidence that the dicot genes were subjected to compositional selection pressure.  相似文献   

12.
We screened plant genome sequences, primarily from rice and Arabidopsis thaliana, for CpG islands, and identified DNA segments rich in CpG dinucleotides within these sequences. These CpG-rich clusters appeared in the analysed sequences as discrete peaks and occurred at the frequencies of one per 4.7 kb in rice and one per 4.0 kb in A. thaliana. In rice and A. thaliana, most of the CpG-rich clusters were associated with genes, which suggests that these clusters are useful landmarks in genome sequences for identifying genes in plants with small genomes. In contrast, in plants with larger genomes, only a few of the clusters were associated with genes. These plant CpG-rich clusters satisfied the criteria used for identifying human CpG islands, which suggests that these CpG clusters may be regarded as plant CpG islands. The position of each island relative to the 5'-end of its associated gene varied considerably. Genes in the analysed sequences were grouped into five classes according to the position of the CpG islands within their associated genes. A large proportion of the genes belonged to one of two classes, in which a CpG island occurred near the 5'-end of the gene or covered the whole gene region. The position of a plant CpG island within its associated gene appeared to be related to the extent of tissue-specific expression of the gene; the CpG islands of most of the widely expressed rice genes occurred near the 5'-end of the genes.  相似文献   

13.
The structurally correlated dihedral angles epsilon and zeta are known for their large variability within the B-DNA backbone. We have used molecular modelling to study both energetic and mechanical features of these variations which can produce BI/BII transitions. Calculations were carried out on DNA oligomers containing either YpR or RpY dinucleotides steps within various sequence environments. The results indicate that CpA and CpG steps favour the BI/BII transition more than TpA or any RpY step. The stacking energy and its intra- and inter-strand components explain these effects. Analysis of neighbouring base pairs reveals that BI/BII transitions of CpG and CpA are easiest within (Y)n(R)n sequences. These can also induce a large vibrational amplitude for TpA steps within the BI conformation.  相似文献   

14.
Parvoviruses are rapidly evolving viruses that infect a wide range of hosts, including vertebrates and invertebrates. Extensive methylation of the parvovirus genome has been recently demonstrated. A global pattern of methylation of CpG dinucleotides is seen in vertebrate genomes, compared to “fractional” methylation patterns in invertebrate genomes. It remains unknown if the loss of CpG dinucleotides occurs in all viruses of a given DNA virus family that infect host species spanning across vertebrates and invertebrates. We investigated the link between the extent of CpG dinucleotide depletion among autonomous parvoviruses and the evolutionary lineage of the infected host. We demonstrate major differences in the relative abundance of CpG dinucleotides among autonomous parvoviruses which share similar genome organization and common ancestry, depending on the infected host species. Parvoviruses infecting vertebrate hosts had significantly lower relative abundance of CpG dinucleotides than parvoviruses infecting invertebrate hosts. The strong correlation of CpG dinucleotide depletion with the gain in TpG/CpA dinucleotides and the loss of TpA dinucleotides among parvoviruses suggests a major role for CpG methylation in the evolution of parvoviruses. Our data present evidence that links the relative abundance of CpG dinucleotides in parvoviruses to the methylation capabilities of the infected host. In sum, our findings support a novel perspective of host-driven evolution among autonomous parvoviruses.  相似文献   

15.
NMR evidence is presented indicating that the exceptional conformational dynamics found at TpA steps in DNA is general to all immediate sequence contexts. One easily tractable NMR parameter that is sensitive to TpA base dynamics is the resonance linewidth of the TpA adenine H2 proton. This resonance experiences a temperature-dependent broadening due to conformational dynamics. Unusual dynamics at TpA steps were originally observed in the sequence context (T)pTpTpApAp(A). We have since shown that the evidence for TpA dynamics persists when either the thymine preceding the TpA step or the adenine following the TpA step is preserved [McAteer et al., Nucleic Acids Res. 23, 3962-3966 (1995)]. Here, in order establish whether or not exceptional TpA dynamics occurs in all DNA sequence contexts, we investigated a series of DNA sequences of the form GCNaTANbNbTANaGC, where N=A,T,C,G. In this family of sequences, all 16 possible immediate sequence context environments of the form NaTANb were examined using 10 DNA sequences. Our NMR results show that the TpA adenine H2 resonance contains a temperature dependent excess linewidth indicative of dynamics in all 16 sequence context environments. By studying a complete set of sequence contexts, it was possible to recognize trends relating resonance parameters and sequence environment. For example, the magnitude of the maximum linewidth is largely determined by the identity of the nucleotide following the TpA step and the magnitude of the linewidth maximum is moderately correlated (r=0.56) with the temperature of the linewidth maximum. The physical basis for these correlations is discussed.  相似文献   

16.
Depletion of CpG dinucleotides in severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) genomes has been linked to virus evolution, host-switching, virus replication, and innate immune responses. Temporal variations, if any, in the rate of CpG depletion during virus evolution in the host remain poorly understood. Here, we analyzed the CpG content of over 1.4 million full-length SARS-CoV-2 genomes representing over 170 million documented infections during the first 17 months of the pandemic. Our findings suggest that the extent of CpG depletion in SARS-CoV-2 genomes is modest. Interestingly, the rate of CpG depletion is highest during early evolution in humans and it gradually tapers off, almost reaching an equilibrium; this is consistent with adaptations to the human host. Furthermore, within the coding regions, CpG depletion occurs predominantly at codon positions 2-3 and 3-1. Loss of ZAP (Zinc-finger antiviral protein)-binding motifs in SARS-CoV-2 genomes is primarily driven by the loss of the terminal CpG within the motifs. Nonetheless, majority of the CpG depletion in SARS-CoV-2 genomes occurs outside ZAP-binding motifs. SARS-CoV-2 genomes selectively lose CpGs-motifs from a U-rich context; this may help avoid immune recognition by TLR7. SARS-CoV-2 alpha-, beta-, and delta-variants of concern have reduced CpG content compared to sequences from the beginning of the pandemic. In sum, we provide evidence that the rate of CpG depletion in virus genomes is not uniform and it greatly varies over time and during adaptations to the host. This work highlights how temporal variations in selection pressures during virus adaption may impact the rate and the extent of CpG depletion in virus genomes.  相似文献   

17.
It has been reported earlier that the relative di-nucleotide frequency (RDF) in different parts of a genome is similar while the frequency is variable among different genomes. So RDF is termed as genome signature in bacteria. It is not known if the constancy in RDF is governed by genome wide mutational bias or by selection. Here we did comparative analysis of RDF between the inter-genic and the coding sequences in seventeen bacterial genomes, whose gene expression data was available. The constraint on di-nucleotides was found to be higher in the coding sequences than that in the inter-genic regions and the constraint at the 2nd codon position was more than that in the 3rd position within a genome. Further analysis revealed that the constraint on di-nucleotides at the 2nd codon position is greater in the high expression genes (HEG) than that in the whole genomes as well as in the low expression genes (LEG). We analyzed RDF at the 2nd and the 3rd codon positions in simulated coding sequences that were computationally generated by keeping the codon usage bias (CUB) according to genome G+C composition and the sequence of amino acids unaltered. In the simulated coding sequences, the constraint observed was significantly low and no significant difference was observed between the HEG and the LEG in terms of di-nucleotide constraint. This indicated that the greater constraint on di-nucleotides in the HEG was due to the stronger selection on CUB in these genes in comparison to the LEG within a genome. Further, we did comparative analyses of the RDF in the HEG rpoB and rpoC of 199 bacteria, which revealed a common pattern of constraints on di-nucleotides at the 2nd codon position across these bacteria. To validate the role of CUB on di-nucleotide constraint, we analyzed RDF at the 2nd and the 3rd codon positions in simulated rpoB/rpoC sequences. The analysis revealed that selection on CUB is an important attribute for the constraint on di-nucleotides at these positions in bacterial genomes. We believe that this study has come with major findings of the role of CUB on di-nucleotide constraint in bacterial genomes.  相似文献   

18.
19.
Compositional distributions in three different codon positions as well as codon usage biases of all available DNA sequences of Buchnera aphidicola genome have been analyzed. It was observed that GC levels among the three codon positions is I>II>III as observed in other extremely high AT rich organisms. B. aphidicola being an AT rich organism is expected to have A and/or T at the third positions of codons. Overall codon usage analyses indicate that A and/or T ending codons are predominant in this organism and some particular amino acids are abundant in the coding region of genes. However, multivariate statistical analysis indicates two major trends in the codon usage variation among the genes; one being strongly correlated with the GC contents at the third synonymous positions of codons, and the other being associated with the expression level of genes. Moreover, codon usage biases of the highly expressed genes are almost identical with the overall codon usage biases of all the genes of this organism. These observations suggest that mutational bias is the main factor in determining the codon usage variation among the genes in B. aphidicola.  相似文献   

20.
Adenine nucleotides have been found to appear preferentially in the regions after the initiation codons or before the termination codons of bacterial genes. Our previous experiments showed that AAA and AAT, the two most frequent second codons in Escherichia coli, significantly enhance translation efficiency. To determine whether such a characteristic feature of base frequencies exists in eukaryote genes, we performed a comparative analysis of the base biases at the gene terminal portions using the proteomes of seven eukaryotes. Here we show that the base appearance at the codon third positions of gene terminal regions is highly biased in eukaryote genomes, although the codon third positions are almost free from amino acid preference. The bias changes depending on its position in a gene, and is characteristic of each species. We also found that bias is most outstanding at the second codon, the codon after the initiation codon. NCN is preferred in every genome; in particular, GCG is strongly favored in human and plant genes. The presence of the bias implies that the base sequences at the second codon affect translation efficiency in eukaryotes as well as bacteria.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号