共查询到20条相似文献,搜索用时 15 毫秒
1.
Intrastrand parity rules of DNA base composition and usage biases of synonymous codons 总被引:10,自引:0,他引:10
Noboru Sueoka 《Journal of molecular evolution》1995,40(3):318-325
2.
Near Homogeneity of PR2-Bias Fingerprints in the Human Genome and Their Implications in Phylogenetic Analyses 总被引:1,自引:0,他引:1
Noboru Sueoka 《Journal of molecular evolution》2001,53(4-5):469-476
Genes of a multicellular organism are heterogeneous in the G+C content, which is particularly true in the third codon position.
The extent of deviation from intra-strand equality rule of A = T and G = C (Parity Rule 2, or PR2) is specific for individual amino acids and has been expressed as the PR2-bias fingerprint. Previous
results suggested that the PR2-bias fingerprints tend to be similar among the genes of an organism, and the fingerprint of
the organism is specific for different taxa, reflecting phylogenetic relationships of organisms. In this study, using coding
sequences of a large number of human genes, we examined the intragenomic heterogeneity of their PR2-bias fingerprints in relation
to the G+C content of the third codon position (P
3
). Result shows that the PR2-bias fingerprint is similar in the wide range of the G+C content at the third codon position
(0.30–0.80). This range covers approximately 89% of the genes, and further analysis of the high G+C range (0.80–1.00), where
genes with normal PR2-bias fingerprints and those with anomalous fingerprints are mixed, shows that the total of 95% of genes
have the similar finger prints. The result indicates that the PR2-bias fingerprint is a unique property of an organism and
represents the overall characteristics of the genome. Combined with the previous results that the evolutionary change of the
PR2-bias fingerprint is a slow process, PR2-bias fingerprints may be used for the phylogenetic analyses to supplement and
augment the conventional methods that use the differences of the sequences of orthologous proteins and nucleic acids. Potential
advantages and disadvantages of the PR2-bias fingerprint analysis are discussed.
Received: 21 December 2000 / Accepted: 16 February 2001 相似文献
3.
The human genome, as in other eukaryotes, has a wide heterogeneity in the DNA base composition. The evolutionary basis for this heterogeneity has been unknown. A previous study of the human genome (846 genes analyzed) has shown that, in the major range of the G+C content in the third codon position (0.25-0.75), biases from the Parity Rule 2 (PR2) among the synonymous codons of the four-codon amino acids are similar except in the highest G+C range (Sueoka, N., 1999. Translation-coupled violation of Parity Rule 2 in human genes is not the cause of heterogeneity of the DNA G+C content of third codon position. Gene 238, 53-58.). PR2 is an intra-strand rule where A=T and G=C are expected when there are no biases between the two complementary strands of DNA in mutation and selection rates (substitution rates). In this study, 14,026 human genes were analyzed. In addition, the third codon positions of two-codon amino acids were analyzed. New results show the following: (a) The G+C contents of the third codon position of human genes are scattered in the G+C range of 0.22-0.96 in the third codon position. (b) The PR2 biases are similar in the range of 0.25-0.75, whereas, in the high G+C range (0.75-0.96; 13% of the genes), the PR2-bias fingerprints are different from those of the major range. (c) Unlike the PR2 biases, the G+C contents of the third codon position for both four-codon and two-codon amino acids are all correlated almost perfectly with the G+C content of the third codon position over the total G+C ranges. These results support the notion that the directional mutation pressure, rather than the directional selection pressure, is mainly responsible for the heterogeneity of the G+C content of the third codon position. 相似文献
4.
The intra-strand Parity Rule 2 of DNA (PR2) states that A=T and G=C within each strands. Useful corollaries of PR2 are G/(G+C)=A/(A+T)=0.5, G/(G+A)=C/(C+T)=G+C, G/(G+T)=C/(C+A)=G+C. Here. A, T, G, and C represent relative contents of the four nucleotide residues in a specific strand of DNA, so that A+T+G+C=1. Thus, deviations from the PR2 is a sign of strand-specific (or asymmetric) mutation and/or selection pressures. The present study delineates the symmetric and asymmetric effects of mutations on the intra-genomic heterogeneity of the G+C content in the human genome. The results of this study on the human genome are: (1) When both two- and four-codon amino acids were combined, only slight departures from the PR2 were observed in the total ranges of G+C content of the third-codon position. Thus, the G+C heterogeneity is likely to be caused by symmetric mutagenesis between the two strands. (2) The above result makes the deamination of cytosine due to double-strand breathing of DNA [Mol. Biol. Evol. 17 (2000) 1371] and/or incorporation of the oxidized guanine (8-oxo-guanine) opposite adenine during DNA replication (dGTP-oxidation hypothesis) as the most likely candidates for the major cause of the diversities of the G+C content. (3) Patterns of amino acid-specific PR2-biases detected by plotting PR2 corollaries against the G+C content of third codon position revealed that eight four-codon amino acids can be divided into three types by the second codon letter: (a) C2-type (Ala, Pro, Ser4, and Thr), (b) G2-type (Arg4 and Gly), and (c) T2-type (Leu4 and Val). (4) Most of the asymmetric plot patterns of the above three classes in PR2 biases can be explained by C2→T2 deamination of C2pG3 of C2-type to T2pG3 (T2-type) in both human and chicken. This explains the existence of some preferred codons in human and chicken. However, these biases (asymmetric) hardly contribute to the overall G+C content diversity of the third codon position. 相似文献
5.
Jenkins GM Pagel M Gould EA de A Zanotto PM Holmes EC 《Journal of molecular evolution》2001,52(4):383-390
The extent to which base composition and codon usage vary among RNA viruses, and the possible causes of this bias, is undetermined
in most cases. A maximum-likelihood statistical method was used to test whether base composition and codon usage bias covary
with arthropod association in the genus Flavivirus, a major source of disease in humans and animals. Flaviviruses are transmitted by mosquitoes, by ticks, or directly between
vertebrate hosts. Those viruses associated with ticks were found to have a significantly lower G+C content than non-vector-borne
flaviviruses and this difference was present throughout the genome at all amino acids and codon positions. In contrast, mosquito-borne
viruses had an intermediate G+C content which was not significantly different from those of the other two groups. In addition,
biases in dinucleotide and codon usage that were independent of base composition were detected in all flaviviruses, but these
did not covary with arthropod association. However, the overall effect of these biases was slight, suggesting only weak selection
at synonymous sites. A preliminary analysis of base composition, codon usage, and vector specificity in other RNA virus families
also revealed a possible association between base composition and vector specificity, although with biases different from
those seen in the Flavivirus genus.
Received: 29 August 2000 / Accepted: 19 December 2000 相似文献
6.
Matthew Bellgard David Schibeci Edward Trifonov Takashi Gojobori 《Journal of molecular evolution》2001,53(4-5):465-468
Identifying the G + C difference between closely related bacterial species or between different strains of the same species
is one of the first steps in understanding the evolutionary mechanisms accounting for the differences observed among bacterial
species. The G + C content can be one of the most important factors in the evolution of genomic structures. In this paper,
we describe a new method for detecting an initial stage of differentiation of the G + C content at the third codon base position
between two strains of the same bacterial species. We apply this method to the two strains of Helicobacter pylori. A group of genes is detected with large variations of G + C in the third positions—apparently genes of early response to
pressures of changing G + C. We discuss our findings from the viewpoint of genomic evolution.
Received: 26 February 2001 / Accepted: 16 May 2001 相似文献
7.
Abdelaziz Heddi Hubert Charles Chaqué Khatchadourian Guy Bonnot Paul Nardon 《Journal of molecular evolution》1998,47(1):52-61
The principal intracellular symbiotic bacteria of the cereal weevil Sitophilus oryzae were characterized using the sequence of the 16S rDNA gene (rrs gene) and G + C content analysis. Polymerase chain reaction amplification with universal eubacterial primers of the rrs gene showed a single expected sequence of 1,501 bp. Comparison of this sequence with the available database sequences placed
the intracellular bacteria of S. oryzae as members of the Enterobacteriaceae family, closely related to the free-living bacteria, Erwinia herbicola and Escherichia coli, and the endocytobiotic bacteria of the tsetse fly and aphids. Moreover, by high-performance liquid chromatography, we measured
the genomic G + C content of the S. oryzae principal endocytobiotes (SOPE) as 54%, while the known genomic G + C content of most intracellular bacteria is about 39.5%.
Furthermore, based on the third codon position G + C content and the rrs gene G + C content, we demonstrated that most intracellular bacteria except SOPE are A + T biased irrespective of their phylogenetic
position. Finally, using the hsp60 gene sequence, the codon usage of SOPE was compared with that of two phylogenetically closely related bacteria: E. coli, a free-living bacterium, and Buchnera aphidicola, the intracellular symbiotic bacteria of aphids. Taken together, these results show a peculiar and distinctly different DNA
composition of SOPE with respect to the other obligate intracellular bacteria, and, combined with biological and biochemical
data, they elucidate the evolution of symbiosis in S. oryzae.
Received: 8 September 1997 / Accepted: 24 October 1997 相似文献
8.
We compared the codon usage of sequences of transposable elements (TEs) with that of host genes from the species Drosophila melanogaster, Arabidopsis thaliana, Caenorhabditis elegans, Saccharomyces cerevisiae, and Homo sapiens. Factorial correspondence analysis showed that, regardless of the base composition of the genome, the TEs differed from the
genes of their host species by their AT-richness. In all species, the percentage of A + T on the third codon position of the
TEs was higher than that on the first codon position and lower than that in the noncoding DNA of the genomes. This indicates
that the codon choice is not simply the outcome of mutational bias but is also subject to selection constraints. A tendency
toward higher A + T on the third position than on the first position was also found in the host genes of A. thaliana, C. elegans, and S. cerevisiae but not in those of D. melanogaster and H. sapiens. This strongly suggests that the AT choice is a host-independent characteristic common to all TEs. The codon usage of TEs
generally appeared to be different from the mean of the host genes. In the AT-rich genomes of Arabidopsis thaliana, Caenorhabditis elegans, and Saccharomyces cerevisiae, the codon usage bias of TEs was similar to that of weakly expressed genes. In the GC-rich genome of D. melanogaster, however, the bias in codon usage of the TEs clearly differed from that of weakly expressed genes. These findings suggest
that selection acts on TEs and that TEs may display specific behavior within the host genomes.
Received: 2 May 2001 / Accepted: 29 October 2001 相似文献
9.
Base Composition Skews, Replication Orientation, and Gene Orientation in 12 Prokaryote Genomes 总被引:21,自引:0,他引:21
Michael J. McLean Kenneth H. Wolfe Kevin M. Devine 《Journal of molecular evolution》1998,47(6):691-696
Variation in GC content, GC skew and AT skew along genomic regions was examined at third codon positions in completely sequenced
prokaryotes. Eight out of nine eubacteria studied show GC and AT skews that change sign at the origin of replication. The
leading strand in DNA replication is G-T rich at codon position 3 in six eubacteria, but C-T rich in two Mycoplasma species. In M. genitalium the AT and GC skews are symmetrical around the origin and terminus of replication, whereas its GC content variation has been
shown to have a centre of symmetry elsewhere in the genome. Borrelia burgdorferi and Treponema pallidum show extraordinary extents of base composition skew correlated with direction of DNA replication. Base composition skews
measured at third codon positions probably reflect mutational biases, whereas those measured over all bases in a sequence
(or at codon positions 1 and 2) can be strongly affected by protein considerations due to the tendency in some bacteria for
genes to be transcribed in the same direction that they are replicated. Consequently in some species the direction of skew
for total genomic DNA is opposite to that for codon position 3.
Received: 2 February 1998 / Accepted: 15 June 1998 相似文献
10.
Base composition is not uniform across the genome of Drosophila melanogaster. Earlier analyses have suggested that there is variation in composition in D. melanogaster on both a large scale and a much smaller, within-gene, scale. Here we present analyses on 117 genes which have reliable intron/exon
boundaries and no known alternative splicing. We detect significant heterogeneity in G+C content among intron segments from
the same gene, as well as a significant positive correlation between the intron and the third codon position G+C content within
genes. Both of these observations appear to be due, in part, to an overall decline in intron and third codon position G+C
content along Drosophila genes with introns. However, there is also evidence of an increase in third codon position G+C content at the start of genes;
this is particularly evident in genes without introns. This is consistent with selection acting against preferred codons at
the start of genes.
Received: 24 February 1997 / Accepted: 10 November 1997 相似文献
11.
Nucleotide Composition Bias Affects Amino Acid Content in Proteins Coded by Animal Mitochondria 总被引:16,自引:0,他引:16
We show that in animal mitochondria homologous genes that differ in guanine plus cytosine (G + C) content code for proteins
differing in amino acid content in a manner that relates to the G + C content of the codons. DNA sequences were analyzed using
square plots, a new method that combines graphical visualization and statistical analysis of compositional differences in
both DNA and protein. Square plots divide codons into four groups based on first and second position A + T (adenine plus thymine)
and G + C content and indicate differences in amino acid content when comparing sequences that differ in G + C content. When
sequences are compared using these plots, the amino acid content is shown to correlate with the nucleotide bias of the genes.
This amino acid effect is shown in all protein-coding genes in the mitochondrial genome, including cox I, cox II, and cyt b, mitochondrial genes which are commonly used for phylogenetic studies. Furthermore, nucleotide content differences are shown
to affect the content of all amino acids with A + T- and G + C-rich codons. We speculate that phylogenetic analysis of genes
so affected may tend erroneously to indicate relatedness (or lack thereof) based only on amino acid content.
Received: 3 July 1996 / Accepted: 6 November 1996 相似文献
12.
Amino acid residues arginine (R) and lysine (K) have similar physicochemical characteristics and are often mutually substituted
during evolution without affecting protein function. Statistical examinations on human proteins show that more R than K residues
are used in the proximity of R residues, whereas more K than R are used near K residues. This biased use occurs on both a
global and a local scale (shorter than ∼100 residues). Even within a given exon, G + C-rich and A + T-rich short DNA segments
preferentially encode R and K, respectively. The biased use of R and K on a local scale is also seen in Saccharomyces cerevisiae and Caenorhabdidtis elegans, which lack global-scale mosaic structures with varying GC%, or isochores. Besides R and K, several amino acids are also used
with a positive or negative correlation with the local GC% of third codon bases. The local-, or ``within-gene'-, scale heterogeneity
of the DNA sequence may influence the sequence of the encoded protein segment.
Received: 2 March 1998 / Accepted: 23 April 1998 相似文献
13.
In many unicellular organisms, invertebrates, and plants, synonymous codon usage biases result from a coadaptation between
codon usage and tRNAs abundance to optimize the efficiency of protein synthesis. However, it remains unclear whether natural
selection acts at the level of the speed or the accuracy of mRNAs translation. Here we show that codon usage can improve the
fidelity of protein synthesis in multicellular species. As predicted by the model of selection for translational accuracy,
we find that the frequency of codons optimal for translation is significantly higher at codons encoding for conserved amino
acids than at codons encoding for nonconserved amino acids in 548 genes compared between Caenorhabditis elegans and Homo sapiens. Although this model predicts that codon bias correlates positively with gene length, a negative correlation between codon
bias and gene length has been observed in eukaryotes. This suggests that selection for fidelity of protein synthesis is not
the main factor responsible for codon biases. The relationship between codon bias and gene length remains unexplained. Exploring
the differences in gene expression process in eukaryotes and prokaryotes should provide new insights to understand this key
question of codon usage.
Received: 18 June 2000 / Accepted: 10 November 2000 相似文献
14.
In bacteria, synonymous codon usage can be considerably affected by base composition at neighboring sites. Such context-dependent
biases may be caused by either selection against specific nucleotide motifs or context-dependent mutation biases. Here we
consider the evolutionary conservation of context-dependent codon bias across 11 completely sequenced bacterial genomes. In
particular, we focus on two contextual biases previously identified in Escherichia coli; the avoidance of out-of-frame stop codons and AGG motifs. By identifying homologues of E. coli genes, we also investigate the effect of gene expression level in Haemophilus influenzae and Mycoplasma genitalium. We find that while context-dependent codon biases are widespread in bacteria, few are conserved across all species considered.
Avoidance of out-of-frame stop codons does not apply to all stop codons or amino acids in E. coli, does not hold for different species, does not increase with gene expression level, and is not relaxed in Mycoplasma spp., in which the canonical stop codon, TGA, is recognized as tryptophan. Avoidance of AGG motifs shows some evolutionary
conservation and increases with gene expression level in E. coli, suggestive of the action of selection, but the cause of the bias differs between species. These results demonstrate that
strong context-dependent forces, both selective and mutational, operate on synonymous codon usage but that these differ considerably
between genomes.
Received: 6 May 1999 / Accepted: 29 October 1999 相似文献
15.
Relationships Between Genomic G+C Content,RNA Secondary Structures,and Optimal Growth Temperature in Prokaryotes 总被引:11,自引:0,他引:11
G:C pairs are more stable than A:T pairs because they have an additional hydrogen bond. This has led to many studies on the
correlation between the guanine+cytosine (G+C) content of nucleic acids and temperature over the last 20 years. We collected
the optimal growth temperatures (Topt) and the G+C contents of genomic DNA; 23S, 16S, and 5S ribosomal RNAs; and transfer RNAs for 764 prokaryotic species. No
correlation was found between genomic G+C content and Topt, but there were striking correlations between the G+C content of ribosomal and transfer RNA stems and Topt. Two explanations have been proposed—neutral evolution and selection pressure—for the approximate equalities of G and C (respectively,
A and T) contents within each strand of DNA molecules. Our results do not support the notion that selection pressure induces
complementary oligonucleotides in close proximity and therefore numerous secondary structures in prokaryotic DNA, as the genomic
G+C content does not behave in the same way as that of folded RNA with respect to optimal growth temperature.
Received: 25 September 1996 / Accepted: 21 January 1997 相似文献
16.
Codon Usage in Plastid Genes Is Correlated with Context, Position Within the Gene, and Amino Acid Content 总被引:5,自引:0,他引:5
Highly expressed plastid genes display codon adaptation, which is defined as a bias toward a set of codons which are complementary
to abundant tRNAs. This type of adaptation is similar to what is observed in highly expressed Escherichia coli genes and is probably the result of selection to increase translation efficiency. In the current work, the codon adaptation
of plastid genes is studied with regard to three specific features that have been observed in E. coli and which may influence translation efficiency. These features are (1) a relatively low codon adaptation at the 5′ end of
highly expressed genes, (2) an influence of neighboring codons on codon usage at a particular site (codon context), and (3)
a correlation between the level of codon adaptation of a gene and its amino acid content. All three features are found in
plastid genes. First, highly expressed plastid genes have a noticeable decrease in codon adaptation over the first 10–20 codons.
Second, for the twofold degenerate NNY codon groups, highly expressed genes have an overall bias toward the NNC codon, but
this is not observed when the 3′ neighboring base is a G. At these sites highly expressed genes are biased toward NNT instead
of NNC. Third, plastid genes that have higher codon adaptations also tend to have an increased usage of amino acids with a
high G + C content at the first two codon positions and GNN codons in particular. The correlation between codon adaptation
and amino acid content exists separately for both cytosolic and membrane proteins and is not related to any obvious functional
property. It is suggested that at certain sites selection discriminates between nonsynonymous codons based on translational,
not functional, differences, with the result that the amino acid sequence of highly expressed proteins is partially influenced
by selection for increased translation efficiency.
Received: 21 July 1999 / Accepted: 5 November 1999 相似文献
17.
A+T content, phylogenetic relationships, codon usage, evolutionary rates, and ratio of synonymous versus non-synonymous substitutions
have been studied in partial sequences of the atpD and aroQ/pheA genes of primary (Buchnera) and secondary symbionts of aphids and a set of selected non-symbiotic bacteria, belonging to the five subdivisions of the
Proteobacteria. Compared to the homologous genes of the last group, both genes belonging to Buchnera behave in a similar way, showing a higher A+T content, forming a monophyletic group, a loss in codon bias, especially in
third base position, an evolutionary acceleration and an increase in the number of non-synonymous substitutions, confirming
previous results reported elsewhere for other genes. When available, these properties have been partly observed with the secondary
symbionts, but with values that are intermediate between Buchnera and free living Proteobacteria. They show high A+T content, but not as high as Buchnera, a non-solved phylogenetic position between Buchnera, and the other γ-Proteobacteria, a loss in codon bias, again not as high as in Buchnera and a significant evolutionary acceleration in the case of the three atpD genes, but not when considering aroQ/pheA genes. These results give support to the hypothesis that they are symbionts at different stages of the symbiotic accommodation
to the host. 相似文献
18.
Along the gene, nucleotides in various codon positions tend to exert a slight but observable influence on the nucleotide
choice at neighboring positions. Such context biases are different in different organisms and can be used as genomic signatures.
In this paper, we will focus specifically on the dinucleotide composed of a third codon position nucleotide and its succeeding
first position nucleotide. Using the 16 possible dinucleotide combinations, we calculate how well individual genes conform
to the observed mean dinucleotide frequencies of an entire genome, forming a distance measure for each gene. It is found that
genes from different genomes can be separated with a high degree of accuracy, according to these distance values.
In particular, we address the problem of recent horizontal gene transfer, and how imported genes may be evaluated by their
poor assimilation to the host's context biases. By concentrating on the third- and succeeding first position nucleotides,
we eliminate most spurious contributions from codon usage and amino-acid requirements, focusing mainly on mutational effects.
Since imported genes are expected to converge only gradually to genomic signatures, it is possible to question whether a gene
present in only one of two closely related organisms has been imported into one organism or deleted in the other. Striking
correlations between the proposed distance measure and poor homology are observed when Escherichia coli genes are compared to Salmonella typhi, indicating that sets of outlier genes in E. coli may contain a high number of genes that have been imported into E. coli, and not deleted in S. typhi.
Received: 16 January 2001 / Accepted: 30 August 2001 相似文献
19.
Analysis of DNA sequences of 132 introns and 140 exons from 42 pairs of orthologous genes of mouse and rat was used to compare
patterns of evolutionary change between introns and exons. The mean of the absolute difference in length (measured in base
pairs) between the two species was nearly five times as high in the case of introns as in the case of exons. The average rate
of nucleotide substitution in introns was very similar to the rate of synonymous substitution in exons, and both were about
three times the rate of substitution at nonsynonymous sites in exons. G+C content of introns and exons of the same gene were
correlated; but mean G+C content at the third positions of exons was significantly higher than that of introns or positions
1–2 of exons from the same gene. G+C content was conserved over evolutionary time, as indicated by strong correlations between
mouse and rat; but the change in G+C content was greatest at position 3 of exons, intermediate in introns, and lowest at positions
1–2 in introns.
Received: 23 December 1996 / Accepted: 1 April 1997 相似文献
20.
Characteristics of Nucleotide Substitution in the Hepatitis C Virus Genome: Constraints on Sequence Change in Coding Regions at Both Ends of the Genome 总被引:19,自引:0,他引:19
Comparison of complete genome sequences for different variants of hepatitis C virus (HCV) reveals several different constraints
on sequence change. Synonymous changes are suppressed in coding regions at both 5′ and 3′ ends of the genome. No evidence
was found for the existence of alternative reading frames or for a lower mutation frequency in these regions. Instead, suppression
may be due to constraints imposed by RNA secondary structures identified within the core and NS5b genes. Nonsynonymous substitutions
are less frequent than synonymous ones except in the hypervariable region of E2 and, to a lesser extent, in E1, NS2, and NS5b.
Transitions are more frequent than transversions, particularly at the third position of codons where the bias is 16:1. In
addition, nucleotide substitutions may not occur symmetrically since there is a bias toward G or C at the third position of
codons, while T ↔ C transitions were twice as frequent as A ↔ G transitions. These different biases do not affect the phylogenetic
analysis of HCV variants but need to be taken into account in interpreting sequence change in longitudinal studies.
Received: 9 September 1996 / Accepted: 20 April 1997 相似文献