首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
We present theoretical considerations that suggest that synonymous-codon usage might be expected to be close to an equilibrium distribution given a very homogeneous process of silent substitution. By homogeneous we mean that substitution depends only on the two bases involved, so that 12 base-substitution rates completely describe the silent substitution process. We have developed a method of statistically testing for such homogeneous equilibrium and applied it to reported data on the codon usages of different classes of organisms. Weakly expressed bacterial sequences and both mammalian and nonmammalian eukaryotic sequences deviate significantly from a random pattern of codon usage, in the direction of homogeneous equilibrium. On the other hand, highly expressed bacterial sequences do not exhibit homogeneous equilibrium, which may be correlated with recent experimental results showing that they are optimized to accept the most abundant tRNAs. To examine the effect of amino acid replacements on the homogeneous model of silent substitution, we divided the amino acids with degenerate codes into two classes, those with high mutabilities and those with low, and performed the same analysis on bacterial and eukaryotic data sets. The codon sets of the highly mutable class of amino acids are not further from homogeneous equilibrium than are the codon sets of the class with low mutabilities. We also found for the eukaryotic data that these independent classes of codon sets show very similar equilibrium patterns. The various results suggest a high level of uniformity in the process of silent fixation in the different synonymous-codon sets, especially in eukaryotes.  相似文献   

2.
We study the equilibrium in the use of synonymous codons by eukaryotic organisms and find five equations involving substitution rates that we believe embody the important implications of equilibrium for the process of silent substitution. We then combine these five equations with additional criteria to determine sets of substitution rates applicable to eukaryotic organisms. One method employs the equilibrium equations and a principle of maximum entropy to find the most uniform set of rates consistent with equilibrium. In a second method we combine the equilibrium equations with data on the man-mouse divergence to determine that set of rates that is most neutral yet consistent with both types of data (i.e., equilibrium and divergence data). Simulations show this second method to be quite reliable in spite of significant saturation in the substitution process. We find that when divergence data are included in the calculation of rates, even though these rates are chosen to be as neutral as possible, the strength of selection inferred from the nonuniformity of the rates is approximately doubled. Both sets of rates are applied to estimate the human-mouse divergence time based on several independent subsets of the divergence data consisting of the quartet, C- or T-ending duet, and A- or G-ending duet codon sets. Both rate sets produce patterns of divergence times that are shortest for the quartet data, intermediate for the CT-ending duets, and longest for the AG-ending duets. This indicates that rates of transitions in the duet-codon sets are significantly higher than those in the quartet-codon sets; this effect is especially marked for A----G, the rate of which in duets must be about double that in quartets.  相似文献   

3.
Summary We study the equilibrium in the use of synonymous codons by eukaryotic organisms and find five equations involving substitution rates that we believe embody the important implications of equilibrium for the process of silent substitution. We then combine these five equations with additional criteria to determine sets of substitution rates applicable to eukaryotic organisms. One method employs the equilibrium equations and a principle of maximum entropy to find the most uniform set of rates consistent with equilibrium. In a second method we combine the equilibrium equations with data on the man-mouse divergence to determine that set of rates that is most neutral yet consistent with both types of data (i.e., equilibrium and divergence data). Simulations show this second method to be quite reliable in spite of significant saturation in the substitution process. We find that when divergence data are included in the calculation of rates, even though these rates are chosen to be as neutral as possible, the strength of selection inferred from the nonuniformity of the rates is approximately doubled. Both sets of rates are applied to estimate the human-mouse divergence time based on several independent subsets of the divergence data consisting of the quartet, C- or T-ending duet, and A- or G-ending duet codon sets. Both rate sets produce patterns of divergence times that are shortest for the quartet data, intermediate for the CT-ending duets, and longest for the AG-ending duets. This indicates that rates of transitions in the duet-codon sets are significantly higher than those in the quartet-codon sets; this effect is especially marked for AG, the rate of which in duets must be about double that in quartets.  相似文献   

4.
X Liu  H Liu  W Guo  K Yu 《Gene》2012,509(1):136-141
Codon models are now widely used to draw evolutionary inferences from alignments of homologous sequence data. Incorporating physicochemical properties of amino acids into codon models, two novel codon substitution models describing the evolution of protein-coding DNA sequences are presented based on the similarity scores of amino acids. To describe substitutions between codons a continue-time Markov process is used. Transition/transversion rate bias and nonsynonymous codon usage bias are allowed in the models. In our implementation, the parameters are estimated by maximum-likelihood (ML) method as in previous studies. Furthermore, instantaneous mutations involving more than one nucleotide position of a codon are considered in the second model. Then the two suggested models are applied to five real data sets. The analytic results indicate that the new codon models considering physicochemical properties of amino acids can provide a better fit to the data comparing with existing codon models, and then produce more reliable estimates of certain biologically important measures than existing methods.  相似文献   

5.
Genes sequences from Escherichia coli, Salmonella typhimurium, and other members of the Enterobacteriaceae show a negative correlation between the degree of synonymous-codon usage bias and the rate of nucleotide substitution at synonymous sites. In particular, very highly expressed genes have very biased codon usage and accumulate synonymous substitutions very slowly. In contrast, there is little correlation between the degree of codon bias and the rate of protein evolution. It is concluded that both the rate of synonymous substitution and the degree of codon usage bias largely reflect the intensity of selection at the translational level. Because of the high variability among genes in rates of synonymous substitution, separate molecular clocks of synonymous substitution might be required for different genes.   相似文献   

6.
Abstract The influence of local base composition on mutations in chloroplast DNA (cpDNA) is studied in detail and the resulting, empirically derived, mutation dynamics are used to analyze both base composition and codon usage bias. A 4 × 4 substitution matrix is generated for each of the 16 possible flanking base combinations (contexts) using 17,253 noncoding sites, 1309 of which are variable, from an alignment of three complete grass chloroplast genome sequences. It is shown that substitution bias at these sites is correlated with flanking base composition and that the A+T content of these flanking sites as well as the number of flanking pyrimidines on the same strand appears to have general influences on substitution properties. The context-dependent equilibrium base frequencies predicted from these matrices are then applied to two analyses. The first examines whether or not context dependency of mutations is sufficient to generate average compositional differences between noncoding cpDNA and silent sites of coding sequences. It is found that these two classes of sites exist, on average, in very different contexts and that the observed mutation dynamics are expected to generate significant differences in overall composition bias that are similar to the differences observed in cpDNA. Context dependency, however, cannot account for all of the observed differences: although silent sites in coding regions appear to be at the equilibrium predicted, noncoding cpDNA has a significantly lower A+T content than expected from its own substitution dynamics, possibly due to the influence of indels. The second study examines the codon usage of low-expression chloroplast genes. When context is accounted for, codon usage is very similar to what is predicted by the substitution dynamics of noncoding cpDNA. However, certain codon groups show significant deviation when followed by a purine in a manner suggesting some form of weak selection other than translation efficiency. Overall, the findings indicate that a full understanding of mutational dynamics is critical to understanding the role selection plays in generating composition bias and sequence structure.  相似文献   

7.
Lightfield J  Fram NR  Ely B 《PloS one》2011,6(3):e17677
The GC content of bacterial genomes ranges from 16% to 75% and wide ranges of genomic GC content are observed within many bacterial phyla, including both gram negative and gram positive phyla. Thus, divergent genomic GC content has evolved repeatedly in widely separated bacterial taxa. Since genomic GC content influences codon usage, we examined codon usage patterns and predicted protein amino acid content as a function of genomic GC content within eight different phyla or classes of bacteria. We found that similar patterns of codon usage and protein amino acid content have evolved independently in all eight groups of bacteria. For example, in each group, use of amino acids encoded by GC-rich codons increased by approximately 1% for each 10% increase in genomic GC content, while the use of amino acids encoded by AT-rich codons decreased by a similar amount. This consistency within every phylum and class studied led us to conclude that GC content appears to be the primary determinant of the codon and amino acid usage patterns observed in bacterial genomes. These results also indicate that selection for translational efficiency of highly expressed genes is constrained by the genomic parameters associated with the GC content of the host genome.  相似文献   

8.
A codon-based model of nucleotide substitution for protein-coding DNA sequences   总被引:34,自引:23,他引:11  
A codon-based model for the evolution of protein-coding DNA sequences is presented for use in phylogenetic estimation. A Markov process is used to describe substitutions between codons. Transition/transversion rate bias and codon usage bias are allowed in the model, and selective restraints at the protein level are accommodated using physicochemical distances between the amino acids coded for by the codons. Analyses of two data sets suggest that the new codon-based model can provide a better fit to data than can nucleotide-based models and can produce more reliable estimates of certain biologically important measures such as the transition/transversion rate ratio and the synonymous/nonsynonymous substitution rate ratio.   相似文献   

9.
Analysis of occurrence of simple amino acid repeats in large ensemble of prokaryotic and eukaryotic sequences reveals that nearly all amino acids found in the repeats belong to those which have in their codon repertoires aggressively expanding triplets, all of three known pathologically expanding classes GCU (GCU, CUG, UGC, AGC, GCA, CAG), GCC (GCC, CCG, CGC, GGC, GCG, CGG), and AAG (AAG, AGA, GAA, CTT, TTC, TCT). This is observed especially clear in the first exons of proteins of higher eukaryotes. The data are interpreted as manifestation of everlasting triplet expansions, which, presumably, started from the very origin of the triplet code. The spontaneous expansions continued to occur all the way during evolution, leaving their footprints in the protein-coding sequences as still visible simple amino acid repeats, as preferred triplets encoding the repeats, and as preferred codons in the codon usage tables.  相似文献   

10.
11.
12.
H. Akashi 《Genetics》1996,144(3):1297-1307
Both natural selection and mutational biases contribute to variation in codon usage bias within Drosophila species. This study addresses the cause of codon bias differences between the sibling species, Drosophila melanogaster and D. simulans. Under a model of mutation-selection-drift, variation in mutational processes between species predicts greater base composition differences in neutrally evolving regions than in highly biased genes. Variation in selection intensity, however, predicts larger base composition differences in highly biased loci. Greater differences in the G+C content of 34 coding regions than 46 intron sequences between D. melanogaster and D. simulans suggest that D. melanogaster has undergone a reduction in selection intensity for codon bias. Computer simulations suggest at least a fivefold reduction in N(e)s at silent sites in this lineage. Other classes of molecular change show lineage effects between these species. Rates of amino acid substitution are higher in the D. melanogaster lineage than in D. simulans in 14 genes for which outgroup sequences are available. Surprisingly, protein sizes are larger in D. melanogaster than in D. simulans in the 34 genes compared between the two species. A substantial fraction of silent, replacement, and insertion/deletion mutations in coding regions may be weakly selected in Drosophila.  相似文献   

13.
Highly expressed genes in many bacteria and small eukaryotes often have a strong compositional bias, in terms of codon usage. Two widely used numerical indices, the codon adaptation index (CAI) and the codon usage, use this bias to predict the expression level of genes. When these indices were first introduced, they were based on fairly simple assumptions about which genes are most highly expressed: the CAI was originally based on the codon composition of a set of only 24 highly expressed genes, and the codon usage on assumptions about which functional classes of genes are highly expressed in fast-growing bacteria. Given the recent advent of genome-wide expression data, we should be able to improve on these assumptions. Here, we measure, in yeast, the degree to which consideration of the current genome-wide expression data sets improves the performance of both numerical indices. Indeed, we find that by changing the parameterization of each model its correlation with actual expression levels can be somewhat improved, although both indices are fairly insensitive to the exact way they are parameterized. This insensitivity indicates a consistent codon bias amongst highly expressed genes. We also attempt direct linear regression of codon composition against genome-wide expression levels (and protein abundance data). This has some similarity with the CAI formalism and yields an alternative model for the prediction of expression levels based on the coding sequences of genes. More information is available at http://bioinfo.mbb.yale.edu/expression/codons.  相似文献   

14.
Gu W  Zhou T  Ma J  Sun X  Lu Z 《Bio Systems》2004,73(2):89-97
The role of silent position in the codon on the protein structure is an interesting and yet unclear problem. In this paper, 563 Homo sapiens genes and 417 Escherichia coli genes coding for proteins with four different folding types have been analyzed using variance analysis, a multivariate analysis method newly used in codon usage analysis, to find the correlation between amino acid composition, synonymous codon, and protein structure in different organisms. It has been found that in E. coli, both amino acid compositions in differently folded proteins and synonymous codon usage in different gene classes coding for differently folded proteins are significantly different. It was also found that only amino acid composition is different in different protein classes in H. sapiens. There is no universal correlation between synonymous codon usage and protein structure in these two different organisms. Further analysis has shown that GC content on the second codon position can distinguish coding genes for different folded proteins in both organisms.  相似文献   

15.
An evolutionary perspective on synonymous codon usage in unicellular organisms   总被引:64,自引:0,他引:64  
Summary Observed patterns of synonymous codon usage are explained in terms of the joint effects of mutation, selection, and random drift. Examination of the codon usage in 165Escherichia coli genes reveals a consistent trend of increasing bias with increasing gene expression level. Selection on codon usage appears to be unidirectional, so that the pattern seen in lowly expressed genes is best explained in terms of an absence of strong selection. A measure of directional synonymous-codon usage bias, the Codon Adaptation Index, has been developed. In enterobacteria, rates of synonymous substitution are seen to vary greatly among genes, and genes with a high codon bias evolve more slowly. A theoretical study shows that the patterns of extreme codon bias observed for someE. coli (and yeast) genes can be generated by rather small selective differences. The relative plausibilities of various theoretical models for explaining nonrandom codon usage are discussed.Presented at the FEBS Symposium on Genome Organization and Evolution, held in Crete, Greece, September 1–5, 1986  相似文献   

16.
The Selective Advantage of Synonymous Codon Usage Bias in Salmonella   总被引:1,自引:0,他引:1  
The genetic code in mRNA is redundant, with 61 sense codons translated into 20 different amino acids. Individual amino acids are encoded by up to six different codons but within codon families some are used more frequently than others. This phenomenon is referred to as synonymous codon usage bias. The genomes of free-living unicellular organisms such as bacteria have an extreme codon usage bias and the degree of bias differs between genes within the same genome. The strong positive correlation between codon usage bias and gene expression levels in many microorganisms is attributed to selection for translational efficiency. However, this putative selective advantage has never been measured in bacteria and theoretical estimates vary widely. By systematically exchanging optimal codons for synonymous codons in the tuf genes we quantified the selective advantage of biased codon usage in highly expressed genes to be in the range 0.2–4.2 x 10−4 per codon per generation. These data quantify for the first time the potential for selection on synonymous codon choice to drive genome-wide sequence evolution in bacteria, and in particular to optimize the sequences of highly expressed genes. This quantification may have predictive applications in the design of synthetic genes and for heterologous gene expression in biotechnology.  相似文献   

17.
The definition of a typical sec-dependent bacterial signal peptide contains a positive charge at the N-terminus, thought to be required for membrane association. In this study the amino acid distribution of all Escherichia coli secretory proteins were analysed. This revealed that there was a statistically significant bias for lysine at the second codon position (P2), consistent with a role for the positive charge in secretion. Removal of the positively charged residue P2 in two different model systems revealed that a positive charge is not required for protein export. A well-characterized feature of large amino acids like lysine at P2 is inhibition of N-terminal methionine removal by methionyl amino-peptidase (MAP). Substitution of lysine at P2 for other large or small amino acids did not affect protein export. Analysis of codon usage revealed that there was a bias for the AAA lysine codon at P2, suggesting that a non-coding function for the AAA codon may be responsible for the strong bias for lysine at P2 of secretory signal sequences. We conclude that the selection for high translation initiation efficiency maybe the selective pressure that has led to codon and consequent amino acid usage at P2 of secretory proteins.  相似文献   

18.
Bayes prediction quantifies uncertainty by assigning posterior probabilities. It was used to identify amino acids in a protein under recurrent diversifying selection indicated by higher nonsynonymous (d(N)) than synonymous (d(S)) substitution rates or by omega = d(N)/d(S) > 1. Parameters were estimated by maximum likelihood under a codon substitution model that assumed several classes of sites with different omega ratios. The Bayes theorem was used to calculate the posterior probabilities of each site falling into these site classes. Here, we evaluate the performance of Bayes prediction of amino acids under positive selection by computer simulation. We measured the accuracy by the proportion of predicted sites that were truly under selection and the power by the proportion of true positively selected sites that were predicted by the method. The accuracy was slightly better for longer sequences, whereas the power was largely unaffected by the increase in sequence length. Both accuracy and power were higher for medium or highly diverged sequences than for similar sequences. We found that accuracy and power were unacceptably low when data contained only a few highly similar sequences. However, sampling a large number of lineages improved the performance substantially. Even for very similar sequences, accuracy and power can be high if over 100 taxa are used in the analysis. We make the following recommendations: (1) prediction of positive selection sites is not feasible for a few closely related sequences; (2) using a large number of lineages is the best way to improve the accuracy and power of the prediction; and (3) multiple models of heterogeneous selective pressures among sites should be applied in real data analysis.  相似文献   

19.
Synonymous codons are unevenly distributed among genes, a phenomenon termed codon usage bias. Understanding the patterns of codon bias and the forces shaping them is a major step towards elucidating the adaptive advantage codon choice can confer at the level of individual genes and organisms. Here, we perform a large-scale analysis to assess codon usage bias pattern of pyrimidine-ending codons in highly expressed genes in prokaryotes. We find a bias pattern linked to the degeneracy of the encoded amino acid. Specifically, we show that codon-pairs that encode two- and three-fold degenerate amino acids are biased towards the C-ending codon while codons encoding four-fold degenerate amino acids are biased towards the U-ending codon. This codon usage pattern is widespread in prokaryotes, and its strength is correlated with translational selection both within and between organisms. We show that this bias is associated with an improved correspondence with the tRNA pool, avoidance of mis-incorporation errors during translation and moderate stability of codon-anticodon interaction, all consistent with more efficient translation.  相似文献   

20.
The striking mutational specificity of N-methyl-N'-nitro-N-nitrosoguanidine (MNNG) exhibited in the lacI gene in Escherichia coli allows comment on the phenotypic consequences of mutation at specific sequences that are not recovered after MNNG mutagenesis. We predict that the I+ phenotype is maintained when such silent positions are substituted by amino acids whose codons are generated by the MNNG-directed G:C----A:T transition. We chose the mutationally silent Gly200 codon (an MNNG hotspot motif sequence) to test this prediction. Through MNNG mutagenesis we have generated, identified and isolated a G:C----A:T transition at position 627 (5'-G-G-C-3') under non-selective conditions which creates the Gly200----Asp substitution. The I+ phenotype is retained for this altered repressor.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号