首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
以植物钾离子外排通道(K’channeloutward.rectifier,KCO)基因为研究对象,运用CodonW软件分析了75个植物KCO基因密码子的使用模式,探讨密码子的使用模式和影响密码子使用的各种可能因素。结果表明:碱基组成差异(r=0.961,P〈0.01)和自然选择(r=0.568,P〈0.01)是影响密码子使用的主要因素,并且高表达的基因强烈偏爱使用以G或C结尾的密码子。确定了UUC、CUC等26个均以G/C结尾的密码子为植物KcD基因的高表达优越密码子。  相似文献   

2.
Codon Usage Bias and Base Composition of Nuclear Genes in Drosophila   总被引:16,自引:8,他引:8       下载免费PDF全文
E. N. Moriyama  D. L. Hartl 《Genetics》1993,134(3):847-858
The nuclear genes of Drosophila evolve at various rates. This variation seems to correlate with codon-usage bias. In order to elucidate the determining factors of the various evolutionary rates and codon-usage bias in the Drosophila nuclear genome, we compared patterns of codon-usage bias with base compositions of exons and introns. Our results clearly show the existence of selective constraints at the translational level for synonymous (silent) sites and, on the other hand, the neutrality or near neutrality of long stretches of nucleotide sequence within noncoding regions. These features were found for comparisons among nuclear genes in a particular species (Drosophila melanogaster, Drosophila pseudoobscura and Drosophila virilis) as well as in a particular gene (alcohol dehydrogenase) among different species in the genus Drosophila. The patterns of evolution of synonymous sites in Drosophila are more similar to those in the prokaryotes than they are to those in mammals. If a difference in the level of expression of each gene is a main reason for the difference in the degree of selective constraint, the evolution of synonymous sites of Drosophila genes would be sensitive to the level of expression among genes and would change as the level of expression becomes altered in different species. Our analysis verifies these predictions and also identifies additional selective constraints at the translational level in Drosophila.  相似文献   

3.
伪狂犬病病毒基因编码区碱基组成与密码子使用偏差   总被引:6,自引:0,他引:6  
由于伪狂犬病病毒(PRV)中G C含量高达74%,至今尚没有一个毒株完成全基因组测序。对已知的68个PRV基因编码区序列碱基组成及密码子使用现象进行了统计分析,结果发现PRV基因中存在非常强的密码子使用偏差。所有68个PRV基因编码区密码子第三位总的G C含量为96.24%,其中UL48基因高达99.52%。PRV基因偏向于使用富含GC的密码子,特别是以C或G结尾的密码子。此外,还发现PRV中G C含量变化较大的UL48、UL40、UL14和IE180等基因附近正好与已知的PRV基因组复制起始区相对应。根据基因功能将PRV基因分为6类进行分析发现,基因功能相同或相近的基因其密码子使用模式相似,其中调节基因的同义密码子相对使用度(RSCU)与其他基因有显著差异,在调节基因中以C结尾的密码子的RSCU值远大于其他同义密码子。最后,对PRV基因氨基酸组成差异进行多元分析,发现不同功能的PRV基因在对应分析图上分布不同,表明PRV基因密码子使用模式可能与基因功能相关。  相似文献   

4.
5.
We compared the codon usage of sequences of transposable elements (TEs) with that of host genes from the species Drosophila melanogaster, Arabidopsis thaliana, Caenorhabditis elegans, Saccharomyces cerevisiae, and Homo sapiens. Factorial correspondence analysis showed that, regardless of the base composition of the genome, the TEs differed from the genes of their host species by their AT-richness. In all species, the percentage of A + T on the third codon position of the TEs was higher than that on the first codon position and lower than that in the noncoding DNA of the genomes. This indicates that the codon choice is not simply the outcome of mutational bias but is also subject to selection constraints. A tendency toward higher A + T on the third position than on the first position was also found in the host genes of A. thaliana, C. elegans, and S. cerevisiae but not in those of D. melanogaster and H. sapiens. This strongly suggests that the AT choice is a host-independent characteristic common to all TEs. The codon usage of TEs generally appeared to be different from the mean of the host genes. In the AT-rich genomes of Arabidopsis thaliana, Caenorhabditis elegans, and Saccharomyces cerevisiae, the codon usage bias of TEs was similar to that of weakly expressed genes. In the GC-rich genome of D. melanogaster, however, the bias in codon usage of the TEs clearly differed from that of weakly expressed genes. These findings suggest that selection acts on TEs and that TEs may display specific behavior within the host genomes. Received: 2 May 2001 / Accepted: 29 October 2001  相似文献   

6.
Synonymous codon usage bias is a broadly observed phenomenon in bacteria, plants, and invertebrates and may result from selection. However, the role of selective pressures in shaping codon bias is still controversial in vertebrates, particularly for mammals. The myosin heavy-chain (MyHC) gene family comprises multiple isoforms of the major force-producing contractile protein in cardiac and skeletal muscles. Slow and fast genes are tandemly arrayed on separate chromosomes, and have distinct patterns of functionality and expression in muscle. We analyze both full-length MyHC genes (~5400?bp) and a larger collection of partial sequences at the 3' end (~500?bp). The MyHC isoforms are an interesting system in which to study codon usage bias because of their length, expression, and critical importance to organismal mobility. Codon bias and GC content differs among MyHC genes with regards to functional type, isoform, and position within the gene. Codon bias even varies by isoform within a species. We find evidence in favor of both chromosomal influences on nucleotide composition and selection against nonsense errors (SANE) acting on codon usage in MyHC genes. Intragenic variation in codon bias and elongation rate is significant, with a strong trend for increasing codon bias and elongation rate towards the 3' end of the gene, although the trend is dependent upon the degeneracy class of the codons. Therefore, patterns of codon usage in MyHC genes are consistent with models supporting SANE as a major force shaping codon usage.  相似文献   

7.
Genes of adenine-specific DNA-methyltransferase M.BspLU11IIIa and cytosine-specific DNA-methyltransferase M.BspLU11IIIb of the type IIG BspLU11III restriction-modification system from the thermophilic strain Bacillus sp. LU11 were expressed in E. coli. They contain a large number of codons that are rare in E. coli and are characterized by equal values of codon adaptation index (CAI) and expression level measure (E(g)). Rare codons are either diffused (M.BspLU11IIIa) or located in clusters (M.BspLU11IIIb). The expression level of the cytosine-specific DNA-methyltransferase was increased by a factor of 7.3 and that of adenine-specific DNA only by a factor of 1.25 after introduction of the plasmid pRARE supplying tRNA genes for six rare codons in E. coli. It can be assumed that the plasmid supplying minor tRNAs can strongly increase the expression level of only genes with cluster distribution of rare codons. Using heparin-Sepharose and phosphocellulose chromatography and gel filtration on Sephadex G-75 both DNA-methyltransferases were isolated as electrophoretically homogeneous proteins (according to the results of SDS-PAGE).  相似文献   

8.
ABSTRACT A codon usage table for the intestinal parasite Giardia lamblia was generated by analysis of the nucleotide sequences of eight genes comprising 3,135 codons. Codon usage revealed a biased use of synonomous codons with a preference for NNC codons (42.1%). The codon usage of G. lamblia more closely resembles that of the prokaryote Halobacterium halobium (correlation coefficient r = 0.73) rather than that of other eukaryotic protozoans, i.e. Trypanosoma brucei ( r = 0.434) and Plasmodium falciparum ( r =–0.31). These observations are consistent with the view that G. lamblia represents the first line of descent from the ancestral cells that first took on eukaryotic features.  相似文献   

9.
Fran Supek  Tomislav ?muc 《Genetics》2010,185(3):1129-1134
A recent investigation concluded that codon bias did not affect expression of green fluorescent protein (GFP) variants in Escherichia coli, while stability of an mRNA secondary structure near the 5′ end played a dominant role. We demonstrate that combining the two variables using regression trees or support vector regression yields a biologically plausible model with better support in the GFP data set and in other experimental data: codon usage is relevant for protein levels if the 5′ mRNA structures are not strong. Natural E. coli genes had weaker 5′ mRNA structures than the examined set of GFP variants and did not exhibit a correlation between the folding free energy of 5′ mRNA structures and protein expression.IN genomes, natural selection may act on silent sites of codons to make translation of highly expressed genes more efficient, an effect linked primarily to abundances of tRNA isoacceptor molecules (Ikemura 1985; Bulmer 1987; Kanaya et al. 1999). Codon choice may also be linked to formation of secondary structures in mRNA that reduce protein levels, as has been shown with haplotypes of the human COMT gene (Nackley et al. 2006). Kudla et al. (2009) have recently reported an experiment that contributes toward understanding how synonymous codon usage shapes gene expression. They have constructed a library of 154 synthetic variants of a green fluorescent protein (GFP) gene that varied randomly at synonymous sites while retaining the original amino acid sequence. The authors concluded that codon usage (CU) bias did not correlate with protein levels measured as fluorescence of the GFP, but also that the minimum free energy of a mRNA secondary structure in a 42-nucleotide region at [−4,37] that overlaps the start codon (“hairpin stability”) bears a great significance. CU bias was quantified by the widely used codon adaptation index (CAI) method (Sharp and Li 1987), essentially a measure of the distance of a gene''s codon usage to the codon usage of a predefined set of highly expressed genes. The CAI and some of its more recent alternatives, such as measure independent of length and composition (MILC) (Supek and Vlahovicek 2005), have been shown to be a viable surrogate for gene expression in various unicellular organisms. Additionally, in a multiple linear regression of rank fluorescence against a number of sequence-derived attributes, including CAI and the abovementioned hairpin stability, Kudla et al. (2009) did not find CAI to contribute significantly toward the prediction of protein levels, in contrast to the hairpin stability.

Both the codon adaptation index and the 5′ mRNA secondary structures influence protein levels in the Kudla et al. data:

The described statistical analyses, however, failed to address the case in which a nonlinear three-way dependency between hairpin stability, codon usage, and fluorescence might exist; data are visualized in Figure 1, A–C, and in figure 2B in Kudla et al. Such complex patterns in data are readily captured by the support vector machines (SVM) algorithm, reviewed in Noble (2006) and Ben-Hur et al. (2008). We have employed the SVM with a radial basis function kernel to regress fluorescence against both hairpin stability and CAI simultaneously (Figure 1B) and computed the Pearson''s correlation coefficient in cross-validation (here denoted as Q) between true and predicted values of fluorescence (See File S1). A linear model based solely on hairpin stability as employed by Kudla et al. (Figure 1A) can explain Q2 = 38.6% of variance in protein levels, while the nonlinear SVM regression that takes CAI into account explains Q2 = 52.2% of variance. The difference in Q is statistically significant at P = 10−190 (paired t-test). Note that Kudla et al. utilize the Spearman rank correlation coefficient (ρ) in their article; the hairpin stability would explain ρ2 = 44.6% of the variance in expression levels if the requirement for a linear relationship was abandoned in this manner.Open in a separate windowFigure 1.—Regression of protein levels against folding free energy of an mRNA hairpin at nucleotides −4 through 37 (A), against the hairpin free energy and the codon adaptation index (Sharp and Li 1987) (B and C), or against the hairpin free energy and the codon frequencies (D and E). The colors show the measured protein levels, while the background shading reflects the protein levels predicted by the specific model. (A) Predictions by linear regression. (B and E) Predictions by a support vector machine with a radial basis function kernel. (C) Predictions by an M5′ regression tree. (D) A schematic of the M5′ model, where coefficients in the terminal nodes are derived from data where protein levels, all codon frequencies, and hairpin free energies were normalized to [0,1] to facilitate comparison between the influence of codons, the hairpin stability, and the constant in the regression equation. All coefficients ≥0.1 are in boldface type. In the plots (A–C and E), a slight amount of random “jitter” was introduced to the point positions (at most, 3% of the range of each axis) to better visualize overlapping points. In the plot in E, a single outlying point is not shown. See Figure S2 for the same plots without jitter and with the outlier in E included. R2 is the squared Pearson''s correlation coefficient between actual and model-predicted protein levels; Q2 is similar, but obtained in cross-validation (10-fold, 100 runs), and is a more conservative estimate of regression accuracy.Open in a separate windowFigure 2.—The distributions of RNA folding free energies of a 42-nucleotide window in the mRNA between positions −4 and +37, where the “A” in the “AUG” start codon has index zero. The distributions are shown separately for the 154 gene variants from Kudla et al. (2009) and for the genes from the E. coli K12 genome. The dotted line indicates the 5th percentile of the E. coli values at −10.9 kcal/mol.Compared to the SVM, a more interpretable generalization of the data can be achieved by a different nonlinear regression approach, the M5′ tree (Wang and Witten 1997), which recursively divides the data to reduce the variance of the dependant variable within each partition and then builds separate linear models for the partitions. The resulting regression tree (Figure 1C; supporting information, Figure S1) better explains the correlation between protein levels on one side and hairpin stability and CAI on the other side when compared to a linear model employed by Kudla et al. that regresses protein levels against hairpin stability only [see figure 2B in Kudla et al. (2009) and Figure 1A]; 9.3% more variance is explained by the M5′, P = 10−91 (paired t-test). An interpretation that follows from the general structure of the M5′ tree (Figure S1) is that, at high mRNA hairpin stability, protein levels will generally be quite low and not dependant on CAI; in contrast, with less stable mRNA hairpins, both hairpin stability and CAI play a role in determining protein levels. In the interpretation of the M5′ tree structure, we would place less emphasis on the exact coefficients of the linear models in the leaves because the reliability of these fine-grained features of the M5′ model can strongly depend on the good coverage of all parts of the mRNA–CAI space data points.

The CAI may not be an optimal summary of codon usage for predicting expression of overexpressed genes:

Regarding use of CAI in the present context, it should be noted that CAI''s original purpose was to serve as a proxy for gene expression in conditions of abundance that result in fast growth in the organism''s environmental niche. The CAI or related approaches (Supek and Vlahovicek 2005) may not, however, be an ideal representation of codon usage when examining overexpression of a foreign protein at levels that exceed the natural abundances of the host''s most highly expressed proteins. This was indeed shown to be the case in a recent article by Welch et al. (2009) in which the authors reported an experiment with heterologous expression of variants of two proteins in E. coli: an antibody fragment and a phage DNA polymerase. Welch et al. found that codon frequencies in general, but not CAI specifically, correlated well with protein levels and postulated that for overexpressed proteins optimal codons would correspond to the codons translated efficiently under amino acid starvation (Elf et al. 2003; Dittmar et al. 2005). Analogously to Welch et al., we now apply our regression algorithms not to the CAI, but directly to the codon frequencies that CAI attempts to summarize in the Kudla et al. data (See File S2). An M5′ regression tree trained on the hairpin stability and codon frequencies (Figure 1D) explains 10.6% more variance (P = 10−83, paired t-test) in protein levels than an M5′ tree trained on hairpin stability and CAI (Figure 1C, Figure S1). A SVM regression model trained on the hairpin stability and a simple linear combination of selected codon frequencies (Figure 1E) explains 8.8% more variance (P = 10−82, paired t-test) than the SVM that uses CAI (Figure 1B). An SVM trained on the hairpin stability and the full set of codon frequencies (not shown in Figure 1) explains Q2 = 65.0% of variance in the protein abundances, a sizable increase (P ≈ 10−260, paired t-test) compared to a linear regression on solely the [−4,37] hairpin stability (Q2 = 38.6%) as originally employed by Kudla et al. and also as compared to a set of randomized controls (Q2 = 20.1–30.7%; Table S1). Therefore, not relying on a predefined notion of codon optimality—as embodied in the CAI—further strengthens the argument that the correlation of CU and protein levels is far from negligible in this data set.Additionally, we found some correlation between codon frequencies and 5′ mRNA hairpin stability in the Kudla et al. gene variants (Figure S4). The fact that the two factors were not completely independent adds weight to the relevance of CU to protein levels since one could not be certain that even the variance in protein levels explained by 5′ mRNA structures is wholly due to the structures themselves and not to the confounding variables—here, the codon frequencies.The M5′ tree trained on codon frequencies (Figure 1D) follows the same general structure as the M5′ tree trained on the CAI (Figure S1) where the codon frequencies become relevant with mRNA hairpins weaker than −9.75 kcal/mol, while with stronger [−4,37] mRNA hairpins protein levels are generally low. Our interpretation is that the lack of a stable secondary structure that could obstruct translational initiation is a necessary but not a sufficient condition for high protein expression. When the initiation phase is unhindered, the bottleneck would shift to the elongation phase in which codon optimality plays an important role. In the literature, theoretical models of translation may consider either the initiation (Bulmer 1991) or the elongation phase (Xia 1998) as the rate-limiting step of translation under physiological conditions; we are not aware of such analyses describing translation of artificially overexpressed genes.The codons identified as relevant by our M5′ model of the Kudla et al. data are different from, but not inconsistent with, those proposed by Welch et al. (Table S2). We anticipate that the rules for codon optimality for overexpression in an Escherichia coli host will become better defined as more large-scale experiments, such as the two discussed here (Kudla et al. 2009; Welch et al. 2009), are carried out.

The “RNA structure + codon usage” model agrees with independent experimental data and is robust to removal of extreme values:

Our reanalysis of the Kudla et al. data should be viewed in light of the conclusions of Welch et al. (2009) who find that codon usage, but not the 5′ hairpin stability, correlates with protein levels in their data, while noting that their gene variants generally have considerably weaker 5′ mRNA hairpins than the sequences in Kudla et al. Welch et al. reconcile the different outcomes of the two experiments by noting that “inhibition of initiation by especially strong mRNA structure would obscure effects resulting from factors that influence elongation, such as codon usage” (page 9). Here we propose that precisely the same model can be derived solely from the Kudla et al. data. Furthermore, we find that the 154 gene variants from Kudla et al. indeed do have unusually stable 5′ mRNA hairpins (mean free energy = −9.68 kcal/mol) in comparison to natural E. coli genes (mean free energy = −6.15 kcal/mol) (P = 10−38 by Mann–Whitney U-test; see Figure 2). The part of the distribution of Kudla et al. gene variants that overlaps with the bulk of the E. coli genes, with 5′ mRNA hairpin free energies lower than ∼ −10 kcal/mol, corresponds to the range where our M5′ model indicates a stronger influence of CU on protein levels (Figure S1, Figure 1D).We investigate to what extent the presence of a group of sequences extreme in their 5′ mRNA hairpin stabilities in the Kudla et al. data set (left peak in Figure 2) influenced the authors'' conclusion that the hairpin stabilities have an overarching influence on protein levels. After removing the sequences below the 5th percentile of the E. coli natural hairpin stabilities (−10.9 kcal/mol), we were left with 109 of the original 154 Kudla et al. sequences. The accuracy of regressing protein levels against mRNA hairpin stability deteriorates greatly (Q2 = 18.5%) after removing the 45 sequences, but less so with SVM and M5′ regression that take into account both CU and the hairpin stability (udla et al. basically captured the difference between these extreme cases—in which very strong 5′ mRNA secondary structures blocked expression—and all other sequences. However, to explain the variation in protein levels within the nonextreme set, hairpin stabilities by themselves are not sufficient and need to be complemented with CU.

TABLE 1

Accuracy of the regression of protein levels against 5′ mRNA hairpin stability or against 5′ mRNA hairpin stability and codon frequencies
Data setLinear regression, hairpin stability only (%)SVM, hairpin stability + codon frequencies (%)M5′, hairpin stability + codon frequencies (%)
Full (n = 154)38.665.056.7
No strong hairpins (n = 109)18.553.040.4
Open in a separate windowThe cross-validation correlation coefficient squared (Q2) is compared with the full Kudla et al. data set (154 proteins) and the reduced data set (109 proteins) where mRNA hairpin folding energies are ≥ −10.9 kcal/mol, the 5th percentile of natural E. coli genes.In addition to measuring protein levels in the 154-sequence data set, Kudla et al. performed an additional experiment where an unstructured 28-codon tag was fused to 5′ ends of 72 (of 154) GFP sequence variants. Adding the tag was found to enhance protein levels, supporting the conclusion of Kudla et al. that 5′ structure of mRNA had a strong influence on protein production. After an analysis of the data, we found (see File S3) that data from this specific experiment are not well suited to serve as a direct verification of our existing M5′ and SVM regression models. Still, we can compare the protein level predictions of our existing SVM model on the same set of sequences before and after adding the unstructured tag. We found that the predicted expression levels have increased for 67 of 72 sequences (Table S3) after adding the tag that fixes 5′ mRNA folding energy at a weak −6.1 kcal/mol, a result consistent with the Kudla et al. experiment. Additionally, we have trained a new SVM regression model only on the tagged 72-sequence set (See File S2) and found that, within this set, SVM regression can again predict GFP levels solely from codon usage (5′ mRNA structure is invariant among these sequences) at Q2 = 37.7%. This amount of variance is similar, or even somewhat larger than, the difference in the variance explained by mRNA vs. mRNA+codons (38.6% vs. 65.0%) in the original data. Therefore, codon usage is of similar importance in shaping the protein levels within the tagged 72-sequence set, as it was in the original 154-sequence set.

mRNA 5′ end secondary structure stabilities do not correlate with protein levels for natural E.

coli genes: To further verify our proposed model, we analyzed the relative contributions of mRNA hairpin stabilities and CU on expression levels of natural E. coli genes (See File S2). If the hairpin stabilities were limiting for expression in the range of folding free energies spanned by the E. coli mRNAs, one would expect to see a correlation between the free energy of mRNA 5′ end folding and the abundance of the corresponding protein. We found no such correlation using the folding free energies of the [−4,37] mRNA region (Figure 3) or equal-sized regions centered around the start codon at [−20,21] or on the expected location of a Shine–Dalgarno sequence (Shultzaberger et al. 2001) at [−30,11] (see Figure S3). Unsurprisingly, CAI correlated well with protein levels (Figure 3) in all examined experimental data sets (Lopez-Campistrous et al. 2005; Lu et al. 2007; Ishihama et al. 2008). Therefore, within the boundaries of the mRNA folding free energies spanned by E. coli genes, the CU plays a dominant role in shaping gene expression (or the CU may possibly be shaped by the expression; see Concluding remarks). As for the stronger mRNA hairpins with < −11 kcal/mol, they are present in the Kudla et al. data, but are very rare in the E. coli genome, which could be explained by one of two scenarios: (i) Above a certain threshold, the mRNA hairpin stability may become so detrimental to expression that all the mutants having such hairpins are subject to very strong negative selection and therefore are absent from the genome. And/or (ii) the Kudla et al. data set may not be representative of the genes in the E. coli genome or the mutational processes they undergo; for example, the amino acid sequence of the GFP''s beginning might be unusually conducive to forming RNA hairpins. Unless further analyses prove differently, it seems reasonable to surmise that in natural E. coli genes mRNA secondary structures would shape expression if they were highly stable, consistent with the finding of a universal (albeit not particularly strong) trend toward avoidance of 5′ mRNA structures in genomes (Gu et al. 2010). However, it can also be concluded at this point—and with more confidence—that at lower secondary structure stabilities the CU has an overarching influence on expression. Such a model of expression-related gene sequence determinants in E. coli is fully consistent with our interpretation of the M5′ regression tree that we have derived from the Kudla et al. data.Open in a separate windowFigure 3.—Correlations between the E. coli absolute protein abundances measured in three independent experiments (Lopez-Campistrous et al. 2005; Lu et al. 2007; Ishihama et al. 2008) and the codon adaptation index (CAI) or the free energy of folding of a secondary structure in the mRNA [−4,37] region (in kcal/mol; more negative values denote a more stable RNA secondary structure). “ρ” is the Spearman''s rank correlation coefficient.

Concluding remarks:

We argue that Kudla et al. worked with a set of gene sequences in which strong mRNA secondary structures (that effectively abolished expression) were frequent enough to mask the relevance of codon frequencies on protein levels when examined only with linear regression methods. While mRNA secondary structures can certainly occur when designing synthetic genes, it is highly questionable to what extent Kudla et al.''s conclusion that CU is of little importance for expression would be generally valid for biotechnological applications, especially since we have shown that the influence of CU is nevertheless present even in the Kudla et al. data. What is beyond doubt, however, is that a strong 5′ mRNA secondary structure can be a roadblock in heterologous expression, and therefore the synthetic gene variants harboring such structures should be avoided. The more specific rules regarding the exact location of the hairpin on the gene sequence, the hairpin''s length, or the tolerable levels of folding free energy will have to be established by further experimentation.A recent algorithm for estimating the efficiency of ribosomal binding sites from the mRNA sequence (Salis et al. 2009) explicitly takes into account the folding free energy of RNA secondary structures, along with other factors. When protein overexpression is desired, the conclusions of Welch et al. and (by our reanalysis) the Kudla et al. data indicate that CU should be optimized in addition to the ribosome binding site sequence to ensure that both initiation and elongation phases of translation are free of impediments.On the basis of their results, Kudla et al. also discuss the evolutionary link between the CU of natural genes and the expression levels of proteins for which they code. They propose that selection for translational efficiency acts at a global level in cells; the codons that accelerate elongation would be preferred in a highly expressed gene not because they facilitate production of that particular protein, but to free up ribosomes for the rate-determining initiation phase of translation of the total cellular mRNA pool. Effectively, the flow of causality between CU and expression would be reversed in comparison to the established view. This hypothesis should be critically reevaluated because it depends on the assertion that manipulating a gene''s CU cannot cause protein levels to increase, an assertion poorly supported by the Kudla et al. data.  相似文献   

10.
The regulatory mechanisms of determining which genes specifically expressed in which tissues are still not fully elucidated, especially in plants. Using internal correspondence analysis, I first establish that tissue-specific genes exhibit significantly different synonymous codon usage in rice, although this effect is weak. The variability of synonymous codon usage between tissues accounts for 5.62% of the total codon usage variability, which has mainly arisen from the neutral evolutionary forces, such as GC content variation among tissues. Moreover, tissue-specific genes are under differential selective constraints, inferring that natural selection also contributes to the codon usage divergence between tissues. These findings may add further evidence in understanding the differentiation and regulation of tissue-specific gene products in plants.  相似文献   

11.
Summary An analysis of 4680 codons expressed by pathogenic Entamoeba histolytica showed the A+U content of coding sequences to be 67%. The preference for A+U resulted in an unusual codon usage with an A+U content of 84% in the third codon position. The data show a remarkable similarity to those obtained for Plasmodium falciparum.  相似文献   

12.
毕赤酵母的密码子用法分析   总被引:130,自引:5,他引:130  
通过分析Pichia pastoris的28个蛋白编码基因的同义密码子使用情况并计算该酵母的密码子用法,首次确定出P.pastoris的19个高表达优越密码子。这些结果经与已知的Saccharomyces cerevisiaeKluyveromyces lactis的密码子用法基本相似,但在氨基酸谷氨酸的密码子选择上截然相反,提示这可能属于P.pastoris所偏爱的密码子用法。  相似文献   

13.
The correlation was shown between the length of introns and the codon usage of the coding sequences of the corresponding genes, which in some cases can be related to the level of gene expression. The link is positive in the unicellular organisms, i.e., genes with the longer introns show the higher bias of codon usage. It is most pronounced in baker's yeast, where it is definitely related to the level of gene expression—genes with the higher level of expression have the longer introns. The correlation is inverted in multicellular organisms as compared to unicellular ones. Some organisms, however, do not show the link. The presence or absence of the link does not seem to be related to the GC percent of the coding sequences. Received: 7 December 1999 / Accepted: 10 May 2000  相似文献   

14.
15.
Transposable elements (TEs) are mobile genetic entities ubiquitously distributed in nearly all genomes.High frequency of codons ending in A/T in TEs has been previously observed in some species.In this study,the biases in nucleotide composition and codon usage of TE transposases and host nuclear genes were investigated in the AT-rich genome of Arabidopsis thaliana and the GC-rich genome of Oryza sativa.Codons ending in A/T are more frequently used by TEs compared with their host nuclear genes.A remarkable p...  相似文献   

16.
影响耶尔森氏鼠疫杆菌基因组密码子使用的因素分析   总被引:2,自引:2,他引:2  
基因组密码子使用的影响因素分析有助于发现影响密码子使用的进化动力学 ,对发现和预测进化的方向和模式有重要的作用。同时 ,分析完整的基因组可以发现特定基因组中密码子的使用模式 ,从而重新设计高效的PCR引物和外源导入基因 ,促进外源基因在特定生物体中的高效率表达。导致瘟疫等外源性感染疾病的耶尔森氏鼠疫杆菌完整基因组序列已经测序公布。为了对鼠疫杆菌的同义密码子使用的进化模式有更加深入的了解 ,详细的研究分析鼠疫杆菌的基因组密码子的使用模式和影响密码子使用的因素。结果发现 ,尽管鼠疫杆菌基因组序列中“G” “C”含量相对较低 (4 7.6 4 % ) ,高水平表达基因的密码子第三位碱基使用胞嘧啶 (C)的频率比表达水平低的基因使用胞嘧啶 (C)有显著的提高 ,表达水平较低的基因在密码子的第三位碱基更趋向使用鸟嘌呤 (G)。在表达水平高低的两组基因中 ,对密码子的第三位碱基使用腺嘌呤 (A)和胸腺嘧啶 (T)总体上趋于随机使用。基因的表达水平与对应分析的第一条向量轴呈高度相关 (R =0 .6 3,P <0 .0 0 0 1)。通过分析比较表达水平高低两组基因的密码子使用模式发现 ,基因的表达水平对于密码子使用有显著的影响。GC skew分析结果显示 ,复制转录阶段的选择对密码子使用有一定的影响。在不同长度  相似文献   

17.
为了分析鲨烯合酶(squalene synthase, SS)基因密码子的使用方式及其影响因素,利用codon W和SPSS 16.0软件对47条来自不同物种的SS基因进行多元统计分析、对应性分析.SS基因密码子1~3位碱基的GC含量(GC1, GC2和GC3)依次为51.33%、34.65%和54.37%,3个位点的GC含量均呈极显著相关关系(p<0.01),对应性分析的结果表明,第1轴显示30.71%的差异,有效密码子数和GC3、GC1和GC2的均值与GC3之间的相关性均达极显著水平(p<0.01).筛选出的26个最优密码子的第3位碱基均为G或C.以MEGA 5.0构建的基于SS蛋白质序列的进化树比基于RSCU的聚类更符合传统的系统发育观点.SS基因密码子偏好以G/C结尾,使用模式受选择和突变影响,突变对密码子偏好影响较大.  相似文献   

18.
Codon Usage in Tetrahymena and Other Ciliates   总被引:6,自引:0,他引:6  
Codon usage in ciliates was examined by analyzing the coding regions of 22 ciliate genes corresponding to a total of 26, 142 nucleotides (8, 714 codons). It was found that Tetrahymena, Paramecium and the hypotrichs ( Oxytricha and Stylonychia ) differed in which synonymous codons were used most frequently by their genes. In fact, the codon choices in highly expressed Tetrahymena genes were more similar to those of yeast genes than those of Paramecium genes. The ciliates do not appear to have unusually strong biases in codon usage frequency when compared to other protists such as yeast. The analysis of the Tetrahymena genes indicated that genes which are highly expressed during normal cell growth have a stronger bias towards using the "preferred" codons than those expressed at lower levels during growth or for brief periods during processes such as conjugation. This conforms to what is found in other protists.  相似文献   

19.
密码对的使用与基因组进化   总被引:6,自引:0,他引:6  
以5种真核、20种细菌、10种古菌生物的基因组为样本,分析了编码序列中密码对和基因间序列中三联体对的相对模式数随频数的分布,验证了这种分布符合Γ(α,β)分布。发现分布形状参数!值与生物基因组进化存在明显的相关性;编码序列与基因间序列的进化方式截然不同。随着进化,编码序列的分布形状逐渐向随机分布靠近(α值逐渐增大)。而对基因间序列,古菌与真核生物的分布形状接近,与细菌的分布相差明显。  相似文献   

20.
Selection pressures on proteins are usually measured by comparing homologous nucleotide sequences (Zuckerkandl and Pauling 1965). Recently we introduced a novel method, termed volatility, to estimate selection pressures on proteins on the basis of their synonymous codon usage (Plotkin and Dushoff 2003; Plotkin et al. 2004). Here we provide a theoretical foundation for this approach. Under the Fisher-Wright model, we derive the expected frequencies of synonymous codons as a function of the strength of selection on amino acids, the mutation rate, and the effective population size. We analyze the conditions under which we can expect to draw inferences from biased codon usage, and we estimate the time scales required to establish and maintain such a signal. We find that synonymous codon usage can reliably distinguish between negative selection and neutrality only for organisms, such as some microbes, that experience large effective population sizes or periods of elevated mutation rates. The power of volatility to detect positive selection is also modest—requiring approximately 100 selected sites—but it depends less strongly on population size. We show that phenomena such as transient hyper-mutators can improve the power of volatility to detect selection, even when the neutral site heterozygosity is low. We also discuss several confounding factors, neglected by the Fisher-Wright model, that may limit the applicability of volatility in practice. Electronic Supplementary Material Electronic Supplementary material is available for this article at and accessible for authorised users. [Reviewing Editor: Dr. Lauren Meyers]  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号