首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
以6种模式生物基因组为样本,从密码对的碱基组成及密码子的使用两方面,分析了最适密码对与稀有密码对的使用。结果显示:6种生物的最适密码对rP双碱基TA出现的频数都是最低的,而出现频率最大的双碱墓对于古菌、细菌、真核是不同的;稀有密码对中双碱基TA出现的频数却是最高的,而出现频率最低的双碱基刘·于古菌、细菌、真核是不同的。这说明双碱基的分布与密码对的偏好性有很强的相关性,同时也与基因组进化存在关联。另外,我们也分析了本文的6种生物编码序列叶,最适密码对与稀有密码对的出现频数与密码了的相对使用频率的关系,发现密码对的出现频数与其密码子的使用存在相关性。  相似文献   

2.
大肠杆菌基因中密码子前后碱基的使用与蛋白质结构   总被引:4,自引:0,他引:4  
对一组E.coli基因中编码蛋白质各类二级结构(α-螺旋、β-折叠片、无规卷曲和回折)的密码子前后碱基的使用情况进行统计分析和比较,发现一些密码子前后碱基的使用有偏向,而且这些偏向与蛋白质的二级结构有关联,这同时亦表明,E.coli基因中同义密码子的选用与蛋白质的二级结构有一些关联。模型对于蛋白质结构预测算法的改进以及基因工程的研究有辅助作用。  相似文献   

3.
密码对的使用与基因组进化   总被引:6,自引:0,他引:6  
以5种真核、20种细菌、10种古菌生物的基因组为样本,分析了编码序列中密码对和基因间序列中三联体对的相对模式数随频数的分布,验证了这种分布符合Γ(α,β)分布。发现分布形状参数!值与生物基因组进化存在明显的相关性;编码序列与基因间序列的进化方式截然不同。随着进化,编码序列的分布形状逐渐向随机分布靠近(α值逐渐增大)。而对基因间序列,古菌与真核生物的分布形状接近,与细菌的分布相差明显。  相似文献   

4.
以密码对使用偏好性和密码对中二核苷酸频率分别构建了系统发育树。发现用40种模式生物编码序列中密码对的二核苷酸频率构建的系统发育树,明显将生物按进化分成细菌,古菌,真核生物;用密码对使用偏好性指标构建的系统发育树与基于密码对中二核苷酸频率的系统发育树基本一致。结果表明密码对中二核苷酸组分是密码对偏好的决定因素之一。  相似文献   

5.
分析了厌氧性粘菌(Anaeromyxobacter_dehalogenans_2N-C)基因组中密码对的使用,发现其全基因组中有5.2%的密码对模式是缺失的.分析结果表明其密码对的偏好至少可能是四个方面压力的结果:(1)基因组局部的及整体的GC含量,(2)密码对中二核苷酸的组分,(3)氨基酸的亲疏水性,(4)基因组中二肽的保守水平。  相似文献   

6.
利用异源表达系统生产重组蛋白已成为现代基因工程和生物工程研究热点和重点。但是研究者发现并非所有的基因都能在异源宿主中高效表达,除了宿主、分泌途径、启动子等因素外,基因自身的序列也蕴含了多种影响蛋白表达的因素,如密码子偏爱性,密码子对偏爱性,GC含量,mRNA二级结构,mRNA稳定性等。从基因设计的角度对影响蛋白表达的因素和方法进行了综述,尤其是对密码子优化和密码子对优化,详细讨论了与传统基因优化理念截然不同的密码子协调化及密码子对协调化等最新进展。  相似文献   

7.
目的:分析基因组组分极其偏向的厌氧性粘菌和立克次氏体基因中密码对的使用,研究DNA双链密码对使用的不对称性。方法:生物统计学。结果:发现脱卤厌氧粘菌和立克次氏体基因组中分别有17%和21%的密码对在DNA双链上的使用偏好性正好相对,这表明它们的前导链与滞后链上密码对的使用偏好存在差异。因此,影响密码对搭配的重要原因之一是基因所在链的特性。这些特性可能包括:基因方向性偏好、密码子使用偏好、密码子的前后文关系等。结论:造成上述两物种DNA双链间密码对使用不对称的原因可能是DNA链特异的突变偏好性和在复制、转录、翻译水平上的自然选择约束。  相似文献   

8.
转座因子对水稻同义密码子使用偏性的影响   总被引:1,自引:0,他引:1  
利用635个包含完整转座因子插入的粳稻CDS序列,对转座因子如何影响基因编码区的碱基组成及基因的表达水平,进而对基因同义密码子的使用偏性产生影响进行了详细分析。结果表明:转座因子插入极显著地影响到基因编码区的同义密码子使用但并非唯一因素;转座因子对不同基因的表达水平具有多重影响,有的基因表达被抑制,有的反而增强,但总的来说它减少了基因表达水平对同义密码子使用的影响程度。  相似文献   

9.
人类基因同义密码子偏好的特征以及与基因GC含量的关系   总被引:24,自引:0,他引:24  
对人类的728个基因,按其编码区中GC的含量分成四组(从GC<0.43到GC>0.58),分别考察了这四组样本对同义密码子偏好的特征,发现在全部样本中都呈现NTG(N代表四种碱基中的任一种)特受偏爱和NCG尽量避免的特征.基因环境中GC含量与C3/G3含量(密码子第三位C和G的含量)的相关分析,以及四组样本对密码子的偏好都支持以C结尾的密码子在编码中有特殊的优势,这种优势有利于保证翻译的准确性.还考察了各种氨基酸含量随编码区GC含量不同而变化的趋势.  相似文献   

10.
杨树派间不同种的遗传密码子使用频率分析   总被引:1,自引:0,他引:1  
周猛  童春发  施季森 《遗传学报》2007,34(6):555-561
遗传密码子的简并性特征造成了不同物种使用的密码子存在偏爱性。了解不同物种的密码子使用特点,可以为外源基因导入过程中的基因改造提供依据,从而实现外源基因的高效表达。杨树是世界上广泛栽培的重要造林树种之一,已经成为林木基因工程研究的模式植物。本研究采用高频密码子分析法,对美洲山杨P.tremuloides,毛白杨P.tomentosa,美洲黑杨P.deltoids和毛果杨P.trichocarpa 4种杨树的蛋白质编码基因序列(CDS)进行了分析,计算出了杨树同义密码子相对使用频率(RFSC),确定了4种杨树的高频率密码子,发现虽然不同种类的杨树密码子使用上有一些差别,但是偏爱密码子的差别却很小,共性的密码子占绝大多数。仅有Pro,Thr和Cys等少数几个氨基酸的偏爱密码子有差别。这种“共性”提示我们,用不同种的杨树中任何一种杨树的偏爱密码子所设计的外源基因在其他杨树中也可以使用。  相似文献   

11.
大肠杆菌trpBA基因的克隆表达   总被引:1,自引:0,他引:1  
目的:提高大肠杆菌中色氨酸合成酶的表达量和表达活性。方法:利用PCR方法从大肠杆菌K-12的基因组中直接克隆出紧密连锁trpB和trpA基因(简称trpBA),并将其连接到原核表达载体pet22b( )中,得到重组质粒pet22b( )-trp-BA,转化大肠杆菌BL21,IPTG诱导重组蛋白表达,表达产物经SDS-PAGE分析并用比色法测定其活性。结果:凝胶电泳可见PCR扩增产物大小约为2kb,SDS-PAGE鉴定目的蛋白的Mr分别约为29000和44000,色氨酸合成酶α、β亚基分别得到了高效表达,色氨酸合成酶活性提高到对照菌的3.7倍。结论:成功构建了重组质粒pet22b( )-trpBA,色氨酸合成酶的表达量和表达活性在大肠杆菌中得到了提高,为高产色氨酸基因工程菌的构建奠定基础。  相似文献   

12.
The regularities of gamma-induced excision of transposon Tn10 in different rec-strains of E. coli cells after gamma-irradiation have been studied. The survival of cells and relative frequency of the Tn10 elimination as a function of the 137Cs gamma-radiation doses were investigated. RecN and recA-mutants of E. coli were used for study of the role of rec-genes in the gamma-induced transposon excision. It was shown that the induced excision in the recN mutant was reduced. The transposon excision in the recA mutant was not revealed. The obtained results let to conclude that recA, and recN genes are involved not only in DNA repair processes but also in the gamma-induced transposon excision in bacteria.  相似文献   

13.
F P Lindberg  B Lund    S Normark 《The EMBO journal》1984,3(5):1167-1173
Most pyelonephritic Escherichia coli strains bind to digalactoside-containing glycolipids on uroepithelial cells. Purified Pap pili (pili associated with pyelonephritis) show the same binding specificity. A non-polar mutation early in the papA pilin gene abolishes formation of Pap pili but does not affect the degree of digalactoside-specific hemagglutination. Three novel pap genes, papE , papF and papG are defined in this report. The papF and papG gene products are both required for digalactoside-specific agglutination by whole bacteria cells as well as for agglutination by pilus preparations. Pili prepared from a papE mutant have lost their binding ability although whole cells from this mutant retain it, implying an adhesin anchoring role for the papE gene product. A mutant with lesions both in the papA and the papE genes does not mediate digalactoside-specific agglutination. The implications of this finding for pilus biogenesis are discussed.  相似文献   

14.
We have examined codon bias in 20 Brassica gene sequences collected from the literature. A comparison with the codon usage profile derived from 207 plant genes showed that Brassica genes distinctly differ from the plant genes with respect to Gly, Asp, Arg, lie, Try, Thr, Leu and Gin. Codon preferences for various amino acids did not differ among the three Brassica species, B. napus, B. oleracea and B. campestris considered in the present analysis. G ending codons for Thr, Ala, Pro and Ser are avoided by Brassica genes as in plant genes, in general. However, the avoidance of CG and TA doublets in Brassica genes is less than that observed in plant genes.  相似文献   

15.
Fran Supek  Tomislav ?muc 《Genetics》2010,185(3):1129-1134
A recent investigation concluded that codon bias did not affect expression of green fluorescent protein (GFP) variants in Escherichia coli, while stability of an mRNA secondary structure near the 5′ end played a dominant role. We demonstrate that combining the two variables using regression trees or support vector regression yields a biologically plausible model with better support in the GFP data set and in other experimental data: codon usage is relevant for protein levels if the 5′ mRNA structures are not strong. Natural E. coli genes had weaker 5′ mRNA structures than the examined set of GFP variants and did not exhibit a correlation between the folding free energy of 5′ mRNA structures and protein expression.IN genomes, natural selection may act on silent sites of codons to make translation of highly expressed genes more efficient, an effect linked primarily to abundances of tRNA isoacceptor molecules (Ikemura 1985; Bulmer 1987; Kanaya et al. 1999). Codon choice may also be linked to formation of secondary structures in mRNA that reduce protein levels, as has been shown with haplotypes of the human COMT gene (Nackley et al. 2006). Kudla et al. (2009) have recently reported an experiment that contributes toward understanding how synonymous codon usage shapes gene expression. They have constructed a library of 154 synthetic variants of a green fluorescent protein (GFP) gene that varied randomly at synonymous sites while retaining the original amino acid sequence. The authors concluded that codon usage (CU) bias did not correlate with protein levels measured as fluorescence of the GFP, but also that the minimum free energy of a mRNA secondary structure in a 42-nucleotide region at [−4,37] that overlaps the start codon (“hairpin stability”) bears a great significance. CU bias was quantified by the widely used codon adaptation index (CAI) method (Sharp and Li 1987), essentially a measure of the distance of a gene''s codon usage to the codon usage of a predefined set of highly expressed genes. The CAI and some of its more recent alternatives, such as measure independent of length and composition (MILC) (Supek and Vlahovicek 2005), have been shown to be a viable surrogate for gene expression in various unicellular organisms. Additionally, in a multiple linear regression of rank fluorescence against a number of sequence-derived attributes, including CAI and the abovementioned hairpin stability, Kudla et al. (2009) did not find CAI to contribute significantly toward the prediction of protein levels, in contrast to the hairpin stability.

Both the codon adaptation index and the 5′ mRNA secondary structures influence protein levels in the Kudla et al. data:

The described statistical analyses, however, failed to address the case in which a nonlinear three-way dependency between hairpin stability, codon usage, and fluorescence might exist; data are visualized in Figure 1, A–C, and in figure 2B in Kudla et al. Such complex patterns in data are readily captured by the support vector machines (SVM) algorithm, reviewed in Noble (2006) and Ben-Hur et al. (2008). We have employed the SVM with a radial basis function kernel to regress fluorescence against both hairpin stability and CAI simultaneously (Figure 1B) and computed the Pearson''s correlation coefficient in cross-validation (here denoted as Q) between true and predicted values of fluorescence (See File S1). A linear model based solely on hairpin stability as employed by Kudla et al. (Figure 1A) can explain Q2 = 38.6% of variance in protein levels, while the nonlinear SVM regression that takes CAI into account explains Q2 = 52.2% of variance. The difference in Q is statistically significant at P = 10−190 (paired t-test). Note that Kudla et al. utilize the Spearman rank correlation coefficient (ρ) in their article; the hairpin stability would explain ρ2 = 44.6% of the variance in expression levels if the requirement for a linear relationship was abandoned in this manner.Open in a separate windowFigure 1.—Regression of protein levels against folding free energy of an mRNA hairpin at nucleotides −4 through 37 (A), against the hairpin free energy and the codon adaptation index (Sharp and Li 1987) (B and C), or against the hairpin free energy and the codon frequencies (D and E). The colors show the measured protein levels, while the background shading reflects the protein levels predicted by the specific model. (A) Predictions by linear regression. (B and E) Predictions by a support vector machine with a radial basis function kernel. (C) Predictions by an M5′ regression tree. (D) A schematic of the M5′ model, where coefficients in the terminal nodes are derived from data where protein levels, all codon frequencies, and hairpin free energies were normalized to [0,1] to facilitate comparison between the influence of codons, the hairpin stability, and the constant in the regression equation. All coefficients ≥0.1 are in boldface type. In the plots (A–C and E), a slight amount of random “jitter” was introduced to the point positions (at most, 3% of the range of each axis) to better visualize overlapping points. In the plot in E, a single outlying point is not shown. See Figure S2 for the same plots without jitter and with the outlier in E included. R2 is the squared Pearson''s correlation coefficient between actual and model-predicted protein levels; Q2 is similar, but obtained in cross-validation (10-fold, 100 runs), and is a more conservative estimate of regression accuracy.Open in a separate windowFigure 2.—The distributions of RNA folding free energies of a 42-nucleotide window in the mRNA between positions −4 and +37, where the “A” in the “AUG” start codon has index zero. The distributions are shown separately for the 154 gene variants from Kudla et al. (2009) and for the genes from the E. coli K12 genome. The dotted line indicates the 5th percentile of the E. coli values at −10.9 kcal/mol.Compared to the SVM, a more interpretable generalization of the data can be achieved by a different nonlinear regression approach, the M5′ tree (Wang and Witten 1997), which recursively divides the data to reduce the variance of the dependant variable within each partition and then builds separate linear models for the partitions. The resulting regression tree (Figure 1C; supporting information, Figure S1) better explains the correlation between protein levels on one side and hairpin stability and CAI on the other side when compared to a linear model employed by Kudla et al. that regresses protein levels against hairpin stability only [see figure 2B in Kudla et al. (2009) and Figure 1A]; 9.3% more variance is explained by the M5′, P = 10−91 (paired t-test). An interpretation that follows from the general structure of the M5′ tree (Figure S1) is that, at high mRNA hairpin stability, protein levels will generally be quite low and not dependant on CAI; in contrast, with less stable mRNA hairpins, both hairpin stability and CAI play a role in determining protein levels. In the interpretation of the M5′ tree structure, we would place less emphasis on the exact coefficients of the linear models in the leaves because the reliability of these fine-grained features of the M5′ model can strongly depend on the good coverage of all parts of the mRNA–CAI space data points.

The CAI may not be an optimal summary of codon usage for predicting expression of overexpressed genes:

Regarding use of CAI in the present context, it should be noted that CAI''s original purpose was to serve as a proxy for gene expression in conditions of abundance that result in fast growth in the organism''s environmental niche. The CAI or related approaches (Supek and Vlahovicek 2005) may not, however, be an ideal representation of codon usage when examining overexpression of a foreign protein at levels that exceed the natural abundances of the host''s most highly expressed proteins. This was indeed shown to be the case in a recent article by Welch et al. (2009) in which the authors reported an experiment with heterologous expression of variants of two proteins in E. coli: an antibody fragment and a phage DNA polymerase. Welch et al. found that codon frequencies in general, but not CAI specifically, correlated well with protein levels and postulated that for overexpressed proteins optimal codons would correspond to the codons translated efficiently under amino acid starvation (Elf et al. 2003; Dittmar et al. 2005). Analogously to Welch et al., we now apply our regression algorithms not to the CAI, but directly to the codon frequencies that CAI attempts to summarize in the Kudla et al. data (See File S2). An M5′ regression tree trained on the hairpin stability and codon frequencies (Figure 1D) explains 10.6% more variance (P = 10−83, paired t-test) in protein levels than an M5′ tree trained on hairpin stability and CAI (Figure 1C, Figure S1). A SVM regression model trained on the hairpin stability and a simple linear combination of selected codon frequencies (Figure 1E) explains 8.8% more variance (P = 10−82, paired t-test) than the SVM that uses CAI (Figure 1B). An SVM trained on the hairpin stability and the full set of codon frequencies (not shown in Figure 1) explains Q2 = 65.0% of variance in the protein abundances, a sizable increase (P ≈ 10−260, paired t-test) compared to a linear regression on solely the [−4,37] hairpin stability (Q2 = 38.6%) as originally employed by Kudla et al. and also as compared to a set of randomized controls (Q2 = 20.1–30.7%; Table S1). Therefore, not relying on a predefined notion of codon optimality—as embodied in the CAI—further strengthens the argument that the correlation of CU and protein levels is far from negligible in this data set.Additionally, we found some correlation between codon frequencies and 5′ mRNA hairpin stability in the Kudla et al. gene variants (Figure S4). The fact that the two factors were not completely independent adds weight to the relevance of CU to protein levels since one could not be certain that even the variance in protein levels explained by 5′ mRNA structures is wholly due to the structures themselves and not to the confounding variables—here, the codon frequencies.The M5′ tree trained on codon frequencies (Figure 1D) follows the same general structure as the M5′ tree trained on the CAI (Figure S1) where the codon frequencies become relevant with mRNA hairpins weaker than −9.75 kcal/mol, while with stronger [−4,37] mRNA hairpins protein levels are generally low. Our interpretation is that the lack of a stable secondary structure that could obstruct translational initiation is a necessary but not a sufficient condition for high protein expression. When the initiation phase is unhindered, the bottleneck would shift to the elongation phase in which codon optimality plays an important role. In the literature, theoretical models of translation may consider either the initiation (Bulmer 1991) or the elongation phase (Xia 1998) as the rate-limiting step of translation under physiological conditions; we are not aware of such analyses describing translation of artificially overexpressed genes.The codons identified as relevant by our M5′ model of the Kudla et al. data are different from, but not inconsistent with, those proposed by Welch et al. (Table S2). We anticipate that the rules for codon optimality for overexpression in an Escherichia coli host will become better defined as more large-scale experiments, such as the two discussed here (Kudla et al. 2009; Welch et al. 2009), are carried out.

The “RNA structure + codon usage” model agrees with independent experimental data and is robust to removal of extreme values:

Our reanalysis of the Kudla et al. data should be viewed in light of the conclusions of Welch et al. (2009) who find that codon usage, but not the 5′ hairpin stability, correlates with protein levels in their data, while noting that their gene variants generally have considerably weaker 5′ mRNA hairpins than the sequences in Kudla et al. Welch et al. reconcile the different outcomes of the two experiments by noting that “inhibition of initiation by especially strong mRNA structure would obscure effects resulting from factors that influence elongation, such as codon usage” (page 9). Here we propose that precisely the same model can be derived solely from the Kudla et al. data. Furthermore, we find that the 154 gene variants from Kudla et al. indeed do have unusually stable 5′ mRNA hairpins (mean free energy = −9.68 kcal/mol) in comparison to natural E. coli genes (mean free energy = −6.15 kcal/mol) (P = 10−38 by Mann–Whitney U-test; see Figure 2). The part of the distribution of Kudla et al. gene variants that overlaps with the bulk of the E. coli genes, with 5′ mRNA hairpin free energies lower than ∼ −10 kcal/mol, corresponds to the range where our M5′ model indicates a stronger influence of CU on protein levels (Figure S1, Figure 1D).We investigate to what extent the presence of a group of sequences extreme in their 5′ mRNA hairpin stabilities in the Kudla et al. data set (left peak in Figure 2) influenced the authors'' conclusion that the hairpin stabilities have an overarching influence on protein levels. After removing the sequences below the 5th percentile of the E. coli natural hairpin stabilities (−10.9 kcal/mol), we were left with 109 of the original 154 Kudla et al. sequences. The accuracy of regressing protein levels against mRNA hairpin stability deteriorates greatly (Q2 = 18.5%) after removing the 45 sequences, but less so with SVM and M5′ regression that take into account both CU and the hairpin stability (udla et al. basically captured the difference between these extreme cases—in which very strong 5′ mRNA secondary structures blocked expression—and all other sequences. However, to explain the variation in protein levels within the nonextreme set, hairpin stabilities by themselves are not sufficient and need to be complemented with CU.

TABLE 1

Accuracy of the regression of protein levels against 5′ mRNA hairpin stability or against 5′ mRNA hairpin stability and codon frequencies
Data setLinear regression, hairpin stability only (%)SVM, hairpin stability + codon frequencies (%)M5′, hairpin stability + codon frequencies (%)
Full (n = 154)38.665.056.7
No strong hairpins (n = 109)18.553.040.4
Open in a separate windowThe cross-validation correlation coefficient squared (Q2) is compared with the full Kudla et al. data set (154 proteins) and the reduced data set (109 proteins) where mRNA hairpin folding energies are ≥ −10.9 kcal/mol, the 5th percentile of natural E. coli genes.In addition to measuring protein levels in the 154-sequence data set, Kudla et al. performed an additional experiment where an unstructured 28-codon tag was fused to 5′ ends of 72 (of 154) GFP sequence variants. Adding the tag was found to enhance protein levels, supporting the conclusion of Kudla et al. that 5′ structure of mRNA had a strong influence on protein production. After an analysis of the data, we found (see File S3) that data from this specific experiment are not well suited to serve as a direct verification of our existing M5′ and SVM regression models. Still, we can compare the protein level predictions of our existing SVM model on the same set of sequences before and after adding the unstructured tag. We found that the predicted expression levels have increased for 67 of 72 sequences (Table S3) after adding the tag that fixes 5′ mRNA folding energy at a weak −6.1 kcal/mol, a result consistent with the Kudla et al. experiment. Additionally, we have trained a new SVM regression model only on the tagged 72-sequence set (See File S2) and found that, within this set, SVM regression can again predict GFP levels solely from codon usage (5′ mRNA structure is invariant among these sequences) at Q2 = 37.7%. This amount of variance is similar, or even somewhat larger than, the difference in the variance explained by mRNA vs. mRNA+codons (38.6% vs. 65.0%) in the original data. Therefore, codon usage is of similar importance in shaping the protein levels within the tagged 72-sequence set, as it was in the original 154-sequence set.

mRNA 5′ end secondary structure stabilities do not correlate with protein levels for natural E.

coli genes: To further verify our proposed model, we analyzed the relative contributions of mRNA hairpin stabilities and CU on expression levels of natural E. coli genes (See File S2). If the hairpin stabilities were limiting for expression in the range of folding free energies spanned by the E. coli mRNAs, one would expect to see a correlation between the free energy of mRNA 5′ end folding and the abundance of the corresponding protein. We found no such correlation using the folding free energies of the [−4,37] mRNA region (Figure 3) or equal-sized regions centered around the start codon at [−20,21] or on the expected location of a Shine–Dalgarno sequence (Shultzaberger et al. 2001) at [−30,11] (see Figure S3). Unsurprisingly, CAI correlated well with protein levels (Figure 3) in all examined experimental data sets (Lopez-Campistrous et al. 2005; Lu et al. 2007; Ishihama et al. 2008). Therefore, within the boundaries of the mRNA folding free energies spanned by E. coli genes, the CU plays a dominant role in shaping gene expression (or the CU may possibly be shaped by the expression; see Concluding remarks). As for the stronger mRNA hairpins with < −11 kcal/mol, they are present in the Kudla et al. data, but are very rare in the E. coli genome, which could be explained by one of two scenarios: (i) Above a certain threshold, the mRNA hairpin stability may become so detrimental to expression that all the mutants having such hairpins are subject to very strong negative selection and therefore are absent from the genome. And/or (ii) the Kudla et al. data set may not be representative of the genes in the E. coli genome or the mutational processes they undergo; for example, the amino acid sequence of the GFP''s beginning might be unusually conducive to forming RNA hairpins. Unless further analyses prove differently, it seems reasonable to surmise that in natural E. coli genes mRNA secondary structures would shape expression if they were highly stable, consistent with the finding of a universal (albeit not particularly strong) trend toward avoidance of 5′ mRNA structures in genomes (Gu et al. 2010). However, it can also be concluded at this point—and with more confidence—that at lower secondary structure stabilities the CU has an overarching influence on expression. Such a model of expression-related gene sequence determinants in E. coli is fully consistent with our interpretation of the M5′ regression tree that we have derived from the Kudla et al. data.Open in a separate windowFigure 3.—Correlations between the E. coli absolute protein abundances measured in three independent experiments (Lopez-Campistrous et al. 2005; Lu et al. 2007; Ishihama et al. 2008) and the codon adaptation index (CAI) or the free energy of folding of a secondary structure in the mRNA [−4,37] region (in kcal/mol; more negative values denote a more stable RNA secondary structure). “ρ” is the Spearman''s rank correlation coefficient.

Concluding remarks:

We argue that Kudla et al. worked with a set of gene sequences in which strong mRNA secondary structures (that effectively abolished expression) were frequent enough to mask the relevance of codon frequencies on protein levels when examined only with linear regression methods. While mRNA secondary structures can certainly occur when designing synthetic genes, it is highly questionable to what extent Kudla et al.''s conclusion that CU is of little importance for expression would be generally valid for biotechnological applications, especially since we have shown that the influence of CU is nevertheless present even in the Kudla et al. data. What is beyond doubt, however, is that a strong 5′ mRNA secondary structure can be a roadblock in heterologous expression, and therefore the synthetic gene variants harboring such structures should be avoided. The more specific rules regarding the exact location of the hairpin on the gene sequence, the hairpin''s length, or the tolerable levels of folding free energy will have to be established by further experimentation.A recent algorithm for estimating the efficiency of ribosomal binding sites from the mRNA sequence (Salis et al. 2009) explicitly takes into account the folding free energy of RNA secondary structures, along with other factors. When protein overexpression is desired, the conclusions of Welch et al. and (by our reanalysis) the Kudla et al. data indicate that CU should be optimized in addition to the ribosome binding site sequence to ensure that both initiation and elongation phases of translation are free of impediments.On the basis of their results, Kudla et al. also discuss the evolutionary link between the CU of natural genes and the expression levels of proteins for which they code. They propose that selection for translational efficiency acts at a global level in cells; the codons that accelerate elongation would be preferred in a highly expressed gene not because they facilitate production of that particular protein, but to free up ribosomes for the rate-determining initiation phase of translation of the total cellular mRNA pool. Effectively, the flow of causality between CU and expression would be reversed in comparison to the established view. This hypothesis should be critically reevaluated because it depends on the assertion that manipulating a gene''s CU cannot cause protein levels to increase, an assertion poorly supported by the Kudla et al. data.  相似文献   

16.
利用RT-PCR技术获得大豆过氧化氢酶GmCAT3和GmCAT5基因的cDNA片断,序列长为1 492 bp,编码492个氨基酸,相对分子质量分别为56.9和56.7 kD,等电点(pI)都为6.77,同时预测了蛋白质的二级和高级结构;将获得的目的片段连接到pBV220表达载体上,转化大肠杆菌BL21(DE3)进行诱导表达,经SDS-PAGE电泳分析,在42℃诱导6 h的条件下,蛋白的表达量最佳,诱导的目的蛋白在57 kD处主要以包涵体的形式存在,获得重组蛋白占总蛋白的百分比分别是47%和35%,为近一步研究大豆过氧化氢酶的结构和功能提供了实验基础.  相似文献   

17.
Autotransporter (AT) protein-encoding genes of diarrheagenic Escherichia coli (DEC) pathotypes (cah, eatA, ehaABCDJ, espC, espI, espP, pet, pic, sat, and tibA) were detected in typical and atypical enteropathogenic E. coli (EPEC) in frequencies between 0.8% and 39.3%. Although these ATs have been described in particular DEC pathotypes, their presence in EPEC indicates that they should not be considered specific virulence markers.  相似文献   

18.
Biofilm forming cells are distinctive from the well-investigatedplanktonic cells and exhibit a different type of gene expression.Several new Escherichia coli genes related to biofilm formationhave recently been identified through genomic approaches suchas DNA microarray analysis. However, many others involved inthis process might have escaped detection due to poor expression,regulatory mechanism, or genetic backgrounds. Here, we screeneda collection of single-gene deletion mutants of E. coli named‘Keio collection’ to identify genes required forbiofilm formation. Of the 3985 mutants of non-essential genesin the collection thus examined, 110 showed a reduction in biofilmformation nine of which have not been well characterized yet.Systematic and quantitative analysis revealed the involvementof genes of various functions and reinforced the importancein biofilm formation of the genes for cell surface structuresand cell membrane. Characterization of the nine mutants of function-unknowngenes indicated that some of them, such as yfgA that geneticallyinteracts with a periplasmic chaperone gene surA together withyciB and yciM, might be required for the integrity of outermembrane.  相似文献   

19.
To synthesize a protein, a ribosome moves along a messenger RNA (mRNA), reads it codon by codon, and takes up the corresponding ternary complexes which consist of aminoacylated transfer RNAs (aa-tRNAs), elongation factor Tu (EF-Tu), and GTP. During this process of translation elongation, the ribosome proceeds with a codon-specific rate. Here, we present a general theoretical framework to calculate codon-specific elongation rates and error frequencies based on tRNA concentrations and codon usages. Our theory takes three important aspects of in-vivo translation elongation into account. First, non-cognate, near-cognate and cognate ternary complexes compete for the binding sites on the ribosomes. Second, the corresponding binding rates are determined by the concentrations of free ternary complexes, which must be distinguished from the total tRNA concentrations as measured in vivo. Third, for each tRNA species, the difference between total tRNA and ternary complex concentration depends on the codon usages of the corresponding cognate and near-cognate codons. Furthermore, we apply our theory to two alternative pathways for tRNA release from the ribosomal E site and show how the mechanism of tRNA release influences the concentrations of free ternary complexes and thus the codon-specific elongation rates. Using a recently introduced method to determine kinetic rates of in-vivo translation from in-vitro data, we compute elongation rates for all codons in Escherichia coli. We show that for some tRNA species only a few tRNA molecules are part of ternary complexes and, thus, available for the translating ribosomes. In addition, we find that codon-specific elongation rates strongly depend on the overall codon usage in the cell, which could be altered experimentally by overexpression of individual genes.  相似文献   

20.
《Journal of molecular biology》2019,431(13):2434-2441
Usage of sequential codon-pairs is non-random and unique to each species. Codon-pair bias is related to but clearly distinct from individual codon usage bias. Codon-pair bias is thought to affect translational fidelity and efficiency and is presumed to be under the selective pressure. It was suggested that changes in codon-pair utilization may affect human disease more significantly than changes in single codons. Although recombinant gene technologies often take codon-pair usage bias into account, codon-pair usage data/tables are not readily available, thus potentially impeding research efforts. The present computational resource (https://hive.biochemistry.gwu.edu/review/codon2) systematically addresses this issue. Building on our recent HIVE-Codon Usage Tables, we constructed a new database to include genomic codon-pair and dinucleotide statistics of all organisms with sequenced genome, available in the GenBank. We believe that the growing understanding of the importance of codon-pair usage will make this resource an invaluable tool to many researchers in academia and pharmaceutical industry.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号