首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The genetic code is universal, but recombinant protein expression in heterologous systems is often hampered by divergent codon usage. Here, we demonstrate that reprogramming by standardized multi‐parameter gene optimization software and de novo gene synthesis is a suitable general strategy to improve heterologous protein expression. This study compares expression levels of 94 full‐length human wt and sequence‐optimized genes coding for pharmaceutically important proteins such as kinases and membrane proteins in E. coli. Fluorescence‐based quantification revealed increased protein yields for 70% of in vivo expressed optimized genes compared to the wt DNA sequences and also resulted in increased amounts of protein that can be purified. The improvement in transgene expression correlated with higher mRNA levels in our analyzed examples. In all cases tested, expression levels using wt genes in tRNA‐supplemented bacterial strains were outperformed by optimized genes expressed in non‐supplemented host cells.  相似文献   

2.
3.
The nucleotide sequence running from the genetic left end of bacteriophage T7 DNA to within the coding sequence of gene 4 is given, except for the internal coding sequence for the gene 1 protein, which has been determined elsewhere. The sequence presented contains nucleotides 1 to 3342 and 5654 to 12,100 of the approximately 40,000 base-pairs of T7 DNA. This sequence includes: the three strong early promoters and the termination site for Escherichia coli RNA polymerase: eight promoter sites for T7 RNA polymerase; six RNAase III cleavage sites; the primary origin of replication of T7 DNA; the complete coding sequences for 13 previously known T7 proteins, including the anti-restriction protein, protein kinase, DNA ligase, the gene 2 inhibitor of E. coli RNA polymerase, single-strand DNA binding protein, the gene 3 endonuclease, and lysozyme (which is actually an N-acetylmuramyl-l-alanine amidase); the complete coding sequences for eight potential new T7-coded proteins; and two apparently independent initiation sites that produce overlapping polypeptide chains of gene 4 primase. More than 86% of the first 12,100 base-pairs of T7 DNA appear to be devoted to specifying amino acid sequences for T7 proteins, and the arrangement of coding sequences and other genetic elements is very efficient. There is little overlap between coding sequences for different proteins, but junctions between adjacent coding sequences are typically close, the termination codon for one protein often overlapping the initiation codon for the next. For almost half of the potential T7 proteins, the sequence in the messenger RNA that can interact with 16 S ribosomal RNA in initiation of protein synthesis is part of the coding sequence for the preceding protein. The longest non-coding region, about 900 base-pairs, is at the left end of the DNA. The right half of this region contains the strong early promoters for E. coli RNA polymerase and the first RNAase III cleavage site. The left end contains the terminal repetition (nucleotides 1 to 160), followed by a striking array of repeated sequences (nucleotides 175 to 340) that might have some role in packaging the DNA into phage particles, and an A · T-rich region (nucleotides 356 to 492) that contains a promoter for T7 RNA polymerase, and which might function as a replication origin.  相似文献   

4.
Over the past decade, evidence has accumulated that new protein‐coding genes can emerge de novo from previously non‐coding DNA. Most studies have focused on large scale computational predictions of de novo protein‐coding genes across a wide range of organisms. In contrast, experimental data concerning the folding and function of de novo proteins are scarce. This might be due to difficulties in handling de novo proteins in vitro, as most are short and predicted to be disordered. Here, we propose a guideline for the effective expression of eukaryotic de novo proteins in Escherichia coli. We used 11 sequences from Drosophila melanogaster and 10 from Homo sapiens, that are predicted de novo proteins from former studies, for heterologous expression. The candidate de novo proteins have varying secondary structure and disorder content. Using multiple combinations of purification tags, E. coli expression strains, and chaperone systems, we were able to increase the number of solubly expressed putative de novo proteins from 30% to 62%. Our findings indicate that the best combination for expressing putative de novo proteins in E. coli is a GST‐tag with T7 Express cells and co‐expressed chaperones. We found that, overall, proteins with higher predicted disorder were easier to express.StatementToday, we know that proteins do not only evolve by duplication and divergence of existing proteins but also arise from previously non‐coding DNA. These proteins are called de novo proteins. Their properties are still poorly understood and their experimental analysis faces major obstacles. Here, we aim to present a starting point for soluble expression of de novo proteins with the help of chaperones and thereby enable further characterization.  相似文献   

5.
We developed a system to monitor the transfer of heterologous DNA from a genetically manipulated strain of Saccharomyces cerevisiae to Escherichia coli. This system is based on a yeast strain that carries multiple integrated copies of a pUC-derived plasmid. The bacterial sequences are maintained in the yeast genome by selectable markers for lactose utilization. Lysates of the yeast strain were used to transform E. coli. Transfer of DNA was measured by determining the number of ampicillin-resistant E. coli clones. Our results show that transmission of the Ampr gene to E. coli by genetic transformation, caused by DNA released from the yeast, occurs at a very low frequency (about 50 transformants per μg of DNA) under optimal conditions (a highly competent host strain and a highly efficient transformation procedure). These results suggest that under natural conditions, spontaneous transmission of chromosomal genes from genetically modified organisms is likely to be rare.  相似文献   

6.
The cDNA sequence for human renin was modified for use in the expression of the mature protein in E. coli. This was accomplished by the removal of the 5′ untranslated region and sequences coding for the signal peptide and a portion of the mature protein. An oligonucleotide linker was inserted which supplied the deleted coding information for the mature protein in a form optimized for translation in E. coli, in addition to an initiation codon. The modified gene was cloned into an expression vector consisting of the promoter from the tryptophan operon of E. coli and trp L Shine-Dalgarno sequence. In an appropriate host strain the expressed protein is the most prominent species present, and accounts for at least 10% of the total cellular protein. The expressed protein was verified to be renin by its molecular weight, ability to bind a renin antibody, and N-terminal amino acid sequence.  相似文献   

7.

Background

Production of proteins as therapeutic agents, research reagents and molecular tools frequently depends on expression in heterologous hosts. Synthetic genes are increasingly used for protein production because sequence information is easier to obtain than the corresponding physical DNA. Protein-coding sequences are commonly re-designed to enhance expression, but there are no experimentally supported design principles.

Principal Findings

To identify sequence features that affect protein expression we synthesized and expressed in E. coli two sets of 40 genes encoding two commercially valuable proteins, a DNA polymerase and a single chain antibody. Genes differing only in synonymous codon usage expressed protein at levels ranging from undetectable to 30% of cellular protein. Using partial least squares regression we tested the correlation of protein production levels with parameters that have been reported to affect expression. We found that the amount of protein produced in E. coli was strongly dependent on the codons used to encode a subset of amino acids. Favorable codons were predominantly those read by tRNAs that are most highly charged during amino acid starvation, not codons that are most abundant in highly expressed E. coli proteins. Finally we confirmed the validity of our models by designing, synthesizing and testing new genes using codon biases predicted to perform well.

Conclusion

The systematic analysis of gene design parameters shown in this study has allowed us to identify codon usage within a gene as a critical determinant of achievable protein expression levels in E. coli. We propose a biochemical basis for this, as well as design algorithms to ensure high protein production from synthetic genes. Replication of this methodology should allow similar design algorithms to be empirically derived for any expression system.  相似文献   

8.
Recombinant proteins can be targeted to the Escherichia coli periplasm by fusing them to signal peptides. The popular pET vectors facilitate fusion of target proteins to the PelB signal. A systematic comparison of the PelB signal with native E. coli signal peptides for recombinant protein expression and periplasmic localization is not reported. We chose the Bacillus stearothermophilus maltogenic amylase (MA), an industrial enzyme widely used in the baking and brewing industry, as a model protein and analyzed the competence of seven, codon-optimized, E. coli signal sequences to translocate MA to the E. coli periplasm compared to PelB. MA fusions to three of the signals facilitated enhanced periplasmic localization of MA compared to the PelB fusion. Interestingly, these three fusions showed greatly improved MA yields and between 18- and 50-fold improved amylase activities compared to the PelB fusion. Previously, non-optimal codon usage in native E. coli signal peptide sequences has been reported to be important for protein stability and activity. Our results suggest that E. coli signal peptides with optimal codon usage could also be beneficial for heterologous protein secretion to the periplasm. Moreover, such fusions could even enhance activity rather than diminish it. This effect, to our knowledge has not been previously documented. In addition, the seven vector platform reported here could also be used as a screen to identify the best signal peptide partner for other recombinant targets of interest.  相似文献   

9.
Long stretches of “rare” codons are known to severely inhibit the efficiency of translation. Understanding the distribution of such rare codons is of critical importance in improving the efficiency of heterologous gene expression systems. Accurate estimates of codon usage take the abundance of each protein into consideration. In this paper, we analyze the correlation between approximate measures of codon usage and the availability of tRNA at various growth rates in E coli. We show that the computationally derived estimates of tRNA isoacceptor concentration enable the finding of poorly translated codons.  相似文献   

10.
A gene encoding a predicted mitochondrially targeted single-stranded DNA binding protein (mtSSB) was identified in the Arabidopsis thaliana genome sequence. This gene (At4g11060) codes for a protein of 201 amino acids, including a 28-residue putative mitochondrial targeting transit peptide. Protein sequence alignment shows high similarity between the mtSSB protein and single-stranded DNA binding proteins (SSB) from bacteria, including residues conserved for SSB function. Phylogenetic analysis indicates a close relationship between this protein and other mitochondrially targeted SSB proteins. The predicted targeting sequence was fused with the GFP coding region, and the organellar localization of the expressed fusion protein was determined. Specific targeting to mitochondria was observed in in-vitro import experiments and by transient expression of a GFP fusion construct in Arabidopsis leaves after microprojectile bombardment. The mature mtSSB coding region was overexpressed in Escherichia coli and the protein was purified for biochemical characterization. The purified protein binds single-stranded, but not double-stranded, DNA. MtSSB stimulates the homologous strand-exchange activity of E. coli RecA. These results indicate that mtSSB is a functional homologue of the E. coli SSB, and that it may play a role in mitochondrial DNA recombination.  相似文献   

11.
The development of codon bias indices (CBIs) remains an active field of research due to their myriad applications in computational biology. Recently, the relative codon usage bias (RCBS) was introduced as a novel CBI able to estimate codon bias without using a reference set. The results of this new index when applied to Escherichia coli and Saccharomyces cerevisiae led the authors of the original publications to conclude that natural selection favours higher expression and enhanced codon usage optimization in short genes. Here, we show that this conclusion was flawed and based on the systematic oversight of an intrinsic bias for short sequences in the RCBS index and of biases in the small data sets used for validation in E. coli. Furthermore, we reveal that how the RCBS can be corrected to produce useful results and how its underlying principle, which we here term relative codon adaptation (RCA), can be made into a powerful reference-set-based index that directly takes into account the genomic base composition. Finally, we show that RCA outperforms the codon adaptation index (CAI) as a predictor of gene expression when operating on the CAI reference set and that this improvement is significantly larger when analysing genomes with high mutational bias.  相似文献   

12.
A unique serpin, kallistatin, displays vasodilatory, antiangiogenic, anti-inflammatory, and antioxidant activity. Difficulty and low efficacy of obtaining recombinant kallistatin limit the wide investigation of its biological and pathological function. The present study employed a codon optimization algorithm to redesign the kallistatin gene and achieved a high yield of recombinant kallistatin protein. The kallistatin codons were redesigned for a more suitable Escherichia coli host without altering amino acids. Base composition and GC% content were compared between synthetic optimized kallistatin (opti-kallistatin) and wild-type kallistatin (wt-kallistatin). Both opti-kallistatin and wt-kallistatin were purified using Ni-NTA His-binding resins through fast protein liquid chromatography (FPLC). The identity and purity of kallistatin were confirmed by Coomassie blue staining, sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE), and Western blot analysis. The output of opti-kallistatin protein was ~2-fold increase (2.09 ± 0.23 mg/L) compared to wt-kallistatin (1.05 ± 0.2 mg/L). These results suggest that more common codon optimization in the E. coli host significantly increases the yield of heterologous human protein yields. This approach will remarkably facilitate the further investigation of kallistatin in vitro and in vivo.  相似文献   

13.
The tendency for repetitiveness of nucleotides in DNA sequences has been reported for a variety of organisms. We show that the tendency for repetitive use of amino acids is widespread and is observed even for segments conserved between human and Drosophila melanogaster at the level of >50% amino acid identity. This indicates that repetitiveness influences not only the weakly constrained segments but also those sequence segments conserved among phyla. Not only glutamine (Q) but also many of the 20 amino acids show a comparable level of repetitiveness. Repetitiveness in bases at codon position 3 is stronger for human than for D.melanogaster, whereas local repetitiveness in intron sequences is similar between the two organisms. While genes for immune system-specific proteins, but not ancient human genes (i.e. human homologs of Escherichia coli genes), have repetitiveness at codon bases 1 and 2, repetitiveness at codon base 3 for these groups is similar, suggesting that the human genome has at least two mechanisms generating local repetitiveness. Neither amino acid nor nucleotide repetitiveness is observed beyond the exon boundary, denying the possibility that such repetitiveness could mainly stem from natural selection on mRNA or protein sequences. Analyses of mammalian sequence alignments show that while the ‘between gene’ GC content heterogeneity, which is linked to ‘isochores’, is a principal factor associated with the bias in substitution patterns in human, ‘within gene’ heterogeneity in nucleotide composition is also associated with such bias on a more local scale. The relationship amongst the various types of repetitiveness is discussed.  相似文献   

14.
该研究以2株野生沙枣(Elaeagnus angustifolia Linn.)嫩枝经温室水培后的嫩叶为材料,采用CTAB法分别提取总DNA,并利用第二代测序技术进行总DNA从头测序,组装后得到2株沙枣叶绿体基因组全序列,并详细分析了其蛋白质编码基因密码子使用的偏好性及其原因,为沙枣叶绿体基因工程和分子系统进化等研究奠定基础。结果显示:(1)组装得到沙枣叶绿体基因组序列全长150 546 bp,由长度为81 113 bp的长单拷贝(LSC)区域和25 494 bp的短单拷贝(SSC)区域,以及1对分隔开它们的长18 445 bp的反向重复序列(IRS)组成;注释共得到132个基因,包括86个蛋白编码基因、38个tRNA基因和8个rRNA基因。(2)沙枣叶绿体基因组蛋白编码基因密码子的第三位碱基GC含量(GC_3)为28.47%,明显低于整个叶绿体基因组GC含量(37%),也低于第一位(GC_1)和第二位(GC_2)碱基的GC含量,说明密码子对AT碱基结尾有偏好性;其中, UCU、CCU、UGU、GCU、CUU、GAU、UCA和UAA为最优密码子。(3)同义密码子相对使用频率(RSCU)分析发现,影响密码子使用模式的因素并不单一,密码子的偏好性受到突变、选择及其他因素的共同影响,并且自然选择表达引起的序列差异比突变对密码子偏好性的影响要显著;中性绘图分析、有效密码子数(ENC-plot)分析和奇偶偏好性(PR2-plot)分析表明,沙枣叶绿体基因组使用密码子的偏性受选择的影响更大。(4)通过最大似然法、最大简约法和贝叶斯方法对胡颓子科6个物种和1个枣的叶绿体基因序列构建系统发育树,与它们使用密码子偏性聚类的结果一致,表明叶绿体基因组使用密码子偏性与物种的亲缘关系相关。  相似文献   

15.
Gene synthesis is getting more important with the growing availability of low-cost commercial services. The coding sequences are often “optimized” as for the relative synonymous codon usage (RSCU) before synthesis, which is generally included in the commercial services. However, the codon optimization processes are different among different providers and are often hidden from the users. Here, the d'Hondt method, which is widely adopted as a method for determining the number of seats for each party in proportional-representation public elections, is applied to RSCU fitting. This allowed me to make a set of electronic spreadsheets for manual design of protein coding sequences for expression in Escherichia coli, with which users can see the process of codon optimization and can manually edit the codons after the automatic optimization. The spreadsheets may also be useful for molecular biology education  相似文献   

16.
The yield of human alpha 2b interferon in Escherichia coli was optimized by replacement of low-usage arginine codons located in the mRNA 5′ end. The differences observed among the various gene variants suggest that codon usage, Shine-Dalgarno-like sequences, and mRNA secondary structure contribute to the performance of E. coli translation machinery.  相似文献   

17.
Summary The complete DNA sequence of theMicrococcus luteus spectinomycin (spc) operon and its adjacent regions has been determined. The sequence has revealed the presence of genes that are homologous to those of theEscherichia coli ribosomal and related proteins, L14, L24, L5, S8, L6, L18, S5, L30, L15, and secretion protein Y (secY), and the gene for adenylate kinase (adk). The gene arrangement in the spc operon is essentially the same as that ofE. coli except for the absence in theM. luteus spc operon of the genes for S14 and X protein that exist in theE. coli spc operon.SecY andadk seem to be composed of another operon (adk operon) with at least an open reading frame. The deduced amino acid sequences for these ribosomal proteins are well conserved among the two species (40–65% identity). Reflecting the high genomic guanine and cytosine (GC) content ofM. luteus (74%), the codon usage of the genes is extremely biased toward use of G and C, about 94% of the codon third positions being G or C. Seven codons, AUA, AAA, AGA, UUA, GUA, CUA, and CAA, all of which have A at the codon third positions, are completely absent in theM. luteus genes examined. Out of 11 genes in theM. luteus spc and adk operons, 5 (10) use GUG (UGA) and 6 (1) use AUG (UAA) as an initiation (termination) codon.  相似文献   

18.

Background

The construction of customized nucleic acid sequences allows us to have greater flexibility in gene design for recombinant protein expression. Among the various parameters considered for such DNA sequence design, individual codon usage (ICU) has been implicated as one of the most crucial factors affecting mRNA translational efficiency. However, previous works have also reported the significant influence of codon pair usage, also known as codon context (CC), on the level of protein expression.

Results

In this study, we have developed novel computational procedures for evaluating the relative importance of optimizing ICU and CC for enhancing protein expression. By formulating appropriate mathematical expressions to quantify the ICU and CC fitness of a coding sequence, optimization procedures based on genetic algorithm were employed to maximize its ICU and/or CC fitness. Surprisingly, the in silico validation of the resultant optimized DNA sequences for Escherichia coli, Lactococcus lactis, Pichia pastoris and Saccharomyces cerevisiae suggests that CC is a more relevant design criterion than the commonly considered ICU.

Conclusions

The proposed CC optimization framework can complement and enhance the capabilities of current gene design tools, with potential applications to heterologous protein production and even vaccine development in synthetic biotechnology.  相似文献   

19.

Background  

The expression of heterologous proteins in Escherichia coli is strongly affected by codon bias. This phenomenon occurs when the codon usage of the mRNA coding for the foreign protein differs from that of the bacterium. The ribosome pauses upon encountering a rare codon and may detach from the mRNA, thereby the yield of protein expression is reduced. Several bacterial strains have been engineered to overcome this effect. However, the increased rate of translation may lead to protein misfolding and insolubilization. In order to prove this assumption, the solubility of several recombinant proteins from plants was studied in a codon bias-adjusted E. coli strain.  相似文献   

20.
The evolutionary selection forces acting on a protein are commonly inferred using evolutionary codon models by contrasting the rate of synonymous to nonsynonymous substitutions. Most widely used models are based on theoretical assumptions and ignore the empirical observation that distinct amino acids differ in their replacement rates. In this paper, we develop a general method that allows assimilation of empirical amino acid replacement probabilities into a codon-substitution matrix. In this way, the resulting codon model takes into account not only the transition-transversion bias and the nonsynonymous/synonymous ratio, but also the different amino acid replacement probabilities as specified in empirical amino acid matrices. Different empirical amino acid replacement matrices, such as secondary structure-specific matrices or organelle-specific matrices (e.g., mitochondria and chloroplasts), can be incorporated into the model, making it context dependent. Using a diverse set of coding DNA sequences, we show that the novel model better fits biological data as compared with either mechanistic or empirical codon models. Using the suggested model, we further analyze human immunodeficiency virus type 1 protease sequences obtained from drug-treated patients and reveal positive selection in sites that are known to confer drug resistance to the virus.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号