首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

Analyzing the amino acid sequence of an intrinsically disordered protein (IDP) in an evolutionary context can yield novel insights on the functional role of disordered regions and sequence element(s). However, in the case of many IDPs, the lack of evolutionary conservation of the primary sequence can hamper the study of functionality, because the conservation of their disorder profile and ensuing function(s) may not appear in a traditional analysis of the evolutionary history of the protein.

Results

Here we present DisCons (Disorder Conservation), a novel pipelined tool that combines the quantification of sequence- and disorder conservation to classify disordered residue positions. According to this scheme, the most interesting categories (for functional purposes) are constrained disordered residues and flexible disordered residues. The former residues show conservation of both the sequence and the property of disorder and are associated mainly with specific binding functionalities (e.g., short, linear motifs, SLiMs), whereas the latter class correspond to segments where disorder as a feature is important for function as opposed to the identity of the underlying sequence (e.g., entropic chains and linkers). DisCons therefore helps with elucidating the function(s) arising from the disordered state by analyzing individual proteins as well as large-scale proteomics datasets.

Conclusions

DisCons is an openly accessible sequence analysis tool that identifies and highlights structurally disordered segments of proteins where the conformational flexibility is conserved across homologs, and therefore potentially functional. The tool is freely available both as a web application and as stand-alone source code hosted at http://pedb.vib.be/discons.  相似文献   

2.
As more and more protein sequences are uncovered from increasingly inexpensive sequencing techniques, an urgent task is to find their functions. This work presents a highly reliable computational technique for predicting DNA-binding function at the level of protein-DNA complex structures, rather than low-resolution two-state prediction of DNA-binding as most existing techniques do. The method first predicts protein-DNA complex structure by utilizing the template-based structure prediction technique HHblits, followed by binding affinity prediction based on a knowledge-based energy function (Distance-scaled finite ideal-gas reference state for protein-DNA interactions). A leave-one-out cross validation of the method based on 179 DNA-binding and 3797 non-binding protein domains achieves a Matthews correlation coefficient (MCC) of 0.77 with high precision (94%) and high sensitivity (65%). We further found 51% sensitivity for 82 newly determined structures of DNA-binding proteins and 56% sensitivity for the human proteome. In addition, the method provides a reasonably accurate prediction of DNA-binding residues in proteins based on predicted DNA-binding complex structures. Its application to human proteome leads to more than 300 novel DNA-binding proteins; some of these predicted structures were validated by known structures of homologous proteins in APO forms. The method [SPOT-Seq (DNA)] is available as an on-line server at http://sparks-lab.org.  相似文献   

3.
Intrinsically disordered proteins and regions (IDPs and IDRs) lack stable 3D structure under physiological conditions in-vitro, are common in eukaryotes, and facilitate interactions with RNA, DNA and proteins. Current methods for prediction of IDPs and IDRs do not provide insights into their functions, except for a handful of methods that address predictions of protein-binding regions. We report first-of-its-kind computational method DisoRDPbind for high-throughput prediction of RNA, DNA and protein binding residues located in IDRs from protein sequences. DisoRDPbind is implemented using a runtime-efficient multi-layered design that utilizes information extracted from physiochemical properties of amino acids, sequence complexity, putative secondary structure and disorder and sequence alignment. Empirical tests demonstrate that it provides accurate predictions that are competitive with other predictors of disorder-mediated protein binding regions and complementary to the methods that predict RNA- and DNA-binding residues annotated based on crystal structures. Application in Homo sapiens, Mus musculus, Caenorhabditis elegans and Drosophila melanogaster proteomes reveals that RNA- and DNA-binding proteins predicted by DisoRDPbind complement and overlap with the corresponding known binding proteins collected from several sources. Also, the number of the putative protein-binding regions predicted with DisoRDPbind correlates with the promiscuity of proteins in the corresponding protein–protein interaction networks. Webserver: http://biomine.ece.ualberta.ca/DisoRDPbind/  相似文献   

4.
The specificity of protein-protein interactions is encoded in those parts of the sequence that compose the binding interface. Therefore, understanding how changes in protein sequence influence interaction specificity, and possibly the phenotype, requires knowing the location of binding sites in those sequences. However, large-scale detection of protein interfaces remains a challenge. Here, we present a sequence- and interactome-based approach to mine interaction motifs from the recently published Arabidopsis thaliana interactome. The resultant proteome-wide predictions are available via www.ab.wur.nl/sliderbio and set the stage for further investigations of protein-protein binding sites. To assess our method, we first show that, by using a priori information calculated from protein sequences, such as evolutionary conservation and residue surface accessibility, we improve the performance of interface prediction compared to using only interactome data. Next, we present evidence for the functional importance of the predicted sites, which are under stronger selective pressure than the rest of protein sequence. We also observe a tendency for compensatory mutations in the binding sites of interacting proteins. Subsequently, we interrogated the interactome data to formulate testable hypotheses for the molecular mechanisms underlying effects of protein sequence mutations. Examples include proteins relevant for various developmental processes. Finally, we observed, by analysing pairs of paralogs, a correlation between functional divergence and sequence divergence in interaction sites. This analysis suggests that large-scale prediction of binding sites can cast light on evolutionary processes that shape protein-protein interaction networks.  相似文献   

5.
We introduce a new representation and feature extraction method for biological sequences. Named bio-vectors (BioVec) to refer to biological sequences in general with protein-vectors (ProtVec) for proteins (amino-acid sequences) and gene-vectors (GeneVec) for gene sequences, this representation can be widely used in applications of deep learning in proteomics and genomics. In the present paper, we focus on protein-vectors that can be utilized in a wide array of bioinformatics investigations such as family classification, protein visualization, structure prediction, disordered protein identification, and protein-protein interaction prediction. In this method, we adopt artificial neural network approaches and represent a protein sequence with a single dense n-dimensional vector. To evaluate this method, we apply it in classification of 324,018 protein sequences obtained from Swiss-Prot belonging to 7,027 protein families, where an average family classification accuracy of 93%±0.06% is obtained, outperforming existing family classification methods. In addition, we use ProtVec representation to predict disordered proteins from structured proteins. Two databases of disordered sequences are used: the DisProt database as well as a database featuring the disordered regions of nucleoporins rich with phenylalanine-glycine repeats (FG-Nups). Using support vector machine classifiers, FG-Nup sequences are distinguished from structured protein sequences found in Protein Data Bank (PDB) with a 99.8% accuracy, and unstructured DisProt sequences are differentiated from structured DisProt sequences with 100.0% accuracy. These results indicate that by only providing sequence data for various proteins into this model, accurate information about protein structure can be determined. Importantly, this model needs to be trained only once and can then be applied to extract a comprehensive set of information regarding proteins of interest. Moreover, this representation can be considered as pre-training for various applications of deep learning in bioinformatics. The related data is available at Life Language Processing Website: http://llp.berkeley.edu and Harvard Dataverse: http://dx.doi.org/10.7910/DVN/JMFHTN.  相似文献   

6.

Background

Vitamins are typical ligands that play critical roles in various metabolic processes. The accurate identification of the vitamin-binding residues solely based on a protein sequence is of significant importance for the functional annotation of proteins, especially in the post-genomic era, when large volumes of protein sequences are accumulating quickly without being functionally annotated.

Results

In this paper, a new predictor called TargetVita is designed and implemented for predicting protein-vitamin binding residues using protein sequences. In TargetVita, features derived from the position-specific scoring matrix (PSSM), predicted protein secondary structure, and vitamin binding propensity are combined to form the original feature space; then, several feature subspaces are selected by performing different feature selection methods. Finally, based on the selected feature subspaces, heterogeneous SVMs are trained and then ensembled for performing prediction.

Conclusions

The experimental results obtained with four separate vitamin-binding benchmark datasets demonstrate that the proposed TargetVita is superior to the state-of-the-art vitamin-specific predictor, and an average improvement of 10% in terms of the Matthews correlation coefficient (MCC) was achieved over independent validation tests. The TargetVita web server and the datasets used are freely available for academic use at http://csbio.njust.edu.cn/bioinf/TargetVita or http://www.csbio.sjtu.edu.cn/bioinf/TargetVita.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-297) contains supplementary material, which is available to authorized users.  相似文献   

7.
8.
9.
SPOR domains are ∼70 amino acids long and occur in >1,500 proteins identified by sequencing of bacterial genomes. The SPOR domains in the FtsN cell division proteins from Escherichia coli and Caulobacter crescentus have been shown to bind peptidoglycan. Besides FtsN, E. coli has three additional SPOR domain proteins—DamX, DedD, and RlpA. We show here that all three of these proteins localize to the septal ring in E. coli. The loss of DamX or DedD either alone or in combination with mutations in genes encoding other division proteins resulted in a variety of division phenotypes, demonstrating that DamX and DedD participate in cytokinesis. In contrast, RlpA mutants divided normally. Follow-up studies revealed that the SPOR domains themselves localize to the septal ring in vivo and bind peptidoglycan in vitro. Even SPOR domains from heterologous organisms, including Aquifex aeolicus, localized to septal rings when produced in E. coli and bound to purified E. coli peptidoglycan sacculi. We speculate that SPOR domains localize to the division site by binding preferentially to septal peptidoglycan. We further suggest that SPOR domain proteins are a common feature of the division apparatus in bacteria. DamX was characterized further and found to interact with multiple division proteins in a bacterial two-hybrid assay. One interaction partner is FtsQ, and several synthetic phenotypes suggest that DamX is a negative regulator of FtsQ function.Cell division in Escherichia coli is mediated by a collection of approximately 20 proteins, all of which localize to the midcell, where they form a structure called the septal ring, or divisome. About half of these proteins are essential for cell division. The corresponding temperature-sensitive mutants or depletion strains become filamentous and die under nonpermissive conditions. The remaining proteins are not essential under most laboratory conditions. In some cases null mutations reveal modest division defects, but in other cases division defects become apparent only under certain growth conditions or in combination with mutations in genes for other division proteins. For reviews of this topic, see references 18, 22, 29, and 67.One of the essential cell division proteins is a bitopic membrane protein named FtsN (see Fig. Fig.1A)1A) (13, 14). How FtsN facilitates cell division is not clear. Because overproduction of FtsN rescues a variety of mutants with lesions in genes for other cell division proteins [ftsA(Ts), ftsI(Ts) ftsQ(Ts), ftsEX null, ftsK null, and ftsP (sufI) null strains], it seems likely that one function of FtsN is to improve the assembly and/or stability of the septal ring (13, 20, 24, 30, 58, 63). Very recent evidence indicates that FtsN plays an important role in triggering constriction, probably by allosteric activation of some other component of the septal ring (26).Open in a separate windowFIG. 1.SPOR domain proteins included in this study. (A) Membrane topology and number of amino acids in each domain as retrieved from UniProt release 15.7 (http://www.uniprot.org) or the GTOP update of 15 December 2008 (http://spock.genes.nig.ac.jp/∼genome/gtop.html). N, amino terminus; CM, cytoplasmic membrane; OM, outer membrane. RlpA and VPA1294 have a covalently attached lipid at their amino termini. (B) Multiple-sequence alignment of SPOR domains shown in the present study to localize to the septal ring of E. coli. Sequences were aligned manually to the position-specific scoring matrix (PSSM) from http://www.ncbi.nlm.nih.gov/Class/Structure/pssm/pssm_viewer.cgi with the SPOR domain (Pfam accession no. 05036) as the PSSM identifier (PSSM ID). Residues with identity to those in the consensus sequence from the PSSM alignment are shaded gray. Numbers to the left refer to the first positions of the SPOR domains in the indicated proteins.A notable feature of FtsN is that it contains at its C terminus a peptidoglycan (PG) binding domain known as a SPOR domain (Pfam accession no. 05036) (23, 65, 72). SPOR domains are both common and widespread in bacteria. At the time of this writing (August 2009), over 1,500 proteins that contain a SPOR domain are listed in the Pfam database (23). These proteins come from over 500 bacterial species. The domain is named after the founding member of the protein family, a Bacillus subtilis protein named CwlC that is produced relatively late in the process of sporulation (41). CwlC, which comprises an N-terminal amidase domain and a C-terminal SPOR domain, facilitates release of the mature spore by degrading PG in the mother cell (48, 61).Our interest in SPOR domain proteins was piqued during a study of Vibrio parahaemolyticus (in collaboration with Linda McCarter) when we observed that a gene of unknown function, designated vpa1294, is highly induced in V. parahaemolyticus swarmer cells. The VPA1294 protein was annotated as a “putative DamX-related protein” (44; http://genome.gen-info.osaka-u.ac.jp/bacteria/vpara/). To learn about DamX, we turned to the EcoGene website (http://ecogene.org/) (57), which noted that (i) DamX from E. coli has an essentially unknown function, (ii) overproduction of DamX inhibits cell division (43), and (iii) DamX is one of four E. coli proteins that contain a SPOR domain, the others being the cell division protein FtsN and two proteins of unknown function, DedD and RlpA. Based on this information, we decided to investigate whether DamX, DedD, and RlpA are involved in cell division in E. coli. While this work was in progress, the Thanbichler laboratory demonstrated that Caulobacter crescentus has an FtsN-like protein that is needed for cell division (49) and the de Boer laboratory published a report on DamX, DedD, and RlpA from E. coli (26). We also learned that J. Maddock''s laboratory has been investigating DamX, DedD, and RlpA from E. coli (personal communication). Importantly, the major findings from all four laboratories are in general agreement: SPOR domain proteins are widespread in bacteria, many of these proteins are involved in cell division, and SPOR domains are sufficient for septal localization, probably because SPOR domains bind to septal PG.  相似文献   

10.
Protein-nucleotide interactions are ubiquitous in a wide variety of biological processes. Accurately identifying interaction residues solely from protein sequences is useful for both protein function annotation and drug design, especially in the post-genomic era, as large volumes of protein data have not been functionally annotated. Protein-nucleotide binding residue prediction is a typical imbalanced learning problem, where binding residues are extremely fewer in number than non-binding residues. Alleviating the severity of class imbalance has been demonstrated to be a promising means of improving the prediction performance of a machine-learning-based predictor for class imbalance problems. However, little attention has been paid to the negative impact of class imbalance on protein-nucleotide binding residue prediction. In this study, we propose a new supervised over-sampling algorithm that synthesizes additional minority class samples to address class imbalance. The experimental results from protein-nucleotide interaction datasets demonstrate that the proposed supervised over-sampling algorithm can relieve the severity of class imbalance and help to improve prediction performance. Based on the proposed over-sampling algorithm, a predictor, called TargetSOS, is implemented for protein-nucleotide binding residue prediction. Cross-validation tests and independent validation tests demonstrate the effectiveness of TargetSOS. The web-server and datasets used in this study are freely available at http://www.csbio.sjtu.edu.cn/bioinf/TargetSOS/.  相似文献   

11.
Linear motifs mediate a wide variety of cellular functions, which makes their characterization in protein sequences crucial to understanding cellular systems. However, the short length and degenerate nature of linear motifs make their discovery a difficult problem. Here, we introduce MotifHound, an algorithm particularly suited for the discovery of small and degenerate linear motifs. MotifHound performs an exact and exhaustive enumeration of all motifs present in proteins of interest, including all of their degenerate forms, and scores the overrepresentation of each motif based on its occurrence in proteins of interest relative to a background (e.g., proteome) using the hypergeometric distribution. To assess MotifHound, we benchmarked it together with state-of-the-art algorithms. The benchmark consists of 11,880 sets of proteins from S. cerevisiae; in each set, we artificially spiked-in one motif varying in terms of three key parameters, (i) number of occurrences, (ii) length and (iii) the number of degenerate or “wildcard” positions. The benchmark enabled the evaluation of the impact of these three properties on the performance of the different algorithms. The results showed that MotifHound and SLiMFinder were the most accurate in detecting degenerate linear motifs. Interestingly, MotifHound was 15 to 20 times faster at comparable accuracy and performed best in the discovery of highly degenerate motifs. We complemented the benchmark by an analysis of proteins experimentally shown to bind the FUS1 SH3 domain from S. cerevisiae. Using the full-length protein partners as sole information, MotifHound recapitulated most experimentally determined motifs binding to the FUS1 SH3 domain. Moreover, these motifs exhibited properties typical of SH3 binding peptides, e.g., high intrinsic disorder and evolutionary conservation, despite the fact that none of these properties were used as prior information. MotifHound is available (http://michnick.bcm.umontreal.ca or http://tinyurl.com/motifhound) together with the benchmark that can be used as a reference to assess future developments in motif discovery.  相似文献   

12.
The aim of de novo protein design is to find the amino acid sequences that will fold into a desired 3-dimensional structure with improvements in specific properties, such as binding affinity, agonist or antagonist behavior, or stability, relative to the native sequence. Protein design lies at the center of current advances drug design and discovery. Not only does protein design provide predictions for potentially useful drug targets, but it also enhances our understanding of the protein folding process and protein-protein interactions. Experimental methods such as directed evolution have shown success in protein design. However, such methods are restricted by the limited sequence space that can be searched tractably. In contrast, computational design strategies allow for the screening of a much larger set of sequences covering a wide variety of properties and functionality. We have developed a range of computational de novo protein design methods capable of tackling several important areas of protein design. These include the design of monomeric proteins for increased stability and complexes for increased binding affinity.To disseminate these methods for broader use we present Protein WISDOM (http://www.proteinwisdom.org), a tool that provides automated methods for a variety of protein design problems. Structural templates are submitted to initialize the design process. The first stage of design is an optimization sequence selection stage that aims at improving stability through minimization of potential energy in the sequence space. Selected sequences are then run through a fold specificity stage and a binding affinity stage. A rank-ordered list of the sequences for each step of the process, along with relevant designed structures, provides the user with a comprehensive quantitative assessment of the design. Here we provide the details of each design method, as well as several notable experimental successes attained through the use of the methods.  相似文献   

13.
Large quantity of reliable protein interaction data are available for model organisms in public depositories (e.g., MINT, DIP, HPRD, INTERACT). Most data correspond to experiments with the proteins of Saccharomyces cerevisiae, Drosophila melanogaster, Homo sapiens, Caenorhabditis elegans, Escherichia coli and Mus musculus. For other important organisms the data availability is poor or non-existent. Here we present NASCENT, a completely automatic web-based tool and also a downloadable Java program, capable of modeling and generating protein interaction networks even for non-model organisms. The tool performs protein interaction network modeling through gene-name mapping, and outputs the resulting network in graphical form and also in computer-readable graph-forms, directly applicable by popular network modeling software.

Availability

http://nascent.pitgroup.org.  相似文献   

14.
15.
Escherichia coli (E. coli) is the most widely used expression system for the production of recombinant proteins for structural and functional studies. However, purifying proteins is sometimes challenging since many proteins are expressed in an insoluble form. When working with difficult or multiple targets it is therefore recommended to use high throughput (HTP) protein expression screening on a small scale (1-4 ml cultures) to quickly identify conditions for soluble expression. To cope with the various structural genomics programs of the lab, a quantitative (within a range of 0.1-100 mg/L culture of recombinant protein) and HTP protein expression screening protocol was implemented and validated on thousands of proteins. The protocols were automated with the use of a liquid handling robot but can also be performed manually without specialized equipment.Disulfide-rich venom proteins are gaining increasing recognition for their potential as therapeutic drug leads. They can be highly potent and selective, but their complex disulfide bond networks make them challenging to produce. As a member of the FP7 European Venomics project (www.venomics.eu), our challenge is to develop successful production strategies with the aim of producing thousands of novel venom proteins for functional characterization. Aided by the redox properties of disulfide bond isomerase DsbC, we adapted our HTP production pipeline for the expression of oxidized, functional venom peptides in the E. coli cytoplasm. The protocols are also applicable to the production of diverse disulfide-rich proteins. Here we demonstrate our pipeline applied to the production of animal venom proteins. With the protocols described herein it is likely that soluble disulfide-rich proteins will be obtained in as little as a week. Even from a small scale, there is the potential to use the purified proteins for validating the oxidation state by mass spectrometry, for characterization in pilot studies, or for sensitive micro-assays.  相似文献   

16.
17.
Polycomb group response elements (PREs) play an essential role in gene regulation by the Polycomb group (PcG) repressor proteins in Drosophila. PREs are required for the recruitment and maintenance of repression by the PcG proteins. PREs are made up of binding sites for multiple DNA-binding proteins, but it is still unclear what combination(s) of binding sites is required for PRE activity. Here we compare the binding sites and activities of two closely linked yet separable PREs of the Drosophila engrailed (en) gene, PRE1 and PRE2. Both PRE1 and PRE2 contain binding sites for multiple PRE–DNA-binding proteins, but the number, arrangement, and spacing of the sites differs between the two PREs. These differences have functional consequences. Both PRE1 and PRE2 mediate pairing-sensitive silencing of mini-white, a functional assay for PcG repression; however, PRE1 requires two binding sites for Pleiohomeotic (Pho), whereas PRE2 requires only one Pho-binding site for this activity. Furthermore, for full pairing-sensitive silencing activity, PRE1 requires an AT-rich region not found in PRE2. These two PREs behave differently in a PRE embryonic and larval reporter construct inserted at an identical location in the genome. Our data illustrate the diversity of architecture and function of PREs.  相似文献   

18.
The human pathogen Campylobacter jejuni is naturally competent for transformation with its own DNA. Genes required for efficient transformation in C. jejuni include those similar to components of type II secretion systems found in many Gram-negative bacteria (R. S. Wiesner, D. R. Hendrixson, and V. J. DiRita, J Bacteriol 185:5408–5418, 2003, http://dx.doi.org/10.1128/JB.185.18.5408-5418.2003). Two of these, ctsE and ctsP, encode proteins annotated as putative nucleotide binding nucleoside triphosphatases (NTPases) or nucleoside triphosphate (NTP) binding proteins. Here we demonstrate that the nucleotide binding motifs of both proteins are essential for their function in transformation of C. jejuni. Localization experiments demonstrated that CtsE is a soluble protein while CtsP is membrane associated in C. jejuni. A bacterial two-hybrid screen identified an interaction between CtsP and CtsX, an integral membrane protein also required for transformation. Topological analysis of CtsX by the use of LacZ and PhoA fusions demonstrated it to be a bitopic, integral membrane protein with a cytoplasmic amino terminus and a periplasmic carboxyl terminus. Notwithstanding its interaction with membrane-localized CtsX, CtsP inherently associates with the membrane, requiring neither CtsX nor several other Cts proteins for this association.  相似文献   

19.
To eliminate unavoidable contamination of purified recombinant proteins by DnaK, we present a unique approach employing a BL21(DE3) ΔdnaK strain of Escherichia coli. Selected representative purified proteins remained soluble, correctly assembled, and active. This finding establishes DnaK dispensability for protein production in BL21(DE3), which is void of Lon protease, key to eliminating unfolded proteins.Obtaining substantial amounts of pure protein is essential in innumerable biological studies and indispensable to the biochemical characterization of proteins. The ease of growth, well-characterized genetics, and the large number of tools for gene expression have long made Escherichia coli the organism of choice for protein overproduction. The BL21(DE3) strain is widely used for recombinant protein production because of its engineered capacity to produce T7 polymerase and its deficiency in Lon and OmpT proteases.DnaK is an abundant protein (about 1% of the total protein of E. coli) (17) that interacts with a wide range of newly synthesized polypeptides (28) and assists their proper folding and assembly into oligomers by preventing protein aggregation. DnaK, together with ClpB ATPase, is also required to disaggregate preformed protein aggregates (12, 20), and it participates in the degradation of damaged proteins by Lon and ClpP (26, 27).The inactivation of dnaK has been shown previously to increase the insoluble fractions of certain aggregation-prone recombinant proteins (7), and DnaK alleviates the aggregation of certain heterologous proteins when coproduced with the protein of interest (8). However, fruitful coproduction of recombinant proteins with chaperones has been challenged by recent findings demonstrating that chaperones increase the solubility but not necessarily the quality of proteins (13, 15).The DnaK binding site, a five-residue hydrophobic core flanked by two basic residue-enriched regions, occurs on average every 36 residues in protein sequences (25). One consequence is unwanted DnaK contamination of recombinant proteins during purification in E. coli, even after several chromatographic steps (1, 3, 11, 14, 16, 21, 23).One challenge in protein purification is to obtain the highest level of purity in the fewest steps. Biologically active impurities can jeopardize research or therapeutic applications even if present in trace amounts. One approach developed to circumvent DnaK contamination is extensive washing of columns with ATP since DnaK in its ATP-bound state has low affinity for protein (3). However, this strategy lengthens the purification procedure, is expensive, and is of inconsistent effectiveness (1, 14).In an attempt to eliminate DnaK contamination, we have investigated whether recombinant proteins could be produced in the absence of DnaK. Toward that end, we constructed a ΔdnaK derivative of the extensively employed E. coli B host strain BL21(DE3). The consequences of the absence of DnaK for the production, solubilities, correct assembly, and activities of several recombinant proteins in BL21(DE3) have been studied. Obtaining a BL21(DE3) ΔdnaK strain has allowed us to elucidate to what extent such a major E. coli chaperone is indispensable to protein overproduction in the particular genetic context of an E. coli strain that lacks Lon, an ATP-dependent protease responsible for degrading unfolded proteins (10).dnaK in BL21(DE3) was inactivated by the introduction of a null allele, ΔdnaK::Kan, from the E. coli PopC4617 strain by P1 transduction (see Table S1 in the supplemental material). Transductants were selected at 30°C in Luria-Bertani medium complemented with kanamycin. The absence of dnaK was verified by colony PCR using the specific primers dnaK-Nter (5′-GGTAAAATAATTGGTATCGACCTGG-3′) and dnaK-Cter (5′-GTCTTTGACTTCTTCAAATTCAGCG-3′) (see Fig. S1 in the supplemental material). Immunoblotting using an anti-DnaK antibody showed that the obtained transductant (EN2) did not produce DnaK (see Fig. S1 in the supplemental material). The EN2 strain has been deposited at the Collection Nationale de Culture de Microorganismes at the Institut Pasteur (with identification number CNCM I-3863).E. coli K-12 dnaK mutants usually have a narrow range of permissive temperatures for growth (around 30°C) and exhibit multiple cellular defects, such as impaired cell division and the inhibition of DNA and RNA synthesis (4, 6, 18, 22). Inactivating dnaK in the genetic background of BL21(DE3), an E. coli B strain which is already deficient in OmpT and Lon proteases, did not lead to a dramatic difference in the exponential growth rate at 30°C, but at stationary phase, EN2 cells exhibited slightly reduced ability to form colonies on plates (data not shown). As expected for dnaK mutants, EN2 cells demonstrated impaired growth at 42°C (data not shown). Inactivating dnaK in BL21(DE3) did not induce major morphological defects, and EN2 cells were never found to form long filaments, as dnaK mutants with other genetic backgrounds have previously been reported to do (5) (data not shown). Therefore, the EN2 strain can easily be cultivated at 30°C.We next investigated whether inactivating dnaK in BL21(DE3) would impair the production and solubilities of different recombinant proteins (whose features are summarized in Table S2 in the supplemental material). These proteins belong to organisms of different kingdoms, and their molecular masses range from 19 to 51 kDa; therefore, they potentially correspond to DnaK substrates since the masses of polypeptides interacting with DnaK range from 14 to 90 kDa (28). Many of them exist as oligomers and may require the assistance of DnaK for proper assembly. These proteins were also chosen for their different levels of production and solubility in E. coli. Four of them (CpxP, ClpP1, PA28α, and proteasome-activating nucleotidase [PAN]) are totally soluble, and two of them (ClpP2 and green fluorescent protein [GFP]) are aggregation prone and may require the presence of DnaK to prevent their aggregation. Importantly, all these proteins were contaminated by DnaK when purified from E. coli (see Fig. Fig.33).Open in a separate windowFIG. 3.Purification of recombinant proteins in the absence of DnaK. Aliquots of 10 μg of CpxP, ClpP1, ClpP2, GFP, and PAN purified from BL21(DE3) cells (lanes 1, 3, 5, 7, and 9) or EN2 cells (lanes 2, 4, 6, 8, and 10) were loaded onto an SDS-12% PAGE gel. (A) Proteins were revealed by Coomassie blue staining. (B and C) DnaK (B) and GroEL (C) were detected by Western blotting. Sizes of molecular mass markers (lanes MW) are given in kilodaltons and indicated to the left of the gel. The asterisk indicates the position of DnaK on the gel. The two major bands in the purified PAN sample correspond to full-length 50-kDa His-PAN and the 40-kDa PAN fragment resulting from the internal initiation of translation, which copurify as oligomeric complexes (30). The gels shown are representative of results from at least three independent experiments.The production of recombinant proteins in exponentially growing BL21(DE3) and EN2 cells in Luria-Bertani medium at 30°C was induced with 1 mM isopropyl-β-d-thiogalactopyranoside (IPTG) for 2 h. The same biomasses of BL21(DE3) and EN2 cells were sonicated in 1 ml of lysis buffer (50 mM Tris, pH 7.5, 100 mM KCl, 1 mM dithiothreitol). Soluble proteins were separated from aggregated proteins and cellular debris by 30 min of centrifugation at 14,000 × g and 4°C. Pellets containing protein aggregates were resuspended in 1 ml of Tris, pH 7.5, containing 1% sodium dodecyl sulfate (SDS). Total extracts and soluble and insoluble fractions were analyzed by SDS-polyacrylamide gel electrophoresis (PAGE) (Fig. (Fig.11).Open in a separate windowFIG. 1.Levels of production and solubility of recombinant proteins in the absence of DnaK. Aliquots of 10 μg of total extracts (T-un and T) and soluble (S) and insoluble (P) fractions from uninduced (T-un) and IPTG-induced (T, S, and P) BL21(DE3) and EN2 cells overexpressing CpxP (A), ClpP1 (B), PA28α (C), ClpP2 (D), GFP (E), or PAN (F) were analyzed by SDS-12% PAGE on gels stained by Coomassie blue. Sizes of molecular mass markers (lanes MW) are given in kilodaltons and indicated to the left of each gel. Arrowheads indicate the positions of recombinant proteins. The gels shown are representative of results from at least three independent experiments.The levels of production of all tested proteins in EN2 and BL21(DE3) cells were similar, as demonstrated by the protein amounts in total extracts (Fig. (Fig.1,1, lanes T). Moreover, dnaK inactivation did not affect the solubilities of recombinant proteins, even those such as CpxP (Fig. (Fig.1A),1A), ClpP1 (Fig. (Fig.1B),1B), and PA28α (Fig. (Fig.1C)1C) produced in high amounts or those such as ClpP2 (Fig. (Fig.1D)1D) and GFP (Fig. (Fig.1E)1E) prone to aggregation. These findings were surprising since the function of the DnaK chaperone is to prevent protein aggregation during synthesis and to cooperate with DnaJ, GrpE, and ClpB in the disaggregation of aggregates. It seems that, even for aggregation-prone recombinant proteins, solubility may not necessarily be dependent on endogenous DnaK. This finding may reflect the different folding requirements of specific proteins. Another explanation may be the presence of another chaperone with an overlapping conjoint function. In fact, a consequence of the absence of DnaK in cells is higher levels of production of heat shock proteins such as GroEL/GroES (29). Consistent with these data, EN2 cells produced higher amounts of GroEL than BL21(DE3) cells (data not shown), and these higher amounts may compensate for the absence of DnaK in preventing protein aggregation, as was shown previously for endogenous E. coli proteins and other recombinant proteins (8, 28). An abundance of different chaperones playing nonspecialized roles in recombinant protein folding in E. coli cells may permit toleration of the loss of DnaK, without impairing cell capacity as a protein production factory.Since the examined proteins could fold and assemble independently of DnaK, we next tested whether a protein known to interact with DnaK could be produced in the absence of this chaperone. Nemo, the IκB kinase complex regulatory component of the NF-κB signaling pathway in eukaryotes, was shown previously to tightly bind and be contaminated by DnaK when produced in E. coli (1). When recombinant His-tagged Nemo was produced in BL21(DE3) under our conditions, it was barely detectable on electrophoresis gel (Fig. 2A and B). However, immunodetection using an anti-His6 antibody (Roche) at a 1:2,000 dilution showed that the absence of DnaK resulted in an increase in Nemo production (Fig. (Fig.2D).2D). When Nemo was produced in higher amounts, most of the protein was found in the soluble fraction, indicating that it could be produced as a soluble species in the absence of DnaK (Fig. (Fig.2D,2D, lane 5). Increased production of Nemo in the absence of DnaK could be explained by a role of this chaperone in Nemo degradation. Producing Nemo in a BL21(DE3) strain that is deficient in the protease ClpP did not increase its cellular amount (data not shown), indicating that if Nemo was degraded in a DnaK-dependent manner in BL21(DE3) (which already lacks Lon protease), ClpP was not responsible for this proteolysis or the absence of ClpP was compensated for by another protease.Open in a separate windowFIG. 2.Levels of production and solubility of recombinant Nemo in the absence of DnaK. Aliquots of 10 μg (A and C) or 20 μg (B and D) of total extracts (T) and soluble (S) and insoluble (P) fractions from uninduced and IPTG-induced BL21(DE3) and EN2 cells overexpressing Nemo were loaded onto an SDS-10% PAGE gel. Proteins were detected by Coomassie blue staining (A and C), and His-tagged Nemo was detected by Western blotting (B and D). Sizes of molecular mass markers (lanes MW) are given in kilodaltons and indicated to the left of each gel. The gels shown are representative of results from at least three independent experiments.We next tested whether recombinant proteins produced in the absence of DnaK would remain soluble and active during their purification. Samples of 200 ml of cells overproducing CpxP, ClpP1, ClpP2, or GFP or 500 ml of PAN-overproducing cells were sonicated in 2 ml of lysis buffer (50 mM NaH2PO4, pH 8.0, 300 mM NaCl, 10 mM imidazole). The soluble fraction obtained after 30 min of centrifugation at 38,000 × g and 4°C was loaded onto 400 μl of nickel-nitrilotriacetic acid resin, and His-tagged proteins were purified according to the recommendations of the resin manufacturer (Qiagen). After elution, His-tagged proteins were dialyzed against 50 mM Tris, pH 7.5, concentrated, and analyzed by electrophoresis.By this procedure, recombinant proteins were purified to the levels of homogeneity indicated in Fig. Fig.3A.3A. Samples of 10 μg of purified proteins were used for the immunodetection of contamination by DnaK (using an anti-DnaK antibody from Stressgen at a 1:2,000 dilution). We found that DnaK in BL21(DE3) cells contaminated all preparations of purified recombinant proteins, albeit to different extents (Fig. (Fig.3B,3B, lanes 1, 3, 5, 7, and 9). As expected, dnaK inactivation prevented such contamination (Fig. (Fig.3B,3B, lanes 2, 4, 6, 8, and 10). It is noteworthy that most of the proteins purified from BL21(DE3) were also contaminated by GroEL, although this contamination was minor. In EN2 cells, where GroEL expression is increased, we did not systematically observe greater contamination by GroEL (Fig. (Fig.3C).3C). Moreover, CpxP and GFP, the proteins that exhibited the greatest DnaK contamination, were not the most contaminated by GroEL, and GroEL did not copurify with PAN in the absence of DnaK. Thus, the absence of DnaK did not necessarily lead to a higher level of contamination by GroEL.Despite the absence of DnaK, all purified recombinant proteins remained soluble even after being concentrated. Since some aggregates are soluble and solubility does not always guarrantee a native active conformation (15, 19), the activities (when readily measurable) or native conformations of some of the purified proteins were examined. One microgram of purified PAN was used to measure ATP hydrolysis at 55°C as described earlier (2). PAN proteins purified from BL21(DE3) and EN2 cells had comparable ATPase activities, with means ± standard errors of 762.33 ± 145.51 and 968.32 ± 198.85 nmol mg−1 h−1 (n = 3), respectively. The fluorescence emission spectrum (at an excitation wavelength of 400 nm) of GFP purified from EN2 cells was indistinguishable from that of GFP purified from BL21(DE3) cells (Fig. (Fig.4A),4A), indicating that GFP remained correctly folded when produced in the absence of DnaK. CpxP, a component of the Cpx signal transduction pathway, was the protein that exhibited the greatest DnaK contamination (11). It self-associates into dimers (M. Miot and J.-M. Betton, unpublished data), and to test its correct assembly, 100 μl of purified CpxP at 1 mg/ml in a buffer of 25 mM Tris, pH 7.5, and 150 mM NaCl was loaded onto a size exclusion chromatography column (Superdex 200 HR10/30; GE Healthcare) and eluted with the same buffer at a flow rate of 0.5 ml/min. Recombinant CpxP purified from BL21(DE3) eluted at a volume of 14.98 ml (Fig. (Fig.4B),4B), corresponding to a species with an apparent molecular mass of 39.85 kDa (a dimer of Cpx). Recombinant CpxP purified from EN2 eluted at a nearly identical volume of 14.97 ml (Fig. (Fig.4B).4B). Thus, the absence of DnaK did not alter CpxP dimeric assembly and did not produce any soluble higher-molecular-mass aggregate species.Open in a separate windowFIG. 4.Folding and assembly of proteins in the absence of DnaK. (A) Fluorescence emission spectra of 8-μg/ml GFP preparations purified from BL21(DE3) and EN2 cells, recorded with an FP-6200 spectrofluorimeter (Jasco) at a scan rate of 250 nm min−1 using a bandwidth of 5 nm for both excitation and emission beams. The spectra shown are representative of results from at least two independent experiments. (B) Size exclusion chromatograms for CpxP proteins purified from BL21(DE3) and EN2 cells. Arrowheads indicate the elution volumes of the standards, and their masses are given in kilodaltons. The chromatograms shown are representative of results from at least three independent experiments.Altogether, these findings indicate that high levels of correctly folded, assembled, and active soluble recombinant proteins can be produced in the absence of endogenous DnaK chaperone in BL21(DE3). Surprisingly, our study showed that the inactivation of dnaK in BL21(DE3), which does not contain Lon, did not result in an increase in the aggregation of recombinant proteins, as was seen previously in E. coli K-12 (24). It seems that in BL21(DE3) cells, and in E. coli B cells in general, factors other than DnaK and Lon may be fundamental in managing the accumulation of aggregated proteins. Through the detailed characterization of a BL21(DE3) ΔdnaK strain and testing of the production of proteins of different natures, origins, and sizes, including aggregation-prone proteins, our study demonstrates that this EN2 strain offers a strategy that can be generally and extensively used to avoid unwanted contamination by DnaK. In addition, since DnaK has ATPase activity, the EN2 strain is particularly well suited for the production and purification of recombinant ATPases, eliminating the undifferentiable ATPase contamination. Given that GroEL, another major chaperone in E. coli, has also been found to contaminate purified recombinant proteins (9), it would be of additional interest to find conditions under which both dnaK and groEL could be eliminated in the BL21(DE3) strain without impairing its survival and its remarkable protein factory capacities.  相似文献   

20.
The localization of signaling molecules such as G protein-coupled receptors (GPCRs) to primary cilia is essential for correct signal transduction. Detailed studies over the past decade have begun to elucidate the diverse sequences and trafficking mechanisms that sort and transport GPCRs to the ciliary compartment. However, a systematic analysis of the pathways required for ciliary targeting of multiple GPCRs in different cell types in vivo has not been reported. Here we describe the sequences and proteins required to localize GPCRs to the cilia of the AWB and ASK sensory neuron types in Caenorhabditis elegans. We find that GPCRs expressed in AWB or ASK utilize conserved and novel sequences for ciliary localization, and that the requirement for a ciliary targeting sequence in a given GPCR is different in different neuron types. Consistent with the presence of multiple ciliary targeting sequences, we identify diverse proteins required for ciliary localization of individual GPCRs in AWB and ASK. In particular, we show that the TUB-1 Tubby protein is required for ciliary localization of a subset of GPCRs, implying that defects in GPCR localization may be causal to the metabolic phenotypes of tub-1 mutants. Together, our results describe a remarkable complexity of mechanisms that act in a protein- and cell-specific manner to localize GPCRs to cilia, and suggest that this diversity allows for precise regulation of GPCR-mediated signaling as a function of external and internal context.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号