首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The New York Consortium on Membrane Protein Structure (NYCOMPS), a part of the Protein Structure Initiative (PSI) in the USA, has as its mission to establish a high-throughput pipeline for determination of novel integral membrane protein structures. Here we describe our current target selection protocol, which applies structural genomics approaches informed by the collective experience of our team of investigators. We first extract all annotated proteins from our reagent genomes, i.e. the 96 fully sequenced prokaryotic genomes from which we clone DNA. We filter this initial pool of sequences and obtain a list of valid targets. NYCOMPS defines valid targets as those that, among other features, have at least two predicted transmembrane helices, no predicted long disordered regions and, except for community nominated targets, no significant sequence similarity in the predicted transmembrane region to any known protein structure. Proteins that feed our experimental pipeline are selected by defining a protein seed and searching the set of all valid targets for proteins that are likely to have a transmembrane region structurally similar to that of the seed. We require sequence similarity aligning at least half of the predicted transmembrane region of seed and target. Seeds are selected according to their feasibility and/or biological interest, and they include both centrally selected targets and community nominated targets. As of December 2008, over 6,000 targets have been selected and are currently being processed by the experimental pipeline. We discuss how our target list may impact structural coverage of the membrane protein space.  相似文献   

2.
Eukaryotic membrane proteins play many vital roles in the cell and are important drug targets. Approximately 25% of all genes identified in the genome are known to encode membrane proteins, but the vast majority have no assigned function. Although the generation of structures of soluble proteins has entered the high-throughput stage, for eukaryotic membrane proteins only a dozen high-resolution structures have been obtained so far. One major bottleneck for the functional and structural characterisation of membrane proteins is the overproduction of biologically active material. Recent advances in the development of the Lactococcus lactis expression system have opened the way for the high-throughput functional expression of eukaryotic membrane proteins.  相似文献   

3.
Recent years have seen the establishment of structural genomics centers that explicitly target integral membrane proteins. Here, we review the advances in targeting these extremely high-hanging fruits of structural biology in high-throughput mode. We observe that the experimental determination of high-resolution structures of integral membrane proteins is increasingly successful both in terms of getting structures and of covering important protein families, for example, from Pfam. Structural genomics has begun to contribute significantly toward this progress. An important component of this contribution is the set up of robotic pipelines that generate a wealth of experimental data for membrane proteins. We argue that prediction methods for the identification of membrane regions and for the comparison of membrane proteins largely suffice to meet the challenges of target selection for structural genomics of membrane proteins. In contrast, we need better methods to prioritize the most promising members in a family of closely related proteins and to annotate protein function from sequence and structure in absence of homology.  相似文献   

4.
The wealth of genomic data available for many organisms has set the stage for the next phase of structure-function analysis. High-throughput structural genomics is currently the method of choice for rapid analysis of protein structure-function relationships on a proteome-wide basis. The Joint Center for Structural Genomics (JCSG), established in 2000 under the NIH/NIGMS Protein Structure Initiative, has developed and implemented an integrated high-throughput structure pipeline and applied it in a 2-tiered approach to mining the proteome of the thermophilic bacterium Thermotoga maritima. In the first tier, the successful application of this integrated pipeline has resulted in the cloning and expression of 73% of the T. maritima proteome (1376 out of 1877 predicted genes), and has identified 465 proteins which produced crystal hits. These 465 proteins were compared with existing structural information and a subset of 269 targets were selected to process towards structure determination in a second tier effort. To date, the JCSG pipeline applied to the Thermotoga maritima proteome has resulted in 55 new structures and has identified 6 novel folds and continues to identify structures with novel features.  相似文献   

5.
Persistent hurdles impede the successful determination of high-resolution crystal structures of eukaryotic integral membrane proteins (IMP). We designed a high-throughput structural genomics oriented pipeline that seeks to minimize effort in uncovering high-quality, responsive non-redundant targets for crystallization. This “discovery-oriented” pipeline sidesteps two significant bottlenecks in the IMP structure determination pipeline: expression and membrane extraction with detergent. In addition, proteins that enter the pipeline are then rapidly vetted by their presence in the included volume on a size-exclusion column—a hallmark of well-behaved IMP targets. A screen of 384 rationally selected eukaryotic IMPs in baker’s yeast Saccharomyces cerevisiae is outlined to demonstrate the results expected when applying this discovery-oriented pipeline to whole-organism membrane proteomes. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users. Franklin A. Hays and Zygy Roe-Zurz have contributed equally to this work.  相似文献   

6.
A high-throughput crystallization-to-structure pipeline for structural genomics was recently developed at the Advanced Protein Crystallography Research Group of the RIKEN SPring-8 Center in Japan. The structure determination pipeline includes three newly developed technologies for automating X-ray protein crystallography: the automated crystallization and observation robot system "TERA", the SPring-8 Precise Automatic Cryosample Exchanger "SPACE" for automated data collection, and the Package of Expert Researcher's Operation Network "PERON" for automated crystallographic computation from phasing to model checking. During the 5 years following April, 2002, this pipeline was used by seven researchers to determine 138 independent crystal structures (resulting from 437 purified proteins, 234 cryoloop-mountable crystals, and 175 diffraction data sets). The protocols used in the high-throughput pipeline are described in this paper.  相似文献   

7.
MOTIVATION: Experimental techniques alone cannot keep up with the production rate of protein sequences, while computational techniques for protein structure predictions have matured to such a level to provide reliable structural characterization of proteins at large scale. Integration of multiple computational tools for protein structure prediction can complement experimental techniques. RESULTS: We present an automated pipeline for protein structure prediction. The centerpiece of the pipeline is our threading-based protein structure prediction system PROSPECT. The pipeline consists of a dozen tools for identification of protein domains and signal peptide, protein triage to determine the protein type (membrane or globular), protein fold recognition, generation of atomic structural models, prediction result validation, etc. Different processing and prediction branches are determined automatically by a prediction pipeline manager based on identified characteristics of the protein. The pipeline has been implemented to run in a heterogeneous computational environment as a client/server system with a web interface. Genome-scale applications on Caenorhabditis elegans, Pyrococcus furiosus and three cyanobacterial genomes are presented. AVAILABILITY: The pipeline is available at http://compbio.ornl.gov/proteinpipeline/  相似文献   

8.
Mirkovic N  Li Z  Parnassa A  Murray D 《Proteins》2007,66(4):766-777
The technological breakthroughs in structural genomics were designed to facilitate the solution of a sufficient number of structures, so that as many protein sequences as possible can be structurally characterized with the aid of comparative modeling. The leverage of a solved structure is the number and quality of the models that can be produced using the structure as a template for modeling and may be viewed as the "currency" with which the success of a structural genomics endeavor can be measured. Moreover, the models obtained in this way should be valuable to all biologists. To this end, at the Northeast Structural Genomics Consortium (NESG), a modular computational pipeline for automated high-throughput leverage analysis was devised and used to assess the leverage of the 186 unique NESG structures solved during the first phase of the Protein Structure Initiative (January 2000 to July 2005). Here, the results of this analysis are presented. The number of sequences in the nonredundant protein sequence database covered by quality models produced by the pipeline is approximately 39,000, so that the average leverage is approximately 210 models per structure. Interestingly, only 7900 of these models fulfill the stringent modeling criterion of being at least 30% sequence-identical to the corresponding NESG structures. This study shows how high-throughput modeling increases the efficiency of structure determination efforts by providing enhanced coverage of protein structure space. In addition, the approach is useful in refining the boundaries of structural domains within larger protein sequences, subclassifying sequence diverse protein families, and defining structure-based strategies specific to a particular family.  相似文献   

9.
The Center for Eukaryotic Structural Genomics (CESG), as part of the Protein Structure Initiative (PSI), has established a high-throughput structure determination pipeline focused on eukaryotic proteins. NMR spectroscopy is an integral part of this pipeline, both as a method for structure determinations and as a means for screening proteins for stable structure. Because computational approaches have estimated that many eukaryotic proteins are highly disordered, about 1 year into the project, CESG began to use an algorithm (the Predictor of Naturally Disordered Regions, PONDR to avoid proteins that were likely to be disordered. We report a retrospective analysis of the effect of this filtering on the yield of viable structure determination candidates. In addition, we have used our current database of results on 70 protein targets from Arabidopsis thaliana and 1 from Caenorhabditis elegans, which were labeled uniformly with nitrogen-15 and screened for disorder by NMR spectroscopy, to compare the original algorithm with 13 other approaches for predicting disorder from sequence. Our study indicates that the efficiency of structural proteomics of eukaryotes can be improved significantly by removing targets predicted to be disordered by an algorithm chosen to provide optimal performance.  相似文献   

10.
Gram-positive bacterium Streptococcus mutans is the primary causative agent of human dental caries. To better understand this pathogen at the atomic structure level and to establish potential drug and vaccine targets, we have carried out structural genomics research since 2005. To achieve the goal, we have developed various in-house automation systems including novel high-throughput crystallization equipment and methods, based on which a large-scale, high-efficiency and low-cost platform has been establish in our laboratory. From a total of 1,963 annotated open reading frames, 1,391 non-membrane targets were selected prioritized by protein sequence similarities to unknown structures, and clustered by restriction sites to allow for cost-effective high-throughput conventional cloning. Selected proteins were over-expressed in different strains of Escherichia coli. Clones expressed soluble proteins were selected, expanded, and expressed proteins were purified and subjected to crystallization trials. Finally, protein crystals were subjected to X-ray analysis and structures were determined by crystallographic methods. Using the previously established procedures, we have so far obtained more than 200 kinds of protein crystals and 100 kinds of crystal structures involved in different biological pathways. In this paper we demonstrate and review a possibility of performing structural genomics studies at moderate laboratory scale. Furthermore, the techniques and methods developed in our study can be widely applied to conventional structural biology research practice.  相似文献   

11.
Membrane proteins serve as cellular gatekeepers, regulators, and sensors. Prior studies have explored the functional breadth and evolution of proteins and families of particular interest, such as the diversity of transport-associated membrane protein families in prokaryotes and eukaryotes, the composition of integral membrane proteins, and family classification of all human G-protein coupled receptors. However, a comprehensive analysis of the content and evolutionary associations between membrane proteins and families in a diverse set of genomes is lacking. Here, a membrane protein annotation pipeline was developed to define the integral membrane genome and associations between 21,379 proteins from 34 genomes; most, but not all of these proteins belong to 598 defined families. The pipeline was used to provide target input for a structural genomics project that successfully cloned, expressed, and purified 61 of our first 96 selected targets in yeast. Furthermore, the methodology was applied (1) to explore the evolutionary history of the substrate-binding transmembrane domains of the human ABC transporter superfamily, (2) to identify the multidrug resistance-associated membrane proteins in whole genomes, and (3) to identify putative new membrane protein families.  相似文献   

12.
For over 2 decades, continuous efforts to organize the jungle of available protein structures have been underway. Although a number of discrepancies between different classification approaches for soluble proteins have been reported, the classification of membrane proteins has so far not been comparatively studied because of the limited amount of available structural data. Here, we present an analysis of α‐helical membrane protein classification in the SCOP and CATH databases. In the current set of 63 α‐helical membrane protein chains having between 1 and 13 transmembrane helices, we observed a number of differently classified proteins both regarding their domain and fold assignment. The majority of all discrepancies affect single transmembrane helix, two helix hairpin, and four helix bundle domains, while domains with more than five helices are mostly classified consistently between SCOP and CATH. It thus appears that the structural constraints imposed by the lipid bilayer complicate the classification of membrane proteins with only few membrane‐spanning regions. This problem seems to be specific for membrane proteins as soluble four helix bundles, not restrained by the membrane, are more consistently classified by SCOP and CATH. Our findings indicate that the structural space of small membrane helix bundles is highly continuous such that even minor differences in individual classification procedures may lead to a significantly different classification. Membrane proteins with few helices and limited structural diversity only seem to be reasonably classifiable if the definition of a fold is adapted to include more fine‐grained structural features such as helix–helix interactions and reentrant regions. Proteins 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

13.
As structural genomics and proteomics research has become popular, the importance of cell-free protein synthesis systems has been realized for high-throughput expression. Our group has established a high-throughput pipeline for protein sample preparation for structural genomics and proteomics by using cell-free protein synthesis. Among the many procedures for cell-free protein synthesis, the preparation of the cell extract is a crucial step to establish a highly efficient and reproducible workflow. In this article, we describe a detailed protocol for E. coli cell extract preparation for cell-free protein synthesis, which we have developed and routinely use. The cell extract prepared according to this protocol is used for many of our cell-free synthesis applications, including high-throughput protein expression using PCR-amplified templates and large-scale protein production for structure determinations.  相似文献   

14.

Background

Protein structures are critical for understanding the mechanisms of biological systems and, subsequently, for drug and vaccine design. Unfortunately, protein sequence data exceed structural data by a factor of more than 200 to 1. This gap can be partially filled by using computational protein structure prediction. While structure prediction Web servers are a notable option, they often restrict the number of sequence queries and/or provide a limited set of prediction methodologies. Therefore, we present a standalone protein structure prediction software package suitable for high-throughput structural genomic applications that performs all three classes of prediction methodologies: comparative modeling, fold recognition, and ab initio. This software can be deployed on a user''s own high-performance computing cluster.

Methodology/Principal Findings

The pipeline consists of a Perl core that integrates more than 20 individual software packages and databases, most of which are freely available from other research laboratories. The query protein sequences are first divided into domains either by domain boundary recognition or Bayesian statistics. The structures of the individual domains are then predicted using template-based modeling or ab initio modeling. The predicted models are scored with a statistical potential and an all-atom force field. The top-scoring ab initio models are annotated by structural comparison against the Structural Classification of Proteins (SCOP) fold database. Furthermore, secondary structure, solvent accessibility, transmembrane helices, and structural disorder are predicted. The results are generated in text, tab-delimited, and hypertext markup language (HTML) formats. So far, the pipeline has been used to study viral and bacterial proteomes.

Conclusions

The standalone pipeline that we introduce here, unlike protein structure prediction Web servers, allows users to devote their own computing assets to process a potentially unlimited number of queries as well as perform resource-intensive ab initio structure prediction.  相似文献   

15.
We have applied high throughput methods for cloning and expression of more than 850 genes from the Bacillus subtilis genome. The process uses 96-well plates and is automated from the level of primer design to the detection of soluble protein by a tag detection screen. This process was applied to a set of cytoplasmic targets from Bacillus subtilis to produce clones expressing soluble protein for incorporation into the structure determination pipeline of the Midwest Center for Structural Genomics. We also evaluated the feasibility of these plate-based methods for domain-based cloning and expression of secretory proteins and putative soluble domains of membrane proteins. This approach shows promise for implementation in a high throughput format and could provide additional target resources for structure determination. The continued development of new technologies that can be implemented in an automated format will be essential for continued success in the structural genomic programs.  相似文献   

16.
Knots in proteins are increasingly being recognized as an important structural concept, and the folding of these peculiar structures still poses considerable challenges. From a functional point of view, most protein knots discovered so far are either enzymes or DNA-binding proteins. Our comprehensive topological analysis of the Protein Data Bank reveals several novel structures including knotted mitochondrial proteins and the most deeply embedded protein knot discovered so far. For the latter, we propose a novel folding pathway based on the idea that a loose knot forms at a terminus and slides to its native position. For the mitochondrial proteins, we discuss the folding problem from the perspective of transport and suggest that they fold inside the mitochondria. We also discuss the evolutionary origin of a novel class of knotted membrane proteins and argue that a novel knotted DNA-binding protein constitutes a new fold. Finally, we have also discovered a knot in an artificially designed protein structure.  相似文献   

17.
Structural genomics projects represent major undertakings that will change our understanding of proteins. They generate unique datasets that, for the first time, present a standardized view of proteins in terms of their physical and chemical properties. By analyzing these datasets here, we are able to discover correlations between a protein's characteristics and its progress through each stage of the structural genomics pipeline, from cloning, expression, purification, and ultimately to structural determination. First, we use tree-based analyses (decision trees and random forest algorithms) to discover the most significant protein features that influence a protein's amenability to high-throughput experimentation. Based on this, we identify potential bottlenecks in various stages of the structural genomics process through specialized "pipeline schematics". We find that the properties of a protein that are most significant are: (i.) whether it is conserved across many organisms; (ii). the percentage composition of charged residues; (iii). the occurrence of hydrophobic patches; (iv). the number of binding partners it has; and (v). its length. Conversely, a number of other properties that might have been thought to be important, such as nuclear localization signals, are not significant. Thus, using our tree-based analyses, we are able to identify combinations of features that best differentiate the small group of proteins for which a structure has been determined from all the currently selected targets. This information may prove useful in optimizing high-throughput experimentation. Further information is available from http://mining.nesg.org/.  相似文献   

18.
Membrane proteins are involved in numerous vital biological processes. To understand membrane protein functionality, accurate structural information is required. Usually, structure determination and dynamics of membrane proteins are studied in micelles using either solution state NMR or X‐ray crystallography. Even though invaluable information has been obtained by this approach, micelles are known to be far from ideal mimics of biological membranes often causing the loss or decrease of membrane protein activity. Recently, nanodiscs, which are composed of a lipid bilayer surrounded by apolipoproteins, have been introduced as a more physiological alternative than micelles for NMR investigations on membrane proteins. Here, we show that membrane protein bond orientations in nanodiscs can be obtained by measuring residual dipolar couplings (RDCs) with the outer membrane protein OmpX embedded in nanodiscs using Pf1 phage as an alignment medium. The presented collection of membrane protein RDCs in nanodiscs represents an important step toward more comprehensive structural and dynamical NMR‐based investigations of membrane proteins in a natural bilayer environment.  相似文献   

19.
The function of DNA‐ and RNA‐binding proteins can be inferred from the characterization and accurate prediction of their binding interfaces. However, the main pitfall of various structure‐based methods for predicting nucleic acid binding function is that they are all limited to a relatively small number of proteins for which high‐resolution three‐dimensional structures are available. In this study, we developed a pipeline for extracting functional electrostatic patches from surfaces of protein structural models, obtained using the I‐TASSER protein structure predictor. The largest positive patches are extracted from the protein surface using the patchfinder algorithm. We show that functional electrostatic patches extracted from an ensemble of structural models highly overlap the patches extracted from high‐resolution structures. Furthermore, by testing our pipeline on a set of 55 known nucleic acid binding proteins for which I‐TASSER produces high‐quality models, we show that the method accurately identifies the nucleic acids binding interface on structural models of proteins. Employing a combined patch approach we show that patches extracted from an ensemble of models better predicts the real nucleic acid binding interfaces compared with patches extracted from independent models. Overall, these results suggest that combining information from a collection of low‐resolution structural models could be a valuable approach for functional annotation. We suggest that our method will be further applicable for predicting other functional surfaces of proteins with unknown structure. Proteins 2012. © 2011 Wiley Periodicals, Inc.  相似文献   

20.
High-throughput protein production systems have become an important issue, because protein production is one of the bottleneck steps in large-scale structural and functional analyses of proteins. We have developed a dialysis reactor and a fully automated system for protein production using the dialysis cell-free synthesis method, which we previously established to produce protein samples on a milligram scale in a high-throughput manner. The dialysis reactor was designed to be suitable for an automated system and has six dialysis cups attached to a flat dialysis membrane. The automated system is based on a Tecan Freedom EVO 200 workstation in a three-arm configuration, and is equipped with shaking incubators, a vacuum module, a robotic centrifuge, a plate heat sealer, and a custom-made tilting carrier for collection of reaction solutions from the flat-bottom cups with dialysis membranes. The consecutive process, from the dialysis cell-free protein synthesis to the partial purification by immobilized metal affinity chromatography on a 96-well filtration plate, was performed within ca. 14 h, including 8 h of cell-free protein synthesis. The proteins were eluted stepwise in a high concentration using EDTA by centrifugation, while the resin in the filtration plate was washed on the vacuum manifold. The system was validated to be able to simultaneously and automatically produce up to 96 proteins in yields of several milligrams with high well-to-well reliability, sufficient for structural and functional analyses of proteins. The protein samples produced by the automated system have been utilized for NMR screening to judge the protein foldedness and for structure determinations using heteronuclear multi-dimensional NMR spectroscopy. The automated high-throughput protein production system represents an important breakthrough in the structural and functional studies of proteins and has already contributed a massive amount of results in the structural genomics project at the RIKEN Structural Genomics/Proteomics Initiative (RSGI).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号