首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Liu J  Hegyi H  Acton TB  Montelione GT  Rost B 《Proteins》2004,56(2):188-200
A central goal of structural genomics is to experimentally determine representative structures for all protein families. At least 14 structural genomics pilot projects are currently investigating the feasibility of high-throughput structure determination; the National Institutes of Health funded nine of these in the United States. Initiatives differ in the particular subset of "all families" on which they focus. At the NorthEast Structural Genomics consortium (NESG), we target eukaryotic protein domain families. The automatic target selection procedure has three aims: 1) identify all protein domain families from currently five entirely sequenced eukaryotic target organisms based on their sequence homology, 2) discard those families that can be modeled on the basis of structural information already present in the PDB, and 3) target representatives of the remaining families for structure determination. To guarantee that all members of one family share a common foldlike region, we had to begin by dissecting proteins into structural domain-like regions before clustering. Our hierarchical approach, CHOP, utilizing homology to PrISM, Pfam-A, and SWISS-PROT chopped the 103,796 eukaryotic proteins/ORFs into 247,222 fragments. Of these fragments, 122,999 appeared suitable targets that were grouped into >27,000 singletons and >18,000 multifragment clusters. Thus, our results suggested that it might be necessary to determine >40,000 structures to minimally cover the subset of five eukaryotic proteomes.  相似文献   

2.
The New York Consortium on Membrane Protein Structure (NYCOMPS), a part of the Protein Structure Initiative (PSI) in the USA, has as its mission to establish a high-throughput pipeline for determination of novel integral membrane protein structures. Here we describe our current target selection protocol, which applies structural genomics approaches informed by the collective experience of our team of investigators. We first extract all annotated proteins from our reagent genomes, i.e. the 96 fully sequenced prokaryotic genomes from which we clone DNA. We filter this initial pool of sequences and obtain a list of valid targets. NYCOMPS defines valid targets as those that, among other features, have at least two predicted transmembrane helices, no predicted long disordered regions and, except for community nominated targets, no significant sequence similarity in the predicted transmembrane region to any known protein structure. Proteins that feed our experimental pipeline are selected by defining a protein seed and searching the set of all valid targets for proteins that are likely to have a transmembrane region structurally similar to that of the seed. We require sequence similarity aligning at least half of the predicted transmembrane region of seed and target. Seeds are selected according to their feasibility and/or biological interest, and they include both centrally selected targets and community nominated targets. As of December 2008, over 6,000 targets have been selected and are currently being processed by the experimental pipeline. We discuss how our target list may impact structural coverage of the membrane protein space.  相似文献   

3.
The New York Consortium on Membrane Protein Structure (NYCOMPS) was formed to accelerate the acquisition of structural information on membrane proteins by applying a structural genomics approach. NYCOMPS comprises a bioinformatics group, a centralized facility operating a high-throughput cloning and screening pipeline, a set of associated wet labs that perform high-level protein production and structure determination by x-ray crystallography and NMR, and a set of investigators focused on methods development. In the first three years of operation, the NYCOMPS pipeline has so far produced and screened 7,250 expression constructs for 8,045 target proteins. Approximately 600 of these verified targets were scaled up to levels required for structural studies, so far yielding 24 membrane protein crystals. Here we describe the overall structure of NYCOMPS and provide details on the high-throughput pipeline.  相似文献   

4.
The arrival of genomic sequences to the database has provided a seemingly unlimited supply of targets for protein structure determination and the possibility of solving the structure of an entire proteome. Based on our experience with the proteomes of Pyrobaculum aerophilum and Mycobacterium tuberculosis, we have developed a simple strategy for the production of proteins for structural studies by X-ray crystallography. Our scheme demonstrates a strong protein target commitment and includes the expression of genes from these organisms in Escherichia coli. These proteins are expressed with affinity tags and purified for characterization and crystallization. We have identified protein solubility and crystallization as the two major bottlenecks in the process toward the determination of protein structures by X-ray diffraction. Strategies to overcome these bottlenecks are discussed.  相似文献   

5.
Intrinsically disordered proteins (IDPs) and proteins with long disordered regions are highly abundant in various proteomes. Despite their lack of well-defined ordered structure, these proteins and regions are frequently involved in crucial biological processes. Although in recent years these proteins have attracted the attention of many researchers, IDPs represent a significant challenge for structural characterization since these proteins can impact many of the processes in the structure determination pipeline. Here we investigate the effects of IDPs on the structure determination process and the utility of disorder prediction in selecting and improving proteins for structural characterization. Examination of the extent of intrinsic disorder in existing crystal structures found that relatively few protein crystal structures contain extensive regions of intrinsic disorder. Although intrinsic disorder is not the only cause of crystallization failures and many structured proteins cannot be crystallized, filtering out highly disordered proteins from structure-determination target lists is still likely to be cost effective. Therefore it is desirable to avoid highly disordered proteins from structure-determination target lists and we show that disorder prediction can be applied effectively to enrich structure determination pipelines with proteins more likely to yield crystal structures. For structural investigation of specific proteins, disorder prediction can be used to improve targets for structure determination. Finally, a framework for considering intrinsic disorder in the structure determination pipeline is proposed.  相似文献   

6.
Membrane proteins comprise up to one-third of prokaryotic and eukaryotic genomes, but only a very small number of membrane protein structures are known. Membrane proteins are challenging targets for structural biology, primarily due to the difficulty in producing and purifying milligram quantities of these proteins. We are evaluating different methods to produce and purify large numbers of prokaryotic membrane proteins for subsequent structural and functional analysis. Here, we present the comparative expression data for 37 target proteins, all of them secondary transporters, from the mesophilic organism Salmonella typhimurium and the two hyperthermophilic organisms Aquifex aeolicus and Pyrococcus furiosus in three different Escherichia coli expression vectors. In addition, we study the use of Lactococcus lactis as a host for integral membrane protein expression. Overall, 78% of the targets were successfully produced under at least one set of conditions. Analysis of these results allows us to assess the role of different variables in increasing "expression space" coverage for our set of targets. This analysis implies that to maximize the number of nonhomologous targets that are expressed, orthologous targets should be chosen and tested in two vectors with different types of promoters, using C-terminal tags. In addition, E. coli is shown to be a robust host for the expression of prokaryotic transporters, and is superior to L. lactis. These results therefore suggest appropriate strategies for high-throughput heterologous overproduction of membrane proteins.  相似文献   

7.
The process of experimental determination of protein structure is marred with a high ratio of failures at many stages. With availability of large quantities of data from high-throughput structure determination in structural genomics centers, we can now learn to recognize protein features correlated with failures; thus, we can recognize proteins more likely to succeed and eventually learn how to modify those that are less likely to succeed. Here, we identify several protein features that correlate strongly with successful protein production and crystallization and combine them into a single score that assesses "crystallization feasibility." The formula derived here was tested with a jackknife procedure and validated on independent benchmark sets. The "crystallization feasibility" score described here is being applied to target selection in the Joint Center for Structural Genomics, and is now contributing to increasing the success rate, lowering the costs, and shortening the time for protein structure determination. Analyses of PDB depositions suggest that very similar features also play a role in non-high-throughput structure determination, suggesting that this crystallization feasibility score would also be of significant interest to structural biology, as well as to molecular and biochemistry laboratories.  相似文献   

8.
The Center for Eukaryotic Structural Genomics (CESG), as part of the Protein Structure Initiative (PSI), has established a high-throughput structure determination pipeline focused on eukaryotic proteins. NMR spectroscopy is an integral part of this pipeline, both as a method for structure determinations and as a means for screening proteins for stable structure. Because computational approaches have estimated that many eukaryotic proteins are highly disordered, about 1 year into the project, CESG began to use an algorithm (the Predictor of Naturally Disordered Regions, PONDR to avoid proteins that were likely to be disordered. We report a retrospective analysis of the effect of this filtering on the yield of viable structure determination candidates. In addition, we have used our current database of results on 70 protein targets from Arabidopsis thaliana and 1 from Caenorhabditis elegans, which were labeled uniformly with nitrogen-15 and screened for disorder by NMR spectroscopy, to compare the original algorithm with 13 other approaches for predicting disorder from sequence. Our study indicates that the efficiency of structural proteomics of eukaryotes can be improved significantly by removing targets predicted to be disordered by an algorithm chosen to provide optimal performance.  相似文献   

9.
The cry gene family, produced during the late exponential phase of growth in Bacillus thuringiensis, is a large, still-growing family of homologous genes, in which each gene encodes a protein with strong specific activity against only one or a few insect species. Extensive studies are mostly focusing on the structural and functional relationships of Cry proteins, and have revealed several residues or domains that are important for the target recognition and receptor attachment. In this study, we have employed a maximum likelihood method to detect evidence of adaptive evolution in Cry proteins, and have identified 24 positively selected residues, which are all located in Domain Ⅱ or Ⅲ. Combined with known data from mutagenesis studies, the majority of these residues, at the molecular level, contribute much to the insect specificity determination. We postulate that the potential pressures driving the diversification of Cry proteins may be in an attempt to adapt for the "arm race" between δ-endotoxins and the targeted insects, or to enlarge their target spectra, hence result in the functional divergence. The sites identified to be under positive selection would provide targets for further structural and functional analyses on Cry proteins.  相似文献   

10.
Chandonia JM  Kim SH  Brenner SE 《Proteins》2006,62(2):356-370
At the Berkeley Structural Genomics Center (BSGC), our goal is to obtain a near-complete structural complement of proteins in the minimal organisms Mycoplasma genitalium and M. pneumoniae, two closely related pathogens. Current targets for structure determination have been selected in six major stages, starting with those predicted to be most tractable to high throughput study and likely to yield new structural information. We report on the process used to select these proteins, as well as our target deselection procedure. Target deselection reduces experimental effort by eliminating targets similar to those recently solved by the structural biology community or other centers. We measure the impact of the 69 structures solved at the BSGC as of July 2004 on structure prediction coverage of the M. pneumoniae and M. genitalium proteomes. The number of Mycoplasma proteins for which the fold could first be reliably assigned based on structures solved at the BSGC (24 M. pneumoniae and 21 M. genitalium) is approximately 25% of the total resulting from work at all structural genomics centers and the worldwide structural biology community (94 M. pneumoniae and 86 M. genitalium) during the same period. As the number of structures contributed by the BSGC during that period is less than 1% of the total worldwide output, the benefits of a focused target selection strategy are apparent. If the structures of all current targets were solved, the percentage of M. pneumoniae proteins for which folds could be reliably assigned would increase from approximately 57% (391 of 687) at present to around 80% (550 of 687), and the percentage of the proteome that could be accurately modeled would increase from around 37% (254 of 687) to about 64% (438 of 687). In M. genitalium, the percentage of the proteome that could be structurally annotated based on structures of our remaining targets would rise from 72% (348 of 486) to around 76% (371 of 486), with the percentage of accurately modeled proteins would rise from 50% (243 of 486) to 58% (283 of 486). Sequences and data on experimental progress on our targets are available in the public databases TargetDB and PEPCdb.  相似文献   

11.
Structural genomics (or proteomics) activities are critically dependent on the availability of high-throughput structure determination methodology. Development of such methodology has been a particular challenge for NMR based structure determination because of the demands for isotopic labeling of proteins and the requirements for very long data acquisition times. We present here a methodology that gains efficiency from a focus on determination of backbone structures of proteins as opposed to full structures with all sidechains in place. This focus is appropriate given the presumption that many protein structures in the future will be built using computational methods that start from representative fold family structures and replace as many as 70% of the sidechains in the course of structure determination. The methodology we present is based primarily on residual dipolar couplings (RDCs), readily accessible NMR observables that constrain the orientation of backbone fragments irrespective of separation in space. A new software tool is described for the assembly of backbone fragments under RDC constraints and an application to a structural genomics target is presented. The target is an 8.7 kDa protein from Pyrococcus furiosus, PF1061, that was previously not well annotated, and had a nearest structurally characterized neighbor with only 33% sequence identity. The structure produced shows structural similarity to this sequence homologue, but also shows similarity to other proteins, which suggests a functional role in sulfur transfer. Given the backbone structure and a possible functional link this should be an ideal target for development of modeling methods.  相似文献   

12.
Recent years have seen the establishment of structural genomics centers that explicitly target integral membrane proteins. Here, we review the advances in targeting these extremely high-hanging fruits of structural biology in high-throughput mode. We observe that the experimental determination of high-resolution structures of integral membrane proteins is increasingly successful both in terms of getting structures and of covering important protein families, for example, from Pfam. Structural genomics has begun to contribute significantly toward this progress. An important component of this contribution is the set up of robotic pipelines that generate a wealth of experimental data for membrane proteins. We argue that prediction methods for the identification of membrane regions and for the comparison of membrane proteins largely suffice to meet the challenges of target selection for structural genomics of membrane proteins. In contrast, we need better methods to prioritize the most promising members in a family of closely related proteins and to annotate protein function from sequence and structure in absence of homology.  相似文献   

13.
The dramatically increasing number of new protein sequences arising from genomics 4 proteomics requires the need for methods to rapidly and reliably infer the molecular and cellular functions of these proteins. One such approach, structural genomics, aims to delineate the total repertoire of protein folds in nature, thereby providing three-dimensional folding patterns for all proteins and to infer molecular functions of the proteins based on the combined information of structures and sequences. The goal of obtaining protein structures on a genomic scale has motivated the development of high throughput technologies and protocols for macromolecular structure determination that have begun to produce structures at a greater rate than previously possible. These new structures have revealed many unexpected functional inferences and evolutionary relationships that were hidden at the sequence level. Here, we present samples of structures determined at Berkeley Structural Genomics Center and collaborators laboratories to illustrate how structural information provides and complements sequence information to deduce the functional inferences of proteins with unknown molecular functions.Two of the major premises of structural genomics are to discover a complete repertoire of protein folds in nature and to find molecular functions of the proteins whose functions are not predicted from sequence comparison alone. To achieve these objectives on a genomic scale, new methods, protocols, and technologies need to be developed by multi-institutional collaborations worldwide. As part of this effort, the Protein Structure Initiative has been launched in the United States (PSI; www.nigms.nih.gov/funding/psi.html). Although infrastructure building and technology development are still the main focus of structural genomics programs [1–6], a considerable number of protein structures have already been produced, some of them coming directly out of semi-automated structure determination pipelines [6–10]. The Berkeley Structural Genomics Center (BSGC) has focused on the proteins of Mycoplasma or their homologues from other organisms as its structural genomics targets because of the minimal genome size of the Mycoplasmas as well as their relevance to human and animal pathogenicity (http://www.strgen.org). Here we present several protein examples encompassing a spectrum of functional inferences obtainable from their three-dimensional structures in five situations, where the inferences are new and testable, and are not predictable from protein sequence information alone.  相似文献   

14.
Structural genomics projects represent major undertakings that will change our understanding of proteins. They generate unique datasets that, for the first time, present a standardized view of proteins in terms of their physical and chemical properties. By analyzing these datasets here, we are able to discover correlations between a protein's characteristics and its progress through each stage of the structural genomics pipeline, from cloning, expression, purification, and ultimately to structural determination. First, we use tree-based analyses (decision trees and random forest algorithms) to discover the most significant protein features that influence a protein's amenability to high-throughput experimentation. Based on this, we identify potential bottlenecks in various stages of the structural genomics process through specialized "pipeline schematics". We find that the properties of a protein that are most significant are: (i.) whether it is conserved across many organisms; (ii). the percentage composition of charged residues; (iii). the occurrence of hydrophobic patches; (iv). the number of binding partners it has; and (v). its length. Conversely, a number of other properties that might have been thought to be important, such as nuclear localization signals, are not significant. Thus, using our tree-based analyses, we are able to identify combinations of features that best differentiate the small group of proteins for which a structure has been determined from all the currently selected targets. This information may prove useful in optimizing high-throughput experimentation. Further information is available from http://mining.nesg.org/.  相似文献   

15.
Structural genomics (or proteomics) activities are critically dependent on the availability of high-throughput structure determination methodology. Development of such methodology has been a particular challenge for NMR based structure determination because of the demands for isotopic labeling of proteins and the requirements for very long data acquisition times. We present here a methodology that gains efficiency from a focus on determination of backbone structures of proteins as opposed to full structures with all sidechains in place. This focus is appropriate given the presumption that many protein structures in the future will be built using computational methods that start from representative fold family structures and replace as many as 70% of the sidechains in the course of structure determination. The methodology we present is based primarily on residual dipolar couplings (RDCs), readily accessible NMR observables that constrain the orientation of backbone fragments irrespective of separation in space. A new software tool is described for the assembly of backbone fragments under RDC constraints and an application to a structural genomics target is presented. The target is an 8.7 kDa protein from Pyrococcus furiosus, PF1061, that was previously not well annotated, and had a nearest structurally characterized neighbor with only 33% sequence identity. The structure produced shows structural similarity to this sequence homologue, but also shows similarity to other proteins, which suggests a functional role in sulfur transfer. Given the backbone structure and a possible functional link this should be an ideal target for development of modeling methods. This revised version was published online in March 2005 with corrections to the references.  相似文献   

16.
Structural genomics efforts at the Chinese Academy of Sciences and Peking University are reported in this article. The major targets for the structural genomics project are targeted proteins expressed in human hematopoietic stem/progenitor cells, proteins related to blood diseases and other human proteins. Up to now 328 target genes have been constructed in expression vectors. Among them, more than 50% genes have been expressed in Escherichia coli, approximately 25% of the resulting proteins are soluble, and 35 proteins have been purified. Crystallization, data collection and structure determination are continuing. Experiences accumulated during this initial stage are useful for designing and applying high-throughput approaches in structural genomics.Abbreviations: NSFC, National Natural Science foundation of China; MOST, Ministry of Science and Technology of China; CAS, Chinese Academy of Sciences; NSRL, National Synchrotron Radiation Laboratory in Hefei; BSRF, Beijing Synchrotron Radiation Facilities; HSPC, Hematopoietic stem/progenitor cells; APL, acute promyelocytic leukemia; ATRA, all-trans retinoic acid; COG, Cluster of Orthologous Groups of proteins.  相似文献   

17.
Gram-positive bacterium Streptococcus mutans is the primary causative agent of human dental caries. To better understand this pathogen at the atomic structure level and to establish potential drug and vaccine targets, we have carried out structural genomics research since 2005. To achieve the goal, we have developed various in-house automation systems including novel high-throughput crystallization equipment and methods, based on which a large-scale, high-efficiency and low-cost platform has been establish in our laboratory. From a total of 1,963 annotated open reading frames, 1,391 non-membrane targets were selected prioritized by protein sequence similarities to unknown structures, and clustered by restriction sites to allow for cost-effective high-throughput conventional cloning. Selected proteins were over-expressed in different strains of Escherichia coli. Clones expressed soluble proteins were selected, expanded, and expressed proteins were purified and subjected to crystallization trials. Finally, protein crystals were subjected to X-ray analysis and structures were determined by crystallographic methods. Using the previously established procedures, we have so far obtained more than 200 kinds of protein crystals and 100 kinds of crystal structures involved in different biological pathways. In this paper we demonstrate and review a possibility of performing structural genomics studies at moderate laboratory scale. Furthermore, the techniques and methods developed in our study can be widely applied to conventional structural biology research practice.  相似文献   

18.

Background  

The availability of suitable recombinant protein is still a major bottleneck in protein structure analysis. The Protein Structure Factory, part of the international structural genomics initiative, targets human proteins for structure determination. It has implemented high throughput procedures for all steps from cloning to structure calculation. This article describes the selection of human target proteins for structure analysis, our high throughput cloning strategy, and the expression of human proteins in Escherichia colihost cells.  相似文献   

19.
The study of protein structure has been driven largely by the careful inspection of experimental data by human experts. However, the rapid determination of protein structures from structural-genomics projects will make it increasingly difficult to analyse (and determine the principles responsible for) the distribution of proteins in fold space by inspection alone. Here, we demonstrate a machine-learning strategy that automatically determines the structural principles describing 45 folds. The rules learnt were shown to be both statistically significant and meaningful to protein experts. With the increasing emphasis on high-throughput experimental initiatives, machine-learning and other automated methods of analysis will become increasingly important for many biological problems.  相似文献   

20.

Background

The genus Burkholderia includes pathogenic gram-negative bacteria that cause melioidosis, glanders, and pulmonary infections of patients with cancer and cystic fibrosis. Drug resistance has made development of new antimicrobials critical. Many approaches to discovering new antimicrobials, such as structure-based drug design and whole cell phenotypic screens followed by lead refinement, require high-resolution structures of proteins essential to the parasite.

Methodology/Principal Findings

We experimentally identified 406 putative essential genes in B. thailandensis, a low-virulence species phylogenetically similar to B. pseudomallei, the causative agent of melioidosis, using saturation-level transposon mutagenesis and next-generation sequencing (Tn-seq). We selected 315 protein products of these genes based on structure-determination criteria, such as excluding very large and/or integral membrane proteins, and entered them into the Seattle Structural Genomics Center for Infection Disease (SSGCID) structure determination pipeline. To maximize structural coverage of these targets, we applied an “ortholog rescue” strategy for those producing insoluble or difficult to crystallize proteins, resulting in the addition of 387 orthologs (or paralogs) from seven other Burkholderia species into the SSGCID pipeline. This structural genomics approach yielded structures from 31 putative essential targets from B. thailandensis, and 25 orthologs from other Burkholderia species, yielding an overall structural coverage for 49 of the 406 essential gene families, with a total of 88 depositions into the Protein Data Bank. Of these, 25 proteins have properties of a potential antimicrobial drug target i.e., no close human homolog, part of an essential metabolic pathway, and a deep binding pocket. We describe the structures of several potential drug targets in detail.

Conclusions/Significance

This collection of structures, solubility and experimental essentiality data provides a resource for development of drugs against infections and diseases caused by Burkholderia. All expression clones and proteins created in this study are freely available by request.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号