共查询到20条相似文献,搜索用时 15 毫秒
1.
Kim SH Shin DH Choi IG Schulze-Gahmen U Chen S Kim R 《Journal of structural and functional genomics》2003,4(2-3):129-135
The dramatically increasing number of new protein sequences arising from genomics 4 proteomics requires the need for methods to rapidly and reliably infer the molecular and cellular functions of these proteins. One such approach, structural genomics, aims to delineate the total repertoire of protein folds in nature, thereby providing three-dimensional folding patterns for all proteins and to infer molecular functions of the proteins based on the combined information of structures and sequences. The goal of obtaining protein structures on a genomic scale has motivated the development of high throughput technologies and protocols for macromolecular structure determination that have begun to produce structures at a greater rate than previously possible. These new structures have revealed many unexpected functional inferences and evolutionary relationships that were hidden at the sequence level. Here, we present samples of structures determined at Berkeley Structural Genomics Center and collaborators laboratories to illustrate how structural information provides and complements sequence information to deduce the functional inferences of proteins with unknown molecular functions.Two of the major premises of structural genomics are to discover a complete repertoire of protein folds in nature and to find molecular functions of the proteins whose functions are not predicted from sequence comparison alone. To achieve these objectives on a genomic scale, new methods, protocols, and technologies need to be developed by multi-institutional collaborations worldwide. As part of this effort, the Protein Structure Initiative has been launched in the United States (PSI; www.nigms.nih.gov/funding/psi.html). Although infrastructure building and technology development are still the main focus of structural genomics programs [1–6], a considerable number of protein structures have already been produced, some of them coming directly out of semi-automated structure determination pipelines [6–10]. The Berkeley Structural Genomics Center (BSGC) has focused on the proteins of Mycoplasma or their homologues from other organisms as its structural genomics targets because of the minimal genome size of the Mycoplasmas as well as their relevance to human and animal pathogenicity (http://www.strgen.org). Here we present several protein examples encompassing a spectrum of functional inferences obtainable from their three-dimensional structures in five situations, where the inferences are new and testable, and are not predictable from protein sequence information alone. 相似文献
2.
A major goal of structural genomics is the provision of a structural template for a large fraction of protein domains. The magnitude of this task depends on the number and nature of protein sequence families. With a large number of bacterial genomes now fully sequenced, it is possible to obtain improved estimates of the number and diversity of families in that kingdom. We have used an automated clustering procedure to group all sequences in a set of genomes into protein families. Bench-marking shows the clustering method is sensitive at detecting remote family members, and has a low level of false positives. This comprehensive protein family set has been used to address the following questions. (1) What is the structure coverage for currently known families? (2) How will the number of known apparent families grow as more genomes are sequenced? (3) What is a practical strategy for maximizing structure coverage in future? Our study indicates that approximately 20% of known families with three or more members currently have a representative structure. The study indicates also that the number of apparent protein families will be considerably larger than previously thought: We estimate that, by the criteria of this work, there will be about 250,000 protein families when 1000 microbial genomes have been sequenced. However, the vast majority of these families will be small, and it will be possible to obtain structural templates for 70-80% of protein domains with an achievable number of representative structures, by systematically sampling the larger families. 相似文献
3.
Structural similarity to link sequence space: new potential superfamilies and implications for structural genomics
下载免费PDF全文

Aloy P Oliva B Querol E Aviles FX Russell RB 《Protein science : a publication of the Protein Society》2002,11(5):1101-1116
The current pace of structural biology now means that protein three-dimensional structure can be known before protein function, making methods for assigning homology via structure comparison of growing importance. Previous research has suggested that sequence similarity after structure-based alignment is one of the best discriminators of homology and often functional similarity. Here, we exploit this observation, together with a merger of protein structure and sequence databases, to predict distant homologous relationships. We use the Structural Classification of Proteins (SCOP) database to link sequence alignments from the SMART and Pfam databases. We thus provide new alignments that could not be constructed easily in the absence of known three-dimensional structures. We then extend the method of Murzin (1993b) to assign statistical significance to sequence identities found after structural alignment and thus suggest the best link between diverse sequence families. We find that several distantly related protein sequence families can be linked with confidence, showing the approach to be a means for inferring homologous relationships and thus possible functions when proteins are of known structure but of unknown function. The analysis also finds several new potential superfamilies, where inspection of the associated alignments and superimpositions reveals conservation of unusual structural features or co-location of conserved amino acids and bound substrates. We discuss implications for Structural Genomics initiatives and for improvements to sequence comparison methods. 相似文献
4.
在后基因组时代,随着大量物种全基因组序列的获得,结构生物学家面临着结构基因组学的新机遇和挑战。与传统的结构生物学不同的是,结构基因组学的研究主要集中在结构和功能未知并且与从前研究的蛋白质相似性很小的蛋白质。准确的来讲,结构基因组学通过高通量蛋白质表达、结构解析来完成所有蛋白质家族的结构表征,从而能够通过结构预测功能。加州结构基因组学联合实验室发展了高度自动化的蛋白质合成、结晶、结构解析生产线。然而由于一些蛋白质不能被结晶,要想覆盖所有蛋白质结构域还有很大困难。Wuthrich的研究小组通过一些高通量的目的蛋白质筛选和NMR结构解析的方法解决了这一难题。与X射线晶体学解析蛋白质结构相比,NMR技术由于能够解析更接近生理状态的溶液结构而具有互补性。通过获得溶液中的蛋白质稳定性、动力学特征和相互作用信息,正如在朊蛋白和SARS相关蛋白的研究中所表现的那样,NMR技术从扩大已知的蛋白质结构数据库、新的蛋白质功能到化学生物学研究中都扮演着激动人心的角色。 相似文献
5.
Lundstrom K 《Molecular biotechnology》2006,34(2):205-212
Structural genomics can be defined as structural biology on a large number of target proteins in parallel. This approach plays
an important role in modern structure-based drug design. Although a number of structural genomics initiatives have been initiated,
relatively few are associated with integral membrane proteins. This indicates the difficulties in expression, purification,
and crystallization of membrane proteins, which has also been confirmed by the existence of some 100 high-resolution structures
of membrane proteins among the more than 30,000 entries in public databases. Paradoxically, membrane proteins represent 60–70%
of current drug targets and structural knowledge could both improve and speed up the drug discovery process. In order to improve
the sucess rate for structure resolution of membrane proteins structural genomics networks have been established. 相似文献
6.
Almo SC Bonanno JB Sauder JM Emtage S Dilorenzo TP Malashkevich V Wasserman SR Swaminathan S Eswaramoorthy S Agarwal R Kumaran D Madegowda M Ragumani S Patskovsky Y Alvarado J Ramagopal UA Faber-Barata J Chance MR Sali A Fiser A Zhang ZY Lawrence DS Burley SK 《Journal of structural and functional genomics》2007,8(2-3):121-140
7.
8.
As the number of complete genomes that have been sequenced keeps growing, unknown areas of the protein space are revealed and new horizons open up. Most of this information will be fully appreciated only when the structural information about the encoded proteins becomes available. The goal of structural genomics is to direct large-scale efforts of protein structure determination, so as to increase the impact of these efforts. This review focuses on current approaches in structural genomics aimed at selecting representative proteins as targets for structure determination. We will discuss the concept of representative structures/folds, the current methodologies for identifying those proteins, and computational techniques for identifying proteins which are expected to adopt new structural folds. 相似文献
9.
We have developed and tested a simple and efficient protein purification method for biophysical screening of proteins and protein fragments by nuclear magnetic resonance (NMR) and optical methods, such as circular dichroism spectroscopy. The method constitutes an extension of previously described protocols for gene expression and protein solubility screening [M. Hammarstr?m et al., (2002), Protein Science 11, 313]. Using the present purification scheme it is possible to take several target proteins, produced as fusion proteins, from cell pellet to NMR spectrum and obtain a judgment on the suitability for further structural or biophysical studies in less than 1 day. The method is independent of individual protein properties as long as the target protein can be produced in soluble form with a fusion partner. Identical procedures for cell culturing, lysis, affinity chromatography, protease cleavage, and NMR sample preparation then initially require only optimization for different fusion partner and protease combinations. The purification method can be automated, scaled up or down, and extended to a traditional purification scheme. We have tested the method on several small human proteins produced in Escherichia coli and find that the method allows for detection of structured proteins and unfolded or molten globule-like proteins. 相似文献
10.
Baker EN 《Journal of structural and functional genomics》2007,8(2-3):57-65
Tuberculosis (TB) is a devastating disease of worldwide importance. The availability of the genome sequence of Mycobacterium tuberculosis (Mtb), the causative agent, has stimulated a large variety of genome-scale initiatives. These include international structural
genomics efforts which have the dual aim of characterising potential new drug targets and addressing key aspects of the biology
of Mtb. This review highlights the various ways in which structural analysis has illuminated the biological activities of Mtb gene products, which were previously of unknown or uncertain function. Key information comes from the protein fold, from
bound ligands, solvent molecules, ions etc. or from unexpectedly modified amino acid residues. Most importantly, the three
dimensional structure of a protein permits the integration of data from many sources, both bioinformatic and experimental,
to develop testable functional hypotheses. This has led to many new insights into TB biology. 相似文献
11.
Phillips GN Fox BG Markley JL Volkman BF Bae E Bitto E Bingman CA Frederick RO McCoy JG Lytle BL Pierce BS Song J Twigger SN 《Journal of structural and functional genomics》2007,8(2-3):73-84
The Center for Eukaryotic Structural Genomics (CESG) produces and solves the structures of proteins from eukaryotes. We have
developed and operate a pipeline to both solve structures and to test new methodologies. Both NMR and X-ray crystallography
methods are used for structure solution. CESG chooses targets based on sequence dissimilarity to known structures, medical
relevance, and nominations from members of the scientific community. Many times proteins qualify in more than one of these
categories. Here we review some of the structures that have connections to human health and disease. 相似文献
12.
Target selection and ranking is fundamental to structural genomics. We present a Z-score scale, the "OB-Score", to rank potential targets by their predicted propensity to produce diffraction-quality crystals. The OB-Score is derived from a matrix of predicted isoelectric point and hydrophobicity values for nonredundant PDB entries solved to or=1 member with a high OB-Score, presenting favourable candidates for structural studies. 相似文献
13.
14.
This 'Perspective' bears on the present state of protein structure determination by NMR in solution. The focus is on a comparison of the infrastructure available for NMR structure determination when compared to protein crystal structure determination by X-ray diffraction. The main conclusion emerges that the unique potential of NMR to generate high resolution data also on dynamics, interactions and conformational equilibria has contributed to a lack of standard procedures for structure determination which would be readily amenable to improved efficiency by automation. To spark renewed discussion on the topic of NMR structure determination of proteins, procedural steps with high potential for improvement are identified. 相似文献
15.
Gaetano T. Montelione Cheryl Arrowsmith Mark E. Girvin Michael A. Kennedy John L. Markley Robert Powers James H. Prestegard Thomas Szyperski 《Journal of structural and functional genomics》2009,10(2):101-106
This Perspective, arising from a workshop held in July 2008 in Buffalo NY, provides an overview of the role NMR has played
in the United States Protein Structure Initiative (PSI), and a vision of how NMR will contribute to the forthcoming PSI-Biology
program. NMR has contributed in key ways to structure production by the PSI, and new methods have been developed which are
impacting the broader protein NMR community. 相似文献
16.
Challenges at the frontiers of structural biology 总被引:2,自引:0,他引:2
Knowledge of the three-dimensional structures of proteins is the key to unlocking the full potential of genomic information. There are two distinct directions along which cutting-edge research in structural biology is currently moving towards this goal. On the one hand, tightly focused long-term research in individual laboratories is leading to the determination of the structures of macromolecular assemblies of ever-increasing size and complexity. On the other hand, large consortia of structural biologists, inspired by the pace of genome sequencing, are developing strategies to determine new protein structures rapidly, so that it will soon be possible to predict reasonably accurate structures for most protein domains. We anticipate that a small number of complex systems, studied in depth, will provide insights across the field of biology with the aid of genome-based comparative structural analysis. 相似文献
17.
Ursula Pieper Ranyee Chiang Jennifer J. Seffernick Shoshana D. Brown Margaret E. Glasner Libusha Kelly Narayanan Eswar J. Michael Sauder Jeffrey B. Bonanno Subramanyam Swaminathan Stephen K. Burley Xiaojing Zheng Mark R. Chance Steven C. Almo John A. Gerlt Frank M. Raushel Matthew P. Jacobson Patricia C. Babbitt Andrej Sali 《Journal of structural and functional genomics》2009,10(2):107-125
To study the substrate specificity of enzymes, we use the amidohydrolase and enolase superfamilies as model systems; members
of these superfamilies share a common TIM barrel fold and catalyze a wide range of chemical reactions. Here, we describe a
collaboration between the Enzyme Specificity Consortium (ENSPEC) and the New York SGX Research Center for Structural Genomics
(NYSGXRC) that aims to maximize the structural coverage of the amidohydrolase and enolase superfamilies. Using sequence- and
structure-based protein comparisons, we first selected 535 target proteins from a variety of genomes for high-throughput structure
determination by X-ray crystallography; 63 of these targets were not previously annotated as superfamily members. To date,
20 unique amidohydrolase and 41 unique enolase structures have been determined, increasing the fraction of sequences in the
two superfamilies that can be modeled based on at least 30% sequence identity from 45% to 73%. We present case studies of
proteins related to uronate isomerase (an amidohydrolase superfamily member) and mandelate racemase (an enolase superfamily
member), to illustrate how this structure-focused approach can be used to generate hypotheses about sequence–structure–function
relationships.
Electronic supplementary material The online version of this article (doi:) contains supplementary material, which is available to authorized users.
相似文献
Andrej Sali (Corresponding author)Email: URL: http://salilab.org |
18.
Lundstrom K Wagner R Reinhart C Desmyter A Cherouati N Magnin T Zeder-Lutz G Courtot M Prual C André N Hassaine G Michel H Cambillau C Pattus F 《Journal of structural and functional genomics》2006,7(2):77-91
Production of recombinant receptors has been one of the major bottlenecks in structural biology on G protein-coupled receptors
(GPCRs). The MePNet (Membrane Protein Network) was established to overexpress a large number of GPCRs in three major expression
systems, based on Escherichia coli, Pichia pastoris and Semliki Forest virus (SFV) vectors. Evaluation by immunodetection demonstrated that 50% of a total of 103 GPCRs were
expressed in bacterial inclusion bodies, 94% in yeast cell membranes and 95% in SFV-infected mammalian cells. The expression
levels varied from low to high and the various GPCR families and subtypes were analyzed for their expressability in each expression
system. More than 60% of the GPCRs were expressed at milligram levels or higher in one or several systems, compatible to structural
biology applications. Functional activity was determined by binding assays in yeast and mammalian cells and the correlation
between immunodetection and binding activity was analyzed. 相似文献
19.
The advent of the complete genome sequences of various organisms in the mid-1990s raised the issue of how one could determine
the function of hypothetical proteins. While insight might be obtained from a 3D structure, the chances of being able to predict
such a structure is limited for the deduced amino acid sequence of any uncharacterized gene. A template for modeling is required,
but there was only a low probability of finding a protein closely-related in sequence with an available structure. Thus, in
the late 1990s, an international effort known as structural genomics (SG) was initiated, its primary goal to “fill sequence-structure
space” by determining the 3D structures of representatives of all known protein families. This was to be achieved mainly by
X-ray crystallography and it was estimated that at least 5,000 new structures would be required. While the proteins (genes)
for SG have subsequently been derived from hundreds of different organisms, extremophiles and particularly thermophiles have
been specifically targeted due to the increased stability and ease of handling of their proteins, relative to those from mesophiles.
This review summarizes the significant impact that extremophiles and proteins derived from them have had on SG projects worldwide.
To what extent SG has influenced the field of extremophile research is also discussed. 相似文献
20.
The explosion in gene sequence data and technological breakthroughs in protein structure determination inspired the launch of structural genomics (SG) initiatives. An often stated goal of structural genomics is the high-throughput structural characterisation of all protein sequence families, with the long-term hope of significantly impacting on the life sciences, biotechnology and drug discovery. Here, we present a comprehensive analysis of solved SG targets to assess progress of these initiatives. Eleven consortia have contributed 316 non-redundant entries and 323 protein chains to the Protein Data Bank (PDB), and 459 and 393 domains to the CATH and SCOP structure classifications, respectively. The quality and size of these proteins are comparable to those solved in traditional structural biology and, despite huge scope for duplicated efforts, only 14% of targets have a close homologue (>/=30% sequence identity) solved by another consortium. Analysis of CATH and SCOP revealed the significant contribution that structural genomics is making to the coverage of superfamilies and folds. A total of 67% of SG domains in CATH are unique, lacking an already characterised close homologue in the PDB, whereas only 21% of non-SG domains are unique. For 29% of domains, structure determination revealed a remote evolutionary relationship not apparent from sequence, and 19% and 11% contributed new superfamilies and folds. The secondary structure class, fold and superfamily distributions of this dataset reflect those of the genomes. The domains fall into 172 different folds and 259 superfamilies in CATH but the distribution is highly skewed. The most populous of these are those that recur most frequently in the genomes. Whilst 11% of superfamilies are bacteria-specific, most are common to all three superkingdoms of life and together the 316 PDB entries have provided new and reliable homology models for 9287 non-redundant gene sequences in 206 completely sequenced genomes. From the perspective of this analysis, it appears that structural genomics is on track to be a success, and it is hoped that this work will inform future directions of the field. 相似文献