首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A number of structural genomics/proteomics initiatives are focused on bacterial or viral pathogens. In this article, we will review the progress of structural proteomics initiatives targeting the SARS coronavirus (SARS-CoV), the etiological agent of the 2003 worldwide epidemic that culminated in approximately 8,000 cases and 800 deaths. The SARS-CoV genome encodes 28 proteins in three distinct classes, many of them with unknown function and sharing low similarity to other proteins. The structures of 16 SARS-CoV proteins or functional domains have been determined to date. Remarkably, eight of these 16 proteins or functional domains have novel folds, indicating the uniqueness of the coronavirus proteins. The results of SARS-CoV structural proteomics initiatives will have several profound biological impacts, including elucidation of the structure-function relationships of coronavirus proteins; identification of targets for the design of anti-viral compounds against SARS-CoV and other coronaviruses; and addition of new protein folds to the fold space, with further understanding of the structure-function relationships for several new protein families. We discuss the use of structural proteomics in response to emerging infectious diseases such as SARS-CoV and to increase preparedness against future emerging coronaviruses.  相似文献   

2.
The explosion in gene sequence data and technological breakthroughs in protein structure determination inspired the launch of structural genomics (SG) initiatives. An often stated goal of structural genomics is the high-throughput structural characterisation of all protein sequence families, with the long-term hope of significantly impacting on the life sciences, biotechnology and drug discovery. Here, we present a comprehensive analysis of solved SG targets to assess progress of these initiatives. Eleven consortia have contributed 316 non-redundant entries and 323 protein chains to the Protein Data Bank (PDB), and 459 and 393 domains to the CATH and SCOP structure classifications, respectively. The quality and size of these proteins are comparable to those solved in traditional structural biology and, despite huge scope for duplicated efforts, only 14% of targets have a close homologue (>/=30% sequence identity) solved by another consortium. Analysis of CATH and SCOP revealed the significant contribution that structural genomics is making to the coverage of superfamilies and folds. A total of 67% of SG domains in CATH are unique, lacking an already characterised close homologue in the PDB, whereas only 21% of non-SG domains are unique. For 29% of domains, structure determination revealed a remote evolutionary relationship not apparent from sequence, and 19% and 11% contributed new superfamilies and folds. The secondary structure class, fold and superfamily distributions of this dataset reflect those of the genomes. The domains fall into 172 different folds and 259 superfamilies in CATH but the distribution is highly skewed. The most populous of these are those that recur most frequently in the genomes. Whilst 11% of superfamilies are bacteria-specific, most are common to all three superkingdoms of life and together the 316 PDB entries have provided new and reliable homology models for 9287 non-redundant gene sequences in 206 completely sequenced genomes. From the perspective of this analysis, it appears that structural genomics is on track to be a success, and it is hoped that this work will inform future directions of the field.  相似文献   

3.

Background  

Structural genomics initiatives were established with the aim of solving protein structures on a large-scale. For many initiatives, such as the Protein Structure Initiative (PSI), the primary aim of target selection is focussed towards structurally characterising protein families which, so far, lack a structural representative. It is therefore of considerable interest to gain insights into the number and distribution of these families, and what efforts may be required to achieve a comprehensive structural coverage across all protein families.  相似文献   

4.
The structural genomics initiatives have begun with the aim to create a so-called "basic set library" of protein folds that will be used to improve protein prediction methods. Such a library is thought to require the determination of up to 10,000 new structures, including representative structures of several sequence variants from each protein fold. To meet this goal in a reasonable time frame and cost, automated systems must be utilized to clone and to identify the soluble recombinant proteins contained in multiple genomes. This paper presents such a system, developed using the genome of Caenorhabditis elegans (19,099 genes) as a model eukaryotic organism for structural genomics. This system successfully automates nearly all aspects of recombinant protein expression analysis including subcloning, bacterial growth, recombinant protein expression, protein purification, and scoring protein solubility.  相似文献   

5.
One hundred years ago, we knew very little about biological macromolecules and had no tools available to study their structure. Structural biology is now a mature science. New structures are being solved at an ever-increasing rate and there are important new initiatives to determine all the protein folds that are used by biological systems (structural genomics). This article traces some of the key developments in the field.  相似文献   

6.
Exploring the structure and function paradigm   总被引:3,自引:3,他引:0  
Advances in protein structure determination, led by the structural genomics initiatives have increased the proportion of novel folds deposited in the Protein Data Bank. However, these structures are often not accompanied by functional annotations with experimental confirmation. In this review, we reassess the meaning of structural novelty and examine its relevance to the complexity of the structure-function paradigm. Recent advances in the prediction of protein function from structure are discussed, as well as new sequence-based methods for partitioning large, diverse superfamilies into biologically meaningful clusters. Obtaining structural data for these functionally coherent groups of proteins will allow us to better understand the relationship between structure and function.  相似文献   

7.
From protein structure to function.   总被引:6,自引:0,他引:6  
Several databases of protein structural families now exist-organised according to both evolutionary relationships and common folding arrangements. Although these lag behind sequence databases in size, the prospect of structural genomics initiatives means that they may soon include representatives of many of the sequence families. To some extent, functional information can be derived from structural similarity. For some structural families, their function is highly conserved, whereas, for others, it can only be inherited or derived on the basis of additional information (e.g. sequence patterns, common residue clusters and characteristic surface properties).  相似文献   

8.
Assigning function to structures is an important aspect of structural genomics projects, since they frequently provide structures for uncharacterized proteins. Similarities uncovered by structure alignment can suggest a similar function, even in the absence of sequence similarity. For proteins adopting novel folds or those with many functions, this strategy can fail, but functional clues can still come from comparison of local functional sites involving a few key residues. Here we assess the general applicability of functional site comparison through the study of 157 proteins solved by structural genomics initiatives. For 17, the method bolsters confidence in predictions made based on overall fold similarity. For another 12 with new folds, it suggests functions, including a putative phosphotyrosine binding site in the Archaeal protein Mth1187 and an active site for a ribose isomerase. The approach is applied weekly to all new structures, providing a resource for those interested in using structure to infer function.  相似文献   

9.
Garma L  Mukherjee S  Mitra P  Zhang Y 《PloS one》2012,7(6):e38913
"Protein quaternary structure universe" refers to the ensemble of all protein-protein complexes across all organisms in nature. The number of quaternary folds thus corresponds to the number of ways proteins physically interact with other proteins. This study focuses on answering two basic questions: Whether the number of protein-protein interactions is limited and, if yes, how many different quaternary folds exist in nature. By all-to-all sequence and structure comparisons, we grouped the protein complexes in the protein data bank (PDB) into 3,629 families and 1,761 folds. A statistical model was introduced to obtain the quantitative relation between the numbers of quaternary families and quaternary folds in nature. The total number of possible protein-protein interactions was estimated around 4,000, which indicates that the current protein repository contains only 42% of quaternary folds in nature and a full coverage needs approximately a quarter century of experimental effort. The results have important implications to the protein complex structural modeling and the structure genomics of protein-protein interactions.  相似文献   

10.
Structural genomics initiatives aim to create a library of all existing protein folds. We take a look at the progress that has been made and what more needs to be done.  相似文献   

11.
Practical lessons from protein structure prediction   总被引:9,自引:0,他引:9       下载免费PDF全文
Despite recent efforts to develop automated protein structure determination protocols, structural genomics projects are slow in generating fold assignments for complete proteomes, and spatial structures remain unknown for many protein families. Alternative cheap and fast methods to assign folds using prediction algorithms continue to provide valuable structural information for many proteins. The development of high-quality prediction methods has been boosted in the last years by objective community-wide assessment experiments. This paper gives an overview of the currently available practical approaches to protein structure prediction capable of generating accurate fold assignment. Recent advances in assessment of the prediction quality are also discussed.  相似文献   

12.
13.
Protein translations of over 100 complete genomes are now available. About half of these sequences can be provided with structural annotation, thereby enabling some profound insights into protein and pathway evolution. Whereas the major domain structure families are common to all kingdoms of life, these are combined in different ways in multidomain proteins to give various domain architectures that are specific to kingdoms or individual genomes, and contribute to the diverse phenotypes observed. These data argue for more targets in structural genomics initiatives and particularly for the selection of different domain architectures to gain better insights into protein functions.  相似文献   

14.
As the number of complete genomes that have been sequenced keeps growing, unknown areas of the protein space are revealed and new horizons open up. Most of this information will be fully appreciated only when the structural information about the encoded proteins becomes available. The goal of structural genomics is to direct large-scale efforts of protein structure determination, so as to increase the impact of these efforts. This review focuses on current approaches in structural genomics aimed at selecting representative proteins as targets for structure determination. We will discuss the concept of representative structures/folds, the current methodologies for identifying those proteins, and computational techniques for identifying proteins which are expected to adopt new structural folds.  相似文献   

15.
The major challenge for post-genomic research is to functionally assign and validate a large number of novel target genes and their corresponding proteins. Functional genomics approaches have, therefore, gained considerable attention in the quest to convert this massive data set into useful information. One of the crucial components for the functional understanding of unassigned proteins is the analysis of their experimental or modeled 3D structures. Structural proteomics initiatives are generating protein structures at an unprecedented rate but our current knowledge of 3D-structural space is still limited. Estimates on the completeness of the 3D-structural coverage of proteins vary but it is generally accepted that only a minority of the structural proteome has a template structure from which reliable conclusions can be drawn. Thus, structural proteomics has set out to build a map of protein structures that will represent all protein folds included in the 'global proteome'.  相似文献   

16.
Following the complete genome sequencing of an increasing number of organisms, structural biology is engaging in a systematic approach of high-throughput structure determination called structural genomics to create a complete inventory of protein folds/structures that will help predict functions for all proteins. First results show that structural genomics will be highly effective in finding functional annotations for proteins of unknown function.  相似文献   

17.
Over the past five years, genomics has had a major impact on Mycobacterium tuberculosis research. With the publication of the sequences of two virulent strains (H37Rv and CDC1551) and three closely related sequences, M. tuberculosis is becoming a model system for proteomics and structural genomics initiatives. Together with the promise of structures of proteins with novel folds, high-resolution structures of drug targets are providing the basis for rational inhibitor design, with the goal of the development of novel anti-tuberculars. In addition, this work is aiding scientists in the quest for an effective vaccine against this persistent pathogen.  相似文献   

18.
Availability of the human genome data has enabled the exploration of a huge amount of biological information encoded in it. There are extensive ongoing experimental efforts to understand the biological functions of the gene products encoded in the human genome. However, computational analysis can aid immensely in the interpretation of biological function by associating known functional/structural domains to the human proteins. In this article we have discussed the implications of such associations. The association of structural domains to human proteins could help in prioritizing the targets for structure determination in the structural genomics initiatives. The protein kinase family is one of the most frequently occurring protein domain families in the human proteome while P-loop hydrolase, which comprises many GTPases and ATPases, is a highly represented superfamily. Using the superfamily relationships between families of unknown and known structures we could increase structural information content of the human genome by about 5%. We could also make new associations of domain families to 33 human proteins that are potentially linked to genetically inherited diseases.  相似文献   

19.
Recent progress in structure determination techniques has led to a significant growth in the number of known membrane protein structures, and the first structural genomics projects focusing on membrane proteins have been initiated, warranting an investigation of appropriate bioinformatics strategies for optimal structural target selection for these molecules. What determines a membrane protein fold? How many membrane structures need to be solved to provide sufficient structural coverage of the membrane protein sequence space? We present the CAMPS database (Computational Analysis of the Membrane Protein Space) containing almost 45,000 proteins with three or more predicted transmembrane helices (TMH) from 120 bacterial species. This large set of membrane proteins was subjected to single‐linkage clustering using only sequence alignments covering at least 40% of the TMH present in a given family. This process yielded 266 sequence clusters with at least 15 members, roughly corresponding to membrane structural folds, sufficiently structurally homogeneous in terms of the variation of TMH number between individual sequences. These clusters were further subdivided into functionally homogeneous subclusters according to the COG (Clusters of Orthologous Groups) system as well as more stringently defined families sharing at least 30% identity. The CAMPS sequence clusters are thus designed to reflect three main levels of interest for structural genomics: fold, function, and modeling distance. We present a library of Hidden Markov Models (HMM) derived from sequence alignments of TMH at these three levels of sequence similarity. Given that 24 out of 266 clusters corresponding to membrane folds already have associated known structures, we estimate that 242 additional new structures, one for each remaining cluster, would provide structural coverage at the fold level of roughly 70% of prokaryotic membrane proteins belonging to the currently most populated families. Proteins 2006. © 2006 Wiley‐Liss, Inc.  相似文献   

20.
Members of a superfamily of proteins could result from divergent evolution of homologues with insignificant similarity in the amino acid sequences. A superfamily relationship is detected commonly after the three-dimensional structures of the proteins are determined using X-ray analysis or NMR. The SUPFAM database described here relates two homologous protein families in a multiple sequence alignment database of either known or unknown structure. The present release (1.1), which is the first version of the SUPFAM database, has been derived by analysing Pfam, which is one of the commonly used databases of multiple sequence alignments of homologous proteins. The first step in establishing SUPFAM is to relate Pfam families with the families in PALI, which is an alignment database of homologous proteins of known structure that is derived largely from SCOP. The second step involves relating Pfam families which could not be associated reliably with a protein superfamily of known structure. The profile matching procedure, IMPALA, has been used in these steps. The first step resulted in identification of 1280 Pfam families (out of 2697, i.e. 47%) which are related, either by close homologous connection to a SCOP family or by distant relationship to a SCOP family, potentially forming new superfamily connections. Using the profiles of 1417 Pfam families with apparently no structural information, an all-against-all comparison involving a sequence-profile match using IMPALA resulted in clustering of 67 homologous protein families of Pfam into 28 potential new superfamilies. Expansion of groups of related proteins of yet unknown structural information, as proposed in SUPFAM, should help in identifying ‘priority proteins’ for structure determination in structural genomics initiatives to expand the coverage of structural information in the protein sequence space. For example, we could assign 858 distinct Pfam domains in 2203 of the gene products in the genome of Mycobacterium tubercolosis. Fifty-one of these Pfam families of unknown structure could be clustered into 17 potentially new superfamilies forming good targets for structural genomics. SUPFAM database can be accessed at http://pauling.mbu.iisc.ernet.in/~supfam.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号