首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Structural genomics projects require strategies for rapidly recognizing protein sequences appropriate for routine structure determination. For large proteins, this strategy includes the dissection of proteins into structural domains that form stable native structures. However, protein dissection essentially remains an empirical and often a tedious process. Here, we describe a simple strategy for rapidly identifying structural domains and assessing their structures. This approach combines the computational prediction of sequence regions corresponding to putative domains with an experimental assessment of their structures and stabilities by NMR and biochemical methods. We tested this approach with nine putative domains predicted from a set of 108 Thermus thermophilus HB8 sequences using PASS, a domain prediction program we previously reported. To facilitate the experimental assessment of the domain structures, we developed a generic 6-hour His-tag-based purification protocol, which enables the sample quality evaluation of a putative structural domain in a single day. As a result, we observed that half of the predicted structural domains were indeed natively folded, as judged by their HSQC spectra. Furthermore, two of the natively folded domains were novel, without related sequences classified in the Pfam and SMART databases, which is a significant result with regard to the ability of structural genomics projects to uniformly cover the protein fold space.  相似文献   

2.
McGuffin LJ  Jones DT 《Proteins》2002,48(1):44-52
The ultimate goal of structural genomics is to obtain the structure of each protein coded by each gene within a genome to determine gene function. Because of cost and time limitations, it remains impractical to solve the structure for every gene product experimentally. Up to a point, reasonably accurate three‐dimensional structures can be deduced for proteins with homologous sequences by using comparative modeling. Beyond this, fold recognition or threading methods can be used for proteins showing little homology to any known fold, although this is relatively time‐consuming and limited by the library of template folds currently available. Therefore, it is appropriate to develop methods that can increase our knowledge base, expanding our fold libraries by earmarking potentially “novel” folds for experimental structure determination. How can we sift through proteomic data rapidly and yet reliably identify novel folds as targets for structural genomics? We have analyzed a number of simple methods that discriminate between “novel” and “known” folds. We propose that simple alignments of secondary structure elements using predicted secondary structure could potentially be a more selective method than both a simple fold recognition method (GenTHREADER) and standard sequence alignment at finding novel folds when sequences show no detectable homology to proteins with known structures. Proteins 2002;48:44–52. © 2002 Wiley‐Liss, Inc.  相似文献   

3.
Mycobacterium tuberculosis, which belongs to the genus Mycobacterium, is the pathogenic agent for most tuberculosis (TB). As TB remains one of the most rampant infectious diseases, causing morbidity and death with emergence of multi-drug-resistant and extensively-drug-resistant forms, it is urgent to identify new drugs with novel targets to ensure future therapeutic success. In this regards, the structural genomics of M. tuberculosis provides important information to identify potential targets, perform biochemical assays, determine crystal structures in complex with potential inhibitor(s), reveal the key sites/residues for biological activity, and thus validate drug targets and discover novel drugs. In this review, we will discuss the recent progress on novel targets for structure-based anti-M. tuberculosis drug discovery.  相似文献   

4.
The rapid increase in genomic sequences provides new opportunities for comparative genomics. In this report, we describe a novel family of repeat sequences that is present in Bacteria and Archaea but not in Eukarya. The repeat loci typically consisted of repetitive stretches of nucleotides with a length of 25 to 37 bp alternated by nonrepetitive DNA spacers of approximately equal size as the repeats. The nucleotide sequences and the size of the repeats were highly conserved within a species, but between species the sequences showed no similarity. Due to their characteristic structure, we have designated this family of repeat loci as SPacers Interspersed Direct Repeats (SPIDR). The SPIDR loci were identified in more than forty different prokaryotic species. Individual species such as Mycobacterium tuberculosis contain one SPIDR locus, while other species such as Methanococcus jannaschii contained up to 20 different loci. The number of repeats in a locus varies greatly from two repeats to several dozens of repeats. The SPIDR loci were flanked by a common 300-500-bp leader sequence, which appeared to be conserved within a species but not between species. The SPIDR locus of M. tuberculosis is extensively used for strain typing. The finding of SPIDR loci in other prokaryotes, including the pathogens Salmonella, Campylobacter, and Pasteurella may extend this surveillance to other species.  相似文献   

5.
Evolutionary genomics of pathogenic bacteria   总被引:15,自引:0,他引:15  
Complete genome sequences are now available for multiple strains of several bacterial pathogens and comparative analysis of these sequences is providing important insights into the evolution of bacterial virulence. Recently, DNA microarray analysis of many strains of several pathogenic species has contributed to our understanding of bacterial diversity, evolution and pathogenesis. Comparative genomics has shown that pathogens such as Escherichia coli, Helicobacter pylori and Staphylococcus aureus contain extensive variation in gene content whereas Mycobacterium tuberculosis nucleotide divergence is very limited. Overall, these approaches are proving to be a powerful means of exploring bacterial diversity, and are providing an important framework for the analysis of the evolution of pathogenesis and the development of novel antimicrobial agents.  相似文献   

6.
The Protein Structural Initiative (PSI) at the US National Institutes of Health (NIH) is funding four large-scale centers for structural genomics (SG). These centers systematically target many large families without structural coverage, as well as very large families with inadequate structural coverage. Here, we report a few simple metrics that demonstrate how successfully these efforts optimize structural coverage: while the PSI-2 (2005-now) contributed more than 8% of all structures deposited into the PDB, it contributed over 20% of all novel structures (i.e. structures for protein sequences with no structural representative in the PDB on the date of deposition). The structural coverage of the protein universe represented by today’s UniProt (v12.8) has increased linearly from 1992 to 2008; structural genomics has contributed significantly to the maintenance of this growth rate. Success in increasing novel leverage (defined in Liu et al. in Nat Biotechnol 25:849–851, 2007) has resulted from systematic targeting of large families. PSI’s per structure contribution to novel leverage was over 4-fold higher than that for non-PSI structural biology efforts during the past 8 years. If the success of the PSI continues, it may just take another ~15 years to cover most sequences in the current UniProt database.  相似文献   

7.

Background  

Genome sequencing and post-genomics projects such as structural genomics are extending the frontier of the study of sequence-structure-function relationship of genes and their products. Although many sequence/structure-based methods have been devised with the aim of deciphering this delicate relationship, there still remain large gaps in this fundamental problem, which continuously drives researchers to develop novel methods to extract relevant information from sequences and structures and to infer the functions of newly identified genes by genomics technology.  相似文献   

8.
在后基因组时代,随着大量物种全基因组序列的获得,结构生物学家面临着结构基因组学的新机遇和挑战。与传统的结构生物学不同的是,结构基因组学的研究主要集中在结构和功能未知并且与从前研究的蛋白质相似性很小的蛋白质。准确的来讲,结构基因组学通过高通量蛋白质表达、结构解析来完成所有蛋白质家族的结构表征,从而能够通过结构预测功能。加州结构基因组学联合实验室发展了高度自动化的蛋白质合成、结晶、结构解析生产线。然而由于一些蛋白质不能被结晶,要想覆盖所有蛋白质结构域还有很大困难。Wuthrich的研究小组通过一些高通量的目的蛋白质筛选和NMR结构解析的方法解决了这一难题。与X射线晶体学解析蛋白质结构相比,NMR技术由于能够解析更接近生理状态的溶液结构而具有互补性。通过获得溶液中的蛋白质稳定性、动力学特征和相互作用信息,正如在朊蛋白和SARS相关蛋白的研究中所表现的那样,NMR技术从扩大已知的蛋白质结构数据库、新的蛋白质功能到化学生物学研究中都扮演着激动人心的角色。  相似文献   

9.
The field of computational biology has been revolutionized by recent advances in genomics. The completion of a number of genome projects, including that of the human genome, has paved the way toward a variety of challenges and opportunities in bioinformatics and biological systems engineering. One of the first challenges has been the determination of the structures of proteins encoded by the individual genes. This problem, which represents the progression from sequence to structure (genomics to structural genomics), has been widely known as the structure-prediction-in-protein-folding problem. We present the development and application of ASTRO-FOLD, a novel and complete approach for the ab initio prediction of protein structures given only the amino acid sequences of the proteins. The approach exhibits many novel components and the merits of its application are examined for a suite of protein systems, including a number of targets from several critical-assessment-of-structure-prediction experiments.  相似文献   

10.
11.

Background  

Data mining in large DNA sequences is a major challenge in microbial genomics and bioinformatics. Oligonucleotide usage (OU) patterns provide a wealth of information for large scale sequence analysis and visualization. The purpose of this research was to make OU statistical analysis available as a novel web-based tool for functional genomics and annotation. The tool is also available as a downloadable package.  相似文献   

12.
Tsoka S  Ouzounis CA 《FEBS letters》2000,480(1):42-48
Computational genomics is a subfield of computational biology that deals with the analysis of entire genome sequences. Transcending the boundaries of classical sequence analysis, computational genomics exploits the inherent properties of entire genomes by modelling them as systems. We review recent developments in the field, discuss in some detail a number of novel approaches that take into account the genomic context and argue that progress will be made by novel knowledge representation and simulation technologies.  相似文献   

13.
Worldwide structural genomics projects are increasing structure coverage of sequence space but have not significantly expanded the protein structure space itself (i.e., number of unique structural folds) since 2007. Discovering new structural folds experimentally by directed evolution and random recombination of secondary-structure blocks is also proved rarely successful. Meanwhile, previous computational efforts for large-scale mapping of protein structure space are limited to simple model proteins and led to an inconclusive answer on the completeness of the existing observed protein structure space. Here, we build novel protein structures by extending naturally occurring circular (single-loop) permutation to multiple loop permutations (MLPs). These structures are clustered by structural similarity measure called TM-score. The computational technique allows us to produce different structural clusters on the same naturally occurring, packed, stable core but with alternatively connected secondary-structure segments. A large-scale MLP of 2936 domains from structural classification of protein domains reproduces those existing structural clusters (63%) mostly as hubs for many nonredundant sequences and illustrates newly discovered novel clusters as islands adopted by a few sequences only. Results further show that there exist a significant number of novel potentially stable clusters for medium-size or large-size single-domain proteins, in particular, > 100 amino acid residues, that are either not yet adopted by nature or adopted only by a few sequences. This study suggests that MLP provides a simple yet highly effective tool for engineering and design of novel protein structures (including naturally knotted proteins). The implication of recovering new-fold targets from critical assessment of structure prediction techniques (CASP) by MLP on template-based structure prediction is also discussed. Our MLP structures are available for download at the publication page of the Web site http://sparks.informatics.iupui.edu.  相似文献   

14.
Mining bacterial genomes for antimicrobial targets   总被引:2,自引:0,他引:2  
The elucidation of whole-genome sequences is expected to have a revolutionary impact on the discovery of novel medicines. With the availability of complete genome sequences of more than 30 different species, the field of antimicrobial drug discovery has the opportunity to access a remarkable diversity of genomic information. In this review, I summarize how microbial genomics has changed strategies of drug discovery by applying bioinformatics, novel genetic approaches and genomics-based technologies, including analysis of gene expression using DNA microarrays.  相似文献   

15.
Mirkovic N  Li Z  Parnassa A  Murray D 《Proteins》2007,66(4):766-777
The technological breakthroughs in structural genomics were designed to facilitate the solution of a sufficient number of structures, so that as many protein sequences as possible can be structurally characterized with the aid of comparative modeling. The leverage of a solved structure is the number and quality of the models that can be produced using the structure as a template for modeling and may be viewed as the "currency" with which the success of a structural genomics endeavor can be measured. Moreover, the models obtained in this way should be valuable to all biologists. To this end, at the Northeast Structural Genomics Consortium (NESG), a modular computational pipeline for automated high-throughput leverage analysis was devised and used to assess the leverage of the 186 unique NESG structures solved during the first phase of the Protein Structure Initiative (January 2000 to July 2005). Here, the results of this analysis are presented. The number of sequences in the nonredundant protein sequence database covered by quality models produced by the pipeline is approximately 39,000, so that the average leverage is approximately 210 models per structure. Interestingly, only 7900 of these models fulfill the stringent modeling criterion of being at least 30% sequence-identical to the corresponding NESG structures. This study shows how high-throughput modeling increases the efficiency of structure determination efforts by providing enhanced coverage of protein structure space. In addition, the approach is useful in refining the boundaries of structural domains within larger protein sequences, subclassifying sequence diverse protein families, and defining structure-based strategies specific to a particular family.  相似文献   

16.
The growing list of fully sequenced genomes, combined with innovations in the fields of structural biology and bioinformatics, provides a synergy for the discovery of new drug targets. With this background, the TB Structural Genomics Consortium has been formed. This international consortium is comprised of laboratories from 31 universities and institutes in 13 countries. The goal of the consortium is to determine the structures of over 400 potential drug targets from the genome of Mycobacterium tuberculosis and analyze their structures in the context of functional information. We summarize the efforts of the UCLA consortium members. Potential drug targets were selected using a variety of bioinformatics methods and screened for certain physical and species-specific properties to yield a starting group of protein targets for structure determination. Target determination methods include protein phylogenetic profiles and Rosetta Stone methods, and the use of related biochemical pathways to select genes linked to essential prokaryotic genes. Criteria imposed on target selection included potential protein solubility, protein or domain size, and targets that lack homologs in eukaryotic organisms. In addition, some protein targets were chosen that are specific to M. tuberculosis, such as PE and PPE domains. Thus far, the UCLA group has cloned 263 targets, expressed 171 proteins and purified 40 proteins, which are currently in crystallization trials. Our efforts have yielded 13 crystals and eight structures. Seven structures are summarized here. Four of the structures are secreted proteins: antigen 85B; MPT 63, which is one of the three major secreted proteins of M. tuberculosis; a thioredoxin derivative Rv2878c; and potentially secreted glutamate synthetase. We also report the structures of three proteins that are potentially essential to the survival of M. tuberculosis: a protein involved in the folate biosynthetic pathway (Rv3607c); a protein involved in the biosynthesis of vitamin B5 (Rv3602c); and a pyrophosphatase, Rv2697c. Our approach to the M. tuberculosis structural genomics project will yield information for drug design and vaccine production against tuberculosis. In addition, this study will provide further insights into the mechanisms of mycobacterial pathogenesis.  相似文献   

17.
De novo design provides an in silico toolkit for the design of novel small molecular structures to a set of specified structural constraints. With the avalanche of bioinformatics data, de novo design is ideally suited for exploring molecules that could be useful for chemical genomics. The design process involves manipulation of the input, modification of structural constraints, and further processing of the de novo generated molecules using various modular toolkits. The development of a theoretical framework for each of these stages will provide novel practical solutions to the problem of creating compounds with maximal chemical diversity. This short review describes the fundamental problems encountered in the application of novel chemical design technologies to chemical genomics by means of a formal representation. This notation helps to outline and clarify ideas and hypotheses that can then be explored using mathematical algorithms. It is only by developing this rigorous foundation that in silico design can progress in a rational way.  相似文献   

18.
Single and multiple resistance to antibacterial drugs currently in use is spreading, since they act against only a very small number of molecular targets; finding novel targets for anti-infectives is therefore of great importance. All protein sequences from three pathogens (Staphylococcus aureus, Mycobacterium tuberculosis and Escherichia coli O157:H7 EDL993) were assessed via comparative genomics methods for their suitability as antibacterial targets according to a number of criteria, including the essentiality of the protein, its level of sequence conservation, and its distribution in pathogens, bacteria and eukaryotes (especially humans). Each protein was scored and ranked based on weighted variants of these criteria in order to prioritize proteins as potential novel broad-spectrum targets for antibacterial drugs. A number of proteins proved to score highly in all three species and were robust to variations in the scoring system used. Sensitivity analysis indicated the quantitative contribution of each metric to the overall score. After further analysis of these targets, tRNA methyltransferase (trmD) and translation initiation factor IF-1 (infA) emerged as potential and novel antimicrobial targets very worthy of further investigation. The scoring strategy used might be of value in other areas of post-genomic drug discovery.  相似文献   

19.
The use of next‐generation sequencers and advanced genotyping technologies has propelled the field of plant genomics in model crops and plants and enhanced the discovery of hidden bridges between genotypes and phenotypes. The newly generated reference sequences of unstudied minor plants can be annotated by the knowledge of model plants via translational genomics approaches. Here, we reviewed the strategies of translational genomics and suggested perspectives on the current databases of genomic resources and the database structures of translated information on the new genome. As a draft picture of phenotypic annotation, translational genomics on newly sequenced plants will provide valuable assistance for breeders and researchers who are interested in genetic studies.  相似文献   

20.
The targets of the Structural GenomiX (SGX) bacterial genomics project were proteins conserved in multiple prokaryotic organisms with no obvious sequence homolog in the Protein Data Bank of known structures. The outcome of this work was 80 structures, covering 60 unique sequences and 49 different genes. Experimental phase determination from proteins incorporating Se-Met was carried out for 45 structures with most of the remainder solved by molecular replacement using members of the experimentally phased set as search models. An automated tool was developed to deposit these structures in the Protein Data Bank, along with the associated X-ray diffraction data (including refined experimental phases) and experimentally confirmed sequences. BLAST comparisons of the SGX structures with structures that had appeared in the Protein Data Bank over the intervening 3.5 years since the SGX target list had been compiled identified homologs for 49 of the 60 unique sequences represented by the SGX structures. This result indicates that, for bacterial structures that are relatively easy to express, purify, and crystallize, the structural coverage of gene space is proceeding rapidly. More distant sequence-structure relationships between the SGX and PDB structures were investigated using PDB-BLAST and Combinatorial Extension (CE). Only one structure, SufD, has a truly unique topology compared to all folds in the PDB.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号