首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Fold assignments for proteins from the Escherichia coli genome are carried out using BASIC, a profile-profile alignment algorithm, recently tested on fold recognition benchmarks and on the Mycoplasma genitalium genome and PSI BLAST, the newest generation of the de facto standard in homology search algorithms. The fold assignments are followed by automated modeling and the resulting three-dimensional models are analyzed for possible function prediction. Close to 30% of the proteins encoded in the E. coli genome can be recognized as homologous to a protein family with known structure. Most of these homologies (23% of the entire genome) can be recognized both by PSI BLAST and BASIC algorithms, but the latter recognizes an additional 260 homologies. Previous estimates suggested that only 10-15% of E. coli proteins can be characterized this way. This dramatic increase in the number of recognized homologies between E. coli proteins and structurally characterized protein families is partly due to the rapid increase of the database of known protein structures, but mostly it is due to the significant improvement in prediction algorithms. Knowing protein structure adds a new dimension to our understanding of its function and the predictions presented here can be used to predict function for uncharacterized proteins. Several examples, analyzed in more detail in this paper, include the DPS protein protecting DNA from oxidative damage (predicted to be homologous to ferritin with iron ion acting as a reducing agent) and the ahpC/tsa family of proteins, which provides resistance to various oxidating agents (predicted to be homologous to glutathione peroxidase).  相似文献   

2.
The rapid growth in protein structural data and the emergence of structural genomics projects have increased the need for automatic structure analysis and tools for function prediction. Small molecule recognition is critical to the function of many proteins; therefore, determination of ligand binding site similarity is important for understanding ligand interactions and may allow their functional classification. Here, we present a binding sites database (SitesBase) that given a known protein-ligand binding site allows rapid retrieval of other binding sites with similar structure independent of overall sequence or fold similarity. However, each match is also annotated with sequence similarity and fold information to aid interpretation of structure and functional similarity. Similarity in ligand binding sites can indicate common binding modes and recognition of similar molecules, allowing potential inference of function for an uncharacterised protein or providing additional evidence of common function where sequence or fold similarity is already known. Alternatively, the resource can provide valuable information for detailed studies of molecular recognition including structure-based ligand design and in understanding ligand cross-reactivity. Here, we show examples of atomic similarity between superfamily or more distant fold relatives as well as between seemingly unrelated proteins. Assignment of unclassified proteins to structural superfamiles is also undertaken and in most cases substantiates assignments made using sequence similarity. Correct assignment is also possible where sequence similarity fails to find significant matches, illustrating the potential use of binding site comparisons for newly determined proteins.  相似文献   

3.
Colonization of the gastric mucosa with the spiral-shaped Gram-negative proteobacterium Helicobacter pylori is probably the most common chronic infection in humans. The genomes of H. pylori strains J99 and 26695 have been completely sequenced. Functional and three-dimensional structural information is available for less than one third of all open reading frames. We investigated the function and three-dimensional structure of a member from a family of cysteine-rich hypothetical proteins that are unique to H. pylori and Campylobacter jejuni. The structure of H. pylori cysteine-rich protein (Hcp) B possesses a modular architecture consisting of four alpha/alpha-motifs that are cross-linked by disulfide bridges. The Hcp repeat is similar to the tetratricopeptide repeat, which is frequently found in protein/protein interactions. In contrast to the tetratricopeptide repeat, the Hcp repeat is 36 amino acids long. HcpB is capable of binding and hydrolyzing 6-amino penicillinic acid and 7-amino cephalosporanic acid derivatives. The HcpB fold is distinct from the fold of any known penicillin-binding protein, indicating that the Hcp proteins comprise a new family of penicillin-binding proteins. The putative penicillin binding site is located in an amphipathic groove on the concave side of the molecule.  相似文献   

4.
5.
Fold assignments for newly sequenced genomes belong to the most important and interesting applications of the booming field of protein structure prediction. We present a brief survey and a discussion of such assignments completed to date, using as an example several fold assignment projects for proteins from the Escherichia coli genome. This review focuses on steps that are necessary to go beyond the simple assignment projects and into the development of tools extending our understanding of functions of proteins in newly sequenced genomes. This paper also discusses several problems seldom addressed in the literature, such as the problem of domain prediction and complementary predictions (e.g., transmembrane regions and flexible regions) and cross-correlation of predictions from different servers. The influence of sequence and structure database growth on prediction success is also addressed. Finally, we discuss the perspectives of the field in the context of massive sequence and structure determination projects, as well as the development of novel prediction methods.  相似文献   

6.
A computational method for NMR-constrained protein threading.   总被引:2,自引:0,他引:2  
Protein threading provides an effective method for fold recognition and backbone structure prediction. But its application is currently limited due to its level of prediction accuracy and scope of applicability. One way to significantly improve its usefulness is through the incorporation of underconstrained (or partial) NMR data. It is well known that the NMR method for protein structure determination applies only to small proteins and that its effectiveness decreases rapidly as the protein mass increases beyond about 30 kD. We present, in this paper, a computational framework for applying underconstrained NMR data (that alone are insufficient for structure determination) as constraints in protein threading and also in all-atom model construction. In this study, we consider both secondary structure assignments from chemical shifts and NOE distance restraints. Our results have shown that both secondary structure assignments and a small number of long-range NOEs can significantly improve the threading quality in both fold recognition and threading-alignment accuracy, and can possibly extend threading's scope of applicability from homologs to analogs. An accurate backbone structure generated by NMR-constrained threading can then provide a great amount of structural information, equivalent to that provided by many NMR data; and hence can help reduce the number of NMR data typically required for an accurate structure determination. This new technique can potentially accelerate current NMR structure determination processes and possibly expand NMR's capability to larger proteins.  相似文献   

7.
De Ungria MC  Kolesnikow T  Cox PT  Lee A 《Plasmid》1999,41(2):97-109
The 5846-bp circular plasmid pHPS1 of Helicobacter pylori Sydney strain, SS1, was cloned, sequenced, and structurally characterized. The SS1 strain is widely used in animal studies of H. pylori infection. The sequence of pHPS1 revealed three open reading frames (ORFs), all of which are transcribed. Two ORFs encode putative plasmid replication proteins, RepA and RepB, similar to replicases resident on theta plasmids. In contrast, the function of ORF2 remains cryptic due to the absence of sequence similarity with any known protein in sequence databases. In addition, species specificity of these three coding regions was shown using DNA dot blot hybridization in 57 diverse clinical H. pylori isolates and 32 Helicobacter and Campylobacter strains. RepA appears to be the predominant plasmid replication protein of H. pylori and the deduced amino acid sequence was highly conserved (76-96%) in 8 H. pylori isolates, including SS1. RepB was detected in 3 H. pylori isolates examined in this study, 2 of which possess only the repB gene. Analysis of the protein sequences of these two replicases, together with previously characterized H. pylori plasmid replication proteins, supports the formation of a distinct class of H. pylori plasmid proteins. Moreover, comprehensive analysis of the whole genome sequence of H. pylori strain 26695, pHPS1, and other H. pylori plasmid sequences that are available revealed interesting insights as to the occurrence of plasmid-mediated recombination within H. pylori. Common regions between plasmids and chromosome sequences of H. pylori were identified in this study which could only have arisen by genetic recombination, thus providing the first line of evidence, albeit indirectly, of the contribution of H. pylori plasmids in generating an extensive genetic heterogeneity characteristic of this important gastroduodenal pathogen.  相似文献   

8.
To maximise the assignment of function of the proteins encoded by a genome and to aid the search for novel drug targets, there is an emerging need for sensitive methods of predicting protein function on a genome-wide basis. GeneAtlas is an automated, high-throughput pipeline for the prediction of protein structure and function using sequence similarity detection, homology modelling and fold recognition methods. GeneAtlas is described in detail here. To test GeneAtlas, a 'virtual' genome was used, a subset of PDB structures from the SCOP database, in which the functional relationships are known. GeneAtlas detects additional relationships by building 3D models in comparison with the sequence searching method PSI-BLAST. Functionally related proteins with sequence identity below the twilight zone can be recognised correctly.  相似文献   

9.
Of the membrane proteins of known structure, we found that a remarkable 67% of the water soluble domains are structurally similar to water soluble proteins of known structure. Moreover, 41% of known water soluble protein structures share a domain with an already known membrane protein structure. We also found that functional residues are frequently conserved between extramembrane domains of membrane and soluble proteins that share structural similarity. These results suggest membrane and soluble proteins readily exchange domains and their attendant functionalities. The exchanges between membrane and soluble proteins are particularly frequent in eukaryotes, indicating that this is an important mechanism for increasing functional complexity. The high level of structural overlap between the two classes of proteins provides an opportunity to employ the extensive information on soluble proteins to illuminate membrane protein structure and function, for which much less is known. To this end, we employed structure guided sequence alignment to elucidate the functions of membrane proteins in the human genome. Our results bridge the gap of fold space between membrane and water soluble proteins and provide a resource for the prediction of membrane protein function. A database of predicted structural and functional relationships for proteins in the human genome is provided at sbi.postech.ac.kr/emdmp.  相似文献   

10.
The genome sciences face the challenge to characterize structure and function of a vast number of novel genes. Sequence search techniques are used to infer functional and structural information from similarities to experimentally characterized genes or proteins. The persistent goal is to refine these techniques and to develop alternative and complementary methods to increase the range of reliable inference.Here, we focus on the structural and functional assignments that can be inferred from the known three-dimensional structures of proteins. The study uses all structures in the Protein Data Bank that were known by the end of 1997. The protein structures released in 1998 were then characterized in terms of functional and structural similarity to the previously known structures, yielding an estimate of the maximum amount of information on novel protein sequences that can be obtained from inference techniques.The 147 globular proteins corresponding to 196 domains released in 1998 have no clear sequence similarity to previously known structures. However, 75 % of the domains have extensive structure similarity to previously known folds, and most importantly, in two out of three cases similarity in structure coincides with related function. In view of this analysis, full utilization of existing structure data bases would provide information for many new targets even if the relationship is not accessible from sequence information alone. Currently, the most sophisticated techniques detect of the order of one-third of these relationships.  相似文献   

11.
Of the ~4000 ORFs identified through the genome sequence of Mycobacterium tuberculosis (TB) H37Rv, experimentally determined structures are available for 312. Since knowledge of protein structures is essential to obtain a high-resolution understanding of the underlying biology, we seek to obtain a structural annotation for the genome, using computational methods. Structural models were obtained and validated for ~2877 ORFs, covering ~70% of the genome. Functional annotation of each protein was based on fold-based functional assignments and a novel binding site based ligand association. New algorithms for binding site detection and genome scale binding site comparison at the structural level, recently reported from the laboratory, were utilized. Besides these, the annotation covers detection of various sequence and sub-structural motifs and quaternary structure predictions based on the corresponding templates. The study provides an opportunity to obtain a global perspective of the fold distribution in the genome. The annotation indicates that cellular metabolism can be achieved with only 219 folds. New insights about the folds that predominate in the genome, as well as the fold-combinations that make up multi-domain proteins are also obtained. 1728 binding pockets have been associated with ligands through binding site identification and sub-structure similarity analyses. The resource (http://proline.physics.iisc.ernet.in/Tbstructuralannotation), being one of the first to be based on structure-derived functional annotations at a genome scale, is expected to be useful for better understanding of TB and for application in drug discovery. The reported annotation pipeline is fairly generic and can be applied to other genomes as well.  相似文献   

12.
Practical lessons from protein structure prediction   总被引:9,自引:0,他引:9       下载免费PDF全文
Despite recent efforts to develop automated protein structure determination protocols, structural genomics projects are slow in generating fold assignments for complete proteomes, and spatial structures remain unknown for many protein families. Alternative cheap and fast methods to assign folds using prediction algorithms continue to provide valuable structural information for many proteins. The development of high-quality prediction methods has been boosted in the last years by objective community-wide assessment experiments. This paper gives an overview of the currently available practical approaches to protein structure prediction capable of generating accurate fold assignment. Recent advances in assessment of the prediction quality are also discussed.  相似文献   

13.
Protean     
The archaeal, bacterial, andeukaryotic genome projects have overwhelmed our ability to experimentally elucidate the function of each novel gene and gene product. To a certain extent, protein functional assignments can be derived via sequence similarity measures and direct primary sequence analysis using methods to predict hydropathy, secondary structure, amphilicity, and antigenicity. Function can also be inferred on the basis of sequence motifs, such as phosphorylation and lipid binding signatures. These methods, provided in DNASTAR’s PROTEAN module, can be used to putatively assign roles for novel proteins from the genome explosion as well as clarify function for better known proteins.  相似文献   

14.
Over the next few years, various genome projects will sequence many new genes and yield many new gene products. Many of these products will have no known function and little, if any, sequence homology to existing proteins. There is reason to believe that a rapid determination of a protein fold, even at low resolution, can aid in the identification of function and expedite the determination of structure at higher resolution. Recently devised NMR methods of measuring residual dipolar couplings provide one route to the determination of a fold. They do this by allowing the alignment of previously identified secondary structural elements with respect to each other. When combined with constraints involving loops connecting elements or other short-range experimental distance information, a fold is produced. We illustrate this approach to protein fold determination on (15)N-labeled Eschericia coli acyl carrier protein using a limited set of (15)N-(1)H and (1)H-(1)H dipolar couplings. We also illustrate an approach using a more extended set of heteronuclear couplings on a related protein, (13)C, (15)N-labeled NodF protein from Rhizobium leguminosarum.  相似文献   

15.
Prediction of protein function from protein sequence and structure   总被引:1,自引:0,他引:1  
The sequence of a genome contains the plans of the possible life of an organism, but implementation of genetic information depends on the functions of the proteins and nucleic acids that it encodes. Many individual proteins of known sequence and structure present challenges to the understanding of their function. In particular, a number of genes responsible for diseases have been identified but their specific functions are unknown. Whole-genome sequencing projects are a major source of proteins of unknown function. Annotation of a genome involves assignment of functions to gene products, in most cases on the basis of amino-acid sequence alone. 3D structure can aid the assignment of function, motivating the challenge of structural genomics projects to make structural information available for novel uncharacterized proteins. Structure-based identification of homologues often succeeds where sequence-alone-based methods fail, because in many cases evolution retains the folding pattern long after sequence similarity becomes undetectable. Nevertheless, prediction of protein function from sequence and structure is a difficult problem, because homologous proteins often have different functions. Many methods of function prediction rely on identifying similarity in sequence and/or structure between a protein of unknown function and one or more well-understood proteins. Alternative methods include inferring conservation patterns in members of a functionally uncharacterized family for which many sequences and structures are known. However, these inferences are tenuous. Such methods provide reasonable guesses at function, but are far from foolproof. It is therefore fortunate that the development of whole-organism approaches and comparative genomics permits other approaches to function prediction when the data are available. These include the use of protein-protein interaction patterns, and correlations between occurrences of related proteins in different organisms, as indicators of functional properties. Even if it is possible to ascribe a particular function to a gene product, the protein may have multiple functions. A fundamental problem is that function is in many cases an ill-defined concept. In this article we review the state of the art in function prediction and describe some of the underlying difficulties and successes.  相似文献   

16.
The SPOUT family of methyltransferase proteins is noted for containing a deep trefoil knot in their defining backbone fold. This unique fold is of high interest for furthering the understanding of knots in proteins. Here, we report the 1H, 13C, 15N assignments for MTT Tm , a canonical member of the SPOUT family. This protein is unique, as it is one of the smallest members of the family, making it an ideal system for probing the unique properties of the knot. Our present work represents the foundation for further studies into the topology of MTT Tm , and understanding how its structure affects both its folding and function.  相似文献   

17.
MOTIVATION: Protein secondary structure prediction is an important step towards understanding how proteins fold in three dimensions. Recent analysis by information theory indicates that the correlation between neighboring secondary structures are much stronger than that of neighboring amino acids. In this article, we focus on the combination problem for sequences, i.e. combining the scores or assignments from single or multiple prediction systems under the constraint of a whole sequence, as a target for improvement in protein secondary structure prediction. RESULTS: We apply several graphical chain models to solve the combination problem and show that they are consistently more effective than the traditional window-based methods. In particular, conditional random fields (CRFs) moderately improve the predictions for helices and, more importantly, for beta sheets, which are the major bottleneck for protein secondary structure prediction.  相似文献   

18.
Despite the increasing number of recently solved membrane protein structures, coverage of membrane protein fold space remains relatively sparse. This necessitates the use of computational strategies to investigate membrane protein structure, allowing us to further our understanding of how membrane proteins carry out their diverse range of functions, while aiding the development of novel predictive tools with which to probe uncharacterised folds. Analysis of known structures, the application of machine learning techniques, molecular dynamics simulations and protein structure prediction have enabled significant advances to be made in the field of membrane protein research. In this communication, the key bioinformatic methods that allow the characterisation of membrane proteins are reviewed, the tools available for the structural analysis of membrane proteins are presented and the contribution these tools have made to expanding our understanding of membrane protein structure, function and stability is discussed.  相似文献   

19.
We describe our global optimization method called Stochastic Perturbation with Soft Constraints (SPSC), which uses information from known proteins to predict secondary structure, but not in the tertiary structure predictions or in generating the terms of the physics-based energy function. Our approach is also characterized by the use of an all atom energy function that includes a novel hydrophobic solvation function derived from experiments that shows promising ability for energy discrimination against misfolded structures. We present the results obtained using our SPSC method and energy function for blind prediction in the 4th Critical Assessment of Techniques for Protein Structure Prediction competition, and show that our approach is more effective on targets for which less information from known proteins is available. In fact our SPSC method produced the best prediction for one of the most difficult targets of the competition, a new fold protein of 240 amino acids.  相似文献   

20.
McGuffin LJ  Jones DT 《Proteins》2002,48(1):44-52
The ultimate goal of structural genomics is to obtain the structure of each protein coded by each gene within a genome to determine gene function. Because of cost and time limitations, it remains impractical to solve the structure for every gene product experimentally. Up to a point, reasonably accurate three‐dimensional structures can be deduced for proteins with homologous sequences by using comparative modeling. Beyond this, fold recognition or threading methods can be used for proteins showing little homology to any known fold, although this is relatively time‐consuming and limited by the library of template folds currently available. Therefore, it is appropriate to develop methods that can increase our knowledge base, expanding our fold libraries by earmarking potentially “novel” folds for experimental structure determination. How can we sift through proteomic data rapidly and yet reliably identify novel folds as targets for structural genomics? We have analyzed a number of simple methods that discriminate between “novel” and “known” folds. We propose that simple alignments of secondary structure elements using predicted secondary structure could potentially be a more selective method than both a simple fold recognition method (GenTHREADER) and standard sequence alignment at finding novel folds when sequences show no detectable homology to proteins with known structures. Proteins 2002;48:44–52. © 2002 Wiley‐Liss, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号