首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
    
Protein function elucidation often relies heavily on amino acid sequence analysis and other bioinformatics approaches. The reliance is extended to structure homology modeling for ligand docking and protein–protein interaction mapping. However, sequence analysis of RPA3313 exposes a large, unannotated class of hypothetical proteins mostly from the Rhizobiales order. In the absence of sequence and structure information, further functional elucidation of this class of proteins has been significantly hindered. A high quality NMR structure of RPA3313 reveals that the protein forms a novel split ββαβ fold with a conserved ligand binding pocket between the first β‐strand and the N‐terminus of the α‐helix. Conserved residue analysis and protein–protein interaction prediction analyses reveal multiple protein binding sites and conserved functional residues. Results of a mass spectrometry proteomic analysis strongly point toward interaction with the ribosome and its subunits. The combined structural and proteomic analyses suggest that RPA3313 by itself or in a larger complex may assist in the transportation of substrates to or from the ribosome for further processing. Proteins 2016; 85:93–102. © 2016 Wiley Periodicals, Inc.  相似文献   

2.
The TT1485 gene from Thermus thermophilus HB8 encodes a hypothetical protein of unknown function with about 20 sequence homologs of bacterial or archaeal origin. Together they form a family of uncharacterized proteins, the cluster of orthologous group COG3253. Using a combination of amino acid sequence analysis, three-dimensional structural studies and biochemical assays, we identified TT1485 as a novel heme-binding protein. The crystal structure reveals that this protein is a pentamer and each monomer exhibits a β-barrel fold. TT1485 is structurally similar to muconolactone isomerase, but this provided no functional clues. Amino acid sequence analysis revealed remote homology to a heme enzyme, chlorite dismutase. Strikingly, amino acid residues that are highly conserved in the homologous hypothetical proteins and chlorite dismutase cluster around a deep cavity on the surface of each monomer. Molecular modeling shows that the cavity can accommodate a heme group with a strictly conserved His as a heme ligand. TT1485 reconstituted with iron protoporphyrin IX chloride gave a low chlorite dismutase activity, indicating that TT1485 catalyzes a reaction other than chlorite degradation. The presence of a possible Fe–His–Asp triad in the heme proximal site suggests that TT1485 functions as a novel heme peroxidase to detoxify hydrogen peroxide within the cell.  相似文献   

3.
A detailed knowledge of a protein's functional site is an absolute prerequisite for understanding its mode of action at the molecular level. However, the rapid pace at which sequence and structural information is being accumulated for proteins greatly exceeds our ability to determine their biochemical roles experimentally. As a result, computational methods are required which allow for the efficient processing of the evolutionary information contained in this wealth of data, in particular that related to the nature and location of functionally important sites and residues. The method presented here, referred to as conserved functional group (CFG) analysis, relies on a simplified representation of the chemical groups found in amino acid side-chains to identify functional sites from a single protein structure and a number of its sequence homologues. We show that CFG analysis can fully or partially predict the location of functional sites in approximately 96% of the 470 cases tested and that, unlike other methods available, it is able to tolerate wide variations in sequence identity. In addition, we discuss its potential in a structural genomics context, where automation, scalability and efficiency are critical, and an increasing number of protein structures are determined with no prior knowledge of function. This is exemplified by our analysis of the hypothetical protein Ydde_Ecoli, whose structure was recently solved by members of the North East Structural Genomics consortium. Although the proposed active site for this protein needs to be validated experimentally, this example illustrates the scope of CFG analysis as a general tool for the identification of residues likely to play an important role in a protein's biochemical function. Thus, our method offers a convenient solution to rapidly and automatically process the vast amounts of data that are beginning to emerge from structural genomics projects.  相似文献   

4.
    
Cai XH  Jaroszewski L  Wooley J  Godzik A 《Proteins》2011,79(8):2389-2402
The protein universe can be organized in families that group proteins sharing common ancestry. Such families display variable levels of structural and functional divergence, from homogenous families, where all members have the same function and very similar structure, to very divergent families, where large variations in function and structure are observed. For practical purposes of structure and function prediction, it would be beneficial to identify sub-groups of proteins with highly similar structures (iso-structural) and/or functions (iso-functional) within divergent protein families. We compared three algorithms in their ability to cluster large protein families and discuss whether any of these methods could reliably identify such iso-structural or iso-functional groups. We show that clustering using profile-sequence and profile-profile comparison methods closely reproduces clusters based on similarities between 3D structures or clusters of proteins with similar biological functions. In contrast, the still commonly used sequence-based methods with fixed thresholds result in vast overestimates of structural and functional diversity in protein families. As a result, these methods also overestimate the number of protein structures that have to be determined to fully characterize structural space of such families. The fact that one can build reliable models based on apparently distantly related templates is crucial for extracting maximal amount of information from new sequencing projects.  相似文献   

5.
Here we describe various methods currently under development aimed at identifying a proteins function from its three-dimensional structure. We are combining a number of these methods to create a pipeline of applications, called ProFunc, which will take a given 3D structure, run all the applications on it and compile and summarise the results obtained. The aim is to provide a best guess as to the proteins function from the evidence provided by the different methods. Here we present three examples, using structures solved by the Midwest Center for Structural Genomics consortium, illustrating the strengths and weaknesses of current approaches.  相似文献   

6.
    
Xanthomonas campestris pv. campestris strain 17 is a Gram‐negative yellow‐pigmented pathogenic bacterium that causes black rot, one of the major worldwide diseases of cruciferous crops. Its genome contains approximately 4500 genes, one third of which have no known structure and/or function yet are highly conserved among several different bacterial genuses. One of these gene products is XC1692 protein, containing 141 amino acids. It was overexpressed in Escherichia coli, purified and crystallized in a variety of forms using the hanging‐drop vapour‐diffusion method. The crystals diffract to at least 1.45 Å resolution. They are hexagonal and belong to space group P63, with unit‐cell parameters a = b = 56.9, c = 71.0 Å. They contain one molecule per asymmetric unit.  相似文献   

7.
    
Yan Y  Moult J 《Proteins》2006,64(3):615-628
Operons are clusters of genes that are transcribed as a single message, and regulated by the same gene expression machinery. They are found primarily in prokaryotic genomes. Because genes in the same operon are likely to have related functions, identification of the operon structure is potentially useful for assigning gene function. We report the development and benchmarking of two different methods for detecting operons, based on an analysis of 42 fully sequenced prokaryotic organisms. The Gene Neighbor method (GNM) utilizes the relatively high conservation of gene order in operons, compared with genes in general. The Gene Gap Method (GGM) makes use of the relatively short gap between genes in operons compared with that otherwise found between adjacent genes. The methods have been benchmarked using KEGG pathway data and RegulonDB Escherichia coli operon data. With optimum parameters, the specificity of the GNM is 93% and the sensitivity is 70%. For the GGM, the specificity is 95% and the sensitivity is 68%. Together, the two methods have a sensitivity of 87.2%, while joint predictions have a sensitivity of 50% and a specificity of 98%. The methods are used to infer possible functions for some hypothetical genes in prokaryotic genomes. The methods have proven a useful addition to structure information in deriving protein function in a structural genomics project.  相似文献   

8.
结构基因组学研究与核磁共振   总被引:4,自引:0,他引:4       下载免费PDF全文
各种生物的基因组DNA测序计划的完成,将结构生物学带入了结构基因组学时代.结构基因组学是对所有基因组产物结构的系统性测定,它运用高通量的选择、表达、纯化以及结构测定和计算分析手段,为基因组的每个蛋白质产物提供实验测定的结构或较好的理论模型,这将加速生命科学各个领域的研究.生物信息学、基因工程、结构测定技术等的发展为结构基因组学研究提供了保证.近年来核磁共振在技术方法上的进展,使其成为结构基因组学高通量结构分析中的一个关键方法.  相似文献   

9.
    
Xanthomonas campestris pv. campestris is a Gram‐negative yellow‐pigmented pathogenic bacterium that causes black rot, one of the major worldwide diseases of cruciferous crops. Its genome contains approximately 4500 genes, roughly one third of which have no known structure and/or function. However, some genes of unknown function are highly conserved among several different bacterial genuses. XC6422 is one such conserved hypothetical protein and has been overexpressed in Escherichia coli, purified and crystallized in a variety of forms using the hanging‐drop vapour‐diffusion method. Crystals grew to approximately 2 × 1.5 × 0.4 mm in size after one week and diffracted to at least 1.6 Å resolution. They belong to the monoclinic space group C2, with one molecule per asymmetric unit and unit‐cell parameters a = 75.8, b = 79.3, c = 38.2 Å, β = 109.4°. Determination of this structure may provide insights into the protein's function.  相似文献   

10.
Protein function prediction using local 3D templates   总被引:8,自引:0,他引:8  
The prediction of a protein's function from its 3D structure is becoming more and more important as the worldwide structural genomics initiatives gather pace and continue to solve 3D structures, many of which are of proteins of unknown function. Here, we present a methodology for predicting function from structure that shows great promise. It is based on 3D templates that are defined as specific 3D conformations of small numbers of residues. We use four types of template, covering enzyme active sites, ligand-binding residues, DNA-binding residues and reverse templates. The latter are templates generated from the target structure itself and scanned against a representative subset of all known protein structures. Together, the templates provide a fairly thorough coverage of the known structures and ensure that if there is a match to a known structure it is unlikely to be missed. A new scoring scheme provides a highly sensitive means of discriminating between true positive and false positive template matches. In all, the methodology provides a powerful new tool for function prediction to complement those already in use.  相似文献   

11.
    
Babor M  Gerzon S  Raveh B  Sobolev V  Edelman M 《Proteins》2008,70(1):208-217
Metal ions are crucial for protein function. They participate in enzyme catalysis, play regulatory roles, and help maintain protein structure. Current tools for predicting metal-protein interactions are based on proteins crystallized with their metal ions present (holo forms). However, a majority of resolved structures are free of metal ions (apo forms). Moreover, metal binding is a dynamic process, often involving conformational rearrangement of the binding pocket. Thus, effective predictions need to be based on the structure of the apo state. Here, we report an approach that identifies transition metal-binding sites in apo forms with a resulting selectivity >95%. Applying the approach to apo forms in the Protein Data Bank and structural genomics initiative identifies a large number of previously unknown, putative metal-binding sites, and their amino acid residues, in some cases providing a first clue to the function of the protein.  相似文献   

12.
The ability to predict protein function from structure is becoming increasingly important as the number of structures resolved is growing more rapidly than our capacity to study function. Current methods for predicting protein function are mostly reliant on identifying a similar protein of known function. For proteins that are highly dissimilar or are only similar to proteins also lacking functional annotations, these methods fail. Here, we show that protein function can be predicted as enzymatic or not without resorting to alignments. We describe 1178 high-resolution proteins in a structurally non-redundant subset of the Protein Data Bank using simple features such as secondary-structure content, amino acid propensities, surface properties and ligands. The subset is split into two functional groupings, enzymes and non-enzymes. We use the support vector machine-learning algorithm to develop models that are capable of assigning the protein class. Validation of the method shows that the function can be predicted to an accuracy of 77% using 52 features to describe each protein. An adaptive search of possible subsets of features produces a simplified model based on 36 features that predicts at an accuracy of 80%. We compare the method to sequence-based methods that also avoid calculating alignments and predict a recently released set of unrelated proteins. The most useful features for distinguishing enzymes from non-enzymes are secondary-structure content, amino acid frequencies, number of disulphide bonds and size of the largest cleft. This method is applicable to any structure as it does not require the identification of sequence or structural similarity to a protein of known function.  相似文献   

13.
The crystal structure of a conserved hypothetical protein from Escherichia coli has been determined using X-ray crystallography. The protein belongs to the Cluster of Orthologous Group COG1553 (National Center for Biotechnology Information database, NLM, NIH), for which there was no structural information available until now. Structural homology search with DALI algorism indicated that this protein has a new fold with no obvious similarity to those of other proteins with known three-dimensional structures. The protein quaternary structure consists of a dimer of trimers, which makes a characteristic cylinder shape. There is a large closed cavity with approximate dimensions of 16 Å × 16 Å × 20 Å in the center of the hexameric structure. Six putative active sites are positioned along the equatorial surface of the hexamer. There are several highly conserved residues including two possible functional cysteines in the putative active site. The possible molecular function of the protein is discussed.  相似文献   

14.
    
Xanthomonas campestris pv. campestris is a Gram‐negative yellow‐pigmented pathogenic bacterium that causes black rot, one of the major worldwide diseases of cruciferous crops. Its genome contains approximately 4500 genes, roughly one third of which have no known structure and/or function. However, some of these unknown genes are highly conserved among several different bacterial genuses. XC229 is one such protein containing 134 amino acids. It was overexpressed in Escherichia coli, purified and crystallized using the hanging‐drop vapour‐diffusion method. The crystal diffracted to a resolution of at least 1.80 Å. It is cubic and belongs to space group I2x3, with unit‐cell parameters a = b = c = 106.8 Å. It contains one or two molecules per asymmetric unit.  相似文献   

15.
    
The coronavirus responsible for the severe acute respiratory syndrome (SARS-CoV) contains a small envelope protein, E, with putative involvement in host cell apoptosis and virus morphogenesis. It has been suggested that E protein can form a membrane destabilizing transmembrane (TM) hairpin, or homooligomerize to form a regular TM alpha-helical bundle. We have shown previously that the topology of the alpha-helical putative TM domain of E protein (ETM), flanked by two lysine residues at C and N termini to improve solubility, is consistent with a regular TM alpha-helix, with orientational parameters in lipid bilayers that are consistent with a homopentameric model. Herein, we show that this peptide, reconstituted in lipid bilayers, shows sodium conductance. Channel activity is inhibited by the anti-influenza drug amantadine, which was found to bind our preparation with moderate affinity. Results obtained from single or double mutants indicate that the organization of the transmembrane pore is consistent with our previously reported pentameric alpha-helical bundle model.  相似文献   

16.
Methods for predicting protein function from structure are becoming more important as the rate at which structures are solved increases more rapidly than experimental knowledge. As a result, protein structures now frequently lack functional annotations. The majority of methods for predicting protein function are reliant upon identifying a similar protein and transferring its annotations to the query protein. This method fails when a similar protein cannot be identified, or when any similar proteins identified also lack reliable annotations. Here, we describe a method that can assign function from structure without the use of algorithms reliant upon alignments. Using simple attributes that can be calculated from any crystal structure, such as secondary structure content, amino acid propensities, surface properties and ligands, we describe each enzyme in a non-redundant set. The set is split according to Enzyme Classification (EC) number. We combine the predictions of one-class versus one-class support vector machine models to make overall assignments of EC number to an accuracy of 35% with the top-ranked prediction, rising to 60% accuracy with the top two ranks. In doing so we demonstrate the utility of simple structural attributes in protein function prediction and shed light on the link between structure and function. We apply our methods to predict the function of every currently unclassified protein in the Protein Data Bank.  相似文献   

17.
    
Structural genomics offers a potential route to the discovery of protein function. As part of a structural genomics project focused on the hyperthermophilic crenarchaeon Pyrobaculum aerophilum, a conserved hypothetical protein, PAE2754, has been expressed in Escherichia coli, purified and crystallized. Because of the difficulties of preparing interpretable heavy‐atom derivatives with limited resolution and 8–12 molecules in the asymmetric unit, two leucine residues were selected for mutation to methionine. The double mutant L65M/L80M was created, expressed incorporating SeMet and crystallized. The crystals are monoclinic, space group P21, with unit‐cell parameters a = 56.4, b = 193.3, c = 60.5 Å, β = 94.6° and eight molecules (two tetramers) in the asymmetric unit. The crystals diffract to 2.75 Å resolution and are suitable for MAD phasing.  相似文献   

18.
19.
    
The crystal structure of a conserved hypothetical protein, TTHA0849 from Thermus thermophilus HB8, has been determined at 2.4 Å resolution as a part of a structural and functional genomics project on T. thermophilus HB8. The main‐chain folding shows a compact α+β motif, forming a hydrophobic cavity in the molecule. A structural similarity search reveals that it resembles those steroidogenic acute regulatory proteins that contain the lipid‐transfer (START) domain, even though TTHA0849 shows comparatively weak sequence identity to polyketide cyclases. However, the size of the ligand‐binding cavity is distinctly smaller than other START domain‐containing proteins, suggesting that it catalyses the transfer of smaller ligand molecules.  相似文献   

20.
    
A substantial fraction of protein sequences derived from genomic analyses is currently classified as representing 'hypothetical proteins of unknown function'. In part, this reflects the limitations of methods for comparison of sequences with very low identity. We evaluated the effectiveness of a Psi-BLAST search strategy to identify proteins of similar fold at low sequence identity. Psi-BLAST searches for structurally characterized low-sequence-identity matches were carried out on a set of over 300 proteins of known structure. Searches were conducted in NCBI's non-redundant database and were limited to three rounds. Some 614 potential homologs with 25% or lower sequence identity to 166 members of the search set were obtained. Disregarding the expect value, level of sequence identity and span of alignment, correspondence of fold between the target and potential homolog was found in more than 95% of the Psi-BLAST matches. Restrictions on expect value or span of alignment improved the false positive rate at the expense of eliminating many true homologs. Approximately three-quarters of the putative homologs obtained by three rounds of Psi-BLAST revealed no significant sequence similarity to the target protein upon direct sequence comparison by BLAST, and therefore could not be found by a conventional search. Although three rounds of Psi-BLAST identified many more homologs than a standard BLAST search, most homologs were undetected. It appears that more than 80% of all homologs to a target protein may be characterized by a lack of significant sequence similarity. We suggest that conservative use of Psi-BLAST has the potential to propose experimentally testable functions for the majority of proteins currently annotated as 'hypothetical proteins of unknown function'.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号