首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 593 毫秒
1.
Izrailev S  Farnum MA 《Proteins》2004,57(4):711-724
The problem of assigning a biochemical function to newly discovered proteins has been traditionally approached by expert enzymological analysis, sequence analysis, and structural modeling. In recent years, the appearance of databases containing protein-ligand interaction data for large numbers of protein classes and chemical compounds have provided new ways of investigating proteins for which the biochemical function is not completely understood. In this work, we introduce a method that utilizes ligand-binding data for functional classification of enzymes. The method makes use of the existing Enzyme Commission (EC) classification scheme and the data on interactions of small molecules with enzymes from the BRENDA database. A set of ligands that binds to an enzyme with unknown biochemical function serves as a query to search a protein-ligand interaction database for enzyme classes that are known to interact with a similar set of ligands. These classes provide hypotheses of the query enzyme's function and complement other computational annotations that take advantage of sequence and structural information. Similarity between sets of ligands is computed using point set similarity measures based upon similarity between individual compounds. We present the statistics of classification of the enzymes in the database by a cross-validation procedure and illustrate the application of the method on several examples.  相似文献   

2.
3.
MOTIVATION: Given the explosive growth of biomedical data as well as the literature describing results and findings, it is getting increasingly difficult to keep up to date with new information. Keeping databases synchronized with current knowledge is a time-consuming and expensive task-one which can be alleviated by automatically gathering findings from the literature using linguistic approaches. We describe a method to automatically annotate enzyme classes with disease-related information extracted from the biomedical literature for inclusion in such a database. RESULTS: Enzyme names for the 3901 enzyme classes in the BRENDA database, a repository for quantitative and qualitative enzyme information, were identified in more than 100,000 abstracts retrieved from the PubMed literature database. Phrases in the abstracts were assigned to concepts from the Unified Medical Language System (UMLS) utilizing the MetaMap program, allowing for the identification of disease-related concepts by their semantic fields in the UMLS ontology. Assignments between enzyme classes and diseases were created based on their co-occurrence within a single sentence. False positives could be removed by a variety of filters including minimum number of co-occurrences, removal of sentences containing a negation and the classification of sentences based on their semantic fields by a Support Vector Machine. Verification of the assignments with a manually annotated set of 1500 sentences yielded favorable results of 92% precision at 50% recall, sufficient for inclusion in a high-quality database. AVAILABILITY: Source code is available from the author upon request. SUPPLEMENTARY INFORMATION: ftp.uni-koeln.de/institute/biochemie/pub/brenda/info/diseaseSupp.pdf.  相似文献   

4.
The annotation of protein function at genomic scale is essential for day-to-day work in biology and for any systematic approach to the modeling of biological systems. Currently, functional annotation is essentially based on the expansion of the relatively small number of experimentally determined functions to large collections of proteins. The task of systematic annotation faces formidable practical problems related to the accuracy of the input experimental information, the reliability of current systems for transferring information between related sequences, and the reproducibility of the links between database information and the original experiments reported in publications. These technical difficulties merely lie on the surface of the deeper problem of the evolution of protein function in the context of protein sequences and structures. Given the mixture of technical and scientific challenges, it is not surprising that errors are introduced, and expanded, in database annotations. In this situation, a more realistic option is the development of a reliability index for database annotations, instead of depending exclusively on efforts to correct databases. Several groups have attempted to compare the database annotations of similar proteins, which constitutes the first steps toward the calibration of the relationship between sequence and annotation space.  相似文献   

5.
Public sequence databases contain information on the sequence, structure and function of proteins. Genome sequencing projects have led to a rapid increase in protein sequence information, but reliable, experimentally verified, information on protein function lags a long way behind. To address this deficit, functional annotation in protein databases is often inferred by sequence similarity to homologous, annotated proteins, with the attendant possibility of error. Now, the functional annotation in these homologous proteins may itself have been acquired through sequence similarity to yet other proteins, and it is generally not possible to determine how the functional annotation of any given protein has been acquired. Thus the possibility of chains of misannotation arises, a process we term 'error percolation'. With some simple assumptions, we develop a dynamical probabilistic model for these misannotation chains. By exploring the consequences of the model for annotation quality it is evident that this iterative approach leads to a systematic deterioration of database quality.  相似文献   

6.
Using a previously developed automated method for enzyme annotation, we report the re-annotation of the ENZYME database and the analysis of local error rates per class. In control experiments, we demonstrate that the method is able to correctly re-annotate 91% of all Enzyme Classification (EC) classes with high coverage (755 out of 827). Only 44 enzyme classes are found to contain false positives, while the remaining 28 enzyme classes are not represented. We also show cases where the re-annotation procedure results in partial overlaps for those few enzyme classes where a certain inconsistency might appear between homologous proteins, mostly due to function specificity. Our results allow the interactive exploration of the EC hierarchy for known enzyme families as well as putative enzyme sequences that may need to be classified within the EC hierarchy. These aspects of our framework have been incorporated into a web-server, called CORRIE, which stands for Correspondence Indicator Estimation and allows the interactive prediction of a functional class for putative enzymes from sequence alone, supported by probabilistic measures in the context of the pre-calculated Correspondence Indicators of known enzymes with the functional classes of the EC hierarchy. The CORRIE server is available at: http://www.genomes.org/services/corrie/.  相似文献   

7.
Despite its importance in plant metabolism, no sequences of higher plant ATP-dependent phosphofructokinase (EC 2.7.1.11) are annotated in the databases. We have purified the enzyme from spinach leaves 309-fold to electrophoretic homogeneity. The purified enzyme was a homotetramer of approximately 52 kDa subunits with a specific activity of 600 mU x mg(-1) and a Km value for ATP of 81 microm. The purified enzyme was not activated by phosphate, but slightly inhibited instead, suggesting that it was the chloroplast isoform. The inclusion of adenosine 5'-(beta,gamma-imido)triphosphate was conducive to enzyme activity during the purification protocol. The sequences of eight tryptic peptides from the final protein preparation, which did not utilize pyrophosphate as a phosphoryl donor, were determined and an exactly corresponding cDNA was cloned. The sequence of enzymatically active spinach ATP-dependent phosphofructokinase suggests that a large family of genomics-derived higher plant sequences currently annotated in the databases as putative pyrophosphate-dependent phosphofructokinases according to sequence similarity is misannotated with respect to the cosubstrate.  相似文献   

8.
Practical limits of function prediction   总被引:15,自引:0,他引:15  
Devos D  Valencia A 《Proteins》2000,41(1):98-107
  相似文献   

9.
For a very long time, Type II restriction enzymes (REases) have been a paradigm of ORFans: proteins with no detectable similarity to each other and to any other protein in the database, despite common cellular and biochemical function. Crystallographic analyses published until January 2008 provided high-resolution structures for only 28 of 1637 Type II REase sequences available in the Restriction Enzyme database (REBASE). Among these structures, all but two possess catalytic domains with the common PD-(D/E)XK nuclease fold. Two structures are unrelated to the others: R.BfiI exhibits the phospholipase D (PLD) fold, while R.PabI has a new fold termed 'half-pipe'. Thus far, bioinformatic studies supported by site-directed mutagenesis have extended the number of tentatively assigned REase folds to five (now including also GIY-YIG and HNH folds identified earlier in homing endonucleases) and provided structural predictions for dozens of REase sequences without experimentally solved structures. Here, we present a comprehensive study of all Type II REase sequences available in REBASE together with their homologs detectable in the nonredundant and environmental samples databases at the NCBI. We present the summary and critical evaluation of structural assignments and predictions reported earlier, new classification of all REase sequences into families, domain architecture analysis and new predictions of three-dimensional folds. Among 289 experimentally characterized (not putative) Type II REases, whose apparently full-length sequences are available in REBASE, we assign 199 (69%) to contain the PD-(D/E)XK domain. The HNH domain is the second most common, with 24 (8%) members. When putative REases are taken into account, the fraction of PD-(D/E)XK and HNH folds changes to 48% and 30%, respectively. Fifty-six characterized (and 521 predicted) REases remain unassigned to any of the five REase folds identified so far, and may exhibit new architectures. These enzymes are proposed as the most interesting targets for structure determination by high-resolution experimental methods. Our analysis provides the first comprehensive map of sequence-structure relationships among Type II REases and will help to focus the efforts of structural and functional genomics of this large and biotechnologically important class of enzymes.  相似文献   

10.

Background

Despite several recent advances in the automated generation of draft metabolic reconstructions, the manual curation of these networks to produce high quality genome-scale metabolic models remains a labour-intensive and challenging task.

Results

We present PathwayBooster, an open-source software tool to support the manual comparison and curation of metabolic models. It combines gene annotations from GenBank files and other sources with information retrieved from the metabolic databases BRENDA and KEGG to produce a set of pathway diagrams and reports summarising the evidence for the presence of a reaction in a given organism’s metabolic network. By comparing multiple sources of evidence within a common framework, PathwayBooster assists the curator in the identification of likely false positive (misannotated enzyme) and false negative (pathway hole) reactions. Reaction evidence may be taken from alternative annotations of the same genome and/or a set of closely related organisms.

Conclusions

By integrating and visualising evidence from multiple sources, PathwayBooster reduces the manual effort required in the curation of a metabolic model. The software is available online at http://www.theosysbio.bio.ic.ac.uk/resources/pathwaybooster/.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0447-2) contains supplementary material, which is available to authorized users.  相似文献   

11.
BRENDA (BRaunschweig ENzyme DAtabase), founded in 1987 by Dietmar Schomburg, is a comprehensive protein function database, containing enzymatic and metabolic information extracted from the primary literature. Presently, the database holds data on more than 40 000 enzymes and 4460 different organisms, and includes information about enzyme-ligand relationships with numerous chemical compounds. The collection of molecular and biochemical information in BRENDA provides a fundamental resource for research in biotechnology, pharmacology, medicinal diagnostics, enzyme mechanics, and metabolism. BRENDA is accessible free of charge to the academic community at http://www.brenda.uni-koeln.de/; commercial users need a license available from http://www.science-factory.com/  相似文献   

12.
MOTIVATION: Sequence annotations, functional and structural data on snake venom neurotoxins (svNTXs) are scattered across multiple databases and literature sources. Sequence annotations and structural data are available in the public molecular databases, while functional data are almost exclusively available in the published articles. There is a need for a specialized svNTXs database that contains NTX entries, which are organized, well annotated and classified in a systematic manner. RESULTS: We have systematically analyzed svNTXs and classified them using structure-function groups based on their structural, functional and phylogenetic properties. Using conserved motifs in each phylogenetic group, we built an intelligent module for the prediction of structural and functional properties of unknown NTXs. We also developed an annotation tool to aid the functional prediction of newly identified NTXs as an additional resource for the venom research community. AVAILABILITY: We created a searchable online database of NTX proteins sequences (http://research.i2r.a-star.edu.sg/Templar/DB/snake_neurotoxin). This database can also be found under Swiss-Prot Toxin Annotation Project website (http://www.expasy.org/sprot/).  相似文献   

13.
High throughput genome (HTG) and expressed sequence tag (EST) sequences are currently the most abundant nucleotide sequence classes in the public database. The large volume, high degree of fragmentation and lack of gene structure annotations prevent efficient and effective searches of HTG and EST data for protein sequence homologies by standard search methods. Here, we briefly describe three newly developed resources that should make discovery of interesting genes in these sequence classes easier in the future, especially to biologists not having access to a powerful local bioinformatics environment. trEST and trGEN are regularly regenerated databases of hypothetical protein sequences predicted from EST and HTG sequences, respectively. Hits is a web-based data retrieval and analysis system providing access to precomputed matches between protein sequences (including sequences from trEST and trGEN) and patterns and profiles from Prosite and Pfam. The three resources can be accessed via the Hits home page (http://hits. isb-sib.ch).  相似文献   

14.
Functional classification of proteins from sequences alone has become a critical bottleneck in understanding the myriad of protein sequences that accumulate in our databases. The great diversity of homologous sequences hides, in many cases, a variety of functional activities that cannot be anticipated. Their identification appears critical for a fundamental understanding of the evolution of living organisms and for biotechnological applications. ProfileView is a sequence-based computational method, designed to functionally classify sets of homologous sequences. It relies on two main ideas: the use of multiple profile models whose construction explores evolutionary information in available databases, and a novel definition of a representation space in which to analyze sequences with multiple profile models combined together. ProfileView classifies protein families by enriching known functional groups with new sequences and discovering new groups and subgroups. We validate ProfileView on seven classes of widespread proteins involved in the interaction with nucleic acids, amino acids and small molecules, and in a large variety of functions and enzymatic reactions. ProfileView agrees with the large set of functional data collected for these proteins from the literature regarding the organization into functional subgroups and residues that characterize the functions. In addition, ProfileView resolves undefined functional classifications and extracts the molecular determinants underlying protein functional diversity, showing its potential to select sequences towards accurate experimental design and discovery of novel biological functions. On protein families with complex domain architecture, ProfileView functional classification reconciles domain combinations, unlike phylogenetic reconstruction. ProfileView proves to outperform the functional classification approach PANTHER, the two k-mer-based methods CUPP and eCAMI and a neural network approach based on Restricted Boltzmann Machines. It overcomes time complexity limitations of the latter.  相似文献   

15.
NetAffx (http://www.affymetrix.com) details and annotates probesets on Affymetrix GeneChip microarrays. These annotations include (i) static information specific to the probeset composition; (ii) sequence annotations extracted from public databases; and (iii) protein sequence-level annotations derived from public domain programs, as well as libraries of hidden Markov models (HMMs) developed at Affymetrix. For each probeset, NetAffx lists the probe sequences, and the consensus sequence interrogated by the probes; for the larger chip sets, interactive maps display this sequence data in genomic context. Sequence annotations include Gene Ontology (GO) terms and depiction of GO graph relationships; predicted protein domains and motifs; orthologous sequences; links to relevant pathways; and links to public databases including UniGene, LocusLink, SWISS-PROT and OMIM.  相似文献   

16.
Ger MF  Rendon G  Tilson JL  Jakobsson E 《PloS one》2010,5(10):e12827
Voltage-gated and ligand-gated ion channels are used in eukaryotic organisms for the purpose of electrochemical signaling. There are prokaryotic homologues to major eukaryotic channels of these sorts, including voltage-gated sodium, potassium, and calcium channels, Ach-receptor and glutamate-receptor channels. The prokaryotic homologues have been less well characterized functionally than their eukaryotic counterparts. In this study we identify likely prokaryotic functional counterparts of eukaryotic glutamate receptor channels by comprehensive analysis of the prokaryotic sequences in the context of known functional domains present in the eukaryotic members of this family. In particular, we searched the nonredundant protein database for all proteins containing the following motif: the two sections of the extracellular glutamate binding domain flanking two transmembrane helices. We discovered 100 prokaryotic sequences containing this motif, with a wide variety of functional annotations. Two groups within this family have the same topology as eukaryotic glutamate receptor channels. Group 1 has a potassium-like selectivity filter. Group 2 is most closely related to eukaryotic glutamate receptor channels. We present analysis of the functional domain architecture for the group of 100, a putative phylogenetic tree, comparison of the protein phylogeny with the corresponding species phylogeny, consideration of the distribution of these proteins among classes of prokaryotes, and orthologous relationships between prokaryotic and human glutamate receptor channels. We introduce a construct called the Evolutionary Domain Network, which represents a putative pathway of domain rearrangements underlying the domain composition of present channels. We believe that scientists interested in ion channels in general, and ligand-gated ion channels in particular, will be interested in this work. The work should also be of interest to bioinformatics researchers who are interested in the use of functional domain-based analysis in evolutionary and functional discovery.  相似文献   

17.
Most fungal glutathione transferases (GSTs) do not fit easily into any of the previously characterised classes by immunological, sequence or catalytic criteria. In contrast to the paucity of studies on GSTs cloned or isolated from fungal sources, a screen of databases revealed 67 GST-like sequences from 21 fungal species. Comparison by multiple sequence alignment generated a dendrogram revealing five clusters of GST-like proteins designated clusters 1, 2, EFIBgamma, Ure2p and MAK16, the last three of which have previously been related to the GST superfamily. Surprisingly, a relatively small number of fungal GSTs belong to mainstream classes and the previously-described fungal Gamma class is not widespread in the 21 species studied. Representative crystal structures are available for the EFIBgamma and Ure2p classes and the domain structures of representative sequences are compared with these. In addition, there are some "orphan" sequences that do not fit into any previously-described class, but show similarity to genes implicated in fungal biosynthetic gene clusters. We suggest that GST-like sequences are widespread in fungi, participating in a wide range of functions. They probably evolved by a process similar to domain "shuffling".  相似文献   

18.
Nonsynonymous single nucleotide polymorphisms (nsSNPs) in coding regions can lead to amino acid changes that might alter the protein’s function and account for susceptibility to disease and altered drug/xenobiotic response. Many nsSNPs have been found in genes encoding human phase II metabolizing enzymes; however, there is little known about the relationship between the genotype and phenotype of nsSNPs in these enzymes. We have identified 923 validated nsSNPs in 104 human phase II enzyme genes from the Ensembl genome database and the NCBI SNP database. Using PolyPhen, Panther, and SNAP algorithms, 44%–59% of nsSNPs in phase II enzyme genes were predicted to have functional impacts on protein function. Predictions largely agree with the available experimental annotations. 68% of deleterious nsSNPs were correctly predicted as damaging. This study also identified many amino acids that are likely to be functionally critical, but have not yet been studied experimentally. There was significant concordance between the predicted results of Panther and PolyPhen, and between SNAP non-neutral predictions and PolyPhen scores. Evolutionarily non-neutral (destabilizing) amino acid substitutions are thought to be the pathogenetic basis for the alteration of phase II enzyme activity and to be associated with disease susceptibility and drug/xenobiotic toxicity. Furthermore, the molecular evolutionary patterns of phase II enzymes were characterized with regards to the predicted deleterious nsSNPs.  相似文献   

19.
Advances in membrane cell biology are hampered by the relatively high proportion of proteins with no known function. Such proteins are largely or entirely devoid of structurally significant domain annotations. Structural bioinformaticians have developed profile‐profile tools such as HHsearch (online version called HHpred), which can detect remote homologies that are missed by tools used to annotate databases. Here we have applied HHsearch to study a single structural fold in a single model organism as proof of principle. In the entire clan of protein domains sharing the pleckstrin homology domain fold in yeast, systematic application of HHsearch accurately identified known PH‐like domains. It also predicted 16 new domains in 13 yeast proteins many of which are implicated in intracellular traffic. One of these was Vps13p, where we confirmed the functional importance of the predicted PH‐like domain. Even though such predictions require considerable work to be corroborated, they are useful first steps. HHsearch should be applied more widely, particularly across entire proteomes of model organisms, to significantly improve database annotations.   相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号