首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The UniProt/Swiss-Prot Knowledgebase records about 30,500 variants in 5,664 proteins (Release 52.2). Most of these variants are manually curated single amino acid polymorphisms (SAPs) with references to the literature. In order to keep the list of published documents related to SAPs up to date, an automatic information retrieval method is developed to recover texts mentioning SAPs. The method is based on the use of regular expressions (patterns) and rules for the detection and validation of mutations. When evaluated using a corpus of 9,820 PubMed references, the precision of the retrieval was determined to be 89.5% over all variants. It was also found that the use of nonstandard mutation nomenclature and sequence positional correction is necessary to retrieve a significant number of relevant articles. The method was applied to the 5,664 proteins with variants. This was performed by first submitting a PubMed query to retrieve articles using gene or protein names and a list of mutation-related keywords; the SAP detection procedure was then used to recover relevant documents. The method was found to be efficient in retrieving new references on known polymorphisms. New references on known SAPs will be rendered accessible to the public via the Swiss-Prot variant pages.  相似文献   

2.
3.
Computational methods seeking to automatically determine the properties (functional, structural, physicochemical, etc.) of a protein directly from the sequence have long been the focus of numerous research groups. With the advent of advanced sequencing methods and systems, the number of amino acid sequences that are being deposited in the public databases has been increasing steadily. This has in turn generated a renewed demand for automated approaches that can annotate individual sequences and complete genomes quickly, exhaustively and objectively. In this paper, we present one such approach that is centered around and exploits the Bio-Dictionary, a collection of amino acid patterns that completely covers the natural sequence space and can capture functional and structural signals that have been reused during evolution, within and across protein families. Our annotation approach also makes use of a weighted, position-specific scoring scheme that is unaffected by the over-representation of well-conserved proteins and protein fragments in the databases used. For a given query sequence, the method permits one to determine, in a single pass, the following: local and global similarities between the query and any protein already present in a public database; the likeness of the query to all available archaeal/ bacterial/eukaryotic/viral sequences in the database as a function of amino acid position within the query; the character of secondary structure of the query as a function of amino acid position within the query; the cytoplasmic, transmembrane or extracellular behavior of the query; the nature and position of binding domains, active sites, post-translationally modified sites, signal peptides, etc. In terms of performance, the proposed method is exhaustive, objective and allows for the rapid annotation of individual sequences and full genomes. Annotation examples are presented and discussed in Results, including individual queries and complete genomes that were released publicly after we built the Bio-Dictionary that is used in our experiments. Finally, we have computed the annotations of more than 70 complete genomes and made them available on the World Wide Web at http://cbcsrv.watson.ibm.com/Annotations/.  相似文献   

4.
Salt stress negatively affects plant growth and crop productivity. As an ideal model pathway of salt tolerance in glycophyte. To better understand the molecular mechanisms of salt-response in glycophyte, 466 of 15,768 Arabidopsis thaliana proteins with the GO term of biological with known genetic background, Arabidopsis thaliana has been widely applied to disclose the process ‘response to salt stress’ were retrieved from UniPort and analyzed by bioinformatics tools of PANTHER, DAVID, KEGG, Cytoscape and STRING. Our results not only indicated the involvement of salt-responsive proteins in various pathways and interaction networks, but also demonstrated the more complicated cross-tolerances to both abiotic stresses (osmosis, water deprivation, abscisic acid, cold, heat, light and wounding) and biotic stresses (bacterium and fungus) and multiple subcellular locations of these salt-responsive proteins. Furthermore, protein activities of superoxide dismutase (SOD) and peroxidase (POD) in Arabidopsis thaliana were determined under salt, cold and osmotic stresses, which validated the hypothesis of cross-tolerance to multiple stresses. Our work will greatly improve the current knowledge of salt tolerance mechanism in glycophytes and provide potential salt-responsive candidates for promoting plant growth and increasing crop output.  相似文献   

5.

Background  

Due to the complex and distributed nature of biological research, our current biological knowledge is spread over many redundant annotation databases maintained by many independent groups. Analysts usually need to visit many of these bioinformatics databases in order to integrate comprehensive annotation information for their genes, which becomes one of the bottlenecks, particularly for the analytic task associated with a large gene list. Thus, a highly centralized and ready-to-use gene-annotation knowledgebase is in demand for high throughput gene functional analysis.  相似文献   

6.
7.
A wide range of web based prediction and annotation tools are frequently used for determining protein function from sequence. However, parallel processing of sequences for annotation through web tools is not possible due to several constraints in functional programming for multiple queries. Here, we propose the development of APAF as an automated protein annotation filter to overcome some of these difficulties through an integrated approach.  相似文献   

8.
WILMA-automated annotation of protein sequences   总被引:1,自引:0,他引:1  
  相似文献   

9.
The annotation of protein function at genomic scale is essential for day-to-day work in biology and for any systematic approach to the modeling of biological systems. Currently, functional annotation is essentially based on the expansion of the relatively small number of experimentally determined functions to large collections of proteins. The task of systematic annotation faces formidable practical problems related to the accuracy of the input experimental information, the reliability of current systems for transferring information between related sequences, and the reproducibility of the links between database information and the original experiments reported in publications. These technical difficulties merely lie on the surface of the deeper problem of the evolution of protein function in the context of protein sequences and structures. Given the mixture of technical and scientific challenges, it is not surprising that errors are introduced, and expanded, in database annotations. In this situation, a more realistic option is the development of a reliability index for database annotations, instead of depending exclusively on efforts to correct databases. Several groups have attempted to compare the database annotations of similar proteins, which constitutes the first steps toward the calibration of the relationship between sequence and annotation space.  相似文献   

10.
MOTIVATION: Assignment of putative protein functional annotation by comparative analysis using pre-defined experimental annotations is performed routinely by molecular biologists. The number and statistical significance of these assignments remains a challenge in this era of high-throughput proteomics. A combined statistical method that enables robust, automated protein annotation by reliably expanding existing annotation sets is described. An existing clustering scheme, based on relevant experimental information (e.g. sequence identity, keywords or gene expression data) is required. The method assigns new proteins to these clusters with a measure of reliability. It can also provide human reviewers with a reliability score for both new and previously classified proteins. RESULTS: A dataset of 27 000 annotated Protein Data Bank (PDB) polypeptide chains (of 36 000 chains currently in the PDB) was generated from 23 000 chains classified a priori. AVAILABILITY: PDB annotations and sample software implementation are freely accessible on the Web at http://pmr.sdsc.edu/go  相似文献   

11.
In this study, we present two freely available and complementary Distributed Annotation System (DAS) resources: a DAS reference server that provides up-to-date sequence and annotation from UniProt, with additional feature links and database cross-references from InterPro and a DAS client implemented using Java and Macromedia Flash that is optimized for the display of protein features.  相似文献   

12.
The effectiveness of any proteomics database search depends on the theoretical candidate information contained in the protein database. Unfortunately, candidate entries from protein databases such as UniProt rarely contain all the post-translational modifications (PTMs), disulfide bonds, or endogenous cleavages of interest to researchers. These omissions can limit discovery of novel and biologically important proteoforms. Conversely, searching for a specific proteoform becomes a computationally difficult task for heavily modified proteins. Both situations require updates to the database through user-annotated entries. Unfortunately, manually creating properly formatted UniProt Extensible Markup Language (XML) files is tedious and prone to errors. ProSight Annotator solves these issues by providing a graphical interface for adding user-defined features to UniProt-formatted XML files for better informed proteoform searches. It can be downloaded from http://prosightannotator.northwestern.edu .  相似文献   

13.
Whole-genome sequencing projects are a major source of unknown function proteins. However, as predicting protein function from sequence remains a difficult task, research groups recently started to use 3D protein structures and structural models to bypass it. MED-SuMo compares protein surfaces analyzing the composition and spatial distribution of specific chemical groups (hydrogen bond donor, acceptor, positive, negative, aromatic, hydrophobic, guanidinium, hydroxyl, acyl and glycine). It is able to recognize proteins that have similar binding sites and thus, may perform similar functions. We present here a fine example which points out the interest of MED-SuMo approach for functional structural annotation.  相似文献   

14.
In this article, we provide a comprehensive study of the content of the Universal Protein Resource (UniProt) protein data sets for human and mouse. The tryptic search spaces of the UniProtKB (UniProt knowledgebase) complete proteome sets were compared with other data sets from UniProtKB and with the corresponding International Protein Index, reference sequence, Ensembl, and UniRef100 (where UniRef is UniProt reference clusters) organism‐specific data sets. All protein forms annotated in UniProtKB (both the canonical sequences and isoforms) were evaluated in this study. In addition, natural and disease‐associated amino acid variants annotated in UniProtKB were included in the evaluation. The peptide unicity was also evaluated for each data set. Furthermore, the peptide information in the UniProtKB data sets was also compared against the available peptide‐level identifications in the main MS‐based proteomics repositories. Identifying the peptides observed in these repositories is an important resource of information for protein databases as they provide supporting evidence for the existence of otherwise predicted proteins. Likewise, the repositories could use the information available in UniProtKB to direct reprocessing efforts on specific sets of peptides/proteins of interest. In summary, we provide comprehensive information about the different organism‐specific sequence data sets available from UniProt, together with the pros and cons for each, in terms of search space for MS‐based bottom‐up proteomics workflows. The aim of the analysis is to provide a clear view of the tryptic search space of UniProt and other protein databases to enable scientists to select those most appropriate for their purposes.  相似文献   

15.
The use of evolutionary patterns in protein annotation   总被引:1,自引:0,他引:1  
With genomic data skyrocketing, their biological interpretation remains a serious challenge. Diverse computational methods address this problem by pointing to the existence of recurrent patterns among sequence, structure, and function. These patterns emerge naturally from evolutionary variation, natural selection, and divergence--the defining features of biological systems--and they identify molecular events and shapes that underlie specificity of function and allosteric communication. Here we review these methods, and the patterns they identify in case studies and in proteome-wide applications, to infer and rationally redesign function.  相似文献   

16.
Activity-based protein profiling (ABPP), the use of active site-directed chemical probes to monitor enzyme function in complex biological systems, is emerging as a powerful post-genomic technology. ABPP probes have been developed for several enzyme classes and have been used to inventory enzyme activities en masse for a range of (patho) physiological processes. By presenting specific examples, we show here that ABPP provides researchers with a distinctive set of chemical tools to embark on the assignment of functions to many of the uncharacterized enzymes that populate eukaryotic and prokaryotic proteomes.  相似文献   

17.
MOTIVATION: Protein annotation is a task that describes protein X in terms of topic Y. Usually, this is constructed using information from the biomedical literature. Until now, most of literature-based protein annotation work has been done manually by human annotators. However, as the number of biomedical papers grows ever more rapidly, manual annotation becomes more difficult, and there is increasing need to automate the process. Recently, information extraction (IE) has been used to address this problem. Typically, IE requires pre-defined relations and hand-crafted IE rules or annotated corpora, and these requirements are difficult to satisfy in real-world scenarios such as in the biomedical domain. In this article, we describe an IE system that requires only sentences labelled according to their relevance or not to a given topic by domain experts. RESULTS: We applied our system to meet the annotation needs of a well-known protein family database; the results show that our IE system can annotate proteins with a set of extracted relations by learning relations and IE rules for disease, function and structure from only relevant and irrelevant sentences.  相似文献   

18.
Functional annotation from predicted protein interaction networks   总被引:1,自引:0,他引:1  
MOTIVATION: Progress in large-scale experimental determination of protein-protein interaction networks for several organisms has resulted in innovative methods of functional inference based on network connectivity. However, the amount of effort and resources required for the elucidation of experimental protein interaction networks is prohibitive. Previously we, and others, have developed techniques to predict protein interactions for novel genomes using computational methods and data generated from other genomes. RESULTS: We evaluated the performance of a network-based functional annotation method that makes use of our predicted protein interaction networks. We show that this approach performs equally well on experimentally derived and predicted interaction networks, for both manually and computationally assigned annotations. We applied the method to predicted protein interaction networks for over 50 organisms from all domains of life, providing annotations for many previously unannotated proteins and verifying existing low-confidence annotations. AVAILABILITY: Functional predictions for over 50 organisms are available at http://bioverse.compbio.washington.edu and datasets used for analysis at http://data.compbio.washington.edu/misc/downloads/nannotation_data/. SUPPLEMENTARY INFORMATION: A supplemental appendix gives additional details not in the main text. (http://data.compbio.washington.edu/misc/downloads/nannotation_data/supplement.pdf).  相似文献   

19.
Applications of InterPro in protein annotation and genome analysis   总被引:2,自引:0,他引:2  
The applications of InterPro span a range of biologically important areas that includes automatic annotation of protein sequences and genome analysis. In automatic annotation of protein sequences InterPro has been utilised to provide reliable characterisation of sequences, identifying them as candidates for functional annotation. Rules based on the InterPro characterisation are stored and operated through a database called RuleBase. RuleBase is used as the main tool in the sequence database group at the EBI to apply automatic annotation to unknown sequences. The annotated sequences are stored and distributed in the TrEMBL protein sequence database. InterPro also provides a means to carry out statistical and comparative analyses of whole genomes. In the Proteome Analysis Database, InterPro analyses have been combined with other analyses based on CluSTr, the Gene Ontology (GO) and structural information on the proteins.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号