首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Celsius: a community resource for Affymetrix microarray data   总被引:1,自引:1,他引:0  

Helen Berman is the recipient of the Protein Society 2012 Carl Branden Award In addition to being one of the early pioneers in protein crystallography, Carl Brändén made significant contributions to science education with his elegant and beautifully illustrated book Introduction to Protein Structure (Brändén and Tooze, New York: Garland, 1991). It is truly an honor to receive this award in their names. This award and the 40th anniversary of the Protein Data Bank (PDB; Berman et al., Structure 2012;20:391–396) have given me an opportunity to reflect on the various components that have contributed to building a resource for protein science and to try to quantify the impact of having PDB data openly available.  相似文献   

The Protein Information Resource (PIR) is an integrated public resource of protein informatics that supports genomic and proteomic research and scientific discovery. PIR maintains the Protein Sequence Database (PSD), an annotated protein database containing over 283 000 sequences covering the entire taxonomic range. Family classification is used for sensitive identification, consistent annotation, and detection of annotation errors. The superfamily curation defines signature domain architecture and categorizes memberships to improve automated classification. To increase the amount of experimental annotation, the PIR has developed a bibliography system for literature searching, mapping, and user submission, and has conducted retrospective attribution of citations for experimental features. PIR also maintains NREF, a non-redundant reference database, and iProClass, an integrated database of protein family, function, and structure information. PIR-NREF provides a timely and comprehensive collection of protein sequences, currently consisting of more than 1 000 000 entries from PIR-PSD, SWISS-PROT, TrEMBL, RefSeq, GenPept, and PDB. The PIR web site (http://pir.georgetown.edu) connects data analysis tools to underlying databases for information retrieval and knowledge discovery, with functionalities for interactive queries, combinations of sequence and text searches, and sorting and visual exploration of search results. The FTP site provides free download for PSD and NREF biweekly releases and auxiliary databases and files.  相似文献   

CAMERA: a community resource for metagenomics   总被引:6,自引:4,他引:2       下载免费PDF全文

MOTIVATION: Annotation databases are widely used as public repositories of biological knowledge. However, most of these resources have been developed by independent groups which used different designs and different identifiers for the same biological entities. As we show in this article, incoherent name spaces between various databases represent a serious impediment to using the existing annotations at their full potential. Navigating between various such name spaces by mapping IDs from one database to another is a very important issue which is not properly addressed at the moment. RESULTS: We have developed a web-based resource, Onto-Translate (OT), which effectively addresses this problem. OT is able to map onto each other different types of biological entities from the following annotation databases: Swiss-Prot, TrEMBL, NREF, PIR, Gene Ontology, KEGG, Entrez Gene, GenBank, GenPept, IMAGE, RefSeq, UniGene, OMIM, PDB, Eukaryotic Promoter Database, HUGO Gene Nomenclature Committee and NetAffx. Currently, OT is able to perform 462 types of mappings between 29 different types of IDs from 17 databases concerning 53 organisms. Among these, over 300 types of translations and 15 types of IDs are not currently supported by any other tool or resource. On average, OT is able to correctly map between 96 and 99% of the biological entities provided as input. In terms of speed, sets of approximately 20 000 IDs can be translated in <30 s, in most cases. AVAILABILITY: OT is a part of Onto-Tools, which is freely available at http://vortex.cs.wayne.edu/Projects.html  相似文献   

Giving access to sequence and annotation data for genome assemblies is important because, while facilitating research, it places both assembly and annotation quality under scrutiny, resulting in improvements to both. Therefore we announce Avianbase, a resource for bird genomics, which provides access to data released by the Avian Phylogenomics Consortium.Access to complete genome sequences provides the first step towards the understanding of the biology of organisms. It is the template that underpins the phenotypic characteristics of individuals and ultimately separates species due to the accumulation and fixation of mutations over evolutionary timescales. In terms of the available genomic datasets for species, birds, as our more distant relatives, have been historically underrepresented. The high cost of sequencing and annotation in the past led to a bias towards accumulating data for species that are either established model organisms or economically significant (that is, chicken, turkey and duck, representing two sister orders within the Galloanseriformes clade from the large and diverse phylogeny of birds). The recent release of genome assemblies and initial predictions of protein-coding genes [1-4] for 44 bird species, including representatives from all major branches of the bird phylogeny, is, therefore, highly significant.One of the major challenges with the release of this number of newly sequenced genomes and the many more to come [5] is how to make these available to the various research communities in a way that supports basic research. Providing access to the sequences and initial annotations in the format of text files will limit the potential usage of the data as they require significant resources, including bioinformatics personnel and computer infrastructure in place to access and mine - for example, searching for genes belonging to certain protein families or searching for orthologous genes. These overheads pose a serious bottleneck that can hinder research and requires concerted action by the relevant research communities.Once genomes are submitted to public databases, genome-wide annotations are frequently generated and released either via the Ensembl project [6] or by the National Center for Biotechnology Information [7] and sequence and annotation are then made visually available online in integrated views via the Ensembl or the University of California Santa Cruz (UCSC) genome browsers [8]. These systems provide search facilities, sequence alignment tools like BLAT/BLAST and various analysis tools to facilitate subsetting and computational retrieval of the data, including UCSC’s Table Browser or Ensembl’s Perl and REST APIs and BioMart system.While these systems have become almost indispensable for research, not all sequenced genomes are annotated and displayed in genome browsers. Full genome annotation remains time consuming and resource intensive: a full evidence-based Ensembl genebuild takes approximately 4 months. Thus, the list of species represented is currently limited and depends on various factors, including the completeness of the assembled genome sequence and the overall demand in the scientific community for the resources, including whether the species is a model organism (for example, human or mouse), economically important (for example, farmed animals) or of specific phylogenetic interest. Many of the recently sequenced bird genomes do not obviously fall within these categories.  相似文献   

SUMMARY: GeneCruiser is a web service allowing users to annotate their genomic data by mapping microarray feature identifiers to gene identifiers from databases, such as UniGene, while providing links to web resources, such as the UCSC Genome Browser. It relies on a regularly updated database that retrieves and indexes the mappings between microarray probes and genomic databases. Genes are identified using the Life Sciences Identifier standard. AVAILABILITY: GeneCruiser is freely available in the following forms: Web service and Web application, http://www.genecruiser.org; GenePattern, GeneCruiser access has been integrated into our microarray analysis platform, GenePattern. http://www.genepattern.org.  相似文献   

MineBlast is a web service for literature search and presentation based on data-mining results received from UniProt. Users can submit a simple list of protein sequences via a web-based interface. MineBlast performs a BLASTP search in UniProt to identify names and synonyms based on homologous proteins and subsequently queries PubMed, using combined search terms inorder to find and present relevant literature.  相似文献   

Calling on a million minds for community annotation in WikiProteins   总被引:1,自引:0,他引:1  
WikiProteins enables community annotation in a Wiki-based system. Extracts of major data sources have been fused into an editable environment that links out to the original sources. Data from community edits create automatic copies of the original data. Semantic technology captures concepts co-occurring in one sentence and thus potential factual statements. In addition, indirect associations between concepts have been calculated. We call on a 'million minds' to annotate a 'million concepts' and to collect facts from the literature with the reward of collaborative knowledge discovery. The system is available for beta testing at http://www.wikiprofessional.org.  相似文献   

Assays capable of determining the properties of thousands of genes in parallel present challenges with regard to accurate data processing and functional annotation. Collections of microarray expression data are applied here to assess the quality of different high-throughput protein interaction data sets. Significant differences are found. Confidence in 973 out of 5342 putative two-hybrid interactions from S. cerevisiae is increased. Besides verification, integration of expression and interaction data is employed to provide functional annotation for over 300 previously uncharacterized genes. The robustness of these approaches is demonstrated by experiments that test the in silico predictions made. This study shows how integration improves the utility of different types of functional genomic data and how well this contributes to functional annotation.  相似文献   



The functional annotation of proteins relies on published information concerning their close and remote homologues in sequence databases. Evidence for remote sequence similarity can be further strengthened by a similar biological background of the query sequence and identified database sequences. However, few tools exist so far, that provide a means to include functional information in sequence database searches.  相似文献   

Lower eukaryotes of the kingdom Fungi include a variety of biotechnologically important yeast species that are in the focus of genome research for more than a decade. Due to the rapid progress in ultra-fast sequencing technologies, the amount of available yeast genome data increases steadily. Thus, an efficient bioinformatics platform is required that covers genome assembly, eukaryotic gene prediction, genome annotation, comparative yeast genomics, and metabolic pathway reconstruction. Here, we present a bioinformatics platform for yeast genomics named RAPYD addressing the key requirements of extensive yeast sequence data analysis. The first step is a comprehensive regional and functional annotation of a yeast genome. A region prediction pipeline was implemented to obtain reliable and high-quality predictions of coding sequences and further genome features. Functions of coding sequences are automatically determined using a configurable prediction pipeline. Based on the resulting functional annotations, a metabolic pathway reconstruction module can be utilized to rapidly generate an overview of organism-specific features and metabolic blueprints. In a final analysis step shared and divergent features of closely related yeast strains can be explored using the comparative genomics module. An in-depth application example of the yeast Meyerozyma guilliermondii illustrates the functionality of RAPYD. A user-friendly web interface is available at https://rapyd.cebitec.uni-bielefeld.de.  相似文献   

A wide range of web based prediction and annotation tools are frequently used for determining protein function from sequence. However, parallel processing of sequences for annotation through web tools is not possible due to several constraints in functional programming for multiple queries. Here, we propose the development of APAF as an automated protein annotation filter to overcome some of these difficulties through an integrated approach.  相似文献   

Post-translational modifications(PTMs) occurring at protein lysine residues,or protein lysine modifications(PLMs),play critical roles in regulating biological processes.Due to the explosive expansion of the amount of PLM substrates and the discovery of novel PLM types,here we greatly updated our previous studies,and presented a much more integrative resource of protein lysine modification database(PLMD).In PLMD,we totally collected and integrated 284,780 modification events in 53,501 proteins across 176 eukaryotes and prokaryotes for up to 20 types of PLMs,including ubiquitination,acetylation,sumoylation,methylation,succinylation,malonylation,glutarylation,giycation,formylation,hydroxylation,butyrylation,propionylation,crotonylation,pupylation,neddylation,2-hydroxyisobutyrylation,phosphoglycerylation,carboxylation,lipoylation and biotinylation.Using the data set,a motif-based analysis was performed for each PLM type,and the results demonstrated that different PLM types preferentially recognize distinct sequence motifs for the modifications.Moreover,various PLMs synergistically orchestrate specific cellular biological processes by mutual crosstalks with each other,and we totally found 65,297 PLM events involved in 90 types of PLM co-occurrences on the same lysine residues.Finally,various options were provided for accessing the data,while original references and other annotations were also present for each PLM substrate.Taken together,we anticipated the PLMD database can serve as a useful resource for further researches of PLMs.PLMD 3.0 was implemented in PHP + MySQL and freely available at http://plmd.biocuckoo.org.  相似文献   

We developed a resource, the Arabidopsis PeptideAtlas (www.peptideatlas.org/builds/arabidopsis/), to solve central questions about the Arabidopsis thaliana proteome, such as the significance of protein splice forms and post-translational modifications (PTMs), or simply to obtain reliable information about specific proteins. PeptideAtlas is based on published mass spectrometry (MS) data collected through ProteomeXchange and reanalyzed through a uniform processing and metadata annotation pipeline. All matched MS-derived peptide data are linked to spectral, technical, and biological metadata. Nearly 40 million out of ∼143 million MS/MS (tandem MS) spectra were matched to the reference genome Araport11, identifying ∼0.5 million unique peptides and 17,858 uniquely identified proteins (only isoform per gene) at the highest confidence level (false discovery rate 0.0004; 2 non-nested peptides ≥9 amino acid each), assigned canonical proteins, and 3,543 lower-confidence proteins. Physicochemical protein properties were evaluated for targeted identification of unobserved proteins. Additional proteins and isoforms currently not in Araport11 were identified that were generated from pseudogenes, alternative start, stops, and/or splice variants, and small Open Reading Frames; these features should be considered when updating the Arabidopsis genome. Phosphorylation can be inspected through a sophisticated PTM viewer. PeptideAtlas is integrated with community resources including TAIR, tracks in JBrowse, PPDB, and UniProtKB. Subsequent PeptideAtlas builds will incorporate millions more MS/MS data.

A web resource providing the global community with mass spectrometry-based Arabidopsis proteome information and its spectral, technical, and biological metadata integrated with TAIR and JBrowse.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号