首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Babnigg G  Giometti CS 《Proteomics》2006,6(16):4514-4522
In proteome studies, identification of proteins requires searching protein sequence databases. The public protein sequence databases (e.g., NCBInr, UniProt) each contain millions of entries, and private databases add thousands more. Although much of the sequence information in these databases is redundant, each database uses distinct identifiers for the identical protein sequence and often contains unique annotation information. Users of one database obtain a database-specific sequence identifier that is often difficult to reconcile with the identifiers from a different database. When multiple databases are used for searches or the databases being searched are updated frequently, interpreting the protein identifications and associated annotations can be problematic. We have developed a database of unique protein sequence identifiers called Sequence Globally Unique Identifiers (SEGUID) derived from primary protein sequences. These identifiers serve as a common link between multiple sequence databases and are resilient to annotation changes in either public or private databases throughout the lifetime of a given protein sequence. The SEGUID Database can be downloaded (http://bioinformatics.anl.gov/SEGUID/) or easily generated at any site with access to primary protein sequence databases. Since SEGUIDs are stable, predictions based on the primary sequence information (e.g., pI, Mr) can be calculated just once; we have generated approximately 500 different calculations for more than 2.5 million sequences. SEGUIDs are used to integrate MS and 2-DE data with bioinformatics information and provide the opportunity to search multiple protein sequence databases, thereby providing a higher probability of finding the most valid protein identifications.  相似文献   

2.
Homology Gene List (HOMGL) is a web-based tool for comparing gene lists with different accession numbers and identifiers and between different organisms. UniGene, LocusLink, HomoloGene and Ensembl databases are utilized to map between these lists and to retrieve upstream or transcribed sequences for genes in these lists. We illustrate the use of HOMGL with respect to microarray studies and promoter analysis. AVAILABILITY: http://homgl.biologie.hu-berlin.de/  相似文献   

3.
In this paper, we present RuleMiner, a knowledge system to facilitate a seamless integration of multi-sequence analysis tools and define profile-based rules for supporting high-throughput protein function annotations. This system consists of three essential components, Protein Function Groups (PFGs), PFG profiles and rules. The PFGs, established from an integrated analysis of current knowledge of protein functions from Swiss-Prot database and protein family-based sequence classifications, cover all possible cellular functions available in the database. The PFG profiles illustrate detailed protein features in the PFGs as in sequence conservations, the occurrences of sequence-based motifs, domains and species distributions. The rules, extracted from the PFG profiles, describe the clear relationships between these PFGs and all possible features. As a result, the RuleMiner is able to provide an enhanced capability for protein function analysis, such as results from the integrated sequence analysis tools for given proteins can be comparatively analyzed due to the clear feature-PFG relationships. Also, much needed guidance is readily available for such analysis. If the rules describe one-to-one (unique) relationships between the protein features and the PFGs, then these features can be utilized as unique functional identifiers and cellular functions of unknown proteins can be reliably determined. Otherwise, additional information has to be provided.  相似文献   

4.
Predicting subcellular localization of human proteins is a challenging problem, particularly when query proteins may have a multiplex character, i.e., simultaneously exist at, or move between, two or more different subcellular location sites. In a previous study, we developed a predictor called “Hum-mPLoc” to deal with the multiplex problem for the human protein system. However, Hum-mPLoc has the following shortcomings. (1) The input of accession number for a query protein is required in order to obtain a higher expected success rate by selecting to use the higher-level prediction pathway; but many proteins, such as synthetic and hypothetical proteins as well as those newly discovered proteins without being deposited into databanks yet, do not have accession numbers. (2) Neither functional domain nor sequential evolution information were taken into account in Hum-mPLoc, and hence its power may be reduced accordingly. In view of this, a top-down strategy to address these shortcomings has been implemented. The new predictor thus obtained is called Hum-mPLoc 2.0, where the accession number for input is no longer needed whatsoever. Moreover, both the functional domain information and the sequential evolution information have been fused into the predictor by an ensemble classifier. As a consequence, the prediction power has been significantly enhanced. The web server of Hum-mPLoc2.0 is freely accessible at http://www.csbio.sjtu.edu.cn/bioinf/hum-multi-2/.  相似文献   

5.
When analyzing proteins in complex samples using tandem mass spectrometry of peptides generated by proteolysis, the inference of proteins can be ambiguous, even with well-validated peptides. Unresolved questions include whether to show all possible proteins vs a minimal list, what to do when proteins are inferred ambiguously, and how to quantify peptides that bridge multiple proteins, each with distinguishing evidence. Here we describe IsoformResolver, a peptide-centric protein inference algorithm that clusters proteins in two ways, one based on peptides experimentally identified from MS/MS spectra, and the other based on peptides derived from an in silico digest of the protein database. MS/MS-derived protein groups report minimal list proteins in the context of all possible proteins, without redundantly listing peptides. In silico-derived protein groups pull together functionally related proteins, providing stable identifiers. The peptide-centric grouping strategy used by IsoformResolver allows proteins to be displayed together when they share peptides in common, providing a comprehensive yet concise way to organize protein profiles. It also summarizes information on spectral counts and is especially useful for comparing results from multiple LC-MS/MS experiments. Finally, we examine the relatedness of proteins within IsoformResolver groups and compare its performance to other protein inference software.  相似文献   

6.
Wang C  Marshall A  Zhang D  Wilson ZA 《Plant physiology》2012,158(4):1523-1533
Protein interactions are fundamental to the molecular processes occurring within an organism and can be utilized in network biology to help organize, simplify, and understand biological complexity. Currently, there are more than 10 publicly available Arabidopsis (Arabidopsis thaliana) protein interaction databases. However, there are limitations with these databases, including different types of interaction evidence, a lack of defined standards for protein identifiers, differing levels of information, and, critically, a lack of integration between them. In this paper, we present an interactive bioinformatics Web tool, ANAP (Arabidopsis Network Analysis Pipeline), which serves to effectively integrate the different data sets and maximize access to available data. ANAP has been developed for Arabidopsis protein interaction integration and network-based study to facilitate functional protein network analysis. ANAP integrates 11 Arabidopsis protein interaction databases, comprising 201,699 unique protein interaction pairs, 15,208 identifiers (including 11,931 The Arabidopsis Information Resource Arabidopsis Genome Initiative codes), 89 interaction detection methods, 73 species that interact with Arabidopsis, and 6,161 references. ANAP can be used as a knowledge base for constructing protein interaction networks based on user input and supports both direct and indirect interaction analysis. It has an intuitive graphical interface allowing easy network visualization and provides extensive detailed evidence for each interaction. In addition, ANAP displays the gene and protein annotation in the generated interactive network with links to The Arabidopsis Information Resource, the AtGenExpress Visualization Tool, the Arabidopsis 1,001 Genomes GBrowse, the Protein Knowledgebase, the Kyoto Encyclopedia of Genes and Genomes, and the Ensembl Genome Browser to significantly aid functional network analysis. The tool is available open access at http://gmdd.shgmo.org/Computational-Biology/ANAP.  相似文献   

7.

Background  

Each major protein database uses its own conventions when assigning protein identifiers. Resolving the various, potentially unstable, identifiers that refer to identical proteins is a major challenge. This is a common problem when attempting to unify datasets that have been annotated with proteins from multiple data sources or querying data providers with one flavour of protein identifiers when the source database uses another. Partial solutions for protein identifier mapping exist but they are limited to specific species or techniques and to a very small number of databases. As a result, we have not found a solution that is generic enough and broad enough in mapping scope to suit our needs.  相似文献   

8.
Given the growing wealth of downstream information, the integration of molecular and non-molecular data on a given organism has become a major challenge. For micro-organisms, this information now includes a growing collection of sequenced genes and complete genomes, and for communities of organisms it includes metagenomes. Integration of the data is facilitated by the existence of authoritative, community-recognized, consensus identifiers that may form the heart of so-called information knuckles. The Genomic Standards Consortium (GSC) is building a mapping of identifiers across a group of federated databases with the aim to improve navigation across these resources and to enable the integration of their information in the near future. In particular, this is possible because of the existence of INSDC Genome Project Identifiers (GPIDs) and accession numbers, and the ability of the community to define new consensus identifiers such as the culture identifiers used in the StrainInfo.net bioportal. Here we outline (1) the general design of the Genomic Rosetta Stone project, (2) introduce example linkages between key databases (that cover information about genomes, 16S rRNA gene sequences, and microbial biological resource centers), and (3) make an open call for participation in this project providing a vision for its future use.  相似文献   

9.
We have isolated a gene encoding for an olfactory sensory neuron (OSN)-specific protein in an invertebrate, the land snail Eobania vermiculata (GenBank accession number AY147909). Using in situ hybridization, we detected expression of its mRNA in the dendrite, cell body and axon of OSNs. By neural tracing, using the lipophilic tracer DiI and in situ hybridization, we have revealed the organization of OSNs and their connections with olfactory glomeruli in the land snail. Sequence and expression pattern analogy of land snail protein with olfactory marker protein (OMP) from vertebrates suggest that the land snail protein is an OMP-like protein. This protein could represent a plesiomorphic character in the evolution of olfactory proteins.  相似文献   

10.
A major challenge facing biodiversity informatics is integrating data stored in widely distributed databases. Initial efforts have relied on taxonomic names as the shared identifier linking records in different databases. However, taxonomic names have limitations as identifiers, being neither stable nor globally unique, and the pace of molecular taxonomic and phylogenetic research means that a lot of information in public sequence databases is not linked to formal taxonomic names. This review explores the use of other identifiers, such as specimen codes and GenBank accession numbers, to link otherwise disconnected facts in different databases. The structure of these links can also be exploited using the PageRank algorithm to rank the results of searches on biodiversity databases. The key to rich integration is a commitment to deploy and reuse globally unique, shared identifiers [such as Digital Object Identifiers (DOIs) and Life Science Identifiers (LSIDs)], and the implementation of services that link those identifiers.  相似文献   

11.
Our web-based tool simplifies the often laborious procedure of retrieving a set of biosequences in a publication or webpage. As a front-end to the Bioperl toolkit, it accepts as an input a list of identifiers. They are specified in an ASCII table (copy-pasted from the publication's PDF or HTML page) and give rise to queries in multiple databases for the protein/nucleic acid data specified. Currently, GenBank, PIR (Protein Information Resource) and Swiss-Prot are supported. For any sequence accession code listed, the database can be specified and, if retrieval fails, automatic lookup for the same code in other databases can be requested. Sequence length information (if specified) and heuristic rules are used to drive the lookup if multiple protein coding sequences (CDS) are part of a single accession. Warnings are issued in cases of ambiguities and inconsistencies. An advanced option enables the user to format the output in whatever format they wish.  相似文献   

12.
13.
Mayer U 《Proteomics》2008,8(1):42-44
Proteomic studies often produce sets of hundreds of proteins. Bioinformatic information for these large protein sets must be collected from multiple online resources. Protein Information Crawler (PIC) automatically bulk-collects such data from multiple databases and prediction servers, based on National Center for Biotechnology Information (NCBI) gi numbers or accession numbers, and summarizes them in a Microsoft Excel spreadsheet and/or HTML table. PIC greatly accelerates information procurement, helps to build customized protein information databases and drastically reduces manual database investigation in extensive proteomic studies. Availability: http://www.zoo.uni-heidelberg.de/mfa/PIC.  相似文献   

14.
We present a proximity ligation-based multiplexed protein detection procedure in which several selected proteins can be detected via unique nucleic-acid identifiers and subsequently quantified by real-time PCR. The assay requires a 1-microl sample, has low-femtomolar sensitivity as well as five-log linear range and allows for modular multiplexing without cross-reactivity. The procedure can use a single polyclonal antibody batch for each target protein, simplifying affinity-reagent creation for new biomarker candidates.  相似文献   

15.
We present a novel approach that relies on the affinity capture of protein interaction partners from a complex mixture, followed by their covalent fixation via UV‐induced activation of incorporated diazirine photoreactive amino acids (photo‐methionine and photo‐leucine). The captured protein complexes are enzymatically digested and interacting proteins are identified and quantified by label‐free LC/MS analysis. Using HeLa cell lysates with photo‐methionine and photo‐leucine‐labeled proteins, we were able to capture and preserve protein interactions that are otherwise elusive in conventional pull‐down experiments. Our approach is exemplified for mapping the protein interaction network of protein kinase D2, but has the potential to be applied to any protein system. Data are available via ProteomeXchange with identifiers PXD005346 (photo amino acid incorporation) and PXD005349 (enrichment experiments).  相似文献   

16.
One of the main goals in proteomics is to solve biological and molecular questions regarding a set of identified proteins. In order to achieve this goal, one has to extract and collect the existing biological data from public repositories for every protein and afterward, analyze and organize the collected data. Due to the complexity of this task and the huge amount of data available, it is not possible to gather this information by hand, making it necessary to find automatic methods of data collection. Within a proteomic context, we have developed Protein Information and Knowledge Extractor (PIKE) which solves this problem by automatically accessing several public information systems and databases across the Internet. PIKE bioinformatics tool starts with a set of identified proteins, listed as the most common protein databases accession codes, and retrieves all relevant and updated information from the most relevant databases. Once the search is complete, PIKE summarizes the information for every single protein using several file formats that share and exchange the information with other software tools. It is our opinion that PIKE represents a great step forward for information procurement and drastically reduces manual database validation for large proteomic studies. It is available at http://proteo.cnb.csic.es/pike .  相似文献   

17.
Using known substructures in protein model building and crystallography.   总被引:56,自引:11,他引:45       下载免费PDF全文
Retinol binding protein can be constructed from a small number of large substructures taken from three unrelated proteins. The known structures are treated as a knowledge base from which one extracts information to be used in molecular modelling when lacking true atomic resolution. This includes the interpretation of electron density maps and modelling homologous proteins. Models can be built into maps more accurately and more quickly. This requires the use of a skeleton representation for the electron density which improves the determination of the initial chain tracing. Fragment-matching can be used to bridge gaps for inserted residues when modelling homologous proteins.  相似文献   

18.
DisProt: a database of protein disorder   总被引:1,自引:0,他引:1  
The Database of Protein Disorder (DisProt) is a curated database that provides structure and function information about proteins that lack a fixed three-dimensional (3D) structure under putatively native conditions, either in their entirety or in part. Starting from the central premise that intrinsic disorder is an important structural class of protein and in order to meet the increasing interest thereof, DisProt is aimed at becoming a central repository of disorder-related information. For each disordered protein, the database includes the name of the protein, various aliases, accession codes, amino acid sequence, location of the disordered region(s), and methods used for structural (disorder) characterization. If applicable, most entries also list the biological function(s) of each disordered region, how each region of disorder is used for function, as well as provide links to PubMed abstracts and major protein databases. AVAILABILITY: www.disprot.org  相似文献   

19.
Proteins located in appropriate cellular compartments are of paramount importance to exert their biological functions. Prediction of protein subcellular localization by computational methods is required in the post-genomic era. Recent studies have been focusing on predicting not only single-location proteins but also multi-location proteins. However, most of the existing predictors are far from effective for tackling the challenges of multi-label proteins. This article proposes an efficient multi-label predictor, namely mPLR-Loc, based on penalized logistic regression and adaptive decisions for predicting both single- and multi-location proteins. Specifically, for each query protein, mPLR-Loc exploits the information from the Gene Ontology (GO) database by using its accession number (AC) or the ACs of its homologs obtained via BLAST. The frequencies of GO occurrences are used to construct feature vectors, which are then classified by an adaptive decision-based multi-label penalized logistic regression classifier. Experimental results based on two recent stringent benchmark datasets (virus and plant) show that mPLR-Loc remarkably outperforms existing state-of-the-art multi-label predictors. In addition to being able to rapidly and accurately predict subcellular localization of single- and multi-label proteins, mPLR-Loc can also provide probabilistic confidence scores for the prediction decisions. For readers’ convenience, the mPLR-Loc server is available online (http://bioinfo.eie.polyu.edu.hk/mPLRLocServer).  相似文献   

20.

Background  

Comparison of large protein datasets has become a standard task in bioinformatics. Typically researchers wish to know whether one group of proteins is significantly enriched in certain annotation attributes or sequence properties compared to another group, and whether this enrichment is statistically significant. In order to conduct such comparisons it is often required to integrate molecular sequence data and experimental information from disparate incompatible sources. While many specialized programs exist for comparisons of this kind in individual problem domains, such as expression data analysis, no generic software solution capable of addressing a wide spectrum of routine tasks in comparative proteomics is currently available.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号