首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Babnigg G  Giometti CS 《Proteomics》2006,6(16):4514-4522
In proteome studies, identification of proteins requires searching protein sequence databases. The public protein sequence databases (e.g., NCBInr, UniProt) each contain millions of entries, and private databases add thousands more. Although much of the sequence information in these databases is redundant, each database uses distinct identifiers for the identical protein sequence and often contains unique annotation information. Users of one database obtain a database-specific sequence identifier that is often difficult to reconcile with the identifiers from a different database. When multiple databases are used for searches or the databases being searched are updated frequently, interpreting the protein identifications and associated annotations can be problematic. We have developed a database of unique protein sequence identifiers called Sequence Globally Unique Identifiers (SEGUID) derived from primary protein sequences. These identifiers serve as a common link between multiple sequence databases and are resilient to annotation changes in either public or private databases throughout the lifetime of a given protein sequence. The SEGUID Database can be downloaded (http://bioinformatics.anl.gov/SEGUID/) or easily generated at any site with access to primary protein sequence databases. Since SEGUIDs are stable, predictions based on the primary sequence information (e.g., pI, Mr) can be calculated just once; we have generated approximately 500 different calculations for more than 2.5 million sequences. SEGUIDs are used to integrate MS and 2-DE data with bioinformatics information and provide the opportunity to search multiple protein sequence databases, thereby providing a higher probability of finding the most valid protein identifications.  相似文献   

2.
Protein sequence databases   总被引:2,自引:0,他引:2  
A variety of protein sequence databases exist, ranging from simple sequence repositories, which store data with little or no manual intervention in the creation of the records, to expertly curated universal databases that cover all species and in which the original sequence data are enhanced by the manual addition of further information in each sequence record. As the focus of researchers moves from the genome to the proteins encoded by it, these databases will play an even more important role as central comprehensive resources of protein information. Several the leading protein sequence databases are discussed here, with special emphasis on the databases now provided by the Universal Protein Knowledgebase (UniProt) consortium.  相似文献   

3.
With the explosive growth of biological data, the development of new means of data storage was needed. More and more often biological information is no longer published in the conventional way via a publication in a scientific journal, but only deposited into a database. In the last two decades these databases have become essential tools for researchers in biological sciences. Biological databases can be classified according to the type of information they contain. There are basically three types of sequence-related databases (nucleic acid sequences, protein sequences and protein tertiary structures) as well as various specialized data collections. It is important to provide the users of biomolecular databases with a degree of integration between these databases as by nature all of these databases are connected in a scientific sense and each one of them is an important piece to biological complexity. In this review we will highlight our effort in connecting biological information as demonstrated in the SWISS-PROT protein database.  相似文献   

4.
MetaFam is a comprehensive relational database of protein family information. This web-accessible resource integrates data from several primary sequence and secondary protein family databases. By pooling together the information from these disparate sources, MetaFam is able to provide the most complete protein family sets available. Users are able to explore the interrelationships among these primary and secondary databases using a powerful graphical visualization tool, MetaFamView. Additionally, users can identify corresponding sequence entries among the sequence databases, obtain a quick summary of corresponding families (and their sequence members) among the family databases, and even attempt to classify their own unassigned sequences. Hypertext links to the appropriate source databases are provided at every level of navigation. Global family database statistics and information are also provided. Public access to the data is available at http://metafam.ahc.umn.edu/.  相似文献   

5.
The bioinformatics software, Geneious, provides a useful platform for researchers to retrieve and analyse genomic and functional genomics information. However, the main databases that the software is able to access are hosted by NCBI (National Center for Biotechnology Information). The databases of EuPathDB (Eukaryotic Pathogen Database Resources), such as PlasmoDB and PiroplasmaDB, collect more specific and detailed information about eukaryotic pathogens than those kept in NCBI databases. Two plugins for Geneious, one for PlasmaDB and one for PiroplasmaDB were developed. When installed, users can use search facilities to find and import gene and protein sequences from the EuPathDB databases. Users can then use the functions of Geneious to process the sequence information. When information unique to PlasmoDB and PiroplasmaDB is required, the user can access results linked with the gene/protein sequence via the default web browser. The plugins are freely available from the Victorian Bioinformatics Consortium website. The plugins can be modified to access any of the databases of EuPathDB.  相似文献   

6.
Proteomics and the study of protein–protein interactions are becoming increasingly important in our effort to understand human diseases on a system-wide level. Thanks to the development and curation of protein-interaction databases, up-to-date information on these interaction networks is accessible and publicly available to the scientific community. As our knowledge of protein–protein interactions increases, it is important to give thought to the different ways that these resources can impact biomedical research. In this article, we highlight the importance of protein–protein interactions in human genetics and genetic epidemiology. Since protein–protein interactions demonstrate one of the strongest functional relationships between genes, combining genomic data with available proteomic data may provide us with a more in-depth understanding of common human diseases. In this review, we will discuss some of the fundamentals of protein interactions, the databases that are publicly available and how information from these databases can be used to facilitate genome-wide genetic studies.  相似文献   

7.
Comprehensive, computerized databases of cellular protein information derived from the analysis of two-dimensional gels, together with recently developed techniques to microsequence proteins offer a new dimension to the study of genome organization and function. In particular, human protein databases provide an ideal framework in which to focus the human genome sequencing effort.  相似文献   

8.
Thermodynamic data regarding proteins and their interactions are important for understanding the mechanisms of protein folding, protein stability, and molecular recognition. Although there are several structural databases available for proteins and their complexes with other molecules, databases for experimental thermodynamic data on protein stability and interactions are rather scarce. Thus, we have developed two electronically accessible thermodynamic databases. ProTherm, Thermodynamic Database for Proteins and Mutants, contains numerical data of several thermodynamic parameters of protein stability, experimental methods and conditions, along with structural, functional, and literature information. ProNIT, Thermodynamic Database for Protein-Nucleic Acid Interactions, contains thermodynamic data for protein-nucleic acid binding, experimental conditions, structural information of proteins, nucleic acids and the complex, and literature information. These data have been incorporated into 3DinSight, an integrated database for structure, function, and properties of biomolecules. A WWW interface allows users to search for data based on various conditions, with different display and sorting options, and to visualize molecular structures and their interactions. These thermodynamic databases, together with structural databases, help researchers gain insight into the relationship among structure, function, and thermodynamics of proteins and their interactions, and will become useful resources for studying proteins in the postgenomic era.  相似文献   

9.
10.
研究蛋白质和配体相互作用的结构和亲和力,不仅有助于了解蛋白质的功能,而且对药物研发以及药物作用机制的研究,也具有十 分重要的意义。目前,人们通过人工检索和半自动检索的方式,从文献和蛋白质数据库(Protein Data Bank,PDB)中获得了许多蛋白质- 配体亲和力信息和生物相关配体信息,并构建了许多蛋白质-配体相互作用的信息数据库。对3 个蛋白质-配体亲和力数据库和6 个蛋白质 晶体结构-配体生物相关性数据库进行介绍,并对其主要应用进行简述,希望能为实现高效准确地筛选和设计药物提供一定的帮助。  相似文献   

11.
Nature's strategies for evolving catalytic functions can be deciphered from the information contained in the rapidly expanding protein sequence databases. However, the functions of many proteins in the protein sequence and structure databases are either uncertain (too divergent to assign function based on homology) or unknown (no homologs), thereby limiting the utility of the databases. The mechanistically diverse enolase superfamily is a paradigm for understanding the structural bases for evolution of enzymatic function. We describe strategies for assigning functions to members of the enolase superfamily that should be applicable to other superfamilies.  相似文献   

12.
Integration of pathway and protein-protein interaction(PPI) data can provide more information that could lead to new biological insights. PPIs are usually represented by a simple binary model, whereas pathways are represented by more complicated models. We developed a series of rules for transforming protein interactions from pathway to binary model, and the protein interactions from seven pathway databases, including PID, Bio Carta, Reactome, Net Path, INOH, SPIKE and KEGG, were transformed based on these rules. These pathway-derived binary protein interactions were integrated with PPIs from other five PPI databases including HPRD, Int Act, Bio GRID, MINT and DIP, to develop integrated dataset(named Path PPI). More detailed interaction type and modification information on protein interactions can be preserved in Path PPI than other existing datasets. Comparison analysis results indicate that most of the interaction overlaps values(OAB) among these pathway databases were less than 5%, and these databases must be used conjunctively. The Path PPI data was provided at http://proteomeview. hupo.org.cn/Path PPI/Path PPI.html.  相似文献   

13.
The delineation of domain boundaries of a given sequence in the absence of known 3D structures or detectable sequence homology to known domains benefits many areas in protein science, such as protein engineering, protein 3D structure determination and protein structure prediction. With the exponential growth of newly determined sequences, our ability to predict domain boundaries rapidly and accurately from sequence information alone is both essential and critical from the viewpoint of gene function annotation. Anyone attempting to predict domain boundaries for a single protein sequence is invariably confronted with a plethora of databases that contain boundary information available from the internet and a variety of methods for domain boundary prediction. How are these derived and how well do they work? What definition of 'domain' do they use? We will first clarify the different definitions of protein domains, and then describe the available public databases with domain boundary information. Finally, we will review existing domain boundary prediction methods and discuss their strengths and weaknesses.  相似文献   

14.
Mayer U 《Proteomics》2008,8(1):42-44
Proteomic studies often produce sets of hundreds of proteins. Bioinformatic information for these large protein sets must be collected from multiple online resources. Protein Information Crawler (PIC) automatically bulk-collects such data from multiple databases and prediction servers, based on National Center for Biotechnology Information (NCBI) gi numbers or accession numbers, and summarizes them in a Microsoft Excel spreadsheet and/or HTML table. PIC greatly accelerates information procurement, helps to build customized protein information databases and drastically reduces manual database investigation in extensive proteomic studies. Availability: http://www.zoo.uni-heidelberg.de/mfa/PIC.  相似文献   

15.
Analysis of cellular protein patterns by computer-aided 2-dimensional gel electrophoresis together with recent advances in protein sequence analysis have made possible the establishment of comprehensive 2-dimensional gel protein databases that may link protein and DNA information and that offer a global approach to the study of the cell. Using the integrated approach offered by 2-dimensional gel protein databases it is now possible to reveal phenotype specific protein (or proteins), to microsequence them, to search for homology with previously identified proteins, to clone the cDNAs, to assign partial protein sequence to genes for which the full DNA sequence and the chromosome location is known, and to study the regulatory properties and function of groups of proteins that are coordinately expressed in a given biological process. Human 2-dimensional gel protein databases are becoming increasingly important in view of the concerted effort to map and sequence the entire genome.  相似文献   

16.
UniProt archive     
UniProt Archive (UniParc) is the most comprehensive, non-redundant protein sequence database available. Its protein sequences are retrieved from predominant, publicly accessible resources. All new and updated protein sequences are collected and loaded daily into UniParc for full coverage. To avoid redundancy, each unique sequence is stored only once with a stable protein identifier, which can be used later in UniParc to identify the same protein in all source databases. When proteins are loaded into the database, database cross-references are created to link them to the origins of the sequences. As a result, performing a sequence search against UniParc is equivalent to performing the same search against all databases cross-referenced by UniParc. UniParc contains only protein sequences and database cross-references; all other information must be retrieved from the source databases.  相似文献   

17.

Background  

Frequently, several alternative names are in use for biological objects such as genes and proteins. Applications like manual literature search, automated text-mining, named entity identification, gene/protein annotation, and linking of knowledge from different information sources require the knowledge of all used names referring to a given gene or protein. Various organism-specific or general public databases aim at organizing knowledge about genes and proteins. These databases can be used for deriving gene and protein name dictionaries. So far, little is known about the differences between databases in terms of size, ambiguities and overlap.  相似文献   

18.
Databases containing proteomic information have become indispensable for virology studies. As the gap between the amount of sequence information and functional characterization widens, increasing efforts are being directed to the development of databases. For virologist, it is therefore desirable to have a single data collection point which integrates research related data from different domains. CHPVDB is our effort to provide virologist such a one‐step information center. We describe herein the creation of CHPVDB, a new database that integrates information of different proteins in to a single resource. For basic curation of protein information, the database relies on features from other selected databases, servers and published reports. This database facilitates significant relationship between molecular analysis, cleavage sites, possible protein functional families assigned to different proteins of Chandipura virus (CHPV) by SVMProt and related tools.  相似文献   

19.
One of the main goals in proteomics is to solve biological and molecular questions regarding a set of identified proteins. In order to achieve this goal, one has to extract and collect the existing biological data from public repositories for every protein and afterward, analyze and organize the collected data. Due to the complexity of this task and the huge amount of data available, it is not possible to gather this information by hand, making it necessary to find automatic methods of data collection. Within a proteomic context, we have developed Protein Information and Knowledge Extractor (PIKE) which solves this problem by automatically accessing several public information systems and databases across the Internet. PIKE bioinformatics tool starts with a set of identified proteins, listed as the most common protein databases accession codes, and retrieves all relevant and updated information from the most relevant databases. Once the search is complete, PIKE summarizes the information for every single protein using several file formats that share and exchange the information with other software tools. It is our opinion that PIKE represents a great step forward for information procurement and drastically reduces manual database validation for large proteomic studies. It is available at http://proteo.cnb.csic.es/pike .  相似文献   

20.
Public sequence databases contain information on the sequence, structure and function of proteins. Genome sequencing projects have led to a rapid increase in protein sequence information, but reliable, experimentally verified, information on protein function lags a long way behind. To address this deficit, functional annotation in protein databases is often inferred by sequence similarity to homologous, annotated proteins, with the attendant possibility of error. Now, the functional annotation in these homologous proteins may itself have been acquired through sequence similarity to yet other proteins, and it is generally not possible to determine how the functional annotation of any given protein has been acquired. Thus the possibility of chains of misannotation arises, a process we term 'error percolation'. With some simple assumptions, we develop a dynamical probabilistic model for these misannotation chains. By exploring the consequences of the model for annotation quality it is evident that this iterative approach leads to a systematic deterioration of database quality.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号