首页 | 本学科首页   官方微博 | 高级检索  
   检索      

UniProt蛋白质数据库简介
引用本文:罗静初.UniProt蛋白质数据库简介[J].生物信息学,2019,17(3):131-144.
作者姓名:罗静初
作者单位:北京大学 生命科学学院,北京100871
摘    要:UniProt(https://www.uniprot.org/)是国际知名蛋白质数据库,主要包括UniProtKB知识库、UniParc归档库和UniRef参考序列集三部分。UniProtKB知识库是UniProt的核心,除蛋白质序列数据外,还包括大量注释信息。UniProtKB知识库分Swiss-Prot和TrEMBL两个子库。Swiss-Prot子库中50多万条序列均由人工审阅和注释,而TrEMBL子库中1.4亿多条序列是由核酸序列数据库EMBL中的蛋白质编码序列翻译所得,并由计算机根据一定规则进行注释。UniParc归档库将存放于不同数据库中的同一个蛋白质归并到一个记录中以避免冗余,并赋予序列唯一性特定标识符。UniRef参考序列集按相似性程度将UniProtKB和UniParc中的序列分为UniRef100、UniRef90和UniRef50三个数据集。UniProt网站为用户提供了高效实用的高级检索系统和大量帮助文档。UniProt数据库每4周发布新版的同时也发布统计报表,用户可通过统计报表了解该数据库的数据量及更新情况、数据类别和物种分布等基本信息,查看常规注释信息、序列特征注释信息和数据库交叉链接等统计数据。UniProt是目前国际上序列数据最完整、注释信息最丰富的非冗余蛋白质序列数据库,自本世纪初创建以来,为生命科学领域提供了宝贵资源。

关 键 词:数据库  蛋白质序列  蛋白质功能  数据库注释  数据库交叉链接  数据库高级检索
收稿时间:2019/3/19 0:00:00
修稿时间:2019/4/25 0:00:00

A brief introduction to UniProt
LUO Jingchu.A brief introduction to UniProt[J].China Journal of Bioinformation,2019,17(3):131-144.
Authors:LUO Jingchu
Institution:College of Life Sciences, Peking University, Beijing 100871, China
Abstract:The Universal Protein Resource (https://www.uniprot.org/, UniProt) is a well-known protein database, which consists of the UniProt knowledgebase (UniProtKB), the UniProt unique protein identifier archive (UniParc), and the UniProt reference sequence clusters (UniRef). Apart from protein sequence data, the UniProtKB has comprehensive annotations and is the core of the database. UniProtKB/Swiss-Prot has more than 500 thousand entries and is a manually reviewed and annotated subset of UniProtKB, while the UniProtKB/TrEMBL contains more than 140 million un-reviewed sequences which are translated from the coding sequences in the nucleotide database EMBL and computationally annotated based on certain rules. UniParc merges the same sequence stored in UniProtKB and other available protein sequence databases into a single record to avoid redundancy and gives each record a permanent and unique identifier. UniRef clusters the UniProtKB and the selected UniParc sequences into three different sets, i.e., UniRef100, UniRef90, and UniRef50, according to their sequence identity. The UniProt website provides users with an easy-to-use and highly efficient interface for advanced search and various help documents. The UniProt database releases statistics published online along with the update of the database every four weeks, which lists useful information such as the number of newly added and updated entries, the sequence types and their taxonomic sources, as well as general annotations, sequence features, and database cross-references. UniProt has been serving the user community of life sciences as the most-comprehensive, well-annotated, non-redundant, and freely-accessible resource of protein sequence and function since it was established at the beginning of this century.
Keywords:Database  Protein sequence  Protein function  Database annotation  Database cross-reference  Database query
本文献已被 CNKI 等数据库收录!
点击此处可从《生物信息学》浏览原始摘要信息
点击此处可从《生物信息学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号