首页 | 本学科首页   官方微博 | 高级检索  
     


Analysis of the tryptic search space in UniProt databases
Authors:Emanuele Alpi  Johannes Griss  Alan Wilter Sousa da Silva  Benoit Bely  Ricardo Antunes  Hermann Zellner  Daniel Ríos  Claire O'Donovan  Juan Antonio Vizcaíno  Maria J. Martin
Affiliation:European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL‐EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
Abstract:In this article, we provide a comprehensive study of the content of the Universal Protein Resource (UniProt) protein data sets for human and mouse. The tryptic search spaces of the UniProtKB (UniProt knowledgebase) complete proteome sets were compared with other data sets from UniProtKB and with the corresponding International Protein Index, reference sequence, Ensembl, and UniRef100 (where UniRef is UniProt reference clusters) organism‐specific data sets. All protein forms annotated in UniProtKB (both the canonical sequences and isoforms) were evaluated in this study. In addition, natural and disease‐associated amino acid variants annotated in UniProtKB were included in the evaluation. The peptide unicity was also evaluated for each data set. Furthermore, the peptide information in the UniProtKB data sets was also compared against the available peptide‐level identifications in the main MS‐based proteomics repositories. Identifying the peptides observed in these repositories is an important resource of information for protein databases as they provide supporting evidence for the existence of otherwise predicted proteins. Likewise, the repositories could use the information available in UniProtKB to direct reprocessing efforts on specific sets of peptides/proteins of interest. In summary, we provide comprehensive information about the different organism‐specific sequence data sets available from UniProt, together with the pros and cons for each, in terms of search space for MS‐based bottom‐up proteomics workflows. The aim of the analysis is to provide a clear view of the tryptic search space of UniProt and other protein databases to enable scientists to select those most appropriate for their purposes.
Keywords:Bioinformatics  Protein isoforms  Sequence redundancy  Trypsin digestion  Variation
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号