首页 | 本学科首页   官方微博 | 高级检索  
     

宏蛋白质组学信息分析的基本策略及其挑战
引用本文:徐洪凯,闫克强,何燕斌,闻博,杨焕明,刘斯奇. 宏蛋白质组学信息分析的基本策略及其挑战[J]. 生物化学与生物物理进展, 2018, 45(1): 23-35
作者姓名:徐洪凯  闫克强  何燕斌  闻博  杨焕明  刘斯奇
作者单位:深圳华大生命科学研究院,深圳 518083;中国科学院北京基因组研究所,基因组科学与信息重点实验室,北京 100101,深圳华大生命科学研究院,深圳 518083;中国科学院北京基因组研究所,基因组科学与信息重点实验室,北京 100101,深圳华大生命科学研究院,深圳 518083,深圳华大生命科学研究院,深圳 518083,深圳华大生命科学研究院,深圳 518083;沃森基因组研究院,杭州 310008,深圳华大生命科学研究院,深圳 518083;中国科学院北京基因组研究所,基因组科学与信息重点实验室,北京 100101;沃森基因组研究院,杭州 310008
基金项目:国家重点基础研究发展计划(973)(2014CBA02002, 2014CBA02005)资助项目
摘    要:宏蛋白质组学是一门新型科学,它运用质谱技术规模化地采集自然界微生物种群的蛋白质信息,并结合多种组学数据,开展微生物种群的遗传特征及其生物功能的研究.宏蛋白质组学的信息分析与传统蛋白质组学方法有较大的不同,亟需拓展新的分析思路.由于宏蛋白质组的研究对象是复杂度极高的微生物样品,因此,需要构建尽可能囊括样本中所含微生物的基因组信息的物种数据库.面对庞大的数据库,必须考虑到分析过程中所消耗的计算资源和鉴定结果的质控标准,因此,需要高度优化库容量、搜库、假阳性控制等参数.鉴于宏蛋白质组数据中广泛存在复杂的同源蛋白质序列,因此,需要充分利用NCBI数据库中的分类信息进行匹配,并运用LCA算法过滤处理才能将蛋白质有效地归组到物种.本文立足于宏蛋白质组学信息分析,从宏蛋白质组的数据库建立、蛋白质归并、生物学意义发掘等几个方面着手,对该领域的发展现状、面临挑战以及未来研究方向进行了评述.

关 键 词:宏蛋白质组学,数据库,数据分析,蛋白归并,物种分析
收稿时间:2017-05-22
修稿时间:2017-10-16

The Strategies and Challenges in Metaproteomics Bioinformatics
XU Hong-Kai,YAN Ke-Qiang,HE Yan-Bin,WEN Bo,YANG Huan-Ming and LIU Si-Qi. The Strategies and Challenges in Metaproteomics Bioinformatics[J]. Progress In Biochemistry and Biophysics, 2018, 45(1): 23-35
Authors:XU Hong-Kai  YAN Ke-Qiang  HE Yan-Bin  WEN Bo  YANG Huan-Ming  LIU Si-Qi
Affiliation:BGI-Shenzhen, Shenzhen 518083, China;Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China,BGI-Shenzhen, Shenzhen 518083, China;Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China,BGI-Shenzhen, Shenzhen 518083, China,BGI-Shenzhen, Shenzhen 518083, China,BGI-Shenzhen, Shenzhen 518083, China;James D. Watson Institute of Genome Sciences, Hangzhou 310008, China and BGI-Shenzhen, Shenzhen 518083, China;Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China;James D. Watson Institute of Genome Sciences, Hangzhou 310008, China
Abstract:Metaproteomics is a new frontier of microbiological science that collects the proteomic data from microbes in nature using mass spectrometry and explores the corresponding genetic and biochemical mechanisms with systematical bioinformatics. In contrast to the traditional approach, metaproteomic informatics adopts new strategies, including algorithms, databases and searches. As the metaproteomic samples generally contain very complicated protein components, a large dataset with all the potential microbe genomes is basically required for searching peptides based on the signals of mass spectrometry, while such searching process is real time-consuming. Several considerable factors such as dataset capacity, searching strategy and false positive control, therefore, have to be carefully evaluated to achieve the better results of protein identification with an acceptable accuracy and efficiency. Meanwhile, except a common sequence merger in proteomic informatics, metaproteomics has to deal with the issues of vast sequence homologous and species grouping. Solving these problems relies on effective utilization to the public information gained from NCBI for species classification, and filtration treatment from sequence to species using LCA algorithm. Herein, we briefly introduce this field, including which is the basic informatics strategy of metaproteomics, what are the tough challenges in metaproteomic informatics, and how the technique difficulties are being solved in future.
Keywords:metaproteomics   database   data analysis   protein inference   species
本文献已被 CNKI 等数据库收录!
点击此处可从《生物化学与生物物理进展》浏览原始摘要信息
点击此处可从《生物化学与生物物理进展》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号