宏蛋白质组学信息分析的基本策略及其挑战
DOI:
CSTR:
作者:
作者单位:

深圳华大生命科学研究院;中国科学院北京基因组研究所,深圳华大生命科学研究院;中国科学院北京基因组研究所,深圳华大生命科学研究院,深圳华大生命科学研究院,深圳华大生命科学研究院;沃森基因组研究院,深圳华大生命科学研究院;中国科学院北京基因组研究所;沃森基因组研究院

作者简介:

通讯作者:

中图分类号:

基金项目:

国家重点基础研究发展计划(973)(2014CBA02002, 2014CBA02005)资助项目


The Strategies and Challenges in Metaproteomics Bioinformatics
Author:
Affiliation:

BGI Research; Beijing Institute of Genomics, Chinese Academy of Sciences,BGI Research; Beijing Institute of Genomics, Chinese Academy of Sciences,BGI Research,BGI Research,BGI Research; James D. Watson Institute of Genome Sciences,BGI Research; Beijing Institute of Genomics, Chinese Academy of Sciences

Fund Project:

This work was supported by a grant from National Basic Research Program of China (2014CBA02002, 2014CBA02005)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    宏蛋白质组学是一门新型科学,它运用质谱技术规模化地采集自然界微生物种群的蛋白质信息,并结合多种组学数据,开展微生物种群的遗传特征及其生物功能的研究.宏蛋白质组学的信息分析与传统蛋白质组学方法有较大的不同,亟需拓展新的分析思路.由于宏蛋白质组的研究对象是复杂度极高的微生物样品,因此,需要构建尽可能囊括样本中所含微生物的基因组信息的物种数据库.面对庞大的数据库,必须考虑到分析过程中所消耗的计算资源和鉴定结果的质控标准,因此,需要高度优化库容量、搜库、假阳性控制等参数.鉴于宏蛋白质组数据中广泛存在复杂的同源蛋白质序列,因此,需要充分利用NCBI数据库中的分类信息进行匹配,并运用LCA算法过滤处理才能将蛋白质有效地归组到物种.本文立足于宏蛋白质组学信息分析,从宏蛋白质组的数据库建立、蛋白质归并、生物学意义发掘等几个方面着手,对该领域的发展现状、面临挑战以及未来研究方向进行了评述.

    Abstract:

    Metaproteomics is a new frontier of microbiological science that collects the proteomic data from microbes in nature using mass spectrometry and explores the corresponding genetic and biochemical mechanisms with systematical bioinformatics. In contrast to the traditional approach, metaproteomic informatics adopts new strategies, including algorithms, databases and searches. As the metaproteomic samples generally contain very complicated protein components, a large dataset with all the potential microbe genomes is basically required for searching peptides based on the signals of mass spectrometry, while such searching process is real time-consuming. Several considerable factors such as dataset capacity, searching strategy and false positive control, therefore, have to be carefully evaluated to achieve the better results of protein identification with an acceptable accuracy and efficiency. Meanwhile, except a common sequence merger in proteomic informatics, metaproteomics has to deal with the issues of vast sequence homologous and species grouping. Solving these problems relies on effective utilization to the public information gained from NCBI for species classification, and filtration treatment from sequence to species using LCA algorithm. Herein, we briefly introduce this field, including which is the basic informatics strategy of metaproteomics, what are the tough challenges in metaproteomic informatics, and how the technique difficulties are being solved in future.

    参考文献
    相似文献
    引证文献
引用本文

徐洪凯,闫克强,何燕斌,闻博,杨焕明,刘斯奇.宏蛋白质组学信息分析的基本策略及其挑战[J].生物化学与生物物理进展,2018,45(1):23-35

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2017-05-22
  • 最后修改日期:2017-10-16
  • 接受日期:2017-10-20
  • 在线发布日期: 2018-01-16
  • 出版日期: 2018-01-20