基于迭代自学习的操纵子结构预测
DOI:
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金资助项目(30970667, 30770499, 10721403), “重大新药创制”科技重大专项资助项目(2009ZX09501-002), 北京市优秀博士学位论文指导教师科技项目(YB20101000102), 国家重点基础研究发展计划(973)资助项目(2011CB707500)


Operon prediction based on an iterative self-learning algorithm
Author:
Affiliation:

Fund Project:

This work was supported by grants from The National Natural Science Foundation of China (30970667, 30770499, 10721403), The MOST Project of China (2009ZX09501-002), The Excellent Doctoral Dissertation Supervisor Project of Beijing (YB20101000102), and National Basic Research Program of China (2011CB707500)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    原核生物操纵子结构的准确注释对基因功能和基因调控网络的研究具有重要意义,通过生物信息学方法计算预测是当前基因组操纵子结构注释的最主要来源.当前的预测算法大都需要实验确认的操纵子作为训练集,但实验确认的操纵子数据的缺乏一直成为发展算法的瓶颈.基于对操纵子结构的认识,从基因间距离、转录翻译相关的调控信号以及COG功能注释等特征出发,建立了描述操纵子复杂结构的概率模型,并提出了不依赖于特定物种操纵子数据作为训练集的迭代自学习算法.通过对实验验证的操纵子数据集的测试比较,结果表明算法对于预测操纵子结构非常有效.在不依赖于任何已知操纵子信息的情况下,算法在总体预测水平上超过了目前最好的操纵子预测方法,而且这种自学习的预测算法要优于依赖特定物种进行训练的算法.这些特点使得该算法能够适用于新测序的物种,有别于当前常用的操纵子预测方法.对细菌和古细菌的基因组进行大规模比较分析,进一步提高了对基因组操纵子结构的普遍特征和物种特异性的认识.

    Abstract:

    As a specific functional organization of genes in prokaryotic genomes, operon contains a set of adjacent genes under the control of the corresponding regulatory signals, and is expressed as the transcript unit. It has been found that genes in an operon usually tend to have related functions, or belong to the same pathway in cell. Therefore the study of operon structure is significant to understand the gene functions and regulatory networks for prokaryotes. However with the current limitation of data acquisition of operons verified by experiments such as prokaryotic transcriptomics, computation methods to annotate the operons in a newly sequenced genome have so far been the major source of operon data, and will continue to be an important mission. Over the past decade, a set of computational approaches to operon prediction have been proposed, however mainly based on experimental operons as their training sets. Nevertheless the lack of experimental operon dataset has been the bottleneck of operon prediction. The authors employ an iterative self-learning algorithm which is independent of training set with known operon dataset. The algorithm develops based on a probabilistic model using features including gene distance, regulation signals of gene expression and functional annotation such as COG. The test result compared against the experimental operon data indicates that the algorithm can reach the best accuracy without any training set. Besides, this self-learning algorithm is superior to the algorithm trained on any species with known operons. Accordingly, the algorithm can be applied to any newly sequenced genome. Moreover, comparative analysis of bacteria and archaea enhances the knowledge of universal and genome specific features of operons.

    参考文献
    相似文献
    引证文献
引用本文

吴文琪,郑晓斌,刘永初,汤凯,朱怀球.基于迭代自学习的操纵子结构预测[J].生物化学与生物物理进展,2011,38(7):642-651

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2010-12-28
  • 最后修改日期:2011-01-31
  • 接受日期:
  • 在线发布日期: 2011-04-29
  • 出版日期: 2011-07-20