基于扩展起始节点和加权融合策略预测肺癌风险致病基因
DOI:
作者:
作者单位:

西北工业大学 自动化学院,西北工业大学 自动化学院,西北工业大学 自动化学院

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(91430111, 61473232, 61170134)和国家自然科学基金青年基金(61502396)资助项目, 互联网金融创新及监管四川省协同创新中心资助项目


Uncovering Lung Cancer Risk Pathogenic Genes With Expanded Initial Node and Weighted Fusion Strategy
Author:
Affiliation:

School of Automation, Northwestern Polytechnical University,School of Automation, Northwestern Polytechnical University,School of Automation, Northwestern Polytechnical University

Fund Project:

This work was supported by grants from The National Natural Science Foundation of China (91430111, 61473232, 61170134), The National Natural Science Foundation of China Youth Fund Project (61502396), and the Internet Financial Innovation and Supervision of Collaborative Innovation Center in Sichuan Province

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    肺癌风险致病基因预测有助于了解疾病发病机制、提高临床治疗效果.目前,以重启游走为框架的风险致病基因预测算法,普遍存在起始节点少、节点转移概率相同、信息源单一的问题.为此,本文提出一种基于扩展起始节点和加权融合策略的风险致病基因预测算法(命名为AFMFSC),并在肺癌中验证算法有效性.首先,基于增广模糊测量思想,计算疾病表型近似基因间的增广功能相似得分,从中选出重要基因与致病基因作为扩展起始节点;其次,采用节点拓扑相似度转移矩阵及基因表达差异相关性转移矩阵,分别在蛋白质网络中重启随机游走,并将两种结果加权融合排序;最后,通过富集分析排名靠前基因,得到有显著意义的风险致病基因.AFMFSC算法预测的73个肺癌风险致病基因,均与肺癌发生、发展有密切联系,生物学意义显著.与其他排序算法相比,AFMFSC算法的Top 1%、Top 5%和AUC值比较大,平均排名和受拓扑特性偏差影响程度小;融合策略排名性能优于单一转移矩阵或普通邻接矩阵游走排名.AFMFSC算法不仅能准确有效地预测肺癌风险致病基因,而且可推广预测其他疾病风险致病基因,为探索癌症致病机理提供新视角及依据.

    Abstract:

    The identification of risk pathogenic genes for lung cancer is helpful to understand disease pathogenesis and improve clinical practice. However, the present predicting methods of using RWR framework include the common problems of the less initial nodes, the same node transition probability, and the single information source. To further improve the performance of RWR framework, we propose a novel method named AFMFSC to identify disease-related genes, by enlarging the initial nodes and weighted fusion strategy, and use lung cancer as the test object. The AFMFSC algorithm first computes the augmented functional similarity scores between disease phenotype approximate genes based on the idea of augmenting fuzzy measure similarity, screens important genes as the expanded initial nodes together with pathogenic genes, then walks in the global PPI network separately guided by the node similarity transition matrix constructed with PPI network topological similarity properties and the correlational transition matrix constructed with the gene expression profiles, all the genes in the network are ranked by weighted fusing the above results guided by two types of transition matrices, at last the top ranked genes in the enrichment analysis as final risk pathogenic genes are determined. 73 significant genes are predicted to be the risk pathogenic genes for lung cancer, which are closely linked with the generation and development of this disease. Compared with the existing methods for prioritizing potential risk disease genes, the AFMFSC achieves a smaller average rank and less affect by degree distribution bias but bigger Top 1%,Top 5% and AUC value. In addition, the ranking performance of fusion strategy outperforms a single transfer matrix or ordinary adjacency matrix. The AFMFSC algorithm not only can accurately and effectively predict the risk pathogenic genes of lung cancer, but also can be easily extended to identify any other diseases related genes, and provide additional insights for exploring the pathogenesis of cancer.

    参考文献
    相似文献
    引证文献
引用本文

王一斌,程咏梅,张绍武.基于扩展起始节点和加权融合策略预测肺癌风险致病基因[J].生物化学与生物物理进展,2016,43(2):176-186

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2015-12-07
  • 最后修改日期:2015-12-23
  • 接受日期:2016-01-14
  • 在线发布日期: 2016-02-19
  • 出版日期: 2016-02-20