基于三类特征融合的O-糖基化位点预测
DOI:
作者:
作者单位:

湖南农业大学,湖南农业大学,湖南农业大学,湖南农业大学

作者简介:

通讯作者:

中图分类号:

基金项目:

高等学校博士学科点专项科研基金(20124320110002),湖南省自然科学基金(14JJ2082)和长沙市科技计划项目(K1406018-21)资助


Predicting O-glycosylation Sites by Combining Three Different Types of Features
Author:
Affiliation:

Hunan Agricultural University,Hunan Agricultural University,Hunan Agricultural University,Hunan Agricultural University

Fund Project:

This work was supported by grants from Specialized Research Fund for the Doctoral Program of Higher Education (20124320110002), The Natural Science Foundation of Hunan Province, China (14JJ2082) and The Science and Technology Planning Projects of Changsha, China (K1406018-21)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    糖基化是蛋白质翻译后的主要修饰,O-糖基化的固定模式未知,高精度识别O-糖基化位点是机器学习面临的挑战性问题.以迄今最大的人O-糖基化位点Steentoft数据集为基础,本文首次提出了基于位置的卡方差表特征χ2-pos,融合伪氨基酸序列进化信息PsePSSM 以及无方向的k间隔氨基酸对组分Undirected-CKSAAP表征序列,构建5个正负样本均衡的支持向量机分类器,经加权投票,独立测试准确率、Matthew相关系数及ROC曲线下面积,分别达到了89.62%、0.79、0.96,明显优于文献报道结果.χ2-pos、PsePSSM与Undirected-CKSAAP三种特征的融合在蛋白质糖基化、磷酸化等位点预测中有广泛应用前景.

    Abstract:

    Glycosylation is a major modification process in post-translational modification of protein. Accurate prediction of O-linked glycosylation sites is a big challenging faced by machine-learning, for the fixed-model of O-linked glycosylation is not yet known. In this paper, on the basis of the largest-ever Steentoft database up to now, a new feature——chi-square score difference table method based on position (χ2-pos) was first proposed, which combined with pseudo position-specific scoring matrix (PsePSSM) and undirected composition of k-spaced amino acid pairs (Undirected-CKSAAP) were used to present protein sequences. Then 5 support vector machines models were constructed with the same proportion of positive and negative samples. At last, by weighted voting, our results showed that the prediction accuracy, Matthew’s correlation coefficient and area under ROC curve reached 89.62%, 0.79 and 0.96 respectively. They were superior to the literature report. It also demonstrated that the combination of three different features χ2-pos, PsePSSM and Undirected-CKSAAP has extensive application prospect in protein sites prediction such as glycosylation and phosphorylation.

    参考文献
    相似文献
    引证文献
引用本文

向妍,陈渊,谭泗桥,袁哲明.基于三类特征融合的O-糖基化位点预测[J].生物化学与生物物理进展,2016,43(7):691-698

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2016-01-05
  • 最后修改日期:2016-05-09
  • 接受日期:2016-05-16
  • 在线发布日期: 2016-07-18
  • 出版日期: 2016-07-20