湖南农业大学,湖南农业大学,湖南农业大学,湖南农业大学
高等学校博士学科点专项科研基金(20124320110002),湖南省自然科学基金(14JJ2082)和长沙市科技计划项目(K1406018-21)资助
Hunan Agricultural University,Hunan Agricultural University,Hunan Agricultural University,Hunan Agricultural University
This work was supported by grants from Specialized Research Fund for the Doctoral Program of Higher Education (20124320110002), The Natural Science Foundation of Hunan Province, China (14JJ2082) and The Science and Technology Planning Projects of Changsha, China (K1406018-21)
糖基化是蛋白质翻译后的主要修饰,O-糖基化的固定模式未知,高精度识别O-糖基化位点是机器学习面临的挑战性问题.以迄今最大的人O-糖基化位点Steentoft数据集为基础,本文首次提出了基于位置的卡方差表特征χ2-pos,融合伪氨基酸序列进化信息PsePSSM 以及无方向的k间隔氨基酸对组分Undirected-CKSAAP表征序列,构建5个正负样本均衡的支持向量机分类器,经加权投票,独立测试准确率、Matthew相关系数及ROC曲线下面积,分别达到了89.62%、0.79、0.96,明显优于文献报道结果.χ2-pos、PsePSSM与Undirected-CKSAAP三种特征的融合在蛋白质糖基化、磷酸化等位点预测中有广泛应用前景.
Glycosylation is a major modification process in post-translational modification of protein. Accurate prediction of O-linked glycosylation sites is a big challenging faced by machine-learning, for the fixed-model of O-linked glycosylation is not yet known. In this paper, on the basis of the largest-ever Steentoft database up to now, a new feature——chi-square score difference table method based on position (χ2-pos) was first proposed, which combined with pseudo position-specific scoring matrix (PsePSSM) and undirected composition of k-spaced amino acid pairs (Undirected-CKSAAP) were used to present protein sequences. Then 5 support vector machines models were constructed with the same proportion of positive and negative samples. At last, by weighted voting, our results showed that the prediction accuracy, Matthew’s correlation coefficient and area under ROC curve reached 89.62%, 0.79 and 0.96 respectively. They were superior to the literature report. It also demonstrated that the combination of three different features χ2-pos, PsePSSM and Undirected-CKSAAP has extensive application prospect in protein sites prediction such as glycosylation and phosphorylation.
向妍,陈渊,谭泗桥,袁哲明.基于三类特征融合的O-糖基化位点预测[J].生物化学与生物物理进展,2016,43(7):691-698
复制生物化学与生物物理进展 ® 2024 版权所有 ICP:京ICP备05023138号-1 京公网安备 11010502031771号