基于双分支残差网络的多组织RNA m6A甲基化位点预测
CSTR:
作者:
作者单位:

1)云南民族大学数学与计算机科学学院,昆明 650504;2)云南大学云南省统计建模与数据分析重点实验室,昆明 650500

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(12361104),云南省基础研究计划(202301AT070016,202401AT070036),兴滇人才支持计划青年人才项目(XDYC-QNRC-2022-0514),云南省民族多语种智能融合与应用国际联合实验室(202403AP140014)和云南省统计建模与数据分析重点实验室开放课题(SMDAYB2023004)资助。


Prediction of RNA m6A Methylation Sites in Multiple Tissues Based on Dual-branch Residual Network
Author:
Affiliation:

1)School of Mathematics and Computer Science,Yunnan Minzu University, Kunming 650504, China;2)Yunnan Key Laboratory of Statistical Modeling and Data Analysis, Yunnan University, Kunming 650500, China

Fund Project:

This work was supported by grants from The National Natural Science Foundation of China (12361104), Yunnan Fundamental Research Projects (202301AT070016, 202401AT070036), the Youth Talent Program of Xingdian Talent Support Plan (XDYC-QNRC-2022-0514), the Yunnan Province International Joint Laboratory for Intelligent Integration and Application of Ethnic Multilingualism (202403AP140014), and the Open Research Fund of Yunnan Key Laboratory of Statistical Modeling and Data Analysis, Yunnan University (SMDAYB2023004).

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    目的 N6-甲基腺苷(m6A)是真核生物 RNA 上最普遍的表观遗传修饰。它在调控细胞分化和发育过程中起着关键的作用,并且与许多疾病的病理机制相关。m6A位点的精确预测对解析其调控机制及指导药物设计至关重要。然而,传统的生物实验方法通常耗时长且成本高昂。尽管目前已开发出多种m6A位点预测的计算方法,但这些方法在特征学习、预测准确性和泛化能力方面仍有一定改进空间。本文提出一种基于双分支残差网络的m6A位点预测算法——m6A-PSRA,旨在充分利用RNA序列的特征信息,提高m6A位点预测的准确性和模型的泛化能力。方法 m6A-PSRA使用双分支网络架构,其中一条分支通过One-hot编码对序列进行编码,并利用双向长短期记忆网络(BiLSTM)进行特征学习,另一条分支通过k-mer分词编码,并利用预训练模型Doc2vec进行特征学习。在特征学习过程中,两条分支网络均整合残差网络(ResNet)和自注意力机制,以增强特征学习的准确性和模型的泛化能力。结果 在人类、小鼠和大鼠共11个组织中,m6A-PSRA对m6A位点的预测准确率(ACC)和曲线下面积(AUC)值均优于其他方法。特别地,其在各个组织的ACC值和AUC值分别超过90%和95%。消融实验也验证了m6A-PSRA具有较高的m6A位点预测准确性和泛化能力。结论 m6A-PSRA能有效捕获RNA序列特征,具备较高的预测精度和优异的泛化性能,可实现跨组织的高效m6A甲基化位点预测。

    Abstract:

    Objective N6-methyladenosine (m6A), the most prevalent epigenetic modification in eukaryotic RNA, plays a pivotal role in regulating cellular differentiation and developmental processes, with its dysregulation implicated in diverse pathological conditions. Accurate prediction of m6A sites is critical for elucidating their regulatory mechanisms and informing drug development. However, traditional experimental methods are time-consuming and costly. Although various computational approaches have been proposed, challenges remain in feature learning, predictive accuracy, and generalization. Here, we present m6A-PSRA, a dual-branch residual-network-based predictor that fully exploits RNA sequence information to enhance prediction performance and model generalization.Methods m6A-PSRA adopts a parallel dual-branch network architecture to comprehensively extract RNA sequence features via two independent pathways. The first branch applies one-hot encoding to transform the RNA sequence into a numerical matrix while strictly preserving positional information and sequence continuity. This ensures that the biological context conveyed by nucleotide order is retained. A bidirectional long short-term memory network (BiLSTM) then processes the encoded matrix, capturing both forward and backward dependencies between bases to resolve contextual correlations. The second branch employs a k-mer tokenization strategy (k=3), decomposing the sequence into overlapping 3-mer subsequences to capture local sequence patterns. A pre-trained Doc2vec model maps these subsequences into fixed-dimensional vectors, reducing feature dimensionality while extracting latent global semantic information via context learning. Both branches integrate residual networks (ResNet) and a self-attention mechanism: ResNet mitigates vanishing gradients through skip connections, preserving feature integrity, while self-attention adaptively assigns weights to focus on sequence regions most relevant to methylation prediction. This synergy enhances both feature learning and generalization capability.Results Across 11 tissues from humans, mice, and rats, m6A-PSRA consistently outperformed existing methods in accuracy (ACC) and area under the curve (AUC), achieving >90% ACC and >95% AUC in every tissue tested, indicating strong cross-species and cross-tissue adaptability. Validation on independent datasets—including three human cell lines (MOLM1, HEK293, A549) and a long-sequence dataset (m6A_IND, 1 001 nt)—confirmed stable performance across varied biological contexts and sequence lengths. Ablation studies demonstrated that the dual-branch architecture, residual network, and self-attention mechanism each contribute critically to performance, with their combination reducing interference between pathways. Motif analysis revealed an enrichment of m6A sites in guanine (G) and cytosine (C), consistent with known regulatory patterns, supporting the model’s biological plausibility.Conclusion m6A-PSRA effectively captures RNA sequence features, achieving high prediction accuracy and robust generalization across tissues and species, providing an efficient computational tool for m6A methylation site prediction.

    参考文献
    相似文献
    引证文献
引用本文

郭晓甜,高伟,陈丹,李慧敏,谭学文.基于双分支残差网络的多组织RNA m6A甲基化位点预测[J].生物化学与生物物理进展,2025,52(11):2900-2915

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-04-17
  • 最后修改日期:2025-09-27
  • 录用日期:2025-08-11
  • 在线发布日期: 2025-08-12
  • 出版日期: 2025-11-28
文章二维码