1.中南大学肿瘤研究所;2.复旦大学脑科学研究所;3.中南大学湘雅医院耳鼻咽喉头颈外科
1.Cancer Research Institute,Central South University;2.Institutes of Brain Science,Fudan University,Shanghai;3.Department of Otolaryngology Head and Neck Surgery,Xiangya Hospital,Central South University
摘要 目的 转运核糖核酸衍生片段(transfer RNA derived fragment, tRF)是一种新型的长度介于13~50 nt的非编码RNA,最近被发现可能通过靶向结合mRNA,促进mRNA降解抑制蛋白质翻译,从而在细胞中发挥特定的生物学功能,但预测其靶基因的生物信息学方法尚且有限。为了研究这一种新型分子在基因网络调控中发挥的作用,开发一种更全面的预测tRF靶标的算法具有重要意义。方法 我们使用多层感知机神经网络深度学习算法,以确认结合的38,687对tRF:mRNA为训练数据,将已知的tRF和靶标结合的特征用于模型学习,并扩大收录的tRF和靶标的范围。结果 我们将该基于神经网络学习的tRF靶标预测算法命名为tRF Prospect,算法AUC达93.4%,经与过表达tRF后的转录组高通量测序数据进行比对,具有良好的预测性能,证实该算法具有较高准确度。结论 tRF Prospect主要预测tRF发挥生物学功能可能作用的靶向mRNA,对现有的tRF靶标预测平台在收录tRF种类,靶标范围和预测模型上进行了补充,为研究tRF在生物中的新作用提供了基础。
Abstract Objective Transfer RNA-derived fragments (tRFs) are a recently characterized and rapidly expanding class of small non-coding RNAs, typically ranging from 13 to 50 nucleotides in length. They are derived from mature or precursor tRNA molecules through specific cleavage events and have been implicated in a wide range of cellular processes. Increasing evidence indicates that tRFs play important regulatory roles in gene expression, primarily by interacting with target messenger RNAs (mRNAs) to induce transcript degradation, in a manner partially analogous to microRNAs (miRNAs). However, despite their emerging biological relevance and potential roles in disease mechanisms, there remains a significant lack of computational tools capable of systematically predicting the interaction landscape between tRFs and their target mRNAs. Existing databases often rely on limited interaction features and lack the flexibility to accommodate novel or user-defined tRF sequences. The primary goal of this study was to develop a machine learning based prediction algorithm that enables high-throughput, accurate identification of tRF:mRNA binding events, thereby facilitating the functional analysis of tRF regulatory networks. Methods We began by assembling a manually curated dataset of 38,687 experimentally verified tRF:mRNA interaction pairs and extracting seven biologically informed features for each pair: (1) AU content of the binding site, (2) site pairing status, (3) binding region location, (4) number of binding sites per mRNA, (5) length of the longest consecutive complementary stretch, (6) total binding region length, and (7) seed sequence complementarity. Using this dataset and feature set, we trained four distinct machine learning classifiers—logistic regression, random forest, decision tree, and a multilayer perceptron (MLP)—to compare their ability to discriminate true interactions from non-interactions. Each model’s performance was evaluated using overall accuracy, receiver operating characteristic (ROC) curves, and the corresponding area under the ROC curve (AUC). The MLP consistently achieved the highest AUC among the four, and was therefore selected as the backbone of our prediction framework, which we named tRF Prospect. For biological validation, we retrieved three high-throughput RNA-seq datasets from the Gene Expression Omnibus (GEO) in which individual tRFs were overexpressed: AS-tDR-007333 (GSE184690), tRF-3004b (GSE197091), and tRF-20-S998LO9D (GSE208381). Differential expression analysis of each dataset identified genes downregulated upon tRF overexpression, which we designated as putative targets. We then compared the predictions generated by tRF Prospect against those from three established tools—tRFTar, tRForest, and tRFTarget—by quantifying the number of predicted targets for each tRF and assessing concordance with the experimentally derived gene sets. Results The proposed algorithm achieved high predictive accuracy, with an area under the receiver operating characteristic curve (AUC) of 0.934. Functional validation was conducted using transcriptome-wide RNA-seq datasets from cells overexpressing specific tRFs, confirming the model’s ability to accurately predict biologically relevant downregulation of mRNA targets. When benchmarked against established tools such as tRFTar, tRForest, and tRFTarget, tRF Prospect consistently demonstrated superior performance, both in terms of predictive precision and sensitivity, as well as in identifying a higher number of true-positive interactions. Moreover, unlike static databases that are limited to precomputed results, tRF Prospect supports real-time prediction for any user-defined tRF sequence, enhancing its applicability in exploratory and hypothesis-driven research. Conclusion This study introduces tRF Prospect as a powerful and flexible computational tool for investigating tRF:mRNA interactions. By leveraging the predictive strength of deep learning and incorporating a broad spectrum of interaction-relevant features, it addresses key limitations of existing platforms. Specifically, tRF Prospect: (1) expands the range of detectable tRF and target types; (2) improves prediction accuracy through multilayer perceptron model; and (3) allows for dynamic, user-driven analysis beyond database constraints. Although the current version emphasizes miRNA-like repression mechanisms and faces challenges in accurately capturing 5’UTR-associated binding events, it nonetheless provides a critical foundation for future studies aiming to unravel the complex roles of tRFs in gene regulation, cellular function, and disease pathogenesis.
任黛西,易健勇,莫勇真,杨梅,熊炜. tRF Prospect:基于神经网络学习的tRNA衍生片段靶标预测算法[J].生物化学与生物物理进展,,():
复制生物化学与生物物理进展 ® 2025 版权所有 ICP:京ICP备05023138号-1 京公网安备 11010502031771号