西安邮电大学自动化学院,西安 710100
陕西省自然科学基金(2025JC-YBMS-697)资助项目。
College of Automation, Xi''an University of Posts & Telecommunications, Xi''an 710100, China
This work was supported by a grant from Natural Science Foundation of Shaanxi Province (2025JC-YBMS-697).
目的 随着生物信息学的发展,涌现出大量基于深度学习的蛋白质亚细胞定位方法。这些蛋白质亚细胞定位方法(如GapNet-PL、ImPLoc等)能够较准确识别细胞群体水平的蛋白质分布模式,但在单细胞水平或复杂微环境下的定位仍存在局限性。当前蛋白质显微图像缺乏单细胞标注,仅依赖细胞群体水平的标注无法解析单细胞尺度的定位异质性,且现有大多数蛋白质亚细胞定位模型基于CNN设计,忽略了亚细胞结构间的功能关联性,导致单细胞蛋白质亚细胞定位精度差。因此,本文提出一种基于类相关图卷积网络(CP-GCN)的单细胞蛋白质定位方法。方法 首先,建立类相关模块(CPM),充分提取不同亚细胞类别的语义特征。然后,设计CP-GCN网络,挖掘多细胞中蛋白质亚细胞的全局特征并捕获标签图的拓扑信息,学习多标记蛋白质类别之间的关联性。其次,利用K-means聚类方法区分类内多尺度特征,生成多细胞类激活图(CAM),根据CAM的预测区域,对单细胞图像进行伪标注,以有效区分细胞群中的异质性细胞。最后,使用伪标注训练单细胞蛋白质分类模型,实现单细胞蛋白质的精准定位。结果 在Kaggle2021 数据集的单细胞蛋白质预测任务中,该方法的mAP指标均优于现有的蛋白质亚细胞定位方法。对生成的CAM结果进行可视化分析,证明了模型可成功定位单细胞内蛋白质亚细胞。结论 通过CP-GCN网络与伪标签分配策略相结合,可以有效地捕捉蛋白质图像中的异质性细胞的特征,精确定位单细胞内蛋白质位置。
Objective This study proposes a novel single-cell protein localization method based on a class perception graph convolutional network (CP-GCN) to overcome several critical challenges in protein microscopic image analysis, including the scarcity of cell-level annotations, inadequate feature extraction, and the difficulty in achieving precise protein localization within individual cells. The methodology involves multiple innovative components designed to enhance both feature extraction and localization accuracy.Methods First, a class perception module (CPM) is developed to effectively capture and distinguish semantic features across different subcellular categories, enabling more discriminative feature representation. Building upon this, the CP-GCN network is designed to explore global features of subcellular proteins in multicellular environments. This network incorporates a category feature-aware module to extract protein semantic features aligned with label dimensions and establishes a subcellular relationship mining module to model correlations between different subcellular structures. By doing so, it generates co-occurrence embedding features that encode spatial and contextual relationships among subcellular locations, thereby improving feature representation. To further refine localization, a multi-scale feature analysis approach is employed using the K-means clustering algorithm, which classifies multi-scale features within each subcellular category and generates multi-cell class activation maps (CAMs). These CAMs highlight discriminative regions associated with specific subcellular locations, facilitating more accurate protein localization. Additionally, a pseudo-label generation strategy is introduced to address the lack of annotated single-cell data. This strategy segments multicellular images into single-cell images and assigns reliable pseudo-labels based on the CAM-predicted regions, ensuring high-quality training data for single-cell analysis. Under a transfer learning framework, the model is trained to achieve precise single-cell-level protein localization, leveraging both the extracted features and pseudo-labels for robust performance.Results Experimental validation on multiple single-cell test datasets demonstrates that the proposed method significantly outperforms existing approaches in terms of robustness and localization accuracy. Specifically, on the Kaggle 2021 dataset, the method achieves superior mean average precision (mAP) metrics across 18 subcellular categories, highlighting its effectiveness in diverse protein localization tasks. Visualization of the generated CAM results further confirms the model"s capability to accurately localize subcellular proteins within individual cells, even in complex multicellular environments.Conclusion The integration of the CP-GCN network with a pseudo-labeling strategy enables the proposed method to effectively capture heterogeneous cellular features in protein images and achieve precise single-cell protein localization. This advancement not only addresses key limitations in current protein image analysis but also provides a scalable and accurate solution for subcellular protein studies, with potential applications in biomedical research and diagnostic imaging. The success of this method underscores the importance of combining advanced deep learning architectures with innovative training strategies to overcome data scarcity and improve localization performance in biological image analysis. Future work could explore the extension of this framework to other types of microscopic imaging and its application in large-scale protein interaction studies.
唐浩漾,姚欣悦,王濛濛,杨思聪.基于类相关图卷积网络的单细胞蛋白质定位方法[J].生物化学与生物物理进展,,():
复制生物化学与生物物理进展 ® 2025 版权所有 ICP:京ICP备05023138号-1 京公网安备 11010502031771号