1)西安交通大学生命科学与技术学院线粒体生物医学研究所,西安 710049;2)西安交通大学人工智能与机器人研究所,视觉信息与应用国家工程研究中心,人机混合增强智能全国重点实验室,西安 710049
国家自然科学基金(32271281)资助项目。
1)Institute of Mitochondrial Biomedicine, School of Life Sciences and Technology, Xi’an Jiaotong University, Xi’an 710049, China;2)National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an 710049, China
This work was supported by a grant from The National Natural Science Foundation of China (32271281).
人工智能技术在生物学领域的应用在近几年取得了突飞猛进的发展,其中最显著的成果为蛋白质结构预测和设计,该成果于2024年荣获诺贝尔化学奖。可以预见,对蛋白质各类物理和化学属性的精准预测将是蛋白质预测领域下一阶段的重要发展方向。蛋白质热力学稳定性在深入了解生命活动机制、药物研发、疾病诊断和治疗,以及生物技术产业中酶制剂的生产、生物传感器研发以及蛋白质药物制备等方面均具有重要意义。借助人工智能技术进行蛋白质热力学稳定性的精准预测将大幅提升蛋白质相关的科学研究能力和产业发展效率。本文综述了蛋白质热力学稳定性预测技术的发展历程,梳理了从生物实验测定方法、传统能量函数计算方法到现代机器学习预测方法。重点讨论了基于机器学习的预测模型,尤其是深度神经网络、图神经网络和注意力机制等前沿算法在蛋白质热力学稳定性预测中的突破。深入讨论了突变稳定性预测的核心挑战,如数据集质量与数量不平衡、模型过拟合及蛋白质动态性的建模等难题。旨在为研究人员提供一个全面的参考框架,助力突变蛋白质热力学稳定性预测技术的发展。
In recent years, the application of artificial intelligence (AI) in the field of biology has witnessed remarkable advancements. Among these, the most notable achievements have emerged in the domain of protein structure prediction and design, with AlphaFold and related innovations earning the 2024 Nobel Prize in Chemistry. These breakthroughs have transformed our ability to understand protein folding and molecular interactions, marking a pivotal milestone in computational biology. Looking ahead, it is foreseeable that the accurate prediction of various physicochemical properties of proteins—beyond static structure—will become the next critical frontier in this rapidly evolving field. One of the most important protein properties is thermodynamic stability, which refers to a protein’s ability to maintain its native conformation under physiological or stress conditions. Accurate prediction of protein stability, especially upon single-point mutations, plays a vital role in numerous scientific and industrial domains. These include understanding the molecular basis of disease, rational drug design, development of therapeutic proteins, design of more robust industrial enzymes, and engineering of biosensors. Consequently, the ability to reliably forecast the stability changes caused by mutations has broad and transformative implications across biomedical and biotechnological applications. Historically, protein stability was assessed via experimental methods such as differential scanning calorimetry (DSC) and circular dichroism (CD), which, while precise, are time-consuming and resource-intensive. This prompted the development of computational approaches, including empirical energy functions and physics-based simulations. However, these traditional models often fall short in capturing the complex, high-dimensional nature of protein conformational landscapes and mutational effects. Recent advances in machine learning (ML) have significantly improved predictive performance in this area. Early ML models used handcrafted features derived from sequence and structure, whereas modern deep learning models leverage massive datasets and learn representations directly from data. Deep neural networks (DNNs), graph neural networks (GNNs), and attention-based architectures such as transformers have shown particular promise. GNNs, in particular, excel at modeling spatial and topological relationships in molecular structures, making them well-suited for protein modeling tasks. Furthermore, attention mechanisms enable models to dynamically weigh the contribution of specific residues or regions, capturing long-range interactions and allosteric effects. Nevertheless, several key challenges remain. These include the imbalance and scarcity of high-quality experimental datasets, particularly for rare or functionally significant mutations, which can lead to biased or overfitted models. Additionally, the inherently dynamic nature of proteins—their conformational flexibility and context-dependent behavior—is difficult to encode in static structural representations. Current models often rely on a single structure or average conformation, which may overlook important aspects of stability modulation. Efforts are ongoing to incorporate multi-conformational ensembles, molecular dynamics simulations, and physics-informed learning frameworks into predictive models. This paper presents a comprehensive review of the evolution of protein thermodynamic stability prediction techniques, with emphasis on the recent progress enabled by machine learning. It highlights representative datasets, modeling strategies, evaluation benchmarks, and the integration of structural and biochemical features. The aim is to provide researchers with a structured and up-to-date reference, guiding the development of more robust, generalizable, and interpretable models for predicting protein stability changes upon mutation. As the field moves forward, the synergy between data-driven AI methods and domain-specific biological knowledge will be key to unlocking deeper understanding and broader applications of protein engineering.
陶林节,徐凡丁,郭宇,龙建纲,鲁卓阳.基于人工智能的蛋白质热力学稳定性预测[J].生物化学与生物物理进展,2025,52(8):1972-1985
复制
生物化学与生物物理进展 ® 2025 版权所有 ICP:京ICP备05023138号-1 京公网安备 11010502031771号