1)山东理工大学计算机科学与技术学院,淄博 255000;2)北京航空航天大学医学科学与工程学院&生物与医学工程学院,北京 100191;3)清华大学生命科学学院,北京 100084
山东省高等学校优秀青年创新团队支持计划(2019KJN048) 和 国家自然科学基金(31500669) 资助项目。
1)School of Computer Science and Technology, Shandong University of Technology, Zibo 255000, China;2)School of Engineering Medicine & School of Biological Sicence and Medical Engineering, Beihang University, Beijing 100191, China;3)School of Life Sciences, Tsinghua University, Beijing 100084, China
This work was supported by grants from Support Program for Outstanding Youth Innovation Teams in Higher Educational Institutions of Shandong Province (2019KJN048) and The National Natural Science Foundation of China (31500669).
数据非依赖采集(DIA)是蛋白质组学领域近年来快速发展的质谱采集技术,其通过无偏碎裂隔离窗口内的所有母离子采集二级谱图,理论上可实现蛋白质样品的深度覆盖,同时具有高通量、高重现性和高灵敏度的优点。现有的DIA数据采集方法可以分为全窗口碎裂方法、隔离窗口序列碎裂方法和四维DIA数据采集方法(4D-DIA)3大类。针对DIA数据的不同特点,主要数据解析方法包括谱库搜索方法、蛋白质序列库直接搜索方法、伪二级谱图鉴定方法和从头测序方法4大类。解析得到的肽段鉴定结果需要进行可信度评估,包括使用机器学习方法的重排序和对报告结果集合的假发现率估计两个步骤,实现对数据解析结果的质控。本文对DIA数据的采集方法、数据解析方法及软件和鉴定结果可信度评估方法进行了整理和综述,并展望了未来的发展方向。
Data independent acquisition (DIA) is a rapidly developing proteomics technique in recent years, which can theoretically achieve deep coverage of protein samples by collecting tandem mass spectra through unbiased co-fragmentation of all precursors in the isolation window. It has the advantages of high throughput, high reproducibility and high sensitivity. Current DIA data acquisition methods mainly include full-window fragmentation method, isolation window sequential fragmentation method and four-dimensional DIA data acquisition method (4D-DIA). The most commonly used data acquisition methods are SWATH or variable window SWATH and DIA-PASEF methods. The tandem mass spectra collected by the full-window fragmentation method contains precursor ions in the full m/z range, and the spectra analysis is complex. The isolation window sequential fragmentation method reduces the number of precursor ions in tandem mass spectra and the size of the isolation window through a variety of acquisition strategies, effectively reducing the complexity of spectra interpretation. With the development of mass spectrometry instruments, the size of isolation window of the tandem mass spectra acquired by DIA may be close to the size of DDA, enabling the integration of DIA and DDA processes. The 4D-DIA method obtains the corresponding relationship between precursor and fragment ions through additional data dimensions, which improve the selectivity of precursor and greatly reduce the complexity of spectral analysis. The 4D-DIA method is also an important advance for future DIA data collection. According to the characteristics of DIA data, relevant data analysis methods were designed, which mainly included spectral library search method, protein database direct search method, pseudo-MS/MS spectra identification method and de novo sequencing method, as showed in the figure above. The spectral library search method uses the spectral library information for data extraction, which has high peptide identification sensitivity, but have certain requirements on the quality and number of spectral libraries; the protein database direct search method does not require preprocessing of tandem mass spectra and construction of spectral libraries, and directly matches the theoretical tandem mass spectrum of peptide with experimental tandem mass spectrum, but the time complexity is high; pseudo-MS/MS spectra identification method uses the spectra splitting algorithm to split the tandem mass spectrum to obtain multiple pseudo-MS/MS spectra containing single peptide fragment ions, then combined with traditional DDA software to search pseudo-MS/MS spectra; de novo sequencing method directly models the pseudo-MS/MS spectrum through deep learning to predict peptides, has the advantage of identifying sequences of new species, but it is difficult to guarantee the number and reliability of the identification results. The reliability evaluation of the peptide-spectrum matches mainly includes re-ranking by machine learning and false discovery rate estimation of the reported results. Although the DIA method has achieved rapid development in recent years, and has better performance than DDA in terms of depth coverage, there are still shortcomings and improvement in 3 aspects: in-depth analysis, accurate identification and accurate quantification. With the optimization of mass spectrometry acquisition and the development of data analysis, DIA acquisition technology can provide further support for high throughput, full-coverage analysis of proteomics, especially in large cohort data analysis, after further solving the above-mentioned shortcomings. All of them can obtain complete protein maps and explain their underlying life laws, promoting the development of the field of proteomics. In this paper, the DIA data collection method, data analysis method, software and identification result reliability assessment method are sorted and reviewed, and the future development direction is prospected.
侯鑫行,周丕宇,宫鹏云,付嘉乐,刘超,王海鹏.基于数据非依赖采集的蛋白质组质谱数据解析方法研究进展[J].生物化学与生物物理进展,2022,49(12):2364-2386
复制生物化学与生物物理进展 ® 2024 版权所有 ICP:京ICP备05023138号-1 京公网安备 11010502031771号