Progress in Data Analysis Methods for Proteome Mass Spectrometry Based on Data-independent Acquisition
Author:
Affiliation:

1)School of Computer Science and Technology, Shandong University of Technology, Zibo 255000, China;2)School of Engineering Medicine & School of Biological Sicence and Medical Engineering, Beihang University, Beijing 100191, China;3)School of Life Sciences, Tsinghua University, Beijing 100084, China

Clc Number:

Fund Project:

This work was supported by grants from Support Program for Outstanding Youth Innovation Teams in Higher Educational Institutions of Shandong Province (2019KJN048) and The National Natural Science Foundation of China (31500669).

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Data independent acquisition (DIA) is a rapidly developing proteomics technique in recent years, which can theoretically achieve deep coverage of protein samples by collecting tandem mass spectra through unbiased co-fragmentation of all precursors in the isolation window. It has the advantages of high throughput, high reproducibility and high sensitivity. Current DIA data acquisition methods mainly include full-window fragmentation method, isolation window sequential fragmentation method and four-dimensional DIA data acquisition method (4D-DIA). The most commonly used data acquisition methods are SWATH or variable window SWATH and DIA-PASEF methods. The tandem mass spectra collected by the full-window fragmentation method contains precursor ions in the full m/z range, and the spectra analysis is complex. The isolation window sequential fragmentation method reduces the number of precursor ions in tandem mass spectra and the size of the isolation window through a variety of acquisition strategies, effectively reducing the complexity of spectra interpretation. With the development of mass spectrometry instruments, the size of isolation window of the tandem mass spectra acquired by DIA may be close to the size of DDA, enabling the integration of DIA and DDA processes. The 4D-DIA method obtains the corresponding relationship between precursor and fragment ions through additional data dimensions, which improve the selectivity of precursor and greatly reduce the complexity of spectral analysis. The 4D-DIA method is also an important advance for future DIA data collection. According to the characteristics of DIA data, relevant data analysis methods were designed, which mainly included spectral library search method, protein database direct search method, pseudo-MS/MS spectra identification method and de novo sequencing method, as showed in the figure above. The spectral library search method uses the spectral library information for data extraction, which has high peptide identification sensitivity, but have certain requirements on the quality and number of spectral libraries; the protein database direct search method does not require preprocessing of tandem mass spectra and construction of spectral libraries, and directly matches the theoretical tandem mass spectrum of peptide with experimental tandem mass spectrum, but the time complexity is high; pseudo-MS/MS spectra identification method uses the spectra splitting algorithm to split the tandem mass spectrum to obtain multiple pseudo-MS/MS spectra containing single peptide fragment ions, then combined with traditional DDA software to search pseudo-MS/MS spectra; de novo sequencing method directly models the pseudo-MS/MS spectrum through deep learning to predict peptides, has the advantage of identifying sequences of new species, but it is difficult to guarantee the number and reliability of the identification results. The reliability evaluation of the peptide-spectrum matches mainly includes re-ranking by machine learning and false discovery rate estimation of the reported results. Although the DIA method has achieved rapid development in recent years, and has better performance than DDA in terms of depth coverage, there are still shortcomings and improvement in 3 aspects: in-depth analysis, accurate identification and accurate quantification. With the optimization of mass spectrometry acquisition and the development of data analysis, DIA acquisition technology can provide further support for high throughput, full-coverage analysis of proteomics, especially in large cohort data analysis, after further solving the above-mentioned shortcomings. All of them can obtain complete protein maps and explain their underlying life laws, promoting the development of the field of proteomics. In this paper, the DIA data collection method, data analysis method, software and identification result reliability assessment method are sorted and reviewed, and the future development direction is prospected.

    Reference
    Related
    Cited by
Get Citation

HOU Xin-Hang, ZHOU Pi-Yu, GONG Peng-Yun, FU Jia-Le, LIU Chao, WANG Hai-Peng. Progress in Data Analysis Methods for Proteome Mass Spectrometry Based on Data-independent Acquisition[J]. Progress in Biochemistry and Biophysics,2022,49(12):2364-2386

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:November 16,2021
  • Revised:November 18,2022
  • Accepted:March 21,2022
  • Online: December 20,2022
  • Published: December 20,2022