Advances in High-throughput Protein Structural Bioinformatics
Author:
Affiliation:

State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing 211189, China

Clc Number:

Fund Project:

This work was supported by a grant from National Key Research and Development Program of China (2016YFA0501600).

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    This review provides a comprehensive summary of the latest advancements in high-throughput protein structural bioinformatics, a field that has undergone a revolutionary transformation with the advent of deep learning-based protein structure prediction systems like AlphaFold2. These systems have significantly increased the accuracy, speed, and scale of protein structure prediction, resulting in an exponential growth in the number of protein structures available for analysis. Notably, the AlphaFold Protein Structure Database (AFDB) has amassed over 214 million protein structures, surpassing the PDB’s 50-year cumulative data by over 1 000-fold within several months. Big data is driving the comprehensive upgrade of protein structural bioinformatics. This review focuses on three main areas: structure data management, tool development, and structure data mining. In the realm of structure data management, the review spotlights the optimization strategy of AlphaFold-like systems, which significantly reduces the resource requirements for protein folding, enabling more researchers to make custom structure predictions and further enlarging the data scale. The resulting “data explosion” has exerted increased pressure on storage and bandwidth, prompting the development of cutting-edge tools such as Foldcomp, PDC, and ProteStAr for compressing PDB files. Moreover, the review underscores the critical role of public repositories like ModelArchive and PDB-Dev in archiving and sharing third-party AlphaFold models. It also highlights the utilization of independent services like MineProt and 3D-Beacons to create more interactive and accessible data portals. In terms of tool development, the review spotlights recent breakthroughs in structure alignment algorithms, represented by Foldseek, which enable ultra-fast searching of large protein structure databases. It also covers tools for functional annotation of proteins based on their structures, including AlphaFill for ligand annotation, DeepFRI for Gene Ontology (GO) annotation, TT3D for protein-protein interaction (PPI) prediction, among others. It is proposed that 3Di sequences born concurrently with Foldseek can enhance many sequence-based deep learning models developed in the pre-AlphaFold era, enabling them to be applied to structure-based function prediction. The challenges on traditional molecular docking methods in the high-throughput era are mentioned at last, in a gesture to arouse the attention of researchers. Finally, the review explores the burgeoning field of structure data mining. Whole proteome structuring has become feasible in recent years, and scientists are processing large structure datasets from an omics viewpoint, continuously identifying analyzable elements and optimizing methodologies, as well as utilizing newly developed tools to push the boundaries. Notable examples include the identification of new protein families, the development of protein structure clustering, and the integration of AlphaFold with conventional experimental techniques to solve large structures. These advancements are paving the way for a deeper understanding of protein structure and function and have the potential to unlock new discoveries in the life sciences. However, the review also acknowledges the challenges and limitations that persist in the field, including the lack of diversity in high-throughput software for protein structural bioinformatics and the existing bottleneck in rapidly predicting protein complex structures. Overall, structural bioinformatics is expected to play an even more crucial role in the life sciences with the development of high-throughput methodology.

    Reference
    Related
    Cited by
Get Citation

ZHU Yun-Chi, LU Zu-Hong. Advances in High-throughput Protein Structural Bioinformatics[J]. Progress in Biochemistry and Biophysics,2024,51(9):1989-1999

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:March 04,2024
  • Revised:July 09,2024
  • Accepted:April 18,2024
  • Online: September 19,2024
  • Published: September 20,2024