Improving Deep Learning Protein Monomer and Complex Structure Prediction using DeepMSA2 with Huge Metagenomics Data (Nature Methods, Oct 2023)

Wei Zheng 1Qiqige Wuyun 2Yang Li 1 3Chengxin Zhang 1P Lydia Freddolino 4 5Yang Zhang 6 7 8 9 10

Affiliations

  • 1Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
  • 2Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA.
  • 3Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore.
  • 4Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA. lydsf@umich.edu.
  • 5Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA. lydsf@umich.edu.
  • 6Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA. zhang@zhanggroup.org.
  • 7Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore. zhang@zhanggroup.org.
  • 8Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA. zhang@zhanggroup.org.
  • 9Department of Computer Science, School of Computing, National University of Singapore, Singapore, Singapore. zhang@zhanggroup.org.
  • 10Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore. zhang@zhanggroup.org.

Abstract

Leveraging iterative alignment search through genomic and metagenome sequence databases, we report the DeepMSA2 pipeline for uniform protein single- and multichain multiple-sequence alignment (MSA) construction. Large-scale benchmarks show that DeepMSA2 MSAs can remarkably increase the accuracy of protein tertiary and quaternary structure predictions compared with current state-of-the-art methods. An integrated pipeline with DeepMSA2 participated in the most recent CASP15 experiment and created complex structural models with considerably higher quality than the AlphaFold2-Multimer server (v.2.2.0). Detailed data analyses show that the major advantage of DeepMSA2 lies in its balanced alignment search and effective model selection, and in the power of integrating huge metagenomics databases. These results demonstrate a new avenue to improve deep learning protein structure prediction through advanced MSA construction and provide additional evidence that optimization of input information to deep learning-based structure prediction methods must be considered with as much care as the design of the predictor itself.

PMID: 38167654        DOI: 10.1038/s41592-023-02130-4