De Novo Protein Fold Design Through Sequence-independent Fragment Assembly Simulations (Proc. Natl. Acad. Sci. U S A, Jan 23)

Robin Pearce 1Xiaoqiang Huang 1Gilbert S Omenn 1 2 3 4Yang Zhang 1 5 6 7

Affiliations

  • 1Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109.
  • 2Department of Internal Medicine, University of Michigan, Ann Arbor, MI 48109.
  • 3Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109.
  • 4School of Public Health, University of Michigan, Ann Arbor, MI 48109.
  • 5Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109.
  • 6Department of Computer Science, School of Computing, National University of Singapore 117417, Singapore.
  • 7Cancer Science Institute of Singapore, National University of Singapore 117599, Singapore.

Abstract

De novo protein design generally consists of two steps, including structure and sequence design. Many protein design studies have focused on sequence design with scaffolds adapted from native structures in the PDB, which renders novel areas of protein structure and function space unexplored. We developed FoldDesign to create novel protein folds from specific secondary structure (SS) assignments through sequence-independent replica-exchange Monte Carlo (REMC) simulations. The method was tested on 354 non-redundant topologies, where FoldDesign consistently created stable structural folds, while recapitulating on average 87.7% of the SS elements. Meanwhile, the FoldDesign scaffolds had well-formed structures with buried residues and solvent-exposed areas closely matching their native counterparts. Despite the high fidelity to the input SS restraints and local structural characteristics of native proteins, a large portion of the designed scaffolds possessed global folds completely different from natural proteins in the PDB, highlighting the ability of FoldDesign to explore novel areas of protein fold space. Detailed data analyses revealed that the major contributions to the successful structure design lay in the optimal energy force field, which contains a balanced set of SS packing terms, and REMC simulations, which were coupled with multiple auxiliary movements to efficiently search the conformational space. Additionally, the ability to recognize and assemble uncommon super-SS geometries, rather than the unique arrangement of common SS motifs, was the key to generating novel folds. These results demonstrate a strong potential to explore both structural and functional spaces through computational design simulations that natural proteins have not reached through evolution.

PMID: 36656852  DOI: 10.1073/pnas.2208275120