After AlphaFold2, What’s Next? 3 Scenarios of Focus Shift in Protein Science.

Johnny Tam
5 min readDec 4, 2020

### 20210721 update ###
An easy-to-use version of AlphaFold2 is now available to the public,
please refer to this Google Colab notebook! Cheers!!!

https://colab.research.google.com/drive/1LVPSOf4L502F21RWBmYJJYYLDlOU2NTL

AlphaFold2 by Google DeepMind has made another major progress in protein structure modeling. Several news articles even recognized this holy grail challenge a solved problem. Imagine we have accurate 3D models of proteins from ALL available protein sequences known to fold, what’s next? How to unleash the true potential of a huge number of handy protein structures? How will protein scientists shift their focus in the future? I would like to project 3 possible shifts of focus in protein science at this historical turning point. For researchers in related fields, this article serves to stimulate your imagination of what to pursue in the future. For investors in the biotech sector, I hope it helps hint at your next investment goal. I will throw in keywords for your further research.

SARS CoV-2 (COVID19) M protein structure predicted by AlphaFold (an artistic retouch by Prisma).

Scenario (1): Focusing Molecular Dynamics (MD) Simulation

Put it an analogy, AlphaFold2 is a cooking machine that knows the perfect dish to made with given ingredients. We feed in the ingredients (i.e. protein sequence), it has the best idea of what dish to make (i.e. how the sequence should fold), and the product is the perfect dish (i.e. folded protein with state-of-the-art accuracy). But what are the steps of cooking the perfect dish? How does exactly a protein fold? AlphaFold2 solved the protein folding problem by answering the “WHAT” question: “what structure will a sequence fold into?”. Foreseeably, we will shift focus to answering the “HOW” question: “how does a protein fold?”, which is another long-standing holy grail in structural biology. To answer this question, simulating molecular dynamics (MD) is the go-to idea and I foresee an unprecedented demand for MD simulation to study protein folding and functions with the flood of protein structural model from AlphaFold2.

A 600ns MD simulation of the spike protein of SARS CoV-2 (COVID19) by Max Planck Institute of Biophysics.

Molecular dynamics, in a nutshell, puts a protein structure into a virtual box of water and ions (i.e. the buffer), lets the protein float around to see how it interacts externally (e.g. small molecules, a drug for instance) and how it interacts internally (e.g. side-chain rotation, loop flexibility, domain movements). In the sense that AlphaFold2 only gives an answer without telling “how”, researchers will dive deeper into what AlphaFold2 does not answer: the dynamics of proteins, that not only accounts for folding but also their movements while functioning. A huge number of handy yet accurate protein structures will provide us the wealth of starting material to simulate more, and simulate more complexly (imagine, simulate a cell with all its proteins). For researchers who spent most of your time dealing with your protein aggregates, start learning simulation techniques from now to stay competitive. For biotech investors, keep an eye on upcoming GPUs development and manufacture.

Microfluidics, the next big thing for ultra-high-throughput assays, demonstrated by MIT Media Lab.

Scenario (2): Focusing Ultra-high-throughput Assays

Accurate protein structures help us better predict possible protein functions but unfortunately, functions are many while folds are limited: knowing how protein fold does not always map accurately to function. Whether the protein is an antibody, an enzyme, or DNA binding protein, we need assays of specific reaction conditions to verify the function of proteins. Any accurate structure from AlphaFold2 is still largely useless if we are not sure how to use the protein (i.e. the function). Most of the time only when we know the protein functions confirmed by experiments, the structure, which is the physical map allowing biologists to observe, explain, and manipulate protein functions, starts to become valuable.

Figure 1. How AlphaFold2 complemented with scenario (1) MD simulation and scenario (2) ultra-high-throughput assay could lead to the complete understanding of the structure-function relationship of proteins.

However, the reaction condition for a particular assay is very specific: what assay? What substrates? What ions? Only for reaction conditions, there are infinite possibilities for researchers to try on. Not even to mention we have numerous protein sequences and their mutants (again, an infinite number) to assay. This is a particularly likely scenario when molecular biology is entering an era of sequence writing. Biologists are writing new protein sequences to perform new functions. So at the same time, we have to assay a universe of protein sequences we synthesize. No kidding, such a level of assay matrixing is way beyond the ability of manual pipetting could reach. I foresee, to unleash the true value of overflowing protein structures from AlphaFold2, there will be a boom in ultra-high-throughput assays (yes, “high” is too low, it has to be “ultra-high”). For researchers, grasps every chance to learn automated liquid-handling systems, especially on how to instruct automation to do the pipetting for you. For investors, stay tuned to the application of 3D printing and microfluidics to make assay chips.

D. radiodurans, the most radiation-resistant organisms ever found according to Wikipedia. Are we going to create an extremophilic version of algae that helps us generate oxygen in other planets with the advancement of synthetic biology?

Scenario (3): The Renaissance of Synthetic Biology

Rather than just another scenario in parallel, I see the renaissance of synthetic biology as a result of scenario (1) and (2). The concept of synthetic biology has been proposed for a long time. Its core is to build lives, like building legos, to perform new functions. Remember in Avengers Endgame we need to collect enough stones to end the world. Similarly, biologists are collecting several stones to drive the exponential growth of synthetic biology (hopefully we won’t end the world). The first stone is to read DNA cheaply to provide sequence to engineer, which has been largely accomplished by next-generation sequencing. The second stone is to edit and write DNA, which we are gaining the ability by improving e.g. CRSIPR genome editing and enzymatic DNA synthesis. Here I treat AlphaFold2 plus upscaled computational simulation and assay throughput as the remaining stones. After collecting all the stones, the renaissance of synthetic biology is inevitable. In practice, as illustrated in Figure 1., we get the structure of any protein by AlphaFold2, determine its function with assays, understand the structure-function relationship by MD simulation, then edit/write new sequences to generate new mutants with improved function. The whole process is iterated multiple times to get the best variant. Although by doing this, we are not going to create new lives like pokemon immediately. We are going to design antibodies to target any life-threatening bugs. We are going to design enzymes that synthesize many difficult-to-synthesize organic compounds, some may be new drugs in the future. We can also create thermostable algae, send them to outer planets to release oxygens, convert the planet to become habitable. These are some of the foreseeable advancements from the renaissance of synthetic biology. For researchers, find your career niche in the loop (if it fits). For investors, the latest keywords are enzymatic DNA synthesis, computational protein design (antibody, enzymes etc.), and genome writing.

--

--

Johnny Tam

Bioinformatician Specialized in NGS Technology and Protein Engineering. Finishing his PhD studies at UTokyo + RIKEN, Japan. Feel free to reach out! Cheers~ :)