The Fifty-Year Problem — The AI Files

Key Facts

DeepMind's AlphaFold 2 solved the protein folding problem at CASP14, December 2020.
Predictions matched experimental methods with a median GDT score of 92.4.
DeepMind released predicted structures for all 200 million known proteins in July 2022.
Over 2 million researchers in 190 countries use AlphaFold's predictions.
Demis Hassabis and John Jumper won the Nobel Prize in Chemistry, October 2024.

01 — The ProblemFifty Years of Failure

In the half-century between 1972 and 2022, experimental scientists — using X-ray crystallography, cryo-electron microscopy, and nuclear magnetic resonance — determined the three-dimensional structures of approximately 194,000 proteins. Each structure could take years of work and cost hundreds of thousands of dollars. In July 2022, a single upload to a database added 200 million more.

A protein's function is determined by its three-dimensional shape, and that shape is determined by its amino acid sequence. In 1972, Christian Anfinsen won the Nobel Prize in Chemistry for proving this principle. The question that followed consumed structural biology for the next fifty years: if the sequence determines the shape, can we predict the shape from the sequence alone?

The scale of the problem was staggering. In 1969, the molecular biologist Cyrus Levinthal calculated that a typical protein could assume approximately 10³⁰⁰ possible configurations. If a protein sampled one configuration per picosecond, it would take longer than the age of the universe to try them all. Yet real proteins fold into their correct shape in milliseconds. Nature had solved the problem. Science had not.

In 1994, a biennial competition called CASP — the Critical Assessment of protein Structure Prediction — was established to measure progress. Research groups would submit predicted structures for proteins whose actual structures had been determined experimentally but not yet published. For decades, progress was incremental. A few points per competition cycle. The protein folding problem was acknowledged, studied, and unsolved.

10³⁰⁰

possible configurations for a typical protein

years the folding problem resisted solution

194K

structures determined experimentally over five decades

The Fold — An amino acid chain drifts loosely, searching through configuration space. Then it contracts — rapidly, inevitably — into a compact structure. Cross-bonds appear. The protein holds its shape. Then it unfolds and begins again.

02 — The BetFrom Games to Genes

Demis Hassabis was a chess prodigy who reached master standard at thirteen. He designed video games at Bullfrog Productions alongside Peter Molyneux as a teenager, founded his own game studio, then pivoted to neuroscience — earning a PhD at University College London studying how the brain handles memory and imagination. His doctoral work on the link between episodic memory and imagining the future was named one of Science magazine's top ten breakthroughs of 2007.

In 2010, Hassabis co-founded DeepMind in London with Shane Legg and Mustafa Suleyman. The company's thesis was that techniques from neuroscience could unlock general-purpose AI. Google acquired DeepMind in 2014. In March 2016, DeepMind's AlphaGo defeated Lee Sedol, the world's best Go player, 4–1.

After AlphaGo, Hassabis turned the same deep-learning approach toward science. The first target was protein folding.

John Jumper arrived at DeepMind in 2017. He had studied physics and mathematics at Vanderbilt University, then earned a PhD at the University of Chicago applying machine learning to protein dynamics. His background bridged exactly the gap the project required: deep knowledge of both proteins and the neural network architectures that might predict them.

In December 2018, AlphaFold entered CASP13 — DeepMind's first attempt at the competition. It placed first in the Free Modeling category, scoring 68.3 in summed z-scores against 48.2 for the next closest group. The problem was not solved. But the signal was clear: a machine learning lab with no prior structural biology publication had outperformed every academic group in the field on its first try.

03 — The SolveNinety-Two Point Four

Two years later, AlphaFold 2 entered CASP14. The results, announced in November 2020, were not close.

AlphaFold 2 achieved a median GDT score of 92.4 across 97 target proteins. A score above 90 is generally considered competitive with experimental methods. For approximately two-thirds of the targets, AlphaFold scored above 90. The median error in atomic positions was less than one Angstrom — roughly the width of a single atom.

244.0

AlphaFold 2

CASP14 assessor ranking

90.8

Next Best Group

CASP14 assessor ranking

The gap was not incremental. It was a discontinuity.

Andrei Lupas, Director at the Max Planck Institute for Developmental Biology and a CASP assessor, put it directly: "It's a game changer. This will change medicine. It will change research. It will change bioengineering. It will change everything."

"A stunning advance on the protein folding problem." — Venki Ramakrishnan, Nobel laureate and structural biologist

John Moult, who had founded CASP in 1994 to measure exactly this kind of progress, said the problem was "in large part, solved."

For the other teams at CASP14 — groups that had spent years or decades on their own prediction methods — it was an extinction-level event for their subfield.

04 — The ReleaseTwo Hundred Million Structures

The breakthrough would have been significant regardless. What made it matter beyond the lab was what DeepMind did next.

On July 15, 2021, AlphaFold 2's full methodology was published in Nature, in a paper by Jumper, Hassabis, and colleagues that would accumulate nearly 43,000 citations by November 2025. On the same day, DeepMind open-sourced AlphaFold's code. One week later, in partnership with EMBL-EBI, they launched the AlphaFold Protein Structure Database — initially containing predicted structures for approximately 365,000 proteins.

In July 2022, one year later, DeepMind uploaded predicted structures for approximately 200 million proteins from one million species. Effectively, every protein known to science.

194K

Protein Data Bank
50 years of experimental work

→

200M

AlphaFold Database
One upload, July 2022

Science magazine named AlphaFold its 2021 Breakthrough of the Year. By late 2025, according to DeepMind, over three million researchers across 190 countries had used the database, with more than one million users in low- and middle-income countries. The open-source decision had turned a research result into infrastructure.

05 — The PrizeThe Nobel

On October 9, 2024, the Royal Swedish Academy of Sciences awarded the Nobel Prize in Chemistry to three people. Half the prize went to Demis Hassabis and John Jumper "for protein structure prediction." The other half went to David Baker, a biochemist at the University of Washington, "for computational protein design" — work that uses related techniques to design entirely new proteins that do not exist in nature.

The pairing was deliberate. Hassabis and Jumper had taught a machine to read the language of proteins. Baker had taught a machine to write in it.

1972

Anfinsen wins Nobel Prize, establishing sequence determines structure

1994

CASP competition founded to measure prediction progress

2018

AlphaFold enters CASP13, wins Free Modeling on first attempt

Nov 2020
AlphaFold 2 wins CASP14 with 92.4 GDT — problem solved

Jul 2021

Nature paper published, code and database open-sourced

Jul 2022
200 million protein structures uploaded — every known protein

Oct 2024
Hassabis, Jumper, and Baker awarded Nobel Prize in Chemistry

It was the first Nobel Prize in a natural science awarded primarily for work driven by artificial intelligence.

06 — SignalThe Other Side of the Fold

The applications have been concrete. Researchers at the University of Oxford used AlphaFold to identify a critical protein in malaria vaccine development, accelerating the path from basic research to clinical trials. Scientists have used it to design enzymes that break down plastic waste more efficiently. Drug discovery pipelines across the pharmaceutical industry have integrated AlphaFold predictions into their early-stage target identification.

But the same capabilities carry a second set of implications.

A 2024 paper in EMBO Reports — "Security challenges by AI-assisted protein design" — laid out the dual-use risks. AI-powered protein prediction and design reduce the time, resources, and expertise required for biological engineering. The paper concluded that these tools could "shorten the risk chain for biological weapon development" by lowering the barrier for non-experts.

This is not hypothetical. The same tools that allow a researcher to design a better enzyme or a novel therapeutic allow anyone with access to design a protein that folds into a shape optimized for harm. The database is open. The code is open. The barrier that once existed — years of training, millions of dollars of equipment, institutional access — has been compressed into a laptop and a download link.

The protein folding problem is solved. The protein design problem is just beginning. The fold goes both ways.

∞

What If?

The protein folding problem asked: given a sequence, what shape does it make? The protein design problem asks the inverse: given a desired shape, what sequence produces it? AlphaFold answered the first question. David Baker's work — the other half of the same Nobel Prize — is answering the second. Within a decade, designing a novel protein to a precise specification will be as routine as compiling code. The same capability that lets a researcher design a protein to neutralize a toxin lets someone design a protein that mimics one. The same database that accelerates a malaria vaccine accelerates a synthetic pathogen optimized for immune evasion. The barrier was never intent. It was capability — and capability just became a free download. No export control regime, no biosafety protocol, no institutional review board was designed for a world where the tools of molecular engineering are open-source, run on consumer hardware, and improve every eighteen months. The question is not whether someone will use protein design tools to cause harm. The question is what detection infrastructure exists when they do — and right now, the answer is: almost none.

How did this land?

Sources

← Previous Chapter 04 The Antibiotic That AI Found Hiding in Plain Sight 7 min read Next → Chapter 06 The Hurricane That AI Saw First 7 min read