The First Conversation — The AI Files

Key Facts

MIT CSAIL, Project CETI, and UC Berkeley published sperm whale coda research in Nature Communications, May 2024.
Analysis of 8,719 codas revealed combinatorial structure across rhythm, tempo, rubato, and ornamentation.
At least 143 distinct coda types were identified, roughly doubling estimated information capacity.
A 2025 Science paper showed humpback whale songs follow Zipf's law.
Researchers stated: 'We do not know yet what they are saying.'

01 — The Longest ListenContext

In Andy Weir's Project Hail Mary, the alien Rocky communicates through musical chords -- tonal clusters produced by five air bladders, carrying meaning in rhythm and pitch and timing. The protagonist has no Rosetta Stone. He builds a translator anyway, note by note, pattern by pattern, until two species separated by biology and light-years are arguing about fuel ratios. In 2024, a team at MIT fed 8,719 sperm whale recordings into a machine learning model and heard something that would have made Grace drop his whiteboard marker: clicks arranged in combinatorial patterns with rhythm, tempo, rubato, and ornamentation -- a phonetic system hiding in the ocean, inaudible to the human ear for sixty years of listening.

This is not fiction. The alphabet is real. The dictionary is not.

Humans have known whales vocalize since 1967, when Roger Payne and Scott McVay discovered humpback whale song. Payne released the 1970 LP Songs of the Humpback Whale, which sold over 100,000 copies and helped catalyze the Save the Whales movement. For decades, whale researchers could hear pattern but could not parse structure. In 2005, Shane Gero, a Canadian whale biologist now at Carleton University, began recording sperm whale families off the western coast of Dominica. He documented over twenty social units across generations -- a longitudinal dataset stretching nearly two decades that would become the foundation for everything that followed.

In 2020, David Gruber, a marine biologist and National Geographic Explorer, founded Project CETI (Cetacean Translation Initiative) with $33 million in TED Audacious Prize funding. The team grew to over seventy members across thirteen institutions -- MIT CSAIL, UC Berkeley, Harvard, Google Research, Oxford, and others. The mission was direct: use AI to decode sperm whale communication. A 2022 roadmap paper in iScience laid out the approach: massive dataset collection, communication unit detection, playback validation. Roger Payne, the man who had first heard whale song, was among the advisors.

Payne died on June 10, 2023, at age 88. He is listed posthumously as co-author on the 2024 Nature Communications paper that proposed the first sperm whale phonetic alphabet. The man who heard the song never lived to see the alphabet decoded from it.

1967

Roger Payne discovers humpback whale song

2005

Shane Gero begins recording sperm whale families off Dominica

2020

Project CETI founded; $33M raised; 13 institutional partners

2022

Scientific roadmap published in iScience

2023
Roger Payne dies; listed posthumously on the breakthrough paper

2024
Nature Communications: the phonetic alphabet

2025
Vowels, Zipf's law, behavioral prediction

02 — The Alphabet in the ClickDiscovery

The 2024 Nature Communications paper, authored by Pratyusha Sharma, Gero, Payne, Gruber, Daniela Rus, Antonio Torralba, and Jacob Andreas, analyzed 8,719 codas from the EC-1 clan of sperm whales -- recordings collected between 2005 and 2018, covering at least sixty individuals across eleven social units. What the machine learning models found was not a simple repertoire of call types. It was a combinatorial system.

The researchers identified four structural features operating independently within each coda. Rhythm: the pattern of inter-click intervals, with eighteen distinct types. Tempo: the overall duration of the coda, falling into five discrete categories. Rubato: a gradual drift in coda duration across consecutive calls -- approximately 0.05 seconds between adjacent codas, compared to 0.08 seconds for random same-type pairs. And ornamentation: an extra click appended to the end of a coda, appearing in 4% of exchanges, significantly more common at the beginning and end of call sequences.

Anatomy of a Sperm Whale Coda

Rhythm

* * * * *

18 distinct types

Tempo

|----······----|

5 discrete categories

Rubato

~ 0.05s drift →

Gradual tempo shift across consecutive calls

Ornament

* * * * * +

Extra click -- 4% of exchanges; marks sequence edges

Combined, these features yield at least 143 distinct coda types -- up from approximately 21 in the previous inventory. The information capacity roughly doubled, from about 5 bits per coda to approximately 10 bits. The researchers describe this as "duality of patterning": individually meaningless elements combining into larger meaningful units, a property previously documented only in human language. Linguists at Penn's Language Log disputed that framing, noting that no linguists appear on the author list and that "alphabet" is technically a categorical error -- alphabets are written phenomena. The structural finding itself, however, stands on peer-reviewed ground.

No human analyst could have parsed tempo gradations or rubato drift across thousands of calls. The machine learning models detected what biologists could not. "The expressivity of sperm whale calls is much larger than previously thought," said lead author Pratyusha Sharma. Jacob Andreas, an MIT CSAIL co-author, described the system as having "some of the same structural features as the most sophisticated communication systems in the animal kingdom."

The Coda — Scattered teal click-pulses drift through warm stillness. They slow and self-organize into rhythmic clusters connected by faint arcs. The pattern holds in a moment of almost-recognition. Then the connections dissolve and the clicks scatter back into solitary pulses. The alphabet appeared. The words did not.

03 — What the Machine HeardThe Mechanism

Gasper Begus, a linguistics professor at UC Berkeley and the linguistics lead at Project CETI, trained a generative adversarial network to imitate sperm whale codas. The GAN was a hypothesis engine: it learned on its own to treat certain spectral cues as meaningful, narrowing the search space for human analysts. When researchers sped up the audio -- removing the long silences between clicks -- vowel-like patterns emerged. Two recurring spectral shapes corresponded closely to the human vowels "a" and "i." Several diphthong-like sweeps appeared across individual codas.

"In the past, researchers thought of whale communication as a kind of morse code," Begus said. "However, this paper shows that their calls are more like very, very slow vowels."

Old Model: Morse Code

• • • • •

~21 coda types / ~5 bits

Every click sounds the same. Meaning comes only from timing.

New Model: Slow Vowels

a • i a→i

143+ combinations / ~10 bits / vowels + diphthongs

Each click carries spectral information. The silence between them was hiding vowels.

The vowel interpretation is disputed. Luke Rendell (St Andrews) calls the comparison "completely nonsense" and argues recording artifacts could produce similar patterns. Stephanie King (Bristol) suggests the patterns may reflect arousal states rather than intentional production. The debate is unresolved.

A separate line of evidence arrived in February 2025. A team including Jenny Allen, Ellen Garland, Inbal Arnon, and Simon Kirby published in Science that humpback whale songs follow Zipf's law -- the same statistical distribution found universally across human languages. This was the first time Zipf's law had been documented in another species. The songs contained "statistically coherent parts" where transitions between elements were more predictable within units than between them -- the same structural signature found in human speech.

In a preprint posted to bioRxiv in October 2024, Sharma and colleagues introduced WhaleLM, a neural sequence model trained on sperm whale coda sequences. It predicted future diving behavior with 86.4% accuracy -- far above the 50% random-chance baseline. This has not been peer-reviewed. But if it holds, it represents the first evidence that codas may coordinate behavior -- that they carry operational, not just social, information.

The broader field is accelerating. Google Research published a bioacoustics model in September 2024 that classifies whale species from spectrograms, identifying eight species across over 200,000 hours of underwater recordings. Earth Species Project, co-founded by Aza Raskin and Britt Selvitelle, secured $17 million in new funding (self-reported) and released AVES, a self-supervised foundation model for animal vocalizations. These are distinct research programs. Google classifies who is speaking. CETI decodes what they might be saying. But the underlying capability -- AI models trained on massive bioacoustic datasets -- is converging from multiple directions.

04 — Structure Is Not MeaningThe Debate

The most important sentence in the 2024 Nature Communications paper is not about combinatorics or information theory. It is: "We do not know yet what they are saying."

The researchers found structure. They did not find meaning. That gap is where every honest interpretation of this work must begin. Steven Pinker, the Harvard linguist, argued that sperm whale codas are likely "signature calls whose semantics is pretty much restricted to who they are, perhaps together with emotional calls." His logic is functional: "If whales could communicate complex messages, we should observe them collaborating on sophisticated tasks as humans do."

"We understand that one of our greatest risks is that the whales could be incredibly boring." — David Gruber, Project CETI founder

The CETI team's restraint is their strongest credential. Diana Reiss, a cognitive researcher, offered the counterpoint to dismissal: "I think we can safely say we're in a state of ignorance at this point." The honest position is that we do not know enough to dismiss or confirm. Combinatorial structure is not language. But it is not nothing. The question is what lies between.

05 — The Question Before the AnswerSignal

In March 2026, Project CETI published two studies documenting the first comprehensive recording of a sperm whale birth. On July 8, 2023, off the coast of Dominica, all eleven members of Unit A participated as the whale known as Rounder gave birth. Non-kin whales helped lift the newborn to the surface. Three generations were present: Lady Oracle, her daughter Rounder, and Rounder's daughter Accra. The cooperation was not restricted to relatives. This, published in Science, is the first quantitative evidence of non-kin cooperative birth assistance outside primates.

The legal thread is nascent but moving. In March 2024, Indigenous leaders from six Polynesian nations signed He Whakaputanga Moana, the first Indigenous-led declaration recognizing whales as legal persons, according to NPR. In February 2026, New Zealand Green Party MP Teanau Tuiono introduced the Tohora Oranga Bill in Parliament, seeking to enshrine whale personhood in statute. That bill has entered parliamentary ballot; its passage is uncertain. Whether AI-decoded communication evidence will factor into legal arguments remains to be seen. These are early signals, not established legal developments.

What has changed is the question itself. For fifty-seven years -- from Roger Payne's first recording of humpback song to the 2024 Nature Communications paper -- humans knew whales made patterned sounds. Now we know those sounds contain combinatorial, phonetic, and statistical structure that mirrors properties of human language. AI did not decode the meaning. It revealed that there is something to decode. In Project Hail Mary, Grace builds a translator and talks to Rocky. In the Caribbean, the alphabet has been found, the vowels are disputed, and the first word has not been read. The researchers who uncovered the structure said it themselves: "We do not know yet what they are saying." The question is no longer whether whales have something to say. The question is whether we will understand it when they do.

∞

What If?

The alphabet has been found. The vowels are disputed. The first word has not been read. But the technology to synthesize sperm whale codas already exists — Begus's GAN can generate click sequences indistinguishable from real ones, and CETI's roadmap has always included playback as a validation step. When enough structure is decoded to attempt a response, someone will broadcast synthesized codas into the Caribbean and wait for a whale to answer. The ethical architecture for that moment does not exist. Every protocol for first contact — from SETI's post-detection framework to the UN's Outer Space Treaty — assumes the other party is unreachable until they choose to speak. Sperm whales have been speaking for fifteen million years. They did not invite this conversation. A research team will choose which coda patterns to transmit, based on statistical models trained on one clan's recordings over thirteen years, and aim them at animals whose social structure, decision-making, and capacity for distress we do not understand. If a whale responds, we will not know what it said. We will not know what we said. We will not know if our synthesized coda was a greeting, a threat, a claim of territory, or nonsense that causes a mother to move her calf away from the sound. The combinatorial system means the wrong combination of rhythm, tempo, rubato, and ornamentation could produce a signal that carries meaning we never intended — and the whales have no way to tell us we got it wrong. Consent requires understanding on both sides. What happens when one side builds a megaphone before building a dictionary — and the other side has been listening the entire time?

How did this land?

Sources

← Previous Chapter 09 The Library No One Knew Existed 7 min read Next → Chapter 11 Founder Mode on Cancer 8 min read