In 2018, during her chemistry Nobel Prize lecture, Frances Arnold noted that scientists had arrived at a point where they could read, write, and edit any sequence of DNA. But composing whole genes or even whole genomes from scratch — that was something only evolution could do.
A few years later, not long after helping to launch the Arc Institute, a nonprofit research center in the Bay Area, molecular engineer Patrick Hsu wondered if it was possible to imitate the forces of evolution that Arnold had been referring to. DNA is a language, after all, and with all the advances in generative AI — chatbots that could hold eerily lifelike conversations if trained on enough text — maybe recreating all the cellular complexity contained in a genome wasn’t that far behind.
Working with Brian Hie, a computational biologist at Stanford University and a fellow Arc Institute member, Hsu, who is also an assistant professor at the University of California, Berkeley, began assembling a team of scientists to train an AI model on vast troves of biological data — 300 billion DNA letters, including long sequences from 80,000 genomes of bacteria and archaea.
This article is exclusive to STAT+ subscribers
Unlock this article — plus in-depth analysis, newsletters, premium events, and news alerts.
Already have an account? Log in
To submit a correction request, please visit our Contact Us page.