DNA could revolutionize how we store our data
The Age of AI will rely on massive volumes of data that can be easily stored and retrieved—and bioscience may have an ingenious solution.

Shakespeare’s entire catalog of sonnets and eight of his tragedies, all of Wikipedia’s English-language pages, and one of the first movies ever made: scientists have been able to fit the contents of all these works in a space smaller than a tiny test tube. They didn’t somehow miniaturize them, though. Instead, they used DNA—the building block of all life—to encode the information in these creative works and store it at a microscopic scale.
As humans adopt advanced tools like artificial intelligence, tomorrow’s currency will be data. Already, tech giants like Microsoft are raising billions of dollars to construct data centers for AI. And there’s a very real “Storage Wars” scramble underway right now to figure out how to preserve and safeguard exponentially increasing amounts of data. Football field-size, gigawatt energy-sucking data centers are one option. Or DNA storage could be an energy-efficient, compact solution.
(Ancient DNA, from Neanderthals to the Black Plague, is transforming archaeology)
Step 1: Computer storage
We typically think of DNA as a blueprint or instruction booklet—its sequences of As, Ts, Cs, and Gs tell molecular machines how to build the fabric of our very beings. DNA storage flips this paradigm on its head. Computer data make up the inputs, and DNA is the end product.
A handful of start-ups are working to perfect the conversion of binary computer code into physical DNA strands, and in doing so, take a shot at disrupting the multibillion-dollar storage industry. Here’s how they plan to move the industry away from microfilm, microfiche, disks, and servers.
Traditional data storage relies on constant migration to prevent old data from degrading or the technology it’s stored in from becoming obsolete. Varun Mehta, CEO of Atlas Data Storage, compares long-term data storage to painting the Golden Gate bridge—by the time you’ve gone from one end to the other, the first end is rusting and you have to start all over again.
“The same thing happens with long-term data storage,” he says. “You’re always moving from your old tape to your new tape.” He predicts that “people who want to get off that treadmill will be the first to move to DNA.”
Step 2: Encoding digital data in DNA
In practice, DNA storage involves several steps: deciding on a code, making the DNA using a process called synthesis, and storing the resulting DNA strands. DNA storage methods also include ways to categorize the stored strands and convert nucleotide sequences back into information that may be compatible with computers or accessible in some other way. Though industry members formed the DNA Data Storage Alliance in 2020 in part to set standards, companies in the DNA storage space still approach each of these steps in slightly different ways.
(This archaeologist hunts DNA from prehistoric diseases)
First, to store information as DNA, scientists have to determine how the data will be translated. DNA is a base 4 system; in contrast, computers store and process information in binary. Instead of assigning a “1” or a “0” to each DNA nucleotide—an A, C, T, or G—you could instead assign a particular combination of two digits to each base—so an A might stand in for “00,” C “01,” T “10,” and G “11.” Theoretically, this means every DNA nucleotide can encode up to 2 unique bits. In practice, the system isn’t as efficient as that (there are certain combinations of DNA nucleotides that are less stable or otherwise undesirable, and different chemistry protocols exist for turning bits into DNA bases).
Catalog, one DNA storage company, announced in 2022 that it had encoded eight of William Shakespeare’s tragedies into a single test tube. To do this, scientists had to translate about 207,000 words into strings of nucleotides using a class of enzymes called recombinases. They claimed their DNA-building machine, Shannon, encoded the plays into millions of nucleotides in a matter of minutes.
“To each of those words, you associate a random bit vector. A bit vector is just a sequences of ones and zeroes of a fixed length,” explains Catalog’s head of DNA Computing, Swapnil Bhatia, in a video for the company. The word “rose” might have a random bit vector stretching 1,000 numbers long, and different companies will have different ciphers for translating words into 1s, 0s, and nucleotides.
Step 3: Synthesis
DNA synthesis—the step of actually creating custom DNA strands—is another place where companies diverge in their methods. Catalog uses the principles of inkjet printing to exude tiny droplets containing premade DNA fragments. In each droplet, hundreds of thousands of chemical reactions take place per second to elongate the DNA strands. Atlas Data Storage, meanwhile, relies on semiconductor chips and silicon wafers as the environment for assembling strands of synthetic DNA.
“Once those strands are assembled, we harvest them from our chip,” Mehta says. “These DNA strands really are like corn stalks growing in a field on this chip and once they've gotten to the height that we want—to the number of bases—then we harvest them.”
(Dog DNA tests are on the rise. But are they reliable?)
Step 4: Storing the DNA
Storing and preserving these synthetic strands presents another set of hurdles. Catalog and Atlas store DNA samples inside metal capsules, where the strands are not exposed to the elements and degraded. To convert DNA back into bit form, one can sequence it—using the same technology that powers genetic testing like 23andMe. This method can’t be done indefinitely; eventually, the sample will need to be copied over again to restore it. To create longer-lasting, accessible storage, some groups are working on fluorescent tags. Shining a light on the samples can tell researchers information about a given sample at a glance, the same way metadata can help us organize computer files without having to open them.
If companies are able to surmount these challenges, a DNA storage system would take up a fraction of the space of traditional storage methods.
“The theoretical limit is astounding,” Mehta says. “You could fill 50 petabytes worth of data in in a Tylenol-sized capsule”—or roughly 50,000 times as much data as an iPhone can store.
Step 5: Retrieving the data
Storing information in such a small physical package raises philosophical questions about the purpose of storage. Could a storage device itself serve a purpose? Scientists have theorized and created proofs-of-concept of fabrics and everyday items like glasses that contain DNA-stored information. The company Catalog has a branch dedicated to “DNA computing” to search and analyze synthetic DNA without first converting the information encoded in it back into bits. There could be some advantages to working with data in DNA form—rather than moving from one end to another, like a computer processor does, working with the data can occur in many places at once in parallel.
DNA’s status as the basic building block of life may someday make it one of our most durable technologies, Mehta says, because it means it isn’t going anywhere.
“One thousand years from now, there probably will not be any DVD players. In fact, it's hard to find a VHS tape player anymore. But that's never going to happen with DNA, because we need it for our own health,” he says. “We'll always have that technology available.”