Researchers at Microsoft and the University of Washington have reached an early but important milestone in DNA storage by storing a record 200 megabytes of data on the molecular strands.
The impressive part is not just how much data they were able to encode onto synthetic DNA and then decode. It’s also the space they were able to store it in.
Once encoded, the data occupied a spot in a test tube “much smaller than the tip of a pencil,” said Douglas Carmean, the partner architect at Microsoft overseeing the project.
Think of the amount of data in a big data center compressed into a few sugar cubes. Or all the publicly accessible data on the Internet slipped into a shoebox. That is the promise of DNA storage – once scientists are able to scale the technology and overcome a series of technical hurdles.
The Microsoft-UW team stored digital versions of works of art (including a high-definition video by the band OK Go!), the Universal Declaration of Human Rights in more than 100 languages, the top 100 books of Project Guttenberg and the nonprofit Crop Trust’s seed database on DNA strands.
Demand for data storage is growing exponentially, and the capacity of existing storage media is not keeping pace. That’s making it hard for organizations that need to store a lot of data – such as hospitals with vast databases of patient data or companies with lots of video footage – to keep up. And it means information is being lost, and the problem will only worsen without a new solution.
DNA could be the answer.
It has several advantages as a storage medium. It’s compact, durable – capable of lasting for a very long time if kept in good conditions (DNA from woolly mammoths was recovered several thousand years after they went extinct, for instance) – and will always be current, the researchers believe.
“As long as there is DNA-based life on the planet, we’ll be interested in reading it,” said Karin Strauss, the principal Microsoft researcher on the project. “So it’s eternally relevant.”
This explains why the Microsoft-UW team is just one of a number of research groups around the globe pursuing the potential of DNA as a vast digital attic.
The researchers acknowledge they have a long way to go.
Luis Henrique Ceze, a UW associate professor of computer science and engineering and the university’s principal researcher on the project, said the biotechnology industry made big advances in both “synthesizing” (encoding) and “sequencing” (decoding) data in recent years. Even so, he said, the team still has a long way to go to make it viable as an archival technology.
But the researchers are upbeat.
They note that their diverse team of computer scientists, computer architects and molecular biologists already has increased storage capacity a thousand times in the last year. And they believe they can make big advances in speed by applying computer science principles like error correction to the process.
Carmean, who was involved in development of Intel’s microprocessor architecture beginning in 1989, puts it this way:
“It’s one of those serendipitous partnerships where a strong understanding of processors and computation married with molecular biology experts has the potential of producing major breakthroughs.”
To get an idea of how the Microsoft-UW team does its work, flash back to high school biology and recall that DNA – or deoxyribonucleic acid – is a molecule that contains the biological instructions used in the growth, development, functioning and reproduction of all known living organisms.
“DNA is an amazing information storage molecule that encodes data about how a living system works. We’re repurposing that capacity to store digital data — pictures, videos, documents,” said Ceze, who is conducting research in the team’s Molecular Information Systems Lab (MISL), which is housed in a basement on the University of Washington campus. “This is one important example of the potential of borrowing from nature to build better computer systems.”
Storing digital data on DNA works like this:
First the data is translated from 1s and 0s into the “letters” of the four nucleotide bases of a DNA strand — (A)denine, (C)ytosine, (G)uanine and (T)hymine.
Then they have vendor Twist Bioscience “translate those letters, which are still in electronic form, into the molecules themselves, and send them back,” Strauss said. “It’s essentially a test tube and you can barely see what’s in it. It looks like a little bit of salt was dried in the bottom.”
Reading the data uses a biotech tweak to random access memory (RAM), another concept borrowed from computer science. The team uses polymerase chain reaction (PCR), a technique that molecular biologists use routinely to manipulate DNA, to multiply or “amplify” the strands it wants to recover. Once they’ve sharply increased the concentration of the desired snippets, they take a sample, sequence or decode the DNA and then run error correction computations.
The lab tour complete, one question needed asking: Why an OK Go video?
“We like that a lot because there are many parallels with the work,” Strauss said with a laugh. “They’re very innovative and are bringing different things from different areas into their field and we feel we are doing something very similar.”