Data

DNA Storage in a Yottabyte Era

By KIM BELLARD

Did you know we are living in the Zettabyte Era? Honestly, did you even know what a zettabyte is? Kilobytes, gigabytes, maybe even terabytes, sure, but zettabytes? Well, if you ran data centers you’d know, and you’d care because demand for data storage is skyrocketing (all those TikTok videos and Netflix shows add up). Believe it or not, pretty much all of that data is still stored on magnetic tapes, which have served us well for the past sixty some years but at some point, there won’t be enough tapes or enough places to store them to keep up with the data storage needs.

That’s why people are so keen on DNA storage – including me.

A zettabyte, for the record, is one sextillion bytes. A kilobyte is 1000 bytes; a zettabyte is 10007. Between gigabytes and zettabytes, by powers of 1000, come terabytes, petabytes, and exabytes; after zettabyte comes yottabytes. Back in 2016, Cisco announced we were in the Zettabyte Era, with global internet traffic reaching 1.2 zettabytes. We’ll be in the Yottabyte Era before the decade is out.

People have been working on DNA storage for many years; I first wrote about it in 2016, when I speculated it might mean we could literally be our own medical record. We’re not at the stage of practical DNA storage yet, and we probably won’t be for many more years, but it’s hard to believe we’re not going to be there eventually. Unlike every other form of recording we’ve come up with, DNA can persist almost indefinitely, and, as long as there are intelligent species based on DNA, they’ll want to read it.

Most importantly, DNA can store a lot of data. As MIT professor Mark Bathe, Ph.D. told NPR: “All the data in the world could fit in the coffee cup that you’re drinking in the morning if it were stored in DNA.”

Mind. Blown.

What prompted me to write about this now was an announcement from Microsoft. Working with researchers from the Molecular Information Laboratory at the University of Washington, their paper demonstrated a “proof of concept” molecular controller that allowed them to write to DNA “three orders of magnitude” – that’s 1000x – denser. As the announcement said: “Ultimately, we were able to use the system to encode a message onto four strands of synthetic DNA, proof that nanoscale DNA writing is possible at dimensions necessary for practical DNA data storage.”

I’ll spare readers the detail of what they did – I don’t pretend to understand it – but the paper concludes:

we project that the technology will scale further to billions of features per square centimeter, enabling synthesis throughput to reach megabytes-per-second levels in a single write module, competitive with the write throughput of other storage devices…We foresee these assemblers being used in other areas like material science, synthetic biology, diagnostics, and closed-loop massive molecular biology experimental assays.

Similarly, the announcement concludes: “we foresee the technology reaching arrays containing billions of electrodes capable of storing megabytes per second of data in DNA. This will bring DNA data storage performance and cost significantly closer to tape.”

You can bet Microsoft is taking this seriously.

———

Lest anyone think only Microsoft is working on this, there have been several other promising developments in recent weeks. Interesting Engineering highlighted a few of them:

  • Georgia Tech Research Institute researchers have developed a microchip that allows faster writing to DNA, and expect it to 100x faster than current technologies. Lead researcher Nicolas Guise told BBC that, since DNA can survive so long, “the cost of ownership drops to almost zero.”
  • Northwestern University scientists have demonstrated a new “enzymatic system” that encodes three bits of data per hour. The NU announcement explains: “Our method is much cheaper to write information because the enzyme that synthesizes the DNA can be directly manipulated.” The researchers believe the technique could be used to install “molecular recorders” inside cells to act as biosensors; the possibilities are astounding.
  • A team at China’s Southeast University used a new process to split content in sequences, rather than one long chain, while “downsizing” the instruments used. TechRadar speculates could lead to the first mass market DNA storage device. Professor Liu Hong told Global Times: “Now we are aiming at the combination of electronic information technology and biology, which might be used in various aspects including data storage and nucleic test for virus.”

Interesting Engineering may have missed the most interesting use yet: Business Insider India reports that Roddenberry Entertainment has created a NFT (non-fungible token) of Gene Roddenberry’s signature on the first Star Trek contract and is storing it on DNA implanted in a bacteria – “ the first-ever living ecological non-fungible token (NFT).”  The bacteria is currently dormant, but if revived it will duplicate the NFT as it reproduces (which sort of goes against what I thought NFTs were).

Somehow I don’t think that’s what the Microsoft researchers were intending DNA storage to accomplish, but, hey, anything for Star Trek.

As Professor Bathe told NPR, if cost/efficacy issues are solved – and they are well on their way – “Then, you know, the sky’s the limit in terms of just storing everything that we ever wanted to and ever will need to.”

———

It’s possible that DNA storage will never get fast enough or cheap enough to replace existing storage methods. It’s possible that some other new technique will emerge that will be even better than DNA storage (e.g., holographic storage?).  But we are DNA-based creatures and the possibility of using the technique that nature builds us with to store and manipulate the data we generate is irresistible. 

There already are DNA-based “robots” and DNA-based computers so, honestly, DNA storage doesn’t surprise me at all. We should be expecting molecular DNA recorders…and trying to anticipate what we do and don’t want them used for.

In the 21st century, biology is computing, and vice-versa. DNA isn’t just our genetic history and future, but information that we can read and write in. We call it “synthetic biology” now but as the field grows and grows we’re going to forget the “synthetic” part, like “digital health” just becomes “health” (or “cryptocurrency” just becomes “currency”).

Life in the Yottabyte Era is going to be very interesting.

Kim is a former emarketing exec at a major Blues plan, editor of the late & lamented Tincture.io, and now regular THCB contributor.

Spread the love