Affordable storage in DNA format

09.11.2020

Light-directed microarray synthesis of DNA libraries using express reaction protocols is shown to be a cost-effective approach at writing data in DNA format. The study, published in Nature Communications, is the product of a collaboration between the nucleic acid chemistry group at the Institute of Inorganic Chemistry (Mark Somoza and Jory Lietard) and the group of Robert Grass at the ETH Zürich and Reinhard Heckel at the Technische Universität München, a team effort in synthesis, sequencing and error correction.

DNA stores genetic information in the form of a code composed of only four nucleotides: A, C, G and T. While nature has chosen a four-letter molecular code to store information potentially ever since the dawn of life, scientists have only recently started to consider it as a format to store digital information too.

Immensely stable under dry and cool conditions, storage of digital information on DNA seems to be an ideal solution to safeguard data and knowledge for thousands of years, beyond what physical formats can achieve. The physical density of information written in the genetic alphabet is enormous, with all knowledge and data ever created in the history of humankind fitting in a pickup truck if it were to be stored on DNA.

Writing data on DNA

The basis of this alternative use of the genetic code is that for millenaries to come, our society will continue to sequence DNA, and sequencing represents the decoding process of DNA-based digital data. While the reading/decoding is now already massively parallel thanks to high-throughput sequencing, the crux of the matter is writing/encoding, which essentially corresponds to oligonucleotide synthesis.

High-density in situ microarray fabrication is the only synthetic method that can address the throughput issue in writing data on DNA. But even if DNA synthesis has become very cheap at a small scale, high quality oligonucleotide libraries remain an expensive commodity, commanding at least $3000/Mb.

Approach to lowering costs

In our recently published article in Nature Communications, we partnered with the research groups of Robert Grass (ETH Zürich) and Reinhard Heckel (TU Münich) to prepare DNA libraries using our in situ photolithographic approach to DNA array synthesis in an attempt to considerably lower the costs of "writing" DNA.

We used our method of express light-directed array synthesis to prepare a DNA library composed of >16000 unique sequences, all 60 nucleotides long, in <5 h. While our express route is a high error-rate regime, the encoding scheme allows for missing information to be retrieved thanks to highly efficient Reed-Solomon error-correcting algorithms.

Storing 100 kB of perfectly-recovered data (Mozart sheet music), the DNA library is extremely affordable, and when scaled up to the Mb range approaches the ~$500/Mb territory, a four-fold reduction in terms of synthesis costs, with plenty of room for improvement. Ongoing work focuses on increasing data capacity and synthesis efficiency.


Publication in "Nature Communications"

Philipp L. Antkowiak, Jory Lietard, Mohammad Zalbagi Darestani, Mark M. Somoza, Wendelin J. Stark, Reinhard Heckel & Robert N. Grass: Low cost DNA data storage using photolithographic synthesis and advanced information reconstruction and error correction, published online 22/10/2020, https://www.nature.com/articles/s41467-020-19148-3, doi: 10.1038/s41467-020-19148-3

Contact

Dr. Jory Lietard

Institut für Anorganische Chemie

Fakultät für Chemie

Althanstraße 14 (UZA II)

1090 Wien

+43-1-4277-52643

jory.lietard@univie.ac.at

The Maskless Array Synthesis setup for photolithography used in this study. Patterned UV light reaches the reaction chamber for the controlled synthesis of oligonucleotide libraries. (Copyright: Jory Lietard)