DNA stores genetic information in the form of a code composed of only four nucleotides: A, C, G and T. While nature has chosen a four-letter molecular code to store information potentially ever since the dawn of life, scientists have only recently started to consider it as a format to store digital information too.
Immensely stable under dry and cool conditions, storage of digital information on DNA seems to be an ideal solution to safeguard data and knowledge for thousands of years, beyond what physical formats can achieve. The physical density of information written in the genetic alphabet is enormous, with all knowledge and data ever created in the history of humankind fitting in a pickup truck if it were to be stored on DNA.
Writing data on DNA
The basis of this alternative use of the genetic code is that for millenaries to come, our society will continue to sequence DNA, and sequencing represents the decoding process of DNA-based digital data. While the reading/decoding is now already massively parallel thanks to high-throughput sequencing, the crux of the matter is writing/encoding, which essentially corresponds to oligonucleotide synthesis.
High-density in situ microarray fabrication is the only synthetic method that can address the throughput issue in writing data on DNA. But even if DNA synthesis has become very cheap at a small scale, high quality oligonucleotide libraries remain an expensive commodity, commanding at least $3000/Mb.
Approach to lowering costs
In our recently published article in Nature Communications, we partnered with the research groups of Robert Grass (ETH Zürich) and Reinhard Heckel (TU Münich) to prepare DNA libraries using our in situ photolithographic approach to DNA array synthesis in an attempt to considerably lower the costs of "writing" DNA.
We used our method of express light-directed array synthesis to prepare a DNA library composed of >16000 unique sequences, all 60 nucleotides long, in <5 h. While our express route is a high error-rate regime, the encoding scheme allows for missing information to be retrieved thanks to highly efficient Reed-Solomon error-correcting algorithms.
Storing 100 kB of perfectly-recovered data (Mozart sheet music), the DNA library is extremely affordable, and when scaled up to the Mb range approaches the ~$500/Mb territory, a four-fold reduction in terms of synthesis costs, with plenty of room for improvement. Ongoing work focuses on increasing data capacity and synthesis efficiency.
Publication in "Nature Communications"
Philipp L. Antkowiak, Jory Lietard, Mohammad Zalbagi Darestani, Mark M. Somoza, Wendelin J. Stark, Reinhard Heckel & Robert N. Grass: Low cost DNA data storage using photolithographic synthesis and advanced information reconstruction and error correction, published online 22/10/2020, https://www.nature.com/articles/s41467-020-19148-3, doi: 10.1038/s41467-020-19148-3
Contact
Dr. Jory Lietard
Institut für Anorganische Chemie
Fakultät für Chemie
Althanstraße 14 (UZA II)
1090 Wien
+43-1-4277-52643
jory.lietard@univie.ac.at