Storage media: When digital research goes biotech

Justifications with national security interests notwithstanding, a heretofore unimagined extent of data storage is here to stay. This has a variety of reasons that are perhaps best generally characterized as archival and entirely independent of eventual use. Moral, ethical and legal reflections on the subject may be important but they will not stop disruptive new technologies for data collection, data storage, or data processing any more than religious beliefs have ever stalled science for significant periods.

Capacitive revolutions in storage are coming in more ways than one. Besides data storage, but likely with considerable practical cross-fertilization and synergies, geometrical progress is being made in energy storage and battery technology, which are also likely to remove by a reasonably short horizon the last comparative practical and cost advantages of the combustion engine based on fossil fuels. 

In the December 2014 issue of this blog I discussed computational simulation experiments in silico and the example of test cases in “executable biology” under the title “When biotech research goes digital.” But there is also the reverse phenomenon – when digital technology reaches the limits of anorganic storage media, some promising lines of research have “gone biotech” and unearthed surprising potential in the use of DNA. If expectations are sustainable, DNA would also represent a major quantum leap in ensuring continuation of Moore’s Law in the evolution of storage media. While the initial media hoopla about this concept dates back to 2013 and has not been followed up with significant updates since, some more modest but more immediately practical solutions were advertised with proof of concept even somewhat earlier – but are also still pending. I write about it today with some benefit of hindsight and reflection, but also in anticipation of practical solutions essential to commercialization that remain rather far ahead. In truly visionary ideas it is the concept, not its realization, that represents the fundamental breakthrough. Warp drives, as we remember, emerged in science fiction but did inspired real physics as well as R&D.

DNA presents an extremely durable form of information storage. It may be extracted after tens of thousands of years from the bones of mammals found in permafrost, not to mention from human mummies. It does not require electricity and only the barest minimum of physical storage space.

Gene sequencing has sponsored the entry of life sciences into the club of the world’s largest data generators. The total global digital data volume currently existing on this planet is estimated by some to amount to 3 zettabytes (ZB). 1 ZB  =  1021 bytes or 1 trillion gigabytes. Of course, different authors’ quantitative assumptions and growth estimates vary widely, but these numerical details are immaterial since they still indicate a ballpark region and since it is beyond debate that the issue is part of the domain of (very) Big Data. Preserving such quantities of data faces challenges in two central categories: cost and durability. Any form of hard drive requires electricity while media that do not, such as magnetic tapes or magneto-optical drives or disks, are not durable beyond a few years. The Swiss Federal Institute of Technology in Zurich (ETH) explores preserving data encoded on glass-encased DNA stored at sub-zero temperatures – and for literally millions of years. Zettabytes fit in a single spoon of liquid. Of course, by now the frontier has become the yottabyte (1 YB  = 1024 bytes or 1 trillion terabytes). For comparison, and to understand what these research projects aim to accomplish: cost estimates in 2010 for a “one yottabyte hard drive” based on commercially available storage technologies were in the range of one hundred trillion dollars.

While we are capable of speed-reading DNA, writing to DNA involves at least two major problems: first, it is not possible to create anything but short fragments of DNA to date, and second, reading mistakes occur when identical letters are repeated. At least the latter issue appears to have been resolved by writing overlapping rows of letters to conventional short DNA fragments in order to rule out reading mistakes.

This method was first developed by the European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI). They tested it by sending a sample including Shakespeare’s 154 sonnets and a speech by Martin Luther King to Agilent Technologies, a California vendor of artificial DNA. A few hundred thousand DNA particles (‘some dust’) was returned in a vial and could indeed be read flawlessly by EMBL-EBI researchers. By their estimate, 100 million hours of HD videos stored on DNA would fit in a cup of tea – provided cost can be brought down for commercial use over an estimated period of ten years.

Still, as it is always on the road to commercialization, the devil resides in the detail and nuances of cost factors; access modalities and readiness-to-market decide the fate of viability and acceptance for entire conceptual lines of research and indeed the odds of prevailing for a technology. So it is much too early to comment meaningfully on however exorbitant potential a proof of concept such as DNA storage may present, before answers to these practical criteria are available and fully assessable. Estimates suggest that this may require a time span in the vicinity of a decade. Considering the exponential acceleration of computational power it would provide, and how much it would expedite knowledge generation in other fields, this would, if anything, still be probably too early to make seamless use of its potential for practical applications.

No comments:

Post a Comment