Justifications
with national security interests notwithstanding, a heretofore unimagined extent of data storage is here to stay. This has a variety of
reasons that are perhaps best generally characterized as archival and entirely
independent of eventual use. Moral, ethical and legal reflections on the
subject may be important but they will not stop disruptive new technologies for
data collection, data storage, or data processing any more than religious beliefs have ever stalled
science for significant periods.
Capacitive
revolutions in storage are coming in more ways than one. Besides data storage,
but likely with considerable practical cross-fertilization and synergies,
geometrical progress is being made in energy storage and battery technology, which are also likely to remove by a reasonably short horizon the
last comparative practical and cost advantages of the combustion engine based
on fossil fuels.
In
the December 2014 issue of this blog I discussed computational simulation
experiments in silico and the example
of test cases in “executable biology” under the title “When biotech research goes digital.” But there is also the reverse phenomenon –
when digital technology reaches the limits of anorganic storage media, some
promising lines of research have “gone biotech” and unearthed surprising potential in the use of DNA. If expectations are sustainable, DNA would
also represent a major quantum leap in ensuring continuation of Moore’s Law in the evolution of storage media. While the initial media hoopla about
this concept dates back to 2013 and has not been followed up with significant
updates since, some more modest but more immediately practical solutions were advertised with proof
of concept even somewhat earlier – but are also still pending. I write about it
today with some benefit of hindsight and reflection, but also in anticipation
of practical solutions essential to commercialization that remain rather far
ahead. In truly visionary ideas it is the concept, not its realization, that
represents the fundamental breakthrough. Warp drives, as we remember, emerged in science fiction but did inspired real physics as
well as R&D.
DNA
presents an extremely durable form of information storage. It may be extracted after tens of
thousands of years from the bones of mammals found in permafrost, not to mention from human mummies.
It does not require electricity and only the barest minimum of physical storage space.
Gene
sequencing has sponsored the entry of life sciences into the club of the
world’s largest data generators. The total global digital data volume currently
existing on this planet is estimated by some to amount to 3 zettabytes (ZB). 1
ZB =
1021 bytes
or 1 trillion gigabytes. Of course, different authors’ quantitative assumptions and growth estimates vary widely, but these numerical details are immaterial since they still indicate a ballpark region and since it is
beyond debate that the issue is part of the domain of (very) Big Data. Preserving such
quantities of data faces challenges in two central categories: cost and
durability. Any form of hard drive requires electricity while media that do
not, such as magnetic tapes or magneto-optical drives or disks, are not durable
beyond a few years. The Swiss Federal Institute of Technology in Zurich (ETH) explores preserving data encoded on glass-encased DNA stored at sub-zero temperatures – and for literally millions of years. Zettabytes fit in a single spoon of liquid. Of
course, by now the frontier has become the yottabyte (1 YB = 1024
bytes or 1 trillion terabytes). For comparison, and to understand what
these research projects aim to accomplish: cost estimates in 2010 for a “one yottabyte
hard drive” based on commercially available storage technologies were in the range of one hundred trillion dollars.
While we are capable of speed-reading DNA, writing to DNA involves at least two major problems: first, it is not possible to create anything but short fragments of DNA to date, and second, reading mistakes occur when identical letters are repeated. At least the latter issue appears to have been resolved by
writing overlapping rows of letters to conventional short DNA fragments in
order to rule out reading mistakes.
This
method was first developed by the European Molecular Biology Laboratory’s European Bioinformatics Institute
(EMBL-EBI). They tested it by sending a sample including Shakespeare’s 154 sonnets and a speech by Martin Luther King to Agilent Technologies, a California
vendor of artificial DNA. A few hundred thousand DNA particles (‘some dust’) was returned in a vial and could indeed be read flawlessly
by EMBL-EBI researchers. By their
estimate, 100 million hours of HD videos stored on DNA would fit in a cup of tea – provided cost can be brought down for commercial use over an estimated period of ten years.
Still,
as it is always on the road to commercialization, the devil resides in the
detail and nuances of cost factors; access modalities and readiness-to-market
decide the fate of viability and acceptance for entire conceptual lines of
research and indeed the odds of prevailing for a technology. So it is much too
early to comment meaningfully on however exorbitant potential a proof of concept
such as DNA storage may present, before answers to these practical criteria are
available and fully assessable. Estimates suggest that this may require a time
span in the vicinity of a decade. Considering the exponential acceleration of
computational power it would provide, and how much it would expedite knowledge
generation in other fields, this would, if anything, still be probably too
early to make seamless use of its potential for practical applications.
No comments:
Post a Comment