Near-universal
consensus has it that, sometime around 9/11, the world passed from the Age of Aquarius,
through some vernal equinox noticed by few, straight into the Age of Big Data. That passage
brought about a seismic epistemological shift. To be sure, any links to the
events surrounding 9/11 are coincidental: the real reason for this transition was
the coming of age of enabling technology. To that extent, whatever one may want
to think of 9/11 conspiracy theorists
conjecturing about the tragic events as having been brought on, or at least been
aided and abetted, by someone or something other than al Quaeda: the acts and
omissions after 9/11 point to its utility for the advancement of surveillance,
for which political and civic tolerance could otherwise not have been expected.
Very much the same goes for the speed by which authorizing legislation was
whipped through the formalities of democratic rule-making processes, purportedly
under the influence of those events. But such a pounce on an opportunity of
this magnitude had no doubt have to have been incubated for quite some time, in
lockstep with deep insights into the progress of technology and entirely
independent of whatever statistically unpredictable Black Swan event would one
day trigger its sudden political viability. It did not matter which event or who
or what would cause it. That, in all likelihood, was indeed not known, and it did
not need to be known. It was, in Donald Rumsfeld’s
immortal dictum, one of the “known unknowns.”
The
extent of surveillance capabilities that became available as a result to the
U.S. government and to the other “Five Eyes”
Canada, UK, Australia and New Zealand that do not spy on each other (at least
in theory and at least for now) and otherwise cooperate to secure the endurance
of occidental civilization would have been every totalitarian regime’s wet
dream. Perhaps one day, cloning technology may enable resurrection of Feliks
Dzierżyński’s or Lawrentii
Beria’s DNA, or Heinrich Himmler’s,
Erich
Mielke’s or Klemens
Metternich’s, not to mention Joseph
Fouché’s or Philipp
II’s or Kang Sheng’s
or Pol Pot’s – and I
predict the greatest possible unanimity of consensus among all these
distinguished oppressors of the unrestrained human mind: no government can ever
be secure of power without surveillance. So, does it really matter whether the
chicken or the egg existed first, whether surveillance technology eviscerates pre-existing
democratic structures and aspirations (those uncontrollable by powers that be)
or whether it is created by a totalitarian ambition already thus entrenched? The
bottom line remains crisp and clear: information is power.
Another
heretical lesson from history is that power
corrupts, and absolute power corrupts absolutely. This trite aphorism is
not criticism of the fact that an open democratic society develops technology
potentially capable of abuse. Every technology is. But serious concern arises
out of the near-total absence of intelligent systems design aiming at establishing
transparent and accountable checks and balances and procedures to safeguard
against the patently obvious risk of creeping abolition of civil rights and
liberties for the purported preservation of “security” – a term that is neither
qualitatively nor quantitatively or statistically defined and therefore lacking
any significant measure of transparency
and accountability.
After
all, creeping abolition of previously existing expectations has considerable
tradition in the way we treat information. In many ways, technology’s
relationship with power and its preservation has been ambivalent and
multifaceted throughout history. Before the creation of commercial postal
services in Northern Italy circa 1200
by the enterprising bergamaschi
family of Tasso – later the princely house of Thurn and Taxis – organized
conveyance of information was a purely private matter, often enough done ad hoc and characteristically affordable
only to sovereigns and military commanders who could maintain relay stations
for couriers and for their horses. But literacy, thus education, and presumably
leveraged use of information were a matter of substantial privilege in the
medieval world in the first place. The printing press vastly increased access
to knowledge but with it came, perhaps inevitably, censorship of its use. As civilization
progressed, the cost of transfer of information came down somewhat, but not
dramatically while it was still largely based on, and limited by, the economics
of the horse. Not until trade and thereby competition intensified dramatically,
leading to the exchange of multiple scheduled couriers simultaneously on a
given route, did cost decline. After all, another limiting factor was the physical
condition of the medium – the weight of books and paper and the means and
durability of storage. Disembodiment of information came only in the 19th
century, first by rendering the horse obsolete, initially by pneumatic delivery
through a letter chute, followed by sea-, land- and air-based applications of
the steam and combustion engine, but quickly by reducing information itself to
electric signals and electromagnetic waves, analog at first and later digital.
Breaches of confidentiality of information in transit by interception of
couriers and involuntary extraction of information had been a well-known risk
since antiquity. Cicero
already complained about the trials and tribulations of finding trustworthy
carriers for his letters. Simultaneously,
this gave rise to encryption reported, inter
alia, by Plutarch.
Encoding of information, too, was at first limited in its use by the same two
closely related factors: quality and cost. Encryption simply told the
unintended reader to mind his own business. During the dawn of encryption,
surveillance faced a multitude of challenges of which almost none were merely
nominal: first, and pretty much until the end of WWI, the fact of the
transmission itself became known to interceptors only under highly
serendipitous circumstances – through sheer luck, treason, or incompetence. Only
once data transmission occurred through electromagnetic signals carried outside
dedicated cables did encryption become the headache it is known to be today. Wars
were lost due to timely code breaking: the Red Army’s operational designs in post-WWI
Poland[1]
and Hitler’s communications passing through the Enigma system did not remain as
secret as intended.[2]
Since the days of the Enigma machine,
secrecy correlated near-perfectly with maintenance of crypto-technological
superiority, and only for the time of its duration. To the extent as yet
undecipherable communications are being recorded and stored, they remain
available for future use following additional advances in cryptology as well as
quantum leaps of computational resources for ‘brute force’ attempts. Despite
the likelihood of obsolescence for its primary purpose, any recorded signal may
still serve as legal or at least historical evidence. As for encryption itself,
brute force attacks
are put in perspective by Edward Snowden’s remark: “Assume
that your adversary is capable of a trillion guesses per second.”
In
public and
scholarly discourse, Big Data is for the most part viewed from a
perspective of privacy.
While an
important concern, this must not be the litmus test for pushing technology
further along. By far the greatest value of Big Data applications lay not in surveillance
of individualized
data but in the superstatistical
analysis of large anonymized data pools. Notions like the ‘Internet of
things’ and ‘networked factories’ cannot be realized without Big Data. The
concept also has significant implications for life sciences and for public
health.
While
life expectancy rises around the world, people live longer years with more diseases.
Output of new drugs has dropped measurably, also because the cost of regulation
in a wider sense has multiplied: if commercializing a viable drug used to
require a budget of $1 billion not too long ago, the benchmark figure has now reached
about $5 billion, necessitating change. For this and other reasons, the
growth rate of health care cost has surpassed the growth of GDP in virtually
all developed markets. It appears that not only understanding and increasingly
individualized treatment of systemic disorders such as cell malignancies
require vastly increased computational tools and resources, but also the macroeconomic
analysis of public health phenomena will need to rely on complex analysis of
very large data pools if it is to tailor meaningful approaches to prevention
and avoid misallocation of increasingly limited resources.
For
example, different malignancies have very different causation and are thus not
susceptible to similar approaches: while brain tumors
appear to have purely genetic causes, skin cancer
or Hodgkin’s
lymphoma are predominantly caused by environmental factors. Still, 85-95%
of the incidence of disease within a decade can now be predicted with considerable
exactitude by statistical means. This permits successful prevention or at least
retardation of outbreak upon a manifestation of certain known precursor
constellations. With this in mind, individual privacy issues are losing ground
– if all I need to know about a person is essentially public information such
as age, gender, zip code and (maybe) profession to arrive at often shockingly
accurate guesses of ‘privileged data,’ then the value of protecting specific
details by means of privilege
is certainly diminished. The alternative of perfect privacy, thought through to
its logical conclusion, would mean No Data as it would engender a presumption
of non-disclosure or non-use of information that is in plain view. Due process
issues will remain subjects of intense debate,
even with regard to ‘mere’ metadata
and the form
of their production.
Very
similar conclusions are imaginable in the social sciences. They permit
reasonably accurate forecasts of societal developments arising out of
imbalances and trends, but also significantly improved structural planning of
health care, energy and transportation infrastructures based on substantially
reliable quantitative models. In this context, it matters little that a
significant part of data is inaccurate as errors and other data flaws tend to
cancel each other out due to their (overall) random nature. Yes, there is a
percentage of unreliable information but it merely accounts for background
static. That
does not mean that attempts to improve data quality can or should be forgone,
but in a lot of situations, partially flawed results still enable valuable
conclusions. As Big Data advances
as a field, our ability to filter, smooth and otherwise enhance input data
or improve the quality of their analysis, sometimes by complexity management
tools such as the laws
of entropy, will continue to correlate strongly with the power of computational
resources available, and with their growth rate.
Under
the umbrella of Science,
Technology and Society (STS), a field of research loosely termed Critical
Data Studies emerged. It deals with questions of data
assemblages. Data assemblages assume a view of data as already constituted
or composed within data structures that interact with society, its forms and
methods of organization, and with the daily lives of individuals.[3]
Critical Data Studies are based on the demonstrably accurate assumption that data
are almost never simply objective, neutral, or transparent entities of
information. CDS raise
questions reinterpreting well-established concepts in light of
data-specific analysis:
- Causality: how should we find causes in the era of ‘data-driven science?’ Do we need a new conception of causality to fit with new practices?
- Quality: how should we ensure that data are of good enough quality for the purposes for which we use them? What should we make of the open access movement; what kind of new technologies might be needed?
- Security: how can we adequately secure data, while making it accessible to those who need it? How do we protect databases?
- Uncertainty: can Big Data help with uncertainty, or does it generate new uncertainties? What technologies are essential to reduce uncertainty elements in data-driven sciences?
More
about complexity
management tools (that do not necessarily, or even predominantly, lead to ‘complexity
reduction’ in a Big Data context) and paraconsistent
logic will be forthcoming in one of the iterations of this blog. Big Data
brings to our attention a sharper vision of the relative value and at the same
time of a measure of insignificance of detail (and of any shielding thereof)
depending on the purpose of the application. It is time we started talking about data and data structures
on many levels and from multiple perspectives.
[1] Richard
Woytak, Colonel Kowalewski and the
Origins of Polish Code Breaking and Communication Interception, 21 East
European Quarterly no. 4, Jan. 1988, at 497–500.
[2] Robert J. Hanyok, Appendix
B: Before Enigma: Jan Kowalewski and the Early Days of the Polish Cipher Bureau
(1919–22), in Enigma: How the Poles Broke the Nazi Code
(2004).
[3] See Andrew Iliadis, Comparative
Review: Big Data, 46 Communication
Booknotes Quarterly no. 2, May 21, 2015, at 54-57.
No comments:
Post a Comment