Star
Trek: The Next Generation fans will wonder whether
the two words Big Data are descriptors for a new sentient android of epic
proportion, a supersized upgrade of Lieutenant Commander Data. As an addicted trekkie, sadly I must quickly
disabuse you. Well, maybe I’m
wrong. There are people who consider that
Big Data is every bit (?byte) as exciting as the USS Enterprise’s second officer.
Big Data refers to immense data sets that
are collected in fields as diverse as astronomy and genomics. As Wikipedia tells it, “as of 2012, every day
2.5 quintillion (2.5×1018) bytes of data were created”, so there is
a lot of data about. The dynamic of Big
Data is the search for relationships among these data and teasing out
correlations that may not be obvious from the constituent data sets that
comprise it. Our technical capacity to search immense data repositories means
that correlations can be found in a way never before possible.
In their new book Big Data: a Revolution that will transform how we Live,
Work and Think, Viktor Mayer-Schönberger an Internet governance academic
from Oxford and Kenneth Cukier, the data editor of The Economist, recount an interesting example of how Big Data,
collected by Google from the three billion search requests it receives each day
was used to track influenza in the US
Google took the 50 million ‘most common
search terms used by Americans and compared the list with Centers for Disease
Control (CDC) data on the spread of seasonal flu between 2003 and 2008’. After
stupendous computer activity, they settled on 45 search terms that were
strongly correlated with official figures.
These included many obvious terms such as flu, cough, medications for
cough but others that were not so obviously linked. ‘Unlike CDC, they could tell it in near real
time, not a week or two after the fact.’
Although not without their critics and errors, Google flu trends are now
available for many countries. http://google.about.com/od/experimentalgoogletools/qt/GoogleFluTrends.htm
Mayer-Schönberger and Cukier accept that there
is no universally-accepted definition of Big Data, but rather see the term referring
to ‘things one can do at a large scale that cannot be done at a smaller one, to
extract new insights or create new forms of value, in ways that change markets,
organisations, the relationship between citizens and government, and more.’
Our capacity to collect, link and analyse
data electronically is growing exponentially.
Mayer-Schönberger and Cukier draw a parallel between the present and the
era that followed the invention of the Guttenberg printing press around
1439. In the half century starting 1453,
they quote an estimate that eight million books were printed, ‘more than all
the scribes of Europe had produced since the founding of Constantinople 1,200
years earlier.’ In 2003, following a decade of effort, the human genome was
sequenced. ‘Now… a single facility can
sequence that much DNA in a day.’ And
because Big Data includes all the data available, population samples will no
longer be needed in the way they are today and the work of statisticians will
be redefined.
There are many features of Big Data to ponder
for medicine. How will we practise with
more information about correlation and less about causation? If Big Data shows
that people who take regular exercise have better cancer survival, what will we
advise our patients? Is the correlation
sufficient to advise them to exercise, even though the causal pathway is not
known? This will increase our need, and
that of our patients, to live with uncertainty. What meaning does privacy and even
confidentiality have in this new age? We
should surely be thinking and discussing these things now.
(Potential conflict of interest: SL’s son
Nick leads Google France.)