Jim Gray, the database software researcher who disappeared at sea in 2007, predicted the paradigm shift in science that would arise from the massive amounts of data that can now be collected and must be analyzed in all aspects of the material, social, and cultural world. Gray's colleagues at Microsoft celebrate his work in distributed computing systems in a new collection dedicated to his honor, The Fourth Paradigm: Data-Intensive Scientific Discovery edigted by Tony Hey, Stewart Tansley and Kristin Tolle.
New scientific tools that are part sensor and part computer take in unfathomable amounts of information. These include the Australian Square Kilometre Array of radio telescopes, CERN's Large Hadron Collider, and the Pan-Starrs telescopes. These generate several petabytes of information every day. So how big is a petabyte of information? According to John Markoff in a NY Times article today, "A Deluge of Data Shapes in a New Era in Computing," a petabyte of data is the rough equivalent of 799 million copies of Moby Dick. Put that on your reading list, fellow scholars!
Of course, to sort through the daily petabytes requires new tools. And they aren't all expensive. Cheapt closers of computers can manage and process data at all sorts of speeds. Until recently, for example, you could run Linux on your Playstations and, for less than $500, you could basically make a cluster farm that could process petabytes like a supercomputer. Other tools help you search and sort and analyze.
And data isn't just for scientists any more. Indeed, the plethora of data changes knowledge on every level and it changes everyday life. As I and many others in HASTAC have been saying since the beginning of our run in this world, if humanists keep dismissing "data" and "evidence" as mere "positivism," we miss one of the great opportunities of our era. There really isn't such a thing as "data crunching" in the end. Data isn't just "crunched" (what does that mean?) but has to be interpreted, understood, put into context, analyzed along side other data, and in many other ways put through all the paces that humanists are expert at. The divide of "theory" versus "practice" or "the theoretical" versus "the empirical" has long since been shown to be bankrupt.
How much data comes from 100 million Facebook users updating their status every day? Who are those users? How do they see this Web of a world they are co-creating but that is yet available, constantly for exploitation. How do identitarian matters of race or gender or sexual orientation play out in virtual spaces such as Facebook? How does performativity work on line? What about the concept of a "self" when it is clear that we can enter and leave the Web in many guises, constantly overlapping and yet distinctive?
And what is the relationship of data to communication? Before I exit this blog, I will use one of the widgets to share it with my Twitter network and my Facebook friends. What does that widgety activity mean? How does it change the cycles of authorship, production, consumption, publishing, and distribution? What does it do for privacy? How does it shape the public sphere? Information isn't static. It is Webby, as Ruby reminds me, with each think I find part of a web of information to each other thing that you find and that we can mashup and mix together. Data is social.
So much data, so little time. Humanities in a Digital Age are vital, urgent. Better get busy. 790 million Moby Dicks await us.