Blog Post

Data Mining, Collaboration, and Institutional Infrastructure for Tranforming Research

Data Mining, Collaboration, and Institutional Infrastructure for Tranforming Research


Data Mining, Collaboration, and Institutional Infrastructure for Transforming Research and Teaching in the Human Sciences and Beyond
Cathy N. Davidson, Duke University


The first generation of the digital humanities was all about data. The excitement and impetus of digital humanities throughout much of the 1990s and continuing to the present was that massive data bases could be digitized, searched, and combined with other data bases for interoperable searches that yielded more complex and complete results in a shorter amount of time than the human mind had ever imagined possible.[1] In this way, revolutions in digital humanities were similar to those in other fields. In biological science, sequencing the genome could never have happened without dramatic increases in computational power. In natural science, we know more than ever about global warming due to such projects as the Millennium Ecosystem Assessment (2005), which evaluates the global changes to 24 separate life support systems (including biodiversity, ecosystems, and the atmosphere).[2] In social sciences, human complex systems theory combines results based on social network theory, demography, migratory patterns, social regulation and laws to analyze movements of persons and goods globally. And in the human sciences or humanities, myriad projects digitize the texts and artifacts of world culture, from the beginning to the present, in order to create new understandings of the history of ideas.

Second-generation digital humanities are the scholarly equivalent of what Tim O'Reilly has dubbed "Web 2.0." If Web 1.0 was the World Wide Web's collection of websites and data bases (what human scientists would call "archives"), Web 2.0 is a fully developed platform that serves a variety of applications to its end users.[3] However, there is also an important difference between the business and humanistic history of cyberinfrastructure. O'Reilly's term "Web 2.0" was coined to differentiate what was coming next from what didn't work in the burst economy. By creating a "before" and "after," the concept of Web 2.0 was designed to encourage a new generation of investors in internet technologies. But there is no equivalent ?bad before? in digital humanities. Rather, the current generation of digital humanities extends and builds upon the foundation of Humanities 1.0.

The transformation of archives into interoperable and professionally-constructed digital databases has changed the research and pedagogical questions of our age, by providing the individual researcher almost instantaneous access to far more data than any one person could gather in a lifetime and by allowing more people access to these materials than ever before. Let me give an example of how transformative this has been for teaching and education in the human sciences. Back in the 1980s and 1990s, when I taught courses on mass education, reading, and writing during the highly contentious political period following the American Revolution, I used to have graduate students do archival research in early American newspapers and magazines, some of which were available on microfilm or microfiche, unindexed.[4] A student might have spent a hundred hours rolling the films in the dizzying light of those unwieldy machines (I had one in my office and used to call it, without affection, The Green Monster). If the student found one good example, it was a successful project. Two examples constituted a triumph. In many cases, the search was so frustrating that the student might well have applied for a scholarship to travel to an archive in New England, such as the American Antiquarian Society, where the resources were far richer.

If I teach that course now, my students can go to searchable data bases of early American imprints, of eighteenth-century European imprints, of South American and (growing) African archives, and of archives in Asia as well. A contemporary student could, in far less time, not only use digitized and indexed archives to search U.S. data bases but could make comparisons across and among popular political movements world-wide, and possibly make arguments about the spread of dissent along with commodities such as tea, sugar, or rice. The barbarism and ubiquity of the slave trade as part of the spread of global systems of capital also meant for an exchange of ideas about personhood, statehood, individual rights, and human rights.

Thus, in terms of next-generation cyberinfrastructure, we need to start at the most foundational level and envision and implement a globalized semantic web. The linguistic choices embedded in semantics-based searches must incorporate a humanistic and culturally-motivated understanding that terms themselves embody cultural ideologies and that concepts formulated in slightly different ways in different languages encode different epistemologies, ontologies, taxonomies, and histories. Moving from indexical to semantic searches has to be undertaken with a cultural awareness of what is or is not included in ?semantics.?

That brings me to another point, which may appear tangential but which is at the heart of the matter. New ways of thinking need support. If, at present, academic rewards go to the author of a monograph, especially one that posits a different analytical or interpretive hypothesis, for Human Sciences 2.0 we need to think of ways to reward teams of scholars working cross-culturally on collaborative projects. Collaborative work should count, and here humanists can use models that scientists have developed for determining credit in co-authored projects with multiple investigators.

Bibliographic work, translation, and indexical scholarship should also have a place in the reward system of the humanities, as they did in the nineteenth century. The split between ?interpretation? or ?theoretical? or ?analytical? work on the one hand and, on the other, ?archival work? or ?editing? falls apart when we consider the theoretical, interpretive choices that go into decisions about what will be digitized and how. Do we go with taxonomy (formal categorizing systems as evolved by trained archivists)? Or folksonomy (categories arrived at by users, many of which offer less precise organization than professional indexes but often more interesting ones that point out ambiguities and variabilities of usage and application)?

We also need to rethink paper as the gold standard of the humanities. If scholarship is better presented in an interactive 3-D data base, why does the scholar need to translate that work to a printed page in order for it to ?count? towards tenure and promotion? It makes no sense at all if our academic infrastructures are so rigid that they require a ?dumbing down? of our research in order for it to be visible enough for tenure and promotion committees.

As colleagues in the sciences and engineering will acknowledge, these are not simply humanistic issues by any means. Which brings me to a final point. Once we have changed what we value as scholarship, we need to think through the departmental and disciplinary systems within our universities. Unless we find ways to ?link? the different kinds of knowledge and analysis offered by different disciplines, we will be generating data but not really understanding the implications and import of that data. This is exactly why HASTAC (?haystack?) was created. A voluntary network of crossdisciplinary scholars realized that we had to form a ?virtual university? across disciplines where scholars could think together, without institutional boundaries, about what cyberinfrastructure is needed. We needed to conceive better collaborative models of participation, implementation, and interpretation.[11]

We are in an oddly contradictory age where revelations in the computational, natural, and biological sciences evoke the deepest issues about what it means to be human. And yet the present-day academy seems determined to undervalue exactly those disciplines?the humanities, arts, and interpretive social sciences?that offer the most sustained and rigorous methods and insights into the category of the ?human.? In different areas across the human sciences, we have addressed the deeply contested definitions and applications of the ?human? in ways that can challenge (and thus make better) and also support new scientific work. More and more of our nationally funded grants are requiring a social and ethical component in studies, precisely because so much work in science is moving into areas with implications that are profound (in hopeful or disturbing ways) for the future of humanity. Yet, within our universities, humanists are often not at the table when major scientific projects with humanistic implications are proposed. And when they are, the work they do in tandem with scientists often does not count towards tenure and promotion within their humanistic departments. This, too, is an academic infrastructure issue that can only impede the development of cyberinfrastructure. We must attend to these social, institutional, and infrastructural arrangements and make them as flexible?as interoperable?as other aspects of cyberinfrastructure.

1 ?Our Cultural Commonwealth: The Final Report of the American Council of Learned Societies Commission on Cyberinfrastructure for the Humanities and Social Sciences,? December 13, 2006.
2 Millennium Ecosystem Assessment -
3 O'Reilly, T. "What is Web 2.0" Design Patterns and Business Models for the Next Generation of Software,? September 30, 2005.
4 Davidson, C. N. Revolution and the Word: The Rise of the Novel in America (1986; New York and Oxford: Oxford University Press; Expanded Edition, 2004); and Davidson, ed., Reading in America: Literature and Social History (Baltimore: Johns Hopkins University Press, 1989; Second Edition, 1992).
5 International Dunhuang Project -
6 The Law in Slavery and Freedom -
7 Sloan Digital Sky Survey -
8 USC Shoah Foundation Institute for Visual History and education
9 The Museum of Television and Radio - (now part of The Paley Center:
10 Lenoir, T. "Emerging from the Digital Dark Ages: Challenges and Opportunities for the History of Science and Technology in the Information Age," in Roland Ris, ed., Technikforschung: Zwischen Reflexion und Dokumentation, Bern: Swiss Academy of Humanities and Social Sciences, 2004: 11-26; and "Making Studies in New Media Critical," in Oliver Grau, ed.,MediaArtHistories, Cambridge, Mass.; MIT Press, 2007, pp. 355-380.
11 HASTAC (Humanities, Arts, Science, and Technology Advanced Collaboratory) -

URL to article:


No comments