There has been a lot of discussion recently about if, how, and why the digital humanities needs to engage more with Theory. Digital Humanities Now has a stellar review of the debate to catch you up if you've missed it. In reaction to the refrain repeated by many digital humanities scholars, “more hack, less yack,” many in this discussion make the point that the tools we’re using are themselves driven by implicit theoretical assumptions. I agree with this point and our need to interrogate those assumptions. Our recent Critical Code Studies forum also makes this point. But the picture is yet more complex. The tools themselves may be put to multiple uses, each of which derives its focus from the scholar’s theoretical and critical assumptions. To me, theory is also a tool. How often have we seen scholarship that dogmatically applies the hip theory of the moment and thus runs roughshod over the particulars of a text? That we can apply the old saying “when all you have is a hammer, everything looks like a nail” to such naïve applications of theory exemplifies this parallel between tools and theory.
A common mode of investigation in textual analysis is of large corpora, often of a genre of texts. While this practice seems explicitly quantitative, it too evinces theoretical assumptions. In my research, I examine how genres are hermeneutic categories fundamental to interpretation. As Cathy Davidson points out in Now You See It, how we categorize determines what we do (and do not) pay attention to. By extension, how we define a genre sets the boundaries for what we will find, whether it be through traditional humanistic inquiry or through digital tools that analyze large bodies of texts. Whenever I read accounts of digital humanities scholars doing this sort of quantitative work on genres, I become immediately cautious. Most often, the investigators present genre as a self-obvious construct, not needing explanation or interrogation. But if we unpack just a few assumptions, we see how this basic move of quantitative analysis crucially influences findings, even in a seemingly objective project like automated textual analysis. The first question we must ask is, “How did they decide which texts to include and which to leave out?” To create a corpus is to make numerous selections based on pre-existing criteria. These criteria are themselves driven by the scholar’s intuitive sense of what does or does not belong in the given genre. Even though Derrida convincingly demonstrated (to my mind, at least) that the model that posits genre as a strict boundary is logically untenable, many scholars—particularly digital humanists—continue to reify genre in this way. The act of defining a corpus re-enacts the very law of genre that Derrida ridiculed. Nothing, in fact, could be more explicit; either a text is in the corpus or it isn’t. There’s little room for fuzzy boundaries.
This example, then, is but one of the ways in which digital humanities work clearly needs to engage with theory. To fail to do so creates a hermeneutic circle in which the objects under study are selected because they match under-examined criteria and a model of literature (and human thought more generally) that, I argue, is overly simplistic, does not accurately reflect artistic development and change, and fails to account for basic cognitive functions at play in literary interpretation. What, then, might be an implication for textual analysis? I propose that we need to analyze multiple times, through multiple filters and lenses, and on multiple, shifting corpora. For example, scholarship on Old French fabliaux often try to define the genre as a whole to decide which of the 140 or so extant texts “really” count and which don’t. Instead of this misdirected energy, why not accept all the plausible boundaries, then analyze the results of each permutation? That is, analyze the “pure” genre, then again a corpus with all the edge cases included, then yet another with edge cases plus related genres like beast fables included as well. But I don't want to get too far into the specifics of one particular application, as interesting as I find the topic. My point, instead, is that even seemingly simple choices like which texts to include in a corpus derive from theoretical assumptions that need questioning.
If using a tool implicates theory, then building them—the implied work of the “more hack” motto—instantiates theories in even more fundamental ways. For example, in my project Bibliopedia, we made several design choices early on that reflect our attitudes toward data, the scholarly community, and what constitutes a valuable addition to the digital humanities. We want to aggregate scholarly citations from many different data silos of varying degrees of openness, such as JSTOR, Project MUSE, and others. Take Google Scholar as an example. There is currently no way to access their data programatically as Google provides no API. It is only via a web browser that individual humans can browse their information. This design choice means that anyone wishing to re-use or re-mix Google’s data is out of luck. Much like Apple’s insistence on control and closed boxes in all their products, Google does not want to share the bibliographic information they have amassed for Google Scholar. Some traditional repositories like JSTOR and other databases humanities scholars use in everyday research are more open to sharing, but still make their money by providing access to libraries that pay a subscription cost.
In contrast, we have designed Bibliopedia from the start not only to join as many disparate silos of data as we can and to be easily extended to incorporate new ones as they become available, but also to share this data in a transformed, standardized, and reusable way via linked data. For those who don’t know, linked data is most likely the next Big Thing for the Internet. It allows machines to understand the structure of and relationships among data, which in turn permits reasoning, interconnection, and sharing. If you’re curious, you can find out more in our HASTAC Semantic Web group. Our decision in Bibliopedia to join silos of data and then transform what we find into linked data so that this information can then become part of the growing semantic web derives from our theories about how data should be created, used, and shared. We’re assuming the scholarly community should run on the model of a conversation open to everyone, not one taking place behind the doors of a selective private club. In other words, we have a theory about information and scholarship to which very big players (Google, Apple, various publishing houses) do not subscribe. We’re creating a tool that, when fully realized, will change how scholarship is used, distributed, and—most importantly—understood. If theory exists to change the world and how we perceive it, then we begin to see just how little space differentiates theory and digital tools.
Recall (if you can) the differences between 2011 and 1991. Reflect on how many new digital tools we have and just how different our lives our. Cell phones make us available all the time. The Internet, Wikipedia, and Google put massive amounts of information at our instant command. Duke’s iPod experiment also showed one way digital technology can revolutionize the way students learn. Tools change the world; tools are implicitly theoretical. Their affordances shape our encounters with the object of inquiry and with one another. The hammer wants us to hit things. Word frequencies want us to count things. Lacan wants us to psycho-analyze things. They’re all tools. When we fail to realize that fact, we risk pounding everything in sight as if it were a nail.