Blog Post

Mining (Con)texts

John Unsworth gave a talk at Harvard tonight teasingly titled "How Not to Read a Million Books: Text Mining, and Reading the Unreadable." He spoke mostly about the MONK project,a Mellon-funded collaboration that's familiar, I'm sure, to many HASTACers.MONK applies text mining techniques and visualizations to discover newdimensions to literary and historical texts.

Unsworth describedthe work of several scholars already using the MONK toolkit in theirwork. For instance, Tanya Clement, a PhD candidate in EnglishLiterature and Digital Studies, has successfully applied MONK to herresearch on Gertrude Stein's The Making of Americans, as she described in a recent article for Literary and Linguistic Computing:

"The particular reading difficulties engendered by the complicated patterns of repetition in The Making of Americansmirror those a reader might face attempting to read a large collectionof like texts at once without getting lost?likewise, it is almostimpossible to read this text in a traditional, linear manner. However,

SaraSteger, a PhD candidate in English at University of Georgia, issimilarly using MONK in her study of sentimentalism innineteenth-century novels. Not only could she train the program torecognize sentimental scenes, she then was able to mine a collection oftexts for over-represented words in, for instance, Victorian deathbedscenes:

Her results invite new research into the absence of formal expressions ofmourning ("holy," "country," "lord"), and the presence of physical and emotional closeness ("pillow," "cheek," "breath") in deathbed scenes.

I want to underscore that I think these tools



Hi, Whitney, I think you are right that, as with any other tool, one needs to supply context, critical thinking and critical skills, to any data one mines. That's why HASTAC was created around three conjoined areas: creative design and innovation of tools for research and teaching and making art; critical thinking about those tools and about technology in society in general; participatory learning (using whatever tools are available to us now to be able to think through complex interdisciplinary problems together). If one keeps all those three things in mind, then data mining is great because it offers a macro-survey of something that, of course, then is susceptible to serious, sustained, critical thinking, debate, interpretation, and theorizing. I find something similar with other forms of data gathering, such as genomics. I just read a terrific essay by novelist Richard Powers in GQ about being one of ony 9 people to have his entire genome sequenced. What was interesting is after the whole gruelling process he was left with scads of data, all of which demands far more (not less) interpretation.


That's the thing about data: it OPENS, rather than CLOSES, the scope of what we, as interpretive humanists, need to do. The more data, the more we need to use our training to understand its complexity, its implications, and its nuances, whether in text mining or genome expression.



I think part of the problem is that the tools and corpora we currently have in the humanities provide a very shallow view of the data. For example, a corpus of unstructured text


Thanks for the great comments, I think you're both spot-on. Travis, you've got me thinking about the idea of filters again. I think this might be a more direct way to state the concerns I was thinking about in the post: I often hear text-mining discussed as if we're (to bring this verb back!) cleaning the text, transforming it from this tangible, messy thing stored on bound leaves of paper into clean, manipulable data; but in reality, we've just replaced one filter with another. Sometimes changing the filter is great, and yields interesting new information -- but it's still a filter, manmade system built out of certain assumptions about our language, etcetcetc. I've been thinking a lot this last month about how it often seems that we don't question how new tools mediate our relationships with texts with as much vigor as we question, for instance, how textbook anthologies shape our relationship with a visual poet like William Blake. In fact, new tools are still often posited as the "solution," since they offer the opportunity to publish facsimiles of Blake's work. Well, okay, that solves one problem (how to show the poems in their illustrated context); but it introduces new ones.


It's really interesting what people will do with the ordinary in order to transform it into the extraordinary.  As for that book by Goldsmith, I always wondered if he had to get permission from the New York Times to do that or if it was covered by some type of artistic license in some way.