The presence at the MLA of several papers on applying computational techniques to questions in the humanities represents a powerful new paradigm emerging that offers very new and interesting in roads to the humanities scholar, but at the same time demonstrates the rough spots still left in the task of integrating the two disparate disciplines. There was one panel and one paper in another panel dealing with the subject. The panel, Online Discourse and Linguistic Innovation, consisted of two papers presented by Linguist Kendra Douglas about her work with Computer Scientist James Hearne tracking contemporary language change and was a fairly traditional sort of NLP paper. The paper, Critical Text Mining; or, Reading Differently, was presented by Lit student Matthew Wilkens and was more of a general concept paper about how the humanities should be thinking about using computational techniques.
I'm not a linguist, but to the best of my understanding the focus of the first study was on watching the ways that English borrowed meaningful prefixes and suffixes that could be used to imbue any word with new meaning. The example used was the borrowing of "-ista" from foregin terms like "Sandinista" and using it to create "Obamanista" or "Palinista." The term they used was "productive bound morphemes" in case anyone can clarify that. The project I thought was interesting, but had a serious methodological flaw that is very typical of NLP projects that try to study language, and which I think provides the best justification and insight into how and why the humanities need to engage with this kind of work.
The basic methodology of the project was to search google over time for all sorts of possible newly-forged English words and words which fit the above criterion. The claim as I understood it was essentially that the Google returns were representative of the use of these morphemes in written English. What I felt this approach lacked was a more critical engagement with the cultural nuances of language. In the discussion after the presentation, the discussion about the "-ista" suffix centered around what the suffix meant in English; it was negative, from "Obamanista," it suggested a certain amount of work-specific knowledge from "barista," it was feminine from "fashionista." The problem, I felt, with this discussion was that it over-fetishized the suffix as the vessel of meaning in ways I felt missed important realities of its linguistic life. More specifically, what seemed to me to be the case was that the suffix as it was used in differect cultural and linguistic circles borrowed from different entry points from foreign languages and had different semantic implications (which I would then expect to influence each other by their co-occurence in different English language contexts). "Obamanista" invoked the othered revolutionary character of the Nicaraguan "Sandinista" to the almost complete exclusion the ostensibly Italian "barista." What I would think would be relevant to the subject in question here would be the different nuances of the ways "-ista" are applied, the respective locations from where each meaning might come from, an analysis of how the co-existence of defferent meanings shaped one another, the nature of the borrowing between foreign languages before the word might have been absorbed into English, and the historical life of text behind all these questions.
Wilkens' paper, by contrast, represented sort of the other side of the coin. Ostensibly a Lit student, Wilkens' talk came from the perspective of one in the humanities trying to appropriate techniques of NLP. Much of what he said resonated with me about the utility of techniques designed to deal with more text than a human being could ever read to enhance one's ability to make analytical observations. The paper took as an example the literary form of allegory and speculated generally on how one might approach from a computational perspective the task of identifying an allegorical work. Wilkens seemed cognizant of many of the potential pitfalls in terms of reifying allegory as an impermeable category, for instance. The one thing I would say was problematic with the paper, however, was that Wilkens only thought in terms of established NLP techniques. These techniques thus far have not yielded much success for the sorts of tasks Wilkens imagined. If we are to use NLP techniques for textual analysis, we must not only adapt our approach to analyzing the results of the techniques, as was Wilkens' point, but also I would argue the techniques themselves. Wilkens poined out that it is unclear that the identity of an allegory can be recognized entirely by drawing on the text. My thoughts on the matter at the time were essentially that to read any allegory, one must have an understanding of some recognizable historical situation or at least archetypal situation. It is hard to conceive of allegory without a familiarity with historical context. In response to my question about incorporating this context in the form of non-allegorical texts about history, Wilkens responded that it would be an even harder problem to solve. The difficulty I am trying to think around, however, is that the techniques that exist have not yet yielded the kind of results that suggest they could achieve what Wilkens envisions, so it's going to require more thought. That thought, I would suggest, must come from those who have some understanding of how texts work. Thus, to incorporate these computational techniques, we need a paradigm of computational humanities.
I liked both of these talks and was inspired by their presence at the MLA because I do think that these computational techniques represent an important and useful paradigm. What I think is evident in both of these, however, is just how new and unstable such a paradigm is. There is work to be done.