Yesterday's New York Times profile of the work of James Pennebaker is just the latest evidence of a revival of interest in computational stylistics, and I'd be curious to hear other HASTAC Scholars' thoughts on the topic.

I'd be the first to admit that I have something of a counting fetish, and I love to see this kind of thing done well (Mark Liberman's "breakfast experiments" on Language Log come to mind), but I can't help thinking that arguments like the following have a touch of the phrenological about them:

Dr. Pennebaker has found that men tend to use more articles (a, the) and women tend to use more pronouns (he, she, they). The difference, he says, may suggest that men are more prone to concrete thinking and women are more likely to see things from other perspectives.

I understand that this is a journalistic summary, but it is repeated almost verbatim from a recent post ("The Meaning of Words: Obama versus McCain") on Pennebaker's blog. Pennebaker concludes this post with the disclaimer that "no one should take any text analysis expert's opinions too seriously," but this strikes me as just a bit disingenuous coming from a person who has the ear of the Department of Defense.

Fish's critique of 1970's literary stylistics (as practiced by scholars such as L. T. Milic) seems relevant here:

Here the procedure is not circular but arbitrary. The data are scrutinized and an interpretation is asserted for them, asserted rather than proven because there is nothing in the machinery Milic cranks up to authorize the leap (from the data to a specification of their value) he makes. What does authorize it is an unexamined and highly suspect assumption that one can read directly from the description of a text (however derived) to the shape or quality of its author's mind.

Am I missing something? Can someone who is more familiar with Pennebaker's work explain how it authorizes this kind of interpretive leap? Is Fish's critique even still valid thirty years later, now that we have access to vastly larger corpora than anyone could have imagined in 1972?

Finally, I'd love to hear about other people's experiences with text analysis and visualization tools (like Pennebaker's LIWC, or TextArc or the work of Magnus Rembold and Jürgen Späth, to name just a few examples). Have you found these kinds of tools useful in your scholarly work? If so, what principles have you used in assigning interpretations to descriptions of data?


