Earlier this week, on Tuesday, October 15th, Dr. Daniel Look of St. Lawrence University gave a talk called "Who Wrote That? An Introduction to Stylometry" as part of a series called Science Cafe in Canton, New York. Science Cafe talks are meant to be open to the public, and thus not overly-dense in scientific or mathematical concepts, but the information presented was no less engaging, incisive, and insightful.
Dr. Look opened with a discussion of Zipf Distributions and how they have shown up in city populations and music. Zipf Distributions, effectively, concern ratios of ranked values—as per one of Dr. Look's examples, the populations of the highest populated cities in the United States. The oddity is that, with surprising frequency, the most frequent (or greatest sized) item will appear almost twice as often as the second, thrice as often as the third, four times as the fourth, etc. With this surprising discovery, Dr. Look moved the conversation to Stylometry in regard to word count, where a text's word frequency's will distribute in a similar manner.
From here, the talk got even more specific. Dr. Look recently published a chapter, "Statistics in the Hyborean Age: An Introduction to Stylometry via Conan the Barbarian" in the book Conan Meets the Academy: Multidisciplinary Essays on the Enduring Barbarian. In the context of Conan, Dr. Look discussed how Stylometry could be used to measure word counts to determine what authors used what words more or less often, and then applied this to the various authors of Conan stories. Even with words as simple as "upon," the texts could be divided up and show a difference in the writing style of Robert E. Howard from other authors, and this allowed various stories to be divided up by who wrote them. Scholars had already divided the authorship up, but the mathematical studies corroborated this information and showed how effective Stylometry can be.
Further, Dr. Look also discussed the Oz books and the controversy as to whether certain books were written by L. Frank Baum or Ruth Plumly Thompson. Using Principal Component Analysis, Baum and Thompson were divided by their principle comoponents to form two-dimensional vectors. Generally speaking, works written by Baum ended up on one half of the y-axis, and those by Thompson on the other. Then, the principle components in the fifteenth book were measured and plotted, with the vectors ending up exclusively in the Thompson half. This visual was a great way to show how strongly Stylometry can be used to provide evidence for authorship, and how distinctly something as abstract as style can be measured, even if in rudimentary forms like word count.
Overall, the talk was a fantastic intro to the field of Stylometry, one that presents the possibilities—and limitations—of the field in an accessible form. The use of mathematical formulas was limited so as to appeal to a general public audience, but the information provided still felt cohesive and informative. I highly recommend reading Dr. Look's work.