In light of the fact that my final project proposal is a to topic model recent Dartmouth commencement speeches, I decided to compare the 2007 and 2008 addresses to Dartmouth grads given by Henry M. Paulson Jr. ’68, Secretary of the Treasury, and Ellen Johnson-Sirleaf, President of Liberia, respectively. The Cirrus features immediately informed me that Paulson’s speech was about 700 words longer at 1928 words versus Johnson-Sirleaf’s 1287. Another interesting feature here was the vocabulary density measures, in which Sirleaf scored a .416 compared to Paulson’s .359.
The most frequent words in the corpus (the aggregate of both speeches) were: world (16), time (15), make (13), job (12), and change (10). These are all words that I would expect to be in a commencement speech. But the most frequent words in Johnson-Sirleaf’s speech on its own were world (9), change (8), know (6), difference (5) and growth (5), while the most frequent words in Paulson’s were time (14); job (12); service (9); make (8); like (7). The word count at the corpus level can be a bit misleading then, because Paulson’s speech contains the word “job” 12 times, while Johnson-Sirleaf does not even use the word once. The word “world,” on the other hand, is nearly evenly split, with Paulson using it 9 times compared to Sirleaf’s 7.
But the word “world” peaks towards the beginning of Sirleaf’s speech, while it peaks towards the end of Paulson’s. I believe this is because Paulson, as a Dartmouth grad himself, decides to ground his speech in the beginning by relating to the experience of a recent Dartmouth grad looking for a job. Paulson first makes use of the word “job” early on in his speech when he says, “Now, I hate to spoil your graduation by mentioning that four letter word—jobs.” Sirleaf, on the other hand, is the President of Liberia, and so would be expected to begin her speech with more serious comments on society and politics, rather than on the experience of graduating itself. This idea or hypothesis is especially furthered by the abovementioned fact that she does not even use the word “job” once in her speech.
What is perhaps more telling than the most common words in the corpus as a whole are the “distinctive” words in each text. For instance, English is listed as a distinctive word in Paulson’s speech because he speeches about how he was an English major and how he found the skills he learned as an English major at Dartmouth valuable throughout his career. Sirleaf, on the other hand, uses the word “recall” 5 times. This word appears in a parallel structure in which she calls upon graduates to recall great thinkers and activists of the past. She says “I recall Mahatma Ghandi,” for instance, in in the following paragraphs goes on to recall Eleanor Roosevelt, Rosa Parks, Martin Luther King Jr. and Nelson Mandela. It is interesting that this text mining exercise helped reveal that though these two are both politicians, one took a more grounded and Dartmouth-specific approach to his speech while the other took a more worldly approach grounded in history.