Blog Post

Beginning Topic Modeling

Beginning Topic Modeling

As a new digital humanities scholar, my project focuses on tracing the development of media narratives related to police brutality against African-Americans over the past fifty years. Now daunted by the sheer amount of sources that could comprise my corpus, I am looking for a way to focus my search for source material. I could, of course, simply search for phrases in digitized newspapers and literature. "Police brutality" and "African-American" would turn up a few--if not an innumerable amount of--responses. But what I am really looking for is a means of zeroing in on the narratives surrounding these two phrases.

Enter topic modeling. From what I've learned thus far, it appears that topic modeling is my best starting point. Stanford and the University of North Texas's Mapping Text project,, is my working inspiration.

In embarking on this new scholarly journey, I've come across a few resources that I have found helpful thus far; I thought I'd share them with any other burgeoning topic modelers in the HASTAC community. Some of the topic modeling sources I discovered are overlayed with dense computer-science lingo, so I've tried to make this list user friendly for non-computer science researchers like myself. If any more experienced DH scholars can add to this list, I'd appreciate it!

Blog posts

• Megan Brett,

• Scott Weingart,

• Shawn Graham,

Software tools (for those who eschew command lines)

• Paper Machines, (works as an extension to Zotero; can also incorporate data from JSTOR Data for Research)

• Stanford Topic Modeling Toolbox, (runs in MS Excel; allows you to track topics over time)

For more info

• Topic Models Mailing List, (run by David M. Blei of Princeton. For his webpage on topic modeling, see

• David Mimno’s bibliography,

• David Mimno Blog, (how to catch phrases (no pun intended) in your topic modeling search)


* thestarrynight image,




I really enjoy these works that are critical to Topic Modeling, but also provide a solution to the problem. Ben Schmidt has a great article on this:


What a great source - thanks!  The exchange in the comments between Ben Schmidt and Mike Gavin on analogy finding is very interesting.  This aspect of word embeddings may prove to be quite useful for tracing conceptual changes in a narrative over time.


Hi Nicole, thanks for these informative articles! Topic modelling is an interesting DH tool that is a little different from traditional analysis which I think offers some more breadth to DH work.

I would like to put in a good word for the command line, though! I think it is a great tool for anyone who uses a computer to be familiar with, and there is a lot of fun that you can do with it (I wrote a blog post about command line toys here). For any HASTAC-ers who are interested, Codeacademy has a free command line course here that seems pretty straightforward for beginners.


Thanks for the command line tip, Lisa.  I will definitely be visiting Codeacademy to expand my skill set (after I try out ASCII Star Wars, of course).



Thank you for your blog! I have also got interested  in topic modeling for digital humanities. I wanted to add an intro article by David Blei who did a lot of work analyzing academic papers. He uses an R package which is very easy to use.


Thank you! The Blei article is a clear and concise introduction. Learning about R packages is next on my to do list.