Blog Post

Exploring Text Analysis

Exploring Text Analysis

Topic modeling is when you take a collection of written texts and run them through a tool to create charts, graphs, and maps, which are then used to find data trends in the group of work. For example, you may take all of Shakespeare’s poems and run them through a program and see what words were most prominent in his plays. 


Another way to explain topic modeling is from Megan Brett who compares it to highlighting a word document. She says that "As you read through the article, you use a different color for the key words of themes within the paper as you come across them. When you were done, you could copy out the words as grouped by the color you assigned them. That list of words is a topic, and each color represents a different topic."


In the article by Ted Underwood, he states many different ways that text modeling can be used. They are some of the following:

- categorize documents

- Contrast the vocabulary of different corpora

- Trace the history of particular features (words or phrases) over time.

- Cluster features that tend to be associated in a given corpus of documents (aka topic modeling).

- Entity extraction.

- Visualization.


Text analysis is useful because it allows you to view trends from multiple different sources in one cohesive graph. It is easy to see trends in the graphs because they are laid out well. Dr. Merchant talked to our class she showed us very colorful and fun graphs that analyzed the field of demography. The color coordination in these graphs make it easy to quickly spot out trends. Even if you do not know how to code there are many tools available for doing text analysis like, Voyant, MONK, MALLET, TAPoR, SEASR. However, these tools do not always work with large collections of texts and do not allow you to add your own innovations.  


Challenges of text analysis come from its complication and messiness. Complications can arise in the tools that are used for text analysis. In the article by Brett, she mentions that the many different tools that are used for text analysis and that it is important to train the tools that you are using in order to get results. She says that you need to be careful with how many topics you tell the program to return. It takes practice and fine tuning to get results from text analysis which calls for patience. Another problem that can arise is that the visual may not make sense to everyone, so you have to double check that the visualization turned out the way you wanted it to.


After reading about text analysis, I still have a couple of questions on how it works.

Does text analysis have to be done with a lot of documents, or could you do text analysis on just one book or paper?
How often is topic modeling used for analysis compared to the more traditional graph? 



1 comment

You can indeed do text analysis with a single document, such as just one book or paper! Topic modeling is a very specific kind of text analysis in which you turn a very large corpus, or collection of documents, into a "bag of words." It uses an algorithm to discover a list of topics or groups of similar words in that large collection of documents. Here is a great post by Ted Underwood on topic modelling A graph is an example of an output or visualization that you can make to better understand the results of text analysis. But a graph is not, itself, a kind of text analysis. Does that make sense?