In Do Digital Humanists Need to Understand Algorithms?, Benjamin Schmidt argues that digital humanists do not indeed need to understand algorithms, but they do need to understand the transformations that algorithms attempt to bring about. This transformation entails some goal that one can understood even without any knowledge of an algorithm used to do so. Nearly everyone with some amount of education could sort a group of numbers, but only a much smaller percentage could write insertion sort, selection sort or merge sort in any programming language.
Schmidt continues to make his argument by discussing Jockers’s use of algorithms for text analysis. However, this is a very humanistic application of algorithms, and as a computer scientist I am interested in applying this line of thought to other areas of the field. In computer science, we often think about the “correctness” and complexity of an algorithm. However, in this view of things, perhaps an algorithm is not “incorrect” but rather it is performing an unintended transformation. We can measure many algorithms in terms of run-time and use our big-O notation, but how do we measure the complexity of a transformation?
On the topic of text analysis, what transformations do we hope to achieve, and what is lost in the process? If we hope to transform a book into a graph of a “Normalized Sentiment Score”, what does the output actually tell us? I would argue that in some ways there is no way to prove the “correctness” of a text-analysis algorithm such as this one, because books are very subjective and no single analysis of a book is the right one.
This is where I agree with Underwood in his article Where to start with text mining. He argues that “quantitative analysis starts to make things easier only when we start working on a scale where it’s impossible for a human reader to hold everything in memory”. When trying to analyze and compare only one or two books, you would gain much more by reading and critically analyzing them yourself rather than attempt to utilize an algorithm to divulge some information. However, when working with larger collections, one can start to analyze trends that occur geographically and temporally. Even then, the output of text-mining algorithms must still be analyzed if one wishes to understand why a trend occurs.