I remember it well. It was a nice fall day. First semester was now at full throttle and everyone was busy at work. I had been working on my first textual analysis of selected writings from Henry David Thoreau and Susanna Moodie. I wanted to discover patterns between the two works in regards to the Canadian Landscape. The result showed the beauty of metaphors interwoven among the texts, as well as the uncertainty of a new life in such a vast wilderness. Data mining was itself a landscape - one that I quickly endeavoured to explore.
Such my hopes were initially satisfied. One of my professors, Dr. Graham, had a new project that he asked me to assist with. We would take the massive, albeit famous, volumes of former Canadian Prime Minister William Lyon Mackenzie King's diaries and perform a textual analysis. Who knows what we would discover? Maybe a hidden transcript once unknown to history. We might become famous overnight. History was in our hands.
I started immediately by going to the Library and Archives of Canada's online site. I would input a massive wall of text into Voyant Tools and the computer would output a beuatiful array of numbers and sequences, revealing a pattern that I would be the first to realize. Here we were. The making of history was literally keystrokes away. Yet, where were the diaries? I mean, yeah, there were images on the screen and I could read his diary, but that was of no help to me. Where was the plain text? I searched and searched but to no avail. My envisioned making of history was nowhere to be found in my future. The project was quickly abandoned.
Hyperbole aside (and I hope you enjoyed this embellished story), my account contains some level of truth. Let me explain...
Dr. Graham and I really did attempt a textual analysis of the former PM's diaries but ultimately failed, though not without a lesson and hope for the future. See, text analysis is a very wonderful tool on the digital historians belt. You can close read pages upon pages of text in a matter of moments. Our old ways of finding pattern, metaphor, and meaning in great literature, albeit heroic due to the sheer volume of many works, are beginning to evolve parallel to our technology. Textual analysis - a form of data mining - allows you to view the words in and out of context and digest it all in a timely manner once only dreamed of. Our hope was to find new patterns in our analysis of the diaries as textual analysis reveals what traditional readings may have missed. Not that all those readings of literature are invalid, rather we are building upon previous methods.
We did attempt to use free Optical Character Recognition (OCR) software to turn the image of the diaries into plain text, but this ultimately failed. The technology (at least the free ones) are not necessarily perfect yet. The program spit out a large amount of gibberish and the alternative (typing out the entire diary) was unthinkable - no one (unless it was one's job) would have time for this. We had to abandon the project.
But this does not in the least worry me, for our attempt must wait for a later epoch, though one that I can see on the horizon. This failed attempt is one of many case studies of accessibility in the digital humanities. My hope of course is in progress. Technology increases at such incredible rates that projects such as ours, once dwarfed by the technology, soon become possible. We cannot begin to comprehend the speed at which technology increases. (Take for example Moore's Law which states that processing power increases exponentially. Also, some argue that that exponential increase also increases exponentially.) Thus our project must wait a few years. I suspect this will be possible in the very near future when free OCR technology can transcribe the diaries perfectly, without error (or at least until someone transcribes the documents).
Who knows? We may one day be making history with HAL 9000. Let's just hope the practice is better than the theory for this one.