This past summer at Union College, I’ve had the joy of helping to develop the Blake Browser, an online tool to explore and analyze the poems and paintings of William Blake. I collaborated with Professor Nick Webb of the Computer Science department and Professor Andrew Burkett of the English department along with his advisee, Sam Garson (English, Union ’13). Although the Blake Browser is not complete, you can check out the beta version here.
My background is in computer science, but this project opened my eyes to the wider role of the application of computer science methods. During discussions with Andy and Sam, we realized how Blake’s writing style forced researchers to reread many of his poems. Because of this, we wanted the Blake Browser to implement tools that would allow users (e.g., scholars and students) to jump from poem to poem and plate to plate in the Blake corpus very easily through the use of search tools.
I developed two primary search methods in the Blake Browser. By clicking the “Search and Analysis” toolbar on the right-hand side of the screen, the user launches the tool’s ability to generate a list of poems similar to the one being currently viewed. One can also search through the traditional search box by entering keywords and phrases. Both searching mechanisms are implemented using tf-idf and cosine similarity scoring which are well-known information retrieval mechanisms.
I handled other natural language processing tasks with the Natural Language Toolkit (NLTK) modules in Python. In order to prepare the corpus for computation, I removed stopwords (words without meaning like “and” or “the”) and part of speech tagged the poems using the NLTK. These tasks were computationally expensive and took the computer a long time to process, so I stored the results in an XML file, eliminating the need to reprocess the data and slow down the tool.
All of the text and images on the Blake Browser were made available online by the William Blake Archive. Every image that appears in our online tool directs the user back to their homepage as well as the source of the original image. Thanks to the generosity of Professor Joseph Viscomi at the University of North Carolina at Chapel Hill, he provided our team with a 500,000 word XML document of the complete poetry and prose of Blake, which was essential for us to build the browser.
As the Blake Browser is still in beta, there are some bugs to fix and some features to add, so stay tuned for updates. Again, you can check out the beta here.
This project was supported in part by a grant from the National Science Foundation, IIS CPATH Award #0722203.