Blog Post

The Blake Browser: A Tool for Digital Humanities Research and Teaching


This past summer at Union College, I’ve had the joy of helping to develop the Blake Browser, an online tool to explore and analyze the poems and paintings of William Blake.  I collaborated with Professor Nick Webb of the Computer Science department and Professor Andrew Burkett of the English department along with his advisee, Sam Garson (English, Union ’13).  Although the Blake Browser is not complete, you can check out the beta version here.

My background is in computer science, but this project opened my eyes to the wider role of the application of computer science methods.  During discussions with Andy and Sam, we realized how Blake’s writing style forced researchers to reread many of his poems.  Because of this, we wanted the Blake Browser to implement tools that would allow users (e.g., scholars and students) to jump from poem to poem and plate to plate in the Blake corpus very easily through the use of search tools.

I developed two primary search methods in the Blake Browser.  By clicking the “Search and Analysis” toolbar on the right-hand side of the screen, the user launches the tool’s ability to generate a list of poems similar to the one being currently viewed.  One can also search through the traditional search box by entering keywords and phrases.  Both searching mechanisms are implemented using tf-idf and cosine similarity scoring which are well-known information retrieval mechanisms.

I handled other natural language processing tasks with the Natural Language Toolkit (NLTK) modules in Python.  In order to prepare the corpus for computation, I removed stopwords (words without meaning like “and” or “the”) and part of speech tagged the poems using the NLTK.  These tasks were computationally expensive and took the computer a long time to process, so I stored the results in an XML file, eliminating the need to reprocess the data and slow down the tool.

All of the text and images on the Blake Browser were made available online by the William Blake Archive.  Every image that appears in our online tool directs the user back to their homepage as well as the source of the original image.  Thanks to the generosity of Professor Joseph Viscomi at the University of North Carolina at Chapel Hill, he provided our team with a 500,000 word XML document of the complete poetry and prose of Blake, which was essential for us to build the browser. 

As the Blake Browser is still in beta, there are some bugs to fix and some features to add, so stay tuned for updates.  Again, you can check out the beta here.

This project was supported in part by a grant from the National Science Foundation, IIS CPATH Award #0722203.




What a great project!  I'd love to learn more about it... can you tell me how the Search & Analysis toolbar decides what qualities of a poem make it "similar" to another one?  The partners on your team come from a variety of disciplinary backgrounds--do you also have a variety of perspectives on what makes one poem similar to another?

Thanks for sharing your progress with the HASTAC community! 





Hi Bridget,

That’s a great question.

The similarity is determined using a combination of tf-idf and cosine similarity scoring methods.  Tf-idf scoring looks at how often a word occurs in a document compared to how often it appears in other documents.  With this information, we can see how relevant a given document is to a certain keyword.  Once we do this for every word in every document, we can use cosine similarity scoring to compare the tf-idf scores between documents and determine which document is most similar.  This is a very common type of document retrieval method used in the field of Natural Language Processing and is implemented by many search engines.

As for the Humanities approach, we could say that two documents are similar if they talk about the same characters or contain similar themes.  These literary concepts are much more difficult to detect using computational methods.  How can we tell which proper nouns are characters?  If the author uses pronouns, how can we tell which pronouns refer to which characters?  How do we automatically detect themes and when they are mentioned?

The difference between these two approaches are pretty striking: one is entirely computational and the other is entirely subjective.  However, the computational approach is very effective at producing results similar to the Humanities approach.  If you read a poem about Urizen (one of Blake’s main characters), the poems marked as similar also talk about Urizen.

I’m glad that you are interested in the project.  Just let me know if you have any more questions.

- Ben


Thanks for your reply, Ben!  I am so fascinated that the computational and humanities approach came to similar conclusions... wow!  I hope you'll keep us updated with any interesting exceptions to this rule as you continue working on the project?  Thanks for your thoughtful answer!


This looks really cool, Ben! 

I'd love to keep in touch with you regarding this project. I'm cowriting a book that's coming out on Christmas Eve called William Blake and the Digital Humanities. I've also held a long-standing belief that Blake's illuminated texts are like games - and I'd love to construct a gaming interface for people to experience Blake's work. So, jumping from one text to another sounds really interesting in terms of making Blake's work a game.