Creators of semantic web projects tend to be big thinkers. Whether they are trying to bring order to the web, to an entire industry, or to an academic discipline, theorists and practitioners of the semantic web generally focus on datasets too big for individual researchers to gather or verify. In order for the semantic web to become a part of day-to-day humanities research, especially at smaller and less well-off research centers and universities, we urgently need a discussion about how individuals can use linked data.
In a series of three posts, I will suggest some avenues for individual researchers to use linked data resources. The focus will be on use value and on reducing the need for time-consuming verification.
I first learned about linked data while working on the Salons Project, part of Mapping the Republic of Letters, a multi-year, multi-disciplinary digital humanities project at Stanford. Mapping the Republic of Letters uses linked data like VIAF to connect datasets that are verified by individual researchers. Individual projects are all run by researchers who are responsable for the quality of their data. Much of the data comes from the Electronic Enlightenment project at Oxford and other library and print archives of letters and none of the historically important data comes from linked data sources. (Ultimately, individual project heads decide which datasets are most complete and accurate, but print sources are still the only source of primary data for almost all of the projects.)
Here are a few notes on getting started with linked data...
Sources of linked data for humanists
There are many, many sources of linked data out there. The four that I have used the most in my research are:
VIAF - The best resource for literary projects. VIAF assigns a number (unique identifier) to each writer, which allows libraries to link metadata. The VIAF entry for each writer contains a list of variations on the writer's name, birth and death year, publication information, and information about library holdings. It references some pretty obscure writers, including people who published nothing but wrote letters or kept a journal.
Getty ULAN - A fantastic resource for art historians. The ULAN is a unique identifier for artists. The Getty database stores data on the artist's biography, teacher / student relationships, individual works, and museum holdings.
Geonames - Geonames ties place names, including some historical or outdated place names, to geographical coordinates. Store these coordinates and the geonames reference in order to produce maps for your project.
DBpedia - General information about people, places, and things. I have used DBpedia much less than the others, because it has a lot of gaps. DBpedia remains the resource to watch, however, since it has the most ambiitious game plan.
When not to use linked data
One big caveat:
You are still responsable for the quality of your data!
You probably should not use the data from DBpedia or even academic sources like VIAF for the primary phenomenon that you are studying. For instance, if you are studying the religious beliefs of people with whom Newton had an academic connection, you would need to collect and verify the data for the individuals' religious affiliation and academic connections (likely from print sources and library archives). You might want to use linked data sources like VIAF as a first pass on contextual information such as birth year or publications.
N.B. DBpedia is not an academic resource but it may be good enough to evaluate whether or not a particular line of investigation is worth pursuing.
Linked data need to be verified by humans
Data such as the dates, places, and names in correspondence data are only as good as the original source and the work done to verify that data. Where linked data really comes in handy is in pulling in secondary bits of information to the database and in connecting smaller datasets (for example, the correspondents of Voltaire and Newton). Above all, linked data can help you see the broader context by rapidly comparing competing hypothesis.
For instance, DBpedia might help you to identify all of the women in a long list of names, for example. (The entries for basic information like birth year and gender are much better than those for complex matters like medieval professions or 18th-century religious affilitions).
Think of these linked data resources as supercharged versions of the Encyclopedia Britannica or the American Heritage Dictionary. You might use them for quick research on a subject that is tangential to your argument but they are rarely subsitutes for peer-reviewed sources.
The next part of this post will feature an interview with Glauco Mantegari, a specialist of linked data who works with humanists in Mapping the Republic of Letters and the Humanities + Design lab on data / linked data projects.
The third part will highlight projects that use linked data (including those suggested by the HASTAC community).
Do you have a data-intensive project that might benefit from the use of linked data? Do you know of any smaller projets that make use of linked data? Any ideas for how to use linked data? Linked data resources that everyone should know about?