Last March we had a HASTAC Scholars discussion titled “Digital Textuality and Tools” that Angela Kinney and I co-hosted. One of the things that this vibrant forum led to was a description of what sort of digital tools we as humanities scholars would love to have, but currently do not. From this and other conversations on HASTAC and ongoing talks I had been having with a software developer friend of mine grew Bibliopedia, a project to develop just these sorts of digital tools. Along with myself and my coder friend, we also have Ana Boa-Ventura doing web design, another HASTAC scholar. We received a grant from The University of Texas’s Liberal Arts Instructional Services (LAITS) to begin work and are now putting together our application for an NEH Digital Humanities Start Up Grant so that we can keep going. Since that deadline rapidly approaches, I want to introduce the wonderful HASTAC community to what we envision for Bibliopedia and to solicit feedback so that we can further improve our vision.
There are, essentially, two components to the project. The first is the automated data-mining and text analysis aspect of the software. The second is the crowdsourcing of data verification and elaboration. Bibliopedia will be an open research-enabling platform designed to unify the many disparate, closed silos of scholarly information that are available today but remain difficult and time-consuming to employ. Too often, much of a researcher’s effort is spent simply bringing together all of the available information on a particular subject. What is more, a common complaint of Google Books, Google Scholar, WorldCat and other digital research tools is the fact that their automatic parsers have errors in their metadata that they do not allow subject matter experts to repair.
Bibliopedia will unify those now common automated data-mining approaches of Google et al., and will also provide subject matter experts the tools necessary to correct metadata and otherwise to extend the information available from automated data-mining. Bibliopedia will pursue the goal of unifying that information into an environment that not only displays it efficiently, but actively encourages the crowdsourcing of metadata about books, articles, and other scholarly objects. By thus opening data up to revision by the scholarly community, Bibliopedia can build on the strong work of the other mature data silos, improve overall data quality, and provide the academic community at large a continuously improving research tool.
Via JSTOR, Google Scholar, and other full-text scholarly resources, Bibliopedia will provide advanced data-mining and cross-referencing of the scholarly articles and books that discuss a narrowly-focused set of primary literary texts. By focusing on individual texts rather than broad swaths of scholarship, Bibliopedia will not only allow for a deeper examination of the relevant works, but also permit the creation of a collaborative community of researchers, students, and others interested in studying the primary texts. This community will further enhance the data, cross-references, and bibliographies gathered by the software itself by providing user-generated content, discussions, and evaluation. Bibliopedia will also offer advanced visualizations of the data to permit scholars to discover new connections between works and to understand more easily the contours of existing scholarship. Bibliopedia-powered portals will thus serve to aggregate available data that is often inaccessible or invisible to users, thereby providing not only a single location at which to begin research, but also a scholarly community that will collaborate to generate new knowledge. Bibliopedia will also seek to deploy as many different existing open access technologies possible in order to make rapid progress and avoid reinventing the wheel, an all-too-common pitfall of many technology projects.
Sustained, deep interaction between a collaborative user community led by subject-matter experts will join with advanced data-mining and cross-referencing software to generate new, innovative ways of viewing, discovering, and interacting with primary and secondary literature. Not only will the results of the data-mining software of the project provide a valuable, well-populated database of information and cross-references that uncover neglected works and connections previously invisible, but the addition of a social component in the form of user-contributed wiki-style information, tags, relevance rankings, summaries, reviews, abstracts, custom bibliographies, and discussions (among the many forms such interactions will take) will extend the automatically-generated information in further beneficial ways to researchers at all levels.
The user interface and community-enabling aspects of the site, therefore, are of major import. While the data-mining and cross-referencing components are absolutely fundamental, they alone are not enough to create a vibrant, indispensable resource. As the recent explosion of social networking platforms and collaborative projects demonstrates, the creation of an engaged community of users from diverse realms substantially improves products. From the (at times controversial) success of Wikipedia in replacing traditional encyclopedias to the seeming ubiquity of Facebook to the prevalence of crowd-sourced knowledge creation, the importance of software that enables community has never been clearer.
While we currently do not have a site you visit for a demonstration of the software (we are still working mostly on infrastructure and design), I hope this description is concrete and clear enough to give you an idea of what we are are planning. What pitfalls do you see? What issues does this sort of work raise more generally? What features do you think are crucial to such work?