Blog Post

Defining Terms: My First Step in Visualizing DH Crowdsourcing Models


According to Luis Von Ahn, professor at Carnegie Mellon and developer of Duolingo, one million people can translate the entirety of English Wikipedia into Spanish in just 80 hours. He calls that Human Computation, noting that others call it crowdsourcing.

I went to a THATCamp AHA session facilitated by Ben Brumfield where those in the room posed questions to key players on two different projects that used crowdsourcing strategies in very different ways, both achieving success. Galaxy Zoo, represented by Chris Linet, is an example of a crowdsourcing project using highly technical means to gather and verify data, while the Civil War Diaries & Letters Transcription Project, represented by Jen Wolfe, uses technology mostly as the conduit to share the input from manual labor with the quality control of professional librarian labor. After hearing the comparison of these two projects, I wondered how else we might be able to pull out similarities and differences from Digital Humanities projects to see what’s working for whom. Do we have enough examples of successful projects to create crowdsourcing models that could inform future initiatives?*

I really like this as a potential research question, but defining key terms to limit the scope, surveying the field for qualifying projects, and analyzing their crowdsourcing strategies that are unique but also transferable is an important first step.

So let’s start with defining some terms. Clarifying three main concepts will help narrow the scope of this inquiry. What do we mean by “crowdsourcing”? What do we include under the umbrella of “digital humanities” and extend that to what counts as a “digital humanities project”?

That last question may sound unneccesary without knowing what kinds of initiatives fell on my initial list of projects. Brainstorming, I listed all the initiatives I could think of from the Galaxy Zoo and Civil War Diaries examples to things like the Google Books digitization efforts and university partnerships like HathiTrust. The Google Books digitization efforts have huge implications for what’s possible to accomplish through digital humanities investigations, but the crowdsourcing that led to that mass digitization may stretch how I want to define the term.

So let’s backtrack. What is “crowdsourcing”? A strategy showing up much more in business than in academia--and especially in creating databases for smartphone and tablet apps--crowdsourcing is the process of the sum of its parts being more important than the size of the individual part. The successful 2008 fundraising by the Obama presidential campaign told donors that contributing $5 was just as important as contributing $1000. The outcome was raising both financial investments as well as personal investments in the political campaign. Organizers created buy-in, and buy-in can play a significant role in successful crowdsourcing efforts, as well.

But buy-in isn’t required. Minor efforts by a large number of people are required. They can be major efforts, too, but don’t have to be. So let’s start by defining “crowdsourcing” as “the efforts of a number of people contributing to a significant outcome.” One definition taken care of.

Now to define “digital humanities” before limiting the scope of what counts as a “digital humanities project.” The most expansive definition I’ve thought of for the term can be stated as “using technology to teach us something new.” Using digital tools for discovery fits this definition, as does parsing big data for the purpose of (re)telling stories.

For the purpose of this research, I’m going to say projects are scholarly pursuits, where an agenda to fill a knowledge gap trumps an agenda to make profits. There is this trend of universities partnering with industries as one way to make up for budgets slashed by states. For me to want to consider a digital humanities project here, it has to be more for the betterment of humans than it is for a business’s bottom line. YouTube, Flickr, Facebook, and Twitter could all be seen as utilizing crowdsourcing strategies. If the for-profit world contributes to the body of knowledge about our world, then it may have a place among these other initiatives, but I don’t think we can justify using the term “digital humanities” unless it involves discovery about ourselves as humans.

Two examples that help test these definitions are the Google Books digitization efforts mentioned above and the archiving of public tweets by the Library of Congress. The former has created a massive repository and the latter a massive database. Contributions to the former involve university partners presumably under some kind of contract or memorandum of understanding, while contributions to the latter are made by members of the public, either knowingly or unknowingly. And the digitization initiative exists to digitally archive works contributing to human knowledge, while the archived tweets represent moments of human expression.

I feel as if those characteristics qualify these two initiatives as crowdsourcing initiatives that can contribute to the field of digital humanities, but I can’t quite call them projects yet without the influence of the interpretation by scholars to discover something new.

825 words later, here is where I begin. If I hope to visualize some models that can inform the crowdsourcing project I am working on, my next step is to start culling the web for projects to find what characteristics can be grouped together. Being in a Library and Information Science program, classification is right up my alley. Bring on the grunt work!

*To be transparent, I have been pursuing this topic as a research inquiry as of mid-January 2012.


No comments