The Human Face of Crowdsourcing: A Citizen-led Crowdsourcing Case Study

The Human Face of Crowdsourcing: A Citizen-led Crowdsourcing Case Study

In October of 2013, Dr. Richard Marciano presented on CI-BER's behalf at IEEE BigData 2013: Workshop on Big Humanities. We've linked to the paper and slides below, and have pasted the abstract here:

Abstract—The Cyber-Infrastructure for Billions of Electronic Records (CI-BER) project is a collaborative big data management project based on the integration of heterogeneous datasets and multi-source historical and digital collections, including a place-based citizen-led crowdsourcing case study of the Southside neighborhood in Asheville, North Carolina. The project is funded by the National Science Foundation and the National Archives and Records Administration (NARA) agencies.  A test-bed collection containing nearly 100 million files and 50TB of data was developed, with content representing electronic Federal Government records from 150 federal agencies.  The CI-BER project advances the state of the art in generalizable and extensible ultra-highly scalable data management architectures, potentially enabling robust technical preservation of, and access to, electronic records and digital data in the context of emerging national scale cyber-environments. A first-generation open source collaborative mapping environment prototype is currently being developed to support novel “citizen-led crowdsourcing” possibilities for archival material.


Keywords: Crowdsourcing, citizen-sourcing, Southside, Asheville, Housing Authority of the City of Asheville, Asheville NC, urban renewal, African-Americans


The paper Grant, Marciano, Ndiaye, Shawgo and Heard, The Human Face of Crowdsourcing: A Citizen-led Crowdsourcing Case Study (2013) is fascinating, and leads to a question from the MarineLives project team about mapping functionality.

We are working with archival material from the mid-C17th, some three hundred years earlier than the Asheville, North Carolina data mapped by citizens. We don't have terabytes of data, but we do have a lot of transcribed textual data, together with accompanying digital images.

We wish to explore our data more thoroughly using mapping tools, and specifically are interested in citizen-led collaborative mapping, using volunteers in many different locations and indeed different time zones.

We are interested in the approach described in the paper of georeferencing a contemporary map, and adding it to the Big Board room, with further map layers then added to provide contextual data. A georeferenced map of 1746 London exists, developed by the Locating London's Past project. Preloaded data sets can be tailored and mapped onto this georeferenced map. However, there is not the Big Board functionality you describe, which appears to allow the creation of new polygons by citizen-mappers, to which editable web-pages can be appended. This functionality is much closer to the path we have been pioneering for collaborative transcription and annotation.

The MarineLives team would be very interested in testing this software and associated processes, and developing a further use case for the CI-BER project. The specific use case we are thinking of is to map the London dock, wharf and stair infrastructure of London, and the Thames banks, from the 1650s (as detailed in Admiralty Court records), to layer it with other contemporary qualitative and quantitative data, and to link the mapped infrastructure to the many depositions of mariners and land based tradespeople working in and around the 1650s London docks. Click here to read examples of dock related data which appear in Admiralty Court transcriptions.

A fascinating project.  What kind of volume of information are you looking at transcribing?  Would love to know more about your recruiting of volunteers and strategy and outcomes?  




You asked about our approach to collaboration. The key thing to understand is that I am a former management consultant, as well as having worked as R&D Strategy Director at GlaxoWellcome/GSK, and that I have borrowed project management and organisational techniques from these heavily matrixed knowledge intensive environments and have applied them to virtual collaboration.

Indeed, the first collaborative project I ever ran supported by collaborative technology was in 1994, when, as a principal at Booz.Allen in the Engineering and Manufacturing Practice, I ran a project to look at the possibility of creating a new global business in hydrogen. We had associates on the project in Finland, San Francisco, Beijing, Hong Kong, Jakarta, Bangkok and London, and used Bulletin Board technology for file sharing, process updates, and project management.

MarineLives collaboration model

At MarineLives we have invented and successfully tested a very simple approach to collaborative transcription (which we plan now to test out for different types of annotation and linkage). We train volunteer facilitators (think McKinsey Engagement Manager) and each volunteer team facilitator has a team of three to five volunteer associate transcribers, with whom they work closely. We used a wikibased Project Manual to train and support all facilitators and associates, with frequent use by the facilitators of 1:1 Skype and small group Skype video and voice callswith their associates to maximise practical support and to build the sense of involvement of all team members. The collaborative software we used was Scripto, from the Rosenzweig Centre at George Mason University, tailored to our purposes by a volunteer (Giovanni Colavizza (now a Research Assistant at the École Polytéchnique Féderale de Lausanne).

We came up with the model in September 2012, which we tested in a Proof of Concept, which ran from September 2012 to December 2012, with a smaller group of people continuing in a second tranche from January to March 2013.  The process model was designed from the start to minimise dropout from the programme - devices we used (and use) to do so include (1) Being absolutely clear upfront what our time expectations were from volunteers (ca. 1.5-2 hours per week) (2) Having a clear, short defined period for the project, announced in advance, so that individuals are clear about their time commitment (12 weeks) (3) High levels of training and support (4) Face to face kickoff meetings for groups and individuals where possible (in anumber of cases this was at the English National Archives in Kew). On our POC in 2012 we had five facilitators and circa twenty four transcriber associates, covering an age range of 17 to 68, and from a range of backgrounds.