Blog Post

Socializing Big Data: Collaborative Opportunities in Computer Science, the Social Sciences, and the Humanities

Socializing Big Data: Collaborative Opportunities in Computer Science, the Social Sciences, and the Humanities

Today at the Franklin Humanities Institute at Duke University, Richard Marciano talked about Socializing Big Data: Collaborative Opportunities in Computer Science, the Social Sciences, and the Humanities. Richard is a professor in the School of Information and Library Science at the University of North Carolina at Chapel Hill, Director of the Sustainable Archives and Leveraging Technologies (SALT) lab, and co-director of the Digital Innovation Lab (DIL).

The following highlights the points in Richard's presentation that, for many of us, represents the best of what academia offers -- complex, collaborative, and innovative research that builds, applies, investigates, and distills knowledge across a diverse social landscape.

"Socializing big data" represents one of the most challenging and intriguing sociotechnical questions of the 21st century: We have the data. We have a lot of it. Now what? 

Richard started his talk describing the issues that keep him up at night: "It is not just the messiness of all this data, but the notion that big data can create big collaborations, which invites key questions: How can people get along and bring diverse points to the table? Big collaborations also lead to bigger ideas, so how can we guide research directions and develop innovative approaches that benefit from that kind of diversity?"

To illustrate big data and big collaborations, Richard highlighted the “Records in the Cloud” project funded by University of British Columbia iSchool in collaboration with the University of Washington iSchool, and the MidSweden University's Info Tech and Media program. The purpose of the project is to delegate to cloud providers the responsibility for security, accessibility, disposition and preservation. To quote Richard, “This is the nature of a lot of these projects -- which is to say, it is cross discipine in nature, and digs deep and broad to make sure there are viewpoints and representation that go far beyond the technological aspects.”

Going beyond the technological aspects of big data matters now more than ever. 

The White House announced that Big Data is a Big Deal in March of 2012, a headline with teeth, evidently, since they backed it up with $200 million in funding across six federal departments and agencies. We need to be smart about what we do with this opportunity. Writer Ed Dumbill of Forbes magazine shared his own take in an article titled Big Data, Big Hype, Big Deal, an intelligent forecast of big data’s potential for “[S]ensing, algorithmic discovery and gaining deeper insight through data. Essentially, the emergence of a global digital nervous system.”

The question, then, is what does it mean to gain deeper insight through data? As Allistair Croll wrote, Big data is our generations civil rights issue and we don’t know it. Generating deeper insights requires having diverse viewpoints at the table. As an example, Richard referred to the iRODS Primer: Integrated Rule-Oriented Data System Synthesis that he co-authored. His question to us seems technical, but it is fundamentally ethical:

  • How can you specify rules of engagement and rules of authenticity? How can you instrument content management systems with new forms of automation?
  • How could you instruct these systems with rules of ethics, or rules of social behavior?
  • How can you customize the behavior of the system so it will be more user-friendly and smart and adapt as a single system that functions for everyone?

No one is doing this yet, says Richard. What would a system like this mean? “Access and linkage to your big data collections would be governed by principles, and then the collection would try to enforce those things. It’s a return to the old days of artificial intellience when we had expert systems, not just collections of content. A set of rules, triggers, policies would customize the entire behavior. Dealing with big data brings us back to this kind of space, this kind of thinking.”

When we start imagining intelligent systems that are governed by principles and ethics, Richard commented, “You really need different viewpoints at the table. You can’t afford these projects that solely deal with the role of cyberinstructure and that punt on these other topics.”

That brings us to another question raised during the talk: How does the advent of big data change the way we do social science and determine what role social scientists will play? There are so many research issues that haven’t even been framed or formed yet. Understanding how to colloborate is also equally important as how we deal with big data. We are early stages of learning what it means to collaborate across big data projects.

On the CI-BER project that Richard and HASTAC are working, the research involves not just big teams of scholarly collaborators, but big teams of neighborhood groups, public libraries, the chamber of commerce, county organizations, regional planning councils and other stakeholders. “These projects are larger than life. When you bring them together, it supercedes the capability of any one individual to do them, so you have to really rethink the entire research process. We cannot simply automate, we cannot just rely on technology.”

Yes, to be technological, we are talking about “researching the cyberinfrastructre implications of supporting large scale content-based indexing of highly heterogeneous digital collections potentially embodying non-unifrom or sparse metadata architectures.” But we are also talking about the nuts and bolts of how people work together. 

When asked by a member of the audience how he manages big collaboration, Richard responded, “Collaboration is essentially a translation and semantics issue. For some, it might make sense to hire a technical broker. Or it might make sense to bring on humanists and philosphers who can help bring people together, to help them position people and ideas. My experience is that this is very humbling and it takes a lot of time. It can take a year, a year and a half before relationships of trust are developed, and people begin to understand other people’s language.”

*** *** ***

Richard leads development of "big data" projects funded by Mellon, NSF, NARA, NHPRC, IMLS, DHS, NIEHS, and UNC. Recent 2012 grants include a JISC Digging into Data award with UC Berkeley and the U. of Liverpool, called "Integrating Data Mining and Data Management Technologies for Scholarly Inquiry," a Mellon / UNC award called "Carolina Digital Humanities Initiative," which involves the translating of big data challenges into curricular opportunities, and an NSF award for CI-BER, a collaborative big heterogenous data integration project between Duke, University of North Carolina-Chapel Hill, and HASTAC. 

He holds a B.S. in Avionics and Electrical Engineering, and an M.S. and Ph.D. in Computer Science, and has worked as a postdoc in Computational Geography. He conducted interdisciplinary research at the San Diego Supercomputer at UC San Diego, working with teams of scholars in sciences, social sciences, and humanities.

Join HASTAC's Collaborative Data group to follow Richard's work with CI-BER, HASTAC's EAGER project, or want to share your own data collaborations. 

(Image of a starling murmuration is courtesy of




The biggest obstacle has been, and remains that many people lack an intuitive literacy of data as it exists in the digital medium. Design can help, but it will not solve the fundamental problem of giving people a real intuitive understanding of how to move from data -> information -> knowldge -> wisdom  Part of the problem here is gaining at least a surface understanding of the technology involved, and what possible outcomes can come from questioning the data. Another part of the problem is helping people to understand how "good data" is produced, and how to tell the difference between "good" data, and data that is questionable or inaccurate. 

And, there are understandings/literacies for how to do this as a participant in a group, too. 


Do you have any thoughts on how we might approach a literacy for understanding big data? Having an intuitive literacy of data would have to assume other literacies, would it not? Richard made the point that there almost needs to be a "translator" who can speak fluently in a variety of languages and across disciplines. But I'm curious if you think there is a base level literacy for "big data" that helps all stakeholders begin to speak the same language.