Big Data, Big Issues: Thoughts on How to Make Data Analysis More Meaningful

Here are my preliminary welcome remarks that set some of the big themes of our livestreamed HASTAC Workshop on May 28, "Can Analysis of Big (and Sometimes Messy) Data Facilitate Collaboration? Methods, Models, Tools, Best Practices, and Next Steps in Open, Multi-Institutional, Interdisciplinary Mentoring and Collaboration Online and Onsite."


“Thank you all for being here today and for participating in a workshop that we hope will pose new questions and propose new methods for data mining, analysis, visualization, assessment, and interpretation. Our premise for this workshop is that interdisciplinary collaboration and mentoring require us to design better metrics that are interoperable across fields. We also need metrics that account for so-called “soft” as well as “hard” skills.  Through analysis of the kind of (messy) data central to the humanities and interpretive social sciences, we seek to develop more complete ways of understanding other forms of data key to individuals and as well as to relationships within and among complex networks, consortia, collaboratories, institutions, associations, disciplines, interdisciplines, archives, and other formal or informal organizational structures.  Complex human and institutional factors are key to successful mentoring and collaboration in all fields, including in STEM research networks and across STEM fields.  By developing methods (“altmetrics”) for analyzing data that is ambiguous, incommensurate, asymmetrical, intangible, fragile, or incomplete, we enrich current practices of assessment (expanding credentialing beyond grading or test scores); contribute to more robust peer evaluation (beyond citation counts); and establish methods for determining credibility in open source peer-to-peer publications (outside the system of refereed journal publication). Finally, in light of several new studies of the ways race and gender influence evaluation, we need to understand the metrics of prejudice, even discrimination, that skew the input, output, and interpretation of seemingly objective data.”  

--Cathy N. Davidson, Co-Director, PhD Lab at Duke University and Director, Futures Initiative, Graduate Center, CUNY;  HASTAC CoFounder, and PI, “Assessing the Impact of Technology-Aided Participation and Mentoring on Transformative Interdisciplinary Research: A Data-Based Study of the Incentives and Success of an Exemplar Academic Network” (NSF #1243622).


9:30-10:00 Arrival & Coffee

10:00 - 12:00 pm     

Session One, Moderator: Marco Bastos, EAGER Postdoctoral Fellow.

Short papers demonstrating a range of methods relevant to the analysis and visualization of collaborative data, including quantitative methods (network analysis, GIS, statistical modelling) and qualitative approaches to analyzing surveys, interviews, and ethnographies. Presentations are by five early-career scholars, moderated by EAGER Postdoctoral Fellow Marco Bastos (who will also make a brief summary of his work on the HASTAC network this year).  The presentations will be followed by discussion by four senior scholars and then Q&A open to the onsite and online audience.


10:00 - 10:15 - Operationalizing different scholarly roles in the humanities though author level metrics

Cornelius Puschmann (virtual), Research Associate, Humboldt Institute for Internet and Society / Postdoctoral fellow at Humboldt University of Berlin (remote participation)

In this talk I discuss altmetrics (Priem, Piwowar, & Hemminger, 2012) and their possible applications to knowledge dissemination in humanities scholarship. Based on a brief literature review, I will comment on a) possibilities of profiling different types of humanities researchers based on author level metrics, b) areas where humanities scholarship is likely to provide different granularity than natural science, and c) implications for notions of impact in the humanities.


10:15 - 10:30 - Alternative metrics to measure scholarly performance and their use across disciplines

Katrin Weller (virtual) Postdoctoral researcher, Leibniz-Institute for the Social Sciences (remote participation)

The “altmetrics” community is currently discussing new resources that can help to measure the impact of scholarly publications – in addition to classical approaches like the number of scholarly citations. For example, altmetrics based on social media data are expected to reflect a broader public’s perception of science and to provide timely reactions to new scientific findings. In this talk I will sum up recent case studies that have provided first insights into how scholars from different disciplines have been evaluated with different alternative indicators (e.g. physicists or biomedicine scholars on Twitter, philosophy scholars on etc.).


10:30 - 10:45 - Down the Pipes: On Data Analysis for Digital Humanists

Will Shaw, Digital Humanities Technology Consultant, Duke University

Managing, transforming, and analyzing digital humanities data can be difficult, especially to new practitioners. In this brief talk, I argue that aspiring digital humanists can quickly learn the fundamentals of (and best practices related to) data analysis and management not by understanding specific tools but by internalizing basic parts of the "Unix philosophy" -- i.e., the principles of design and functionality that inform the Unix operating system and its descendants.


10:45 - 11:00 - Coauthorship and Email Networks as Proxies for Collaboration

Angela Zoss, Data Visualization Coordinator, Duke University

Collaboration in scholarly communities can be very difficult to quantify, but scholars in fields like Informetrics and Scientometrics often use proxies like coauthorship and interpersonal communication to attempt to analyze patterns in various disciplines and to correlate activities to desirable outcomes. In this presentation I will focus on the methods and results of two research projects that use publication metadata and email listserv logs to identify and characterize trends in academic collaboration. These projects include various techniques in network analysis, computer-mediated discourse analysis, and data visualization.


11:00 - 11:15 - The HASTAC Network, Modeled and Visualized

Marco Bastos, NSF EAGER Postdoctoral Fellow, HASTAC

In this talk I present the preliminary results of the first year of the EAGER project "Assessing the Impact of Technology-Aided Participation and Mentoring on Transformative Interdisciplinary Research: A Data-Based Study of the Incentives and Success of an Exemplar Academic Network." We show a number of visualizations of the HASTAC Network and report early results from the HASTAC Scholars survey carried out earlier this year.


11:15 - 11:30:  Discussion:  Talking Points

Cathy N. Davidson, HASTAC, Duke University and the Graduate Center, CUNY

Kevin Franklin, University of Illinois at Urbana-Champaign

Richard Marciano, University of North Carolina

Alex Yahja, University of Illinois at Urbana-Champaign

Lynne Steuerle Schofield, Psychometrics and Statistics, Swarthmore College

Special Guest: Alan Blatecky, RTI International Visiting Fellow and  former director of the Office of Cyberinfrastructure at the National Science Foundation


11:30 - 12:00   Onsite and online Q and A

12:15 - 01:30 pm    Lunch (provided)

01:30 - 04:00 pm       

Session Two: Roundtable session with the invited guests dedicated to discussing and brainstorming future collaborative, cross-institutional grants on onsite and online interdisciplinary collaboration. Onsiteparticipants are welcome to attend.  We will not be webcasting this session.

Brainstorming and roundtable discussion, invited participants:   

Cathy N. Davidson, HASTAC, Duke, and Graduate Center CUNY Futures Initiative

Kevin Franklin, University of Illinois

Matthew Gold (remote participant), Graduate Center CUNY

Richard Marciano, University of North Carolina

Alex Yahja, University of Illinois at Urbana-Champaign

Lynn Moore, Mozilla Foundation

Lynne Steuerle Schofield, Psychometrics and Statistics, Swarthmore College



These are the kinds of issues--the kind of data--to which we must attend even before we starting thinking about the results.  

