Big Data & Collaboration Conference MAY 28, 2014

Wednesday, May 28, 2014 - 10:00am to 4:00pm

The National Science Foundation awarded HASTAC the EAGER grant to allow for extensive data mining of HASTAC data. The website includes over 200MB in SQL tables with individual and institutional information of scholars. HASTAC is also an academic social network site and the data allows for various forms of visualization, text, spatial, and content analysis. We are now completing the first year of the grant and we would like to share the preliminary results of the project “Assessing the Impact of Technology-Aided Participation and Mentoring on Transformative Interdisciplinary Research: A Data-Based Study of the Incentives and Success of an Exemplar Academic Network.”

Together with researchers from the U.S. and abroad, we are holding a workshop to discuss the use of computational analysis, data extraction, and social networking analysis to investigate the interplay between scholarly communication and academic networks. This workshop is sponsored by HASTAC, the NSF EAGER grant team, and the Duke University PhD Lab on Digital Knowledge. We are inviting researchers interested in the impact of scholarly networks to cross-disciplinary, multi-institutional research and who are interested in discussing the analysis of big (and sometimes messy) data in academic, collaborative settings. If you are interested, save the date and make sure to register on Eventbrite.

Big (and messy) Data & Collaboration Workshop & Conference

A Workshop Sponsored by HASTAC, the NSF EAGER Grant team, and the Duke University PhD Lab on Digital Knowledge

MAY 28, 2014
10:00am - 4:00pm
Ph.D Lab, Garage, C107
Duke University

Interact and engage online


  • Scholarly social networks: Models and Methods

  • Altmetrics, biometrics, and psychometrics

  • Data analysis and visualization methods and practices

  • Online collaboration

  • Online mentoring

  • Quantitative and qualitative data

  • Using data to guide practices, using best practices to guide data analysis

  • Analyzing/visualizing site data on private parts of online networks (public, private, and “hidden” groups)

  • Orchestrating open source online data analysis and research across institutions

  • Early career mentoring and field building via online networks

  • Data-based peer-assessment systems (i.e. badging) to facilitate early-career, online, and interdisciplinary mentoring

  • Using badges as a way to identify, recognize, and credentialize (in machine-readable data form) “soft skills” currently resistant to “big data” analysis

  • Charting the relationships between altmetrics, social network analysis, psychometrics, and qualitative and quantitative methods for gathering, assessing, understanding, and visualizing data to improve learning, teaching, and research method

  • Analyzing public policy and data analysis--for adults, for corporations, for youth and learning



10:00 - 12:00 pm             Session One (morning session livecast and archived). Moderator: Marco Bastos, HASTAC

12:15 - 01:30 pm             Lunch

01:30 - 04:00 pm             Session Two (afternoon session archived for internal resource). Moderator: Cathy N. Davidson, HASTAC


Session One (10:00 - 12:00 pm): Short papers demonstrating a range of methods relevant to the analysis and visualization of collaborative data, including quantitative methods (network analysis, GIS, statistical modelling) and qualitative approaches to analyzing surveys, interviews, and ethnographies. Presentations are by five early-career scholars, moderated by EAGER Postdoctoral Fellow Marco Bastos (who will also make a brief summary of his work on the HASTAC network this year).  The presentations will be followed by responses from four senior scholars and then Q&A open to the onsite and online audience.


10:00 - 10:15 - Operationalizing different scholarly roles in the humanities though author level metrics

Cornelius Puschmann, Research Associate, Humboldt Institute for Internet and Society / Postdoctoral fellow at Humboldt University of Berlin (remote participation)
In this talk I discuss altmetrics (Priem, Piwowar, & Hemminger, 2012) and their possible applications to knowledge dissemination in humanities scholarship. Based on a brief literature review, I will comment on a) possibilities of profiling different types of humanities researchers based on author level metrics, b) areas where humanities scholarship is likely to provide different granularity than natural science, and c) implications for notions of impact in the humanities.
10:15 - 10:30 - Alternative metrics to measure scholarly performance and their use across disciplines
Katrin Weller, Postdoctoral researcher, Leibniz-Institute for the Social Sciences (remote participation)
The “altmetrics” community is currently discussing new resources that can help to measure the impact of scholarly publications – in addition to classical approaches like the number of scholarly citations. For example, altmetrics based on social media data are expected to reflect a broader public’s perception of science and to provide timely reactions to new scientific findings. In this talk I will sum up recent case studies that have provided first insights into how scholars from different disciplines have been evaluated with different alternative indicators (e.g. physicists or biomedicine scholars on Twitter, philosophy scholars on etc.).
10:30 - 10:45 - Down the Pipes: On Data Analysis for Digital Humanists
Will Shaw, Digital Humanities Technology Consultant, Duke University
Managing, transforming, and analyzing digital humanities data can be difficult, especially to new practitioners. In this brief talk, I argue that aspiring digital humanists can quickly learn the fundamentals of (and best practices related to) data analysis and management not by understanding specific tools but by internalizing basic parts of the "Unix philosophy" -- i.e., the principles of design and functionality that inform the Unix operating system and its descendants.
10:45 - 11:00 - Coauthorship and email networks as proxies for collaboration
Angela Zoss, Data Visualization Coordinator, Duke University
Collaboration in scholarly communities can be very difficult to quantify, but scholars in fields like Informetrics and Scientometrics often use proxies like coauthorship and interpersonal communication to attempt to analyze patterns in various disciplines and to correlate activities to desirable outcomes. In this presentation I will focus on the methods and results of two research projects that use publication metadata and email listserv logs to identify and characterize trends in academic collaboration. These projects include various techniques in network analysis, computer-mediated discourse analysis, and data visualization.
11:00 - 11:15 - The HASTAC Network, Modeled and Visualized
Marco Bastos, NSF EAGER Postdoctoral Fellow, HASTAC
In this talk I present the preliminary results of the first year of the EAGER project "Assessing the Impact of Technology-Aided Participation and Mentoring on Transformative Interdisciplinary Research: A Data-Based Study of the Incentives and Success of an Exemplar Academic Network." We show a number of visualizations of the HASTAC Network and report early results from the HASTAC Scholars survey carried out earlier this year.
11:15 - 11:30:  Responses
Cathy N. Davidson, HASTAC
Kevin Franklin, University of Illinois
Richard Marciano, University of North Carolina
Alex Yahja, University of Illinois at Urbana-Champaign
Lynne Steuerle Schofield, Psychometrics and Statistics, Swarthmore College
Special Guest: Alan Blatecky, RTI International Visiting Fellow and  former director of the Office of Cyberinfrastructure at the National Science Foundation
11:30 - 12:00   Onsite and online Q and A


12:15 - 1:30 pm Lunch


Session Two (01:30 - 4:00 pm): Round table session with the invited guests dedicated to discussing and brainstorming future collaborative, cross-institutional grants on onsite and online interdisciplinary collaboration. Onsite participants are welcome to attend.  We will not be webcasting this session.

Focus is on mentoring, using data for best practices and developing best practices for data analysis, and using virtual social networks for early career mentoring in STEM research fields and beyond.  Data-based, formative assessment (in particular, badging) is also a key part of the discussion, bringing together data analysis methods with early career mentoring, creating professional collaborative research pathways, and fostering online mentoring via social networks.

  1. Next step for EAGER: designing a collaborative grant across institutions focusing on online, interdisciplinary, and early-career mentoring.   How can data help us to create more interdisciplinary collaboration? How can an interdisciplinary and multi-institutional team help model ideal practices and methods for the collection and analysis of big data, including for youth?   Concerns such as massive data source(s) identification, integration, analysis and interaction/communication with the learner(s), in compliance with site-specific Institutional Review Board regulations, is a challenge to future STEM, STEAM, social science, and digital humanities research.  How can an interdisciplinary and multi-institutional team model solutions?
  2. How can social networks and mentoring help support junior scholars?  How can we use big (and messy) data to facilitate collaboration and to help us design better connection, interaction, and collaboration across and beyond the STEM community?  How can we find the best ways of assessing collaboration, from research productivity to the array of “soft skills” invaluable to contemporary online communication and to the contemporary workplace but, until now, not amenable to current metrics.
  3. Additional:    HASTAC’s invitation to be involved in the National Data Service, an initiative spearheaded by the National Center for Supercomputing Applications, June 12-13, in Boulder, CO,  “The National Data Service is an emerging vision of how scientists and researchers across all disciplines can find, reuse, and publish data. It is an international federation of data providers, data aggregators, community-specific federations, publishers, and cyberinfrastructure providers. It builds on the data archiving and sharing efforts under way within specific communities and links them together with a common set of tools.”  

Brainstorming and roundtable discussion, invited participants:   

Cathy N. Davidson, HASTAC and Graduate Center CUNY Futures Initiatve

Kevin Franklin, University of Illinois

Matthew Gold, Graduate Center CUNY (remote participation)

Richard Marciano, University of North Carolina

Alex Yahja, University of Illinois at Urbana-Champaign

Lynn Moore, Mozilla Foundation

Lynne Steuerle Schofield, Psychometrics and Statistics, Swarthmore College


Fiona Barnett, Director, HASTAC Scholars

Jade Davis, HASTAC, HASTAC/MacArthur Foundation Digital Media and Learning Competition, and University of North Carolina

Sheryl Grant, Duke and University of North Carolina, HASTAC/MacArthurFoundation Digital Media and Learning Competition


Graduate Fellows Joining from Graduate Center CUNY:  

Danica Savonick, Graduate Center, CUNY

Karl Westerling, Graduate Center, CUNY

Lisa Tagliaferri, Graduate Center, CUNY


Complete list of participants

EAGER Grant: “Assessing the Impact of Technology-Aided Participation and Mentoring on Transformative Interdisciplinary Research: A Data Based Study of the Incentives and Success of an Exemplar Academic Network”


Leadership team: Cathy N. Davidson (Duke University and the Graduate Center, CUNY)

Marco Toledo Bastos (HASTAC EAGER NSF Postdoctoral Fellow)

Facilitation: Kaysi Holman, Interim Program Coordinator, HASTAC, and Liaison to the PhD Lab in Digital Knowledge, Duke University

Demos Oprhanides, Webmaster & Online Community Strategist, HASTAC and the HASTAC/MacArthur Foundation Digital Media and Learning Competition



Boyd D and Crawford K. (2012) Critical Questions for Big Data. Information, Communication & Society 15: 662-679.

Data and Society Research Institute. (2014) The Social, Cultural & Ethical Dimensions of “Big Data”. New York: Data and Society Research Institute.

Gray J, Szalay AS, Thakar AR, et al. (2002) Online scientific data curation, publication, and archiving. Astronomical Telescopes and Instrumentation. International Society for Optics and Photonics, 103-107.

Haustein S, Bowman TD, Holmberg K, et al. (2014) Astrophysicists on Twitter: An in-depth analysis of tweeting and scientific publication behavior. Aslib Journal of Information Management 66: 4-4.

Haustein S, Peters I, Sugimoto CR, et al. (2013) Tweeting biomedicine: An analysis of tweets and citations in the biomedical literature. Journal of the Association for Information Science and Technology.

Koh A. (2012) More Hack, Less Yack?: Modularity, Theory and Habitus in the Digital Humanities. Weblog.

Lynch C. (2008) Big data: How do your data grow? Nature 455: 28-29.

Marcus G and Davis E. (2014) Eight (No, Nine!) Problems with Big Data. The New York Times. New York.

McPherson T. (2012) Why Are the Digital Humanities So White? or Thinking the Histories of Race and Computation. Debates in the Digital Humanities: 139-160.

Peters MA, Besley T and Araya D. (2013) The new development paradigm: education, knowledge economy and digital futures, London: Routledge.

Priem J, Piwowar HA and Hemminger BM. (2012) Altmetrics in the wild: Using social media to explore scholarly impact. arXiv preprint:1203.4745.

Ramsay S. (2012) Programming with Humanists: Reflections on Raising an Army of Hacker-Scholars in the Digital Humanities. In: Hirsch BD (ed) Digital Humanities Pedagogy: Practices, Principles, and Politics. Open Book Publishers.

Seelig T. (2011) Divergent Thinking. Stanford Technology Ventures Program.

Shirky C. (2005) Collaboration Networks. In: collaboration Iv (ed). TED.

Sparks D and Bastos MT. (2012-14) EAGER Collection Blog Posts. EAGER. Durham: HASTAC.

Stevens H. (2014) Why Big Data Requires the Social Sciences. SmartData Collective.

Thelwall M, Haustein S, Larivière V, et al. (2013) Do altmetrics work? Twitter and ten other social web services. PLoS ONE 8: e64841.

Thelwall M and Kousha K. (2013) Academia. edu: Social Network or Academic Network? Journal of the Association for Information Science and Technology.

Tufekci Z. (2014) Big Questions for Social Media Big Data: Representativeness, Validity and Other Methodological Pitfalls. 8th International AAAI Conference on Weblogs and Social Media (ICWSM14).

White House Review of Big Data and Privacy. (2014) Big Data: Seizing Opportunities, Preserving Values. Washington, DC: White House.



Looking so forward to this workshop!


Wait - Jenny, are you coming to Durham for this?!?! If so - looking forward to saying hi in person!!


Looking forward to seeing you too :-)