Blog Post

A networked HASTAC 2013

A networked HASTAC 2013

My first HASTAC conference was a very interesting one. I had the opportunity to present some of the work I have been blogging about, but more importantly for me, I got the chance to see how people were talking about data. In particular, it was illuminating to attend Session 30, on text mining and linked data.

The perspective of each of the various panelists was somewhat different from my own, as a data analyst and "social scientist." Namely, each paper was concerned with issues of access to data and well-thought-out database structures, and in each case made strong points about the problems with limited access to text and bibliographic information, and the difficulties of fitting the existing RDF model to atypical cases.

What stood out to me, both in Session 30 and throughout the conference, was the focus on specific cases, outliers, and unique observations. Much thought is being put into database structures that accomodate a heterogeneous set of cases, as well as into data visualization that permits close inspection of individual observations.

I would argue that this is different, in important ways, from the perspective with which I approach data: my interest is in understanding through summary measures, and in testing the existence of relationships between variables (a concern with variance and covariance, respectively). As such, specific cases, and outliers in particular, do not typically merit my attention, and in fact, sometimes make analysis difficult. Though I think that statistical inference and estimation, which typically rely on larger samples, are important, I am glad to see such a commitment among the HASTAC community to an approach that priviledges the individual case.

 

I want to conclude by sharing a visualization of the #HASTAC2013 hashtag network on Twitter. Specifically, this visualization finds all Twitter accounts who used the #HASTAC2013 hashtag, and plots the Following relations between each of them. The NodeXL extention to Microsoft Excel makes it easy to gather this data, apply the Wakita-Tsurumi clustering algorithm, and layout the entire graph according to cluster membership.

Not surprisingly, according to PageRank (or eigenvector centrality), the most central accounts are HASTAC, Cathy Davidson, and Fiona Barnett (HASTAC Scholars). These are followed by Bethany Nowviskie, Ryan Hunt, the HASTAC 2013 account, Lee Skallerup, Adeline Koh, Jentery Sayers, and Melonie Fullick. More interesting to me, however, are the various emergent clusters, which I have only just begun to explore. I would be interested in hearing from those of who who have been part of the community for longer than I, to help in identifying the clusters of users within this Following network.

 

This material is based upon work supported by the National Science Foundation under Grant Number 1243622. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

 

76

2 comments

Hi David,

This is an interesting post!  The graph, however, is admittedly difficult to navigate. Is there a way to simplify it or is that the nature of the visualization?

Also, thanks for sharing the NodeXL program with us. I'm looking forward to playing around with it.

Lori Beth 

74

Thanks! Unfortunately, I find it difficult to simplify network graphs like this, especially if you have an interest in making each node identifiable, as in the large version of the image.

NodeXL actually does allow one to simplify somewhat, though -- as I have done here, it is easy to cluster the graph into boxes based on any arbitrary partition. Also, a recent release of NodeXL introduces "motif simplification," which drastically simplifies complex graphs into a set of glyphs, although at the cost of individual node identifiability (see this video).

100