Blog Post

A Snapshot of the HASTAC Network

A Snapshot of the HASTAC Network

I'm very excited to take part in the EAGER grant and work with the HASTAC data. David Sparks has done some impressive stuff in the last year and I hope I can keep up the good work. The HASTAC team has been incredibly helpful and it's a pleasure to be a part of this community. On my first post I wanted to take a look at the network of blog posts and comments. We graphed blog posts that have comments and labeled users according to their role in the HASTAC community, so nodes are tagged as DML Winner, Scholars, Staff, Steering Committee, Superusers, Users, etc. Overall we can say HASTAC users are pretty busy writing posts and providing feedback.

In the past seven years there were a total of 1,836 non-unique bloggers and 7,239 commenters in the 7,805 commented posts. This is the subset of commented posts, and the total number of posts is actually much larger at 10,526. This means that every author of a blog post that received at least one comment gets on average four comments per post (that’s some serious feedback). This graph shows the directed network of posts and comments with nodes sized by outdegree (number of comments to blog posts) and edges colored according to the node that posted the comment. Node colors represent groups of users with high levels of interaction. CPM clustering indicates that the network is divided into three main groups.

Two of these groups are very clearly defined in the graph. The third group is harder to visualize because it includes a number of broker nodes located in the center right of the plot. The actual location of nodes is not very important in this plot. We used the Force Atlas layout, so hubs tend to be pushed towards the periphery. However, the proximity between nodes does indicate a higher number of comments from user B to the blog post of user A. Round edges are self-edges or self-loops (i.e. an edge linking a node with itself), which represent users that posted comments on their own blog posts.

The nearly two thousand posts that included comments were authored by 531 unique users, and the seven thousand comments were posted by 1,401 different users. The 7,239 comments and the 7,805 users give a rough feeling for the size of the community. The 1,836 blog authors wrote a total of 1,845 posts, which is another measure of the network diversity. Our preliminary exploration suggests that the network follows a power-law distribution with a large group of new users commenting on posts of a smaller, established group of highly-active users that produce the content. The ratio is of one blog author for every three blog commenters.

It's difficult to grasp just how much diversity there is across blog posts and comments. We compared the word frequency in posts and comments and found that comments include on average 1,427 characters and posts are four times this length (6,358 characters on average). Most common words in blog posts and comments are (not surprisingly) "digital media," "learning," "media," "students," and "work." Highly-frequent words in comments also include "people," think," and "way," while highly-frequent words in posts not found in comments are "university" and "research."

The network also shows how HASTAC users are grouped around interests and ideas that can roughly be split in two major periods. The data we analyzed include seven years of direct connections defined as comments to blog posts. Although HASTAC was founded in 2002, the website was redesigned to accept comments to blog posts only in 2006, so the graph includes posts and comments from August 2006 to August 2013. The following chart shows the distribution of blog posts and comments over the seven-year period. The density line on the histograms also seems to suggest a linear relationship between posts and comments.

In fact, 2006 is the first milestone of the network. Blog posts start to be consistently commented, and in 2008, when the HASTAC Scholars Program took off, the network goes through a considerable boom with nodes being attached to the network at a much faster rate and with greater frequency. The second period starts on 2009, when the HASTAC Scholars Program has matured. The network becomes much denser, and a few users bridge most of the network. By 2010 the average node-to-node distance is just over 3.6 hops. That means users can reach any other user in the network within less than four connections. This video shows the chronological evolution of the network.

Overall the graph shows strong characteristics of a small-world network, with clearly defined groups and subgroups. These are groups of users that contain a subnetwork by including connections to almost any two nodes within the subnet. And although groups are a defining property of a highly clustered network, we found that the average clustering coefficient (the mean value of individual coefficients) is not particularly high at 0.084. This is particularly interesting and we will continue to explore the clustering of HASTAC network in future posts.


This material is based upon work supported by the National Science Foundation under Grant Number 1243622. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.




Very nice work, especially the video, which very clearly shows the punctuated equilibrium in the growth of the network. Were there any obvious substantive labels you could attach to the two or three major clusters in the graph?


The clusters are seemingly gravitating towards the main structures of HASTAC. That is, the HASTAC core staff, the HASTAC Scholars, and the DML competition participants. The first two are highly clustered and the third is loosely couple thru a bunch of broker users. There's also a significant number of nodes not attached to the larger cliques, and that's something I'd like to take a closer look at.