Blog Post

Introducing SocialMediaMineR: a social media tool for R

Introducing SocialMediaMineR: a social media tool for R

Back in 2013 I analysed the differences between readership in social and legacy media and published the results in this paper. This research required me to write a wrapper for the NY Times and the Guardian APIs. Together with Cornelius Puschmann we release the package GuardianR used to retrieve content from the Guardian Content API. This research also required me to retrieve the number of hits each news article gathered across a number of social networking sites. I wrote a quick and dirty piece of code that met the requirements of the project and moved on.

But last May I attended the EAGER Conference on Big Data at the Duke University PhD Lab on Digital Knowledge and talked to Kevin Franklin about my work at HASTAC. He asked me whether I had checked how HASTAC.org content had fared on social networking sites over time, which brought me back to that very same piece of code. After some time debugging the original script I ended up with a code that does a global search on various social media APIs and returns the number of hits for each URL in each social network.

I tested the code over 15,000 HASTAC.org articles (thanks Demos!) and retrieved the number of hits per posts and per tags/topics. During the original research on the differences between readership in social and legacy media I also queried over 15,000 news articles, and so far the code has proved robust enough to cope gracefully with all of this. You should keep in mind that the entire process is time-consuming and it should take several hours to get it done if you are querying a large database. The package is called SocialMediaMineR and is now on CRAN. This is the package description:

SocialMediaMineR is a social media search and analytic tool that takes one or multiple URL(s) and returns the information about the popularity and reach of the URL(s) on social media. The function get_socialmedia retrieves the number of shares, likes, tweets, pins, and hits on Facebook, Twitter, Pinterest, StumbleUpon, LinkedIn, and Reddit. The package also includes dedicated functions for each social networking site and a function to decode shortened URLs.

The package was release under a GPL (>= 2) license and you can simply install the binary via install.packages(). If R is part of your workflow, you might want to install this package and use it whenever you need to retrieve the number of social media hits for a website or article available online. The core function of the package is get_socialmedia(), but you can use the dedicated functions to query specific social networking sites.

 

> # install and load package
> install.packages("SocialMediaMineR")
trying URL 'http://cran.fhcrc.org/src/contrib/SocialMediaMineR_0.1.tar.gz'
Content type 'application/x-gzip' length 11170 bytes (10 Kb)
opened URL
==================================================
downloaded 10 Kb
* installing *source* package "SocialMediaMineR" ...
** package "SocialMediaMineR" successfully unpacked and MD5 sums checked
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (SocialMediaMineR)
The downloaded source packages are in
        "/tmp/Rtmpe9Qe0J/downloaded_packages"
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
> require(SocialMediaMineR)
Loading required package: SocialMediaMineR
> # get list of URLs
> links <- c(
> # retrieve number of hits on social media
> get_socialmedia(links) -> results
[1] "16.67 %"
[1] "33.33 %"
[1] "50 %"
[1] "66.67 %"
[1] "83.33 %"
[1] "100 %"
[1] "Query execution time: 0.21 minutes"
> str(results)
'data.frame':   12 obs. of  15 variables:
 $ fbk_shares    : num  0 170329 259171 192649 3322 ...
 $ fbk_likes     : num  0 80602 112619 86338 839 ...
 $ fbk_comments  : num  0 38455 88165 56882 419 ...
 $ fbk_total     : num  0 289386 459955 335869 4580 ...
 $ fbk_clicks    : num  0 0 0 5353 0 ...
 $ twt_tweets    : num  21564208 7346786 9319312 2345416 193790 ...
 $ rdt_score     : num  0 1 1 1 0 1 981 NA NA NA ...
 $ rdt_downs     : num  0 0 0 0 0 0 0 NA NA NA ...
 $ rdt_ups       : num  0 1 1 1 0 1 981 NA NA NA ...
 $ rdt_comments  : num  2 0 0 0 0 0 134 NA NA NA ...
 $ lkn_shares    : num  0 0 530 6049 11878 ...
 $ stu_views     : num  254963 71591 22614 285214 44616 ...
 $ pin_counts    : num  745385 40808 4860 12646 5163 ...

 

98

2 comments

Nicely done, Marco! Slick implementation, great documentation, and definitely something that will get used.

99

You might find some use for that up there in Boston. ;)

123