The buzz about analyzing "big data" means someone has to do the analysis. But who? I've started to see job listings for "data scientist" pop up, a search for someone with coding skills who can sift through mountains of large data sets and visualize cool patterns (like our own David Sparks).
HASTAC's CIBER and EAGER projects already have people on board doing collaborative data, but I was curious how "the market" might talk about data scientists since that's where the buzz is largely coming from. The Harvard Business Review page describes data scientists this way:
"...Data scientists make discoveries while swimming in data. It’s their preferred method of navigating the world around them. At ease in the digital realm, they are able to bring structure to large quantities of formless data and make analysis possible. They identify rich data sources, join them with other, potentially incomplete data sources, and clean the resulting set."
"The dominant trait among data scientists is an intense curiosity—a desire to go beneath the surface of a problem, find the questions at its heart. Often they are creative in displaying information visually and making the patterns they find clear and compelling. They communicate in language that all their stakeholders understand—and demonstrate the special skills involved in storytelling with data, whether verbally, visually, or—ideally—both."
The article goes on, mentioning the "associative skills" that make good scientists great, people who can make the creative connections between seemingly disparate things. Scanning further, the article claims data scientists "need close relationships," must have empathy and strong social skills, and will want to be "on the bridge," a reference to Star Trek's Captain Kirk and his reliance on Spock for good data analysis. They want to be in the "thick of a developing situation, "build things," and "make an impact."
This is no ordinary quant we're talking about.
Where would a data scientist go to learn these skills? Once a demand for job talent shows up, courses and programs usually follow. Several universities have launched data science degrees, like the Master of Science in Analytics programs at Northwestern University and North Carolina State University. (Where you can expect the school supply list to say: "Your laptop needs to withstand heavy daily use, both physically and computationally.)
Most of us won't need a 10-month course in data science to work on collaborative projects like CIBER and EAGER, but as Richard Marciano mentioned in his Socializing Big Data talk at Duke University a few weeks ago, "Collaboration is essentially a translation and semantics issue. It can take a year, a year and a half before you develop relationships of trust, and understand other people’s languages."
For that reason, the free, open online Introduction to Data Science class offered by the School of Information Studies at Syracuse University looks like just the right portion size for someone who wants to grok the semantics.
I looked at the course description, and there is the usual emphasis on "big data" as the next new thing, but the class will also focus on the full data lifecycle, beyond simple analytics. The course starts end of February, runs for 4 weeks, and is limited to 500 students. The course description reads:
"As the world's data grow exponentially, organizations across all sectors, including government and not-for-profit, need to understand, manage and use big, complex data sets—known as Big Data...the iSchool’s distinct perspective approaches data science with a view of the full data life cycle, going beyond what most discuss as data analytics.
"Using Dr. Jeffrey Stanton’s eBook, An Introduction to Data Science as a guide, participants will ramp up on the most popular open source data science tool, the R open source statistical analysis and visualization system."
Being involved in collaborative data means more than just mashing together a bunch of people with different backgrounds. It means learning a new discourse, and shoring up street cred in other disciplines so collaborators can speak across languages, discover new ideas, and build things. If you're interested in working in collaborative data, this looks like a good (ie. free and online) place to start digging deeper into the world of data science.
Flickr image courtesy of LaurenManning