Blog Post

Data Use at Khan Academy

Data Use at Khan Academy

Over the past couple months I’ve been looking at big data from a number of different perspectives in my NC State University class, “Big Data and the Rhetoric of Information.” We’ve covered things like the quantified self, data art and visualizations, network analysis, predictive analytics, algorithms, etc. It’s unique in that my classmates and I are fairly “data illiterate.” We come from a variety of majors and, for the most part, have little technical experience. By approaching data from multiple angles, and by having a diverse group of students, our class conversation has been varied and interesting.                                   

Our most recent approach to data has been looking at uses of data in non-profits and socially-minded organizations, which has been a refreshing change of conversation from the “company x using long terms and conditions to take user data and make money off advertising.” I’m not intending to pass any judgement on “company x” – that’s a separate conversation – but I have enjoyed the positive tone of looking at data applications for the public good.

So as an assignment for this course, I looked at Khan Academy, and their use of data for the public good. In case you’re not familiar with Khan Academy, it’s an online non-profit organization that delivers thousands of lessons on dozens of subjects and courses. The now-well-established organization came from almost coincidental beginnings, and now boasts the ambitious mission: “to provide a free, world-class education for anyone, anywhere.” The founder, Salman Khan, gave a widely viewed TED talk that summarized his organization, and showed how Khan Academy can have a positive impact on education around the world. The non-profit is of course benefitting the public, but going even a step further, they are trying to make education a public good. In the US, a quality education is about as close to a truly public good as it is anywhere in the world. That being said, there is always room for improvement, and Khan Academy hopes to deliver a product that can be effective on its own or as a complement to classroom learning.

So what all does Khan Academy have to do with data? Well, if you jump to minute 11 of that TED talk, you’ll see that Khan Academy provides users and teachers with extensive amounts of data that can be used to teach more effectively: to know when a concept is mastered, to know when a student needs more help, to know how students can teach one another, etc. Perhaps the best part of his TED talk is where Sal, a former hedge fund analyst, says that he wants “to really arm the teachers with as much data as possible really data that, in almost any other field, is expected, if you're in finance or marketing or manufacturing.”

This notion of giving teachers the ability to shape their interactions with students based on data is not the whole principle of Khan Academy in the classroom, or Khan Academy outside of the classroom, but it is foundational to their system of teaching. The organization has structured the data they feedback to users to reflect a game, and they even have a YouTube channel dedicated to helping teachers and coaches use the data at their disposal.

So if allowing students and educators to make more data-driven decisions is foundational to Khan Academy’s product, I assumed that data-driven decisions are also foundational to the organizations internal proceedings. What data do they take on their users? Why do they take data on their users? What do they do with that data? Of course, I started to answer these questions with the organization’s privacy policy.

The privacy policy is very open, and pretty much says that they don’t sell your data, or provide access to any organization other than the third parties that help manage their data, e.g. Google Analytics. In regards to how they use the data, they don’t go into too many specifics. They clearly state that user data is in no way sold or used for advertising. They generally say that they analyze user data to improve their “Properties,” which is the term they use to reference all tutorial content. I took away four general points of improvement:

  1. Contextualization – Targeting culture, location, language etc
  2. Quality – Improving the teaching methods of their lessons
  3. Targeting – Recommending content that meets user interests; Improving interface
  4. Feedback - Data for users, teachers, and coaches

Beyond these four general areas, I couldn’t take away anything more from my googling or the privacy policy, so I decided to contact a data scientist at Khan Academy directly, and much to my delight, he responded with useful information.

Here are the takeaways from our brief exchange:

  • Khan Academy can use behavioral data to infer lesson quality. If a high percentage of their users stop watching a video at a particular point, they will know, and can revisit that section of the video to see what might be causing users to pause. Similarly, they can see things such as usage rates of Khan Academy based on the number of questions answered correctly or incorrectly. This sort of behavioral feedback can lead to changes in how Khan Academy lessons explain and motivate.
  • In regards to user interface, graphics, and designs, the organization has a framework that allows them to test variations to their platforms and observe resulting changes in metrics such as time on the website, clicks, learning actions, etc.
  • Developing reflective learners is a key goal that goes alongside the development of quality content and quality learning platforms.
  • Knowing who is using Khan Academy, how they are using Khan Academy, and what impact Khan Academy is having is vital to the organization accurately driving towards their mission.

To bring this back in, Khan Academy likely takes in huge amounts of data per user. Just like a number of other large, online platforms, many of their users don’t know what data is being taken by them. The company admittedly has a clear privacy policy, but I used Khan Academy for years without ever reading it. Furthermore, many of their users are kids (they clearly address privacy and safety measures regarding child users in their privacy policy). Yet despite Khan Academy taking large amounts of data from my use of their site, I am not at all offended. In fact, I would give them more data if I could.

So as we as a society discuss things like how user data should be collected and how user data can be commodified, should we also discuss what user data is being applied to? Why do I feel outrage when I find out that companies are making large profits off personal data, or when I find out that Angry Birds has been spying on my phone, yet I’m happy knowing that Sal Khan is watching me watch a video?

Although I think more of the world needs to have discussions about how we manage the millions of gigabytes of data that are out there, I am always happy when I see data making our world a better place. Khan Academy is one of the best examples of big data helping the public, and I hope they continue their push towards a world where a free, world-class education is truly available for anyone, anywhere.


No comments