Blog Post

Here's What 44 million Scratch Scripts Look Like

Here's What 44 million Scratch Scripts Look Like

In preparation for our 2 million project mark celebration and as part of my research on remixing, I have been analyzing the use and reuse of components in the Scratch Online Community a website where young people from around the world share and remix their own video games and animations. For example, I have looked into which images and programming blocks are more commonly used. Now I wanted to go one step further. I wanted to know what are the most common programming constructs or scripts created by the young Scratch programmers. So here it is, a word cloud-like representation of the 100 most common scripts.

Click for larger version

Looks Matter

By far, the most common scripts involve some kind of looks manipulation such as hiding/showing a sprite and switching its costumes. This is probably because controlling what is displayed on the screen is useful and necessary for most types of projects, from games to animations. Also, these scripts often come in pairs: for every "hide" I would expect a "show".

The most commonly used script (9.16% of the total) is a two block script that hides a sprite when an event occurs. The names of the events vary widely. But just to give you a an idea of the types of events we're talking about, the most common events that trigger this script are "Game Over" (2.54%) and "start" (2.44%)


1st place, 9.16%

Below you will see a list of the most common scripts that have something to do with looks, as well as their position in the ranking and percentage of the total, both based on their frequency.

2nd place, 4.84%

4th place, 3.28%

6th place, 1.36%

8th place, 0.95%

9th place, 0.73%

11th place, 0.63%

14th place, 0.53%

15th place, 0.47%

17th place, 0.42%

18th place, 0.40%

21st, 0.35%

As you can see from the small percentages, the frequency distribution of scripts appears to be a long tail distribution. This is to be expected given the large number of combinations that are possible. One might expect a similar distribution if we were to look for the most popular phrases in the English language (probably an even longer and flatter tail).

Interacting via the Keyboard

It is nice to see that interactivity ranked highly as well. After all, interactivity is one of the features that distinguishes Scratch projects from, say, videos or pictures. You can see that some of the scripts above involve interactivity. For example, the 11th most common script is probably used in interactive stories that function like slideshows. I say this partly because I have seen this quite often and because the most commonly used keys that trigger this script are "space" (36.55%) and "1" (6.66%).  The use of the "space" key to interact with projects has developed into a cultural norm that participants learn in the Scratch Online Community (possibly influenced by Microsoft PowerPoint as well).

16th place, 0.46%

While slideshows are interactive, we can see even more complex interactivity in the 16th most popular script. This script is often used in games to let a player control a character using the keyboard. The most common arguments used are "right arrow ,  direction 90° and a move 10 steps" (16.21%) followed by the equivalent "left arrow  and direction -90°" (16.81%).

Update - I was asked about script #27. Here is what I found.
Despite not being obviously interactive, the 27th most common script represents a form of interactivity because one of its arguments is a variable changed by pressing the arrow keys. As we can see in this this project (the very first one to use this script), these blocks are typically used to control the horizontal position of background elements on on a scrolling background game

27th, 0.25% (with sample arguments)

27th, 0.25% (with sample arguments)

Background Sound

I was a bit surprised to find a script related to sound ranked so highly. I guess both animations and games often have some sound playing continuously in the background. Looking more closely, I was even more surprised to find that the sounds looped more frequently are not music files imported into Scratch (i.e. commercial songs) but recordings created within Scratch using the microphone. The most common sound name played with this script is "recording1" (3.82%) followed by "one1" (1.08%).

13th place, 0.57%

Signs of Experimentation

You will notice that some of the scripts in the script cloud are single hat blocks. I was debating whether to include them or not. Technically, I considered them to be scripts even if they don't have any other blocks underneath. I decided to include them because it is quite telling how often people drag a hat block and leave it unused. Compared to other languages, Scratch is quite forgiving and lets people do this without any big repercussions. I would like to think these unused hat blocks represent moments of tinkering and experimentation, something that we value a lot in Scratch.

3rd, 4.73%

5th, 1.44%

12th, 0.63%

After talking to some people, I did decide to leave out the script that had the comment block by itself. For several technical reasons it was identified as a script by the analysis I ran and it's in the 10th position in the list (0.68%). The use of the comment block was both surprising and encouraging. Partly because it was added recently so a lot of projects back in 2007 and 2008 did not even have the option of using it..

10th place, 0.68%



For the past few years, I have been collecting a massive database with information about the components of each version of every project uploaded to the Scratch website. Among other things, this database has the human-readable representation of the scripts for each sprite. As you might know, a sprite can have zero or more scripts, so I started by extracting each script associated with every sprite and created a new database table for it. This new table has a record for every script that comes with its the human-readable version, the id of the project, its version number and the id of the sprite, among other fields.  As I did this, I also added a column to store a version of the script without any arguments. This way, scripts with different arguments but with the same block sequence are considered as equals. This new database table of scripts has more than 44 million scripts and close to 1.7 million unique projects out of a total of 1,883,872 projects (90% of projects have scripts). If you are interested in looking at the data, please check out this spreadsheet and let us know if you find any other facts worth mention (or any mistakes!)

Equivalent scripts.  Arguments are ignored.

Equivalent scripts. Arguments are ignored

This analysis was possible thanks to the work of former MIT student Rita Chen and members of the Scratch community including MyRedNeptune and the active Scratch Wiki editors Jonathanpb, Scimonster and BWOG.

[Reposted from the Scratch Team blog]


No comments