cross posted from my blog
Because I both use computational linguistics in my research and run an annual virtual Wikipedia edit-a-thon for Women’s History Month, I greeted the study “ It's a Man’s Wikipedia? Assessing Gender Inequality in an Online Encyclopedia"with great enthusiasm.
The most encouraging aspect of this study is that it highlights how relatively small changes in Wikipedia can go a long way in addressing some gender disparities. However there remain other areas that must be addressed as well. What this study does, and does well, is create a sub-corpus of Wikipedia entries of notable men and women and compares that corpus to three other corpora of notable persons. Based on comparisons of these databases, content about notable women in Wikipedia appears to be relatively well represented and that content is likely to make it to the “first page” of Wikipedia. The limitation here is in the limitation of the corpora, but that is always the case in computational linguistics. Perhaps most significantly this corpus doesn't take in to account other variable of identity for notable persons such as ethnicity or regional location. This does not mean that Wikipedia is rid of its woman problem.
FIGURE 1 Table 1 English Gender-Specific Likelihood Ratios, from Wagner et al
FIGURE 2 Figure 6 Lexical Bias from Wagner et al, Is It a Man's Wikipedia?
First, the authors identified a gendered network within Wikipedia that isolates knowledge about women. Women’s pages are less likely to be linked and the links that do exist tend to lead to other articles about women. In addition, the authors also ran what is called a keyness analysis, which compares words in entries about women and to the words that appear in entries about men. The analysis reveals 150 dangerous words that heavily skew depiction of women towards representations of sex (figure 1) and familial relationship status (figure 2).
These findings suggest some relatively low-barrier contributions to be made to Wikipedia. The good news is that editing content about familial relationships or removing gendered language, or inserting links to pages may be less daunting than authoring a new entry or engaging in contentious debate over extensive editing of entries.
While this study goes a long way in highlighting nuanced aspects of gender in Wikipedia, there are some significant issues that remain outside its scope that point to a continued need to create entries about women that contravene the above findings. For example, some women are notable primarily because of their familial relationships. Recently I edited the entry of Alberta Christine Williams King, Martin Luther King, Jr.’s mother. It is unlikely that without her famous son that her entry would remain on Wikipedia. In addition, groups that relied on maternalist rhetoric in their activism, as Another Mother for Peace did, should include information on maternal status. This points to an important aspect of computational linguistic analysis: close reading is as important as distant reading.
The role of gender in determining notability remains a problematic issue as well if we are looking at a corpus comprised of such entries. Are articles about women more likely at a statistically significant rate to be challenged or deleted on the basis of non-notability?
Finally completely outside the area of this study are some of the most contentious debates within Wikipedia, such as what category pages about women appear in (categorygate) or the gendered dynamics of the highly contentious editing process (gamergate).
Over at MARCH I offer some more insights into how to apply these findings in small edits that may make big differences.