Measuring Edwin Drood: Experiments with Literary Data, Gephi

The Mystery of Edwin Drood (1870) is really one of the great mysteries of all time, because Charles Dickens died about halfway through writing it, leaving no notes as to how it was going to end. Since then, readers have created thousands of different adaptations, completions, and solutions, a body of work known as "Droodiana." Droodiana spans fiction, literary criticism, poetry, and drama, and even includes a few mock trials and a modern choose-your-own-ending musical.

Droodiana has traditionally been more the purview of fans rather than serious scholars, but I'd argue that it gives us another way to answer one of the fundamental questions of literary studies—how can we recover what contemporary readers thought about their novels and other reading materials?  Edwin Drood is a unique case-study in the history of how people read. We know more about how different readers over the years responded to the novel than to almost any other single text.  But how do we get at this reader response? Digital humanities methods, I’d argue, can give us more quantitative insight into contemporary reader response. Rather than analyzing the top ten adaptations, I can look at hundreds or, ultimately, thousands of these texts at once.

I've written about the methodology here, but briefly, I've read about a hundred different texts written from 1870-1914 (an arbitrary range to limit the possibilities in this early stage), and I quantified each of these texts across 24 different questions which critics have been interested in over the years. Some of these are evaluative questions: would it have been a good novel? Would it have emphasized plot or character? Others are plot questions: did the missing Edwin Drood really die? Who is Datchery, the mysterious detective character? Which characters were going to get married?

My results are still pretty preliminary, but I've found some interesting things when I started playing around with the data. These initial graphs are made in Excel.

So, we’re never going to know the ending. It’s obvious, but it took people a long time to accept it. Right in the 1870s, at the time of Dickens’s death, it was pretty widely believed that the ending of the novel was predictable (the blue line), but there was also an equally strong belief that it should not be predicted, in honor of Dickens. That's the green line, and you can see that that belief became less and less popular over time. (Everyone loves predicting the ending.)

What’s more interesting is the red line, showing the belief that the ending was completely unpredictable. We see that this was initially an unpopular belief. But as time went on and people couldn’t agree on an ending, gradually people started to believe that it was actually unpredictable.

Now, let’s look at the big question—did Edwin Drood live or die? What I found here was a radical difference in genre.

Non-fiction texts are far more likely to logically reason out that Edwin Drood must be dead. But fictional texts (novels, plays, stories) are far more likely to include a moment when Edwin dramatically reveals himself. You wouldn’t think genre would impact argument this radically. This hints at completely different value systems at work in literary criticism and fiction.

Now, let’s consider differences in times. Earlier writers were divided pretty evenly between arguments that Drood lived and that he died, while in the late decade into the 20th century, the idea that he died became more popular:

This is particularly interesting to me as a literary scholar because there’s a critical story we tell that modernist writers wrote more open, ambiguous texts than Victorians. But these findings suggest that Victorian readers as a mass were actually very comfortable with ambiguous texts, hinting that we might need to reconsider some of our preconceptions of literary history.

This next visualization compares seven fictional or dramatic versions of Edwin Drood using Gephi and Gaze:

All of the thicker lines and closer distances reflect texts that make more choices in common—so, for example, two texts with a thicker connecting edge might both have Edwin surviving. The larger circles make more definite decisions about different points.

Now, if I asked you what the most unconventional adaptation of Drood is, looking at this chart, you’d probably say the 1871 play by Walter Stephens.

But if you read the criticism of Drood, you’re far more likely to pick the Spirit Pen version. The “Spirit Pen” version was allegedly written by Dickens, after his death, through spiritual communication with another writer. The joke critics make is that dying had a very negative effect on Dickens’s grammar. So that’s a weird version, but we see it’s actually central in this chart, making fairly similar story choices as the other texts.

Another outlier version is The Cloven Foot, which was an American comic novel. It imitated and spoofed each chapter, but when Dickens died, the author wrapped it up really quickly, accidentally writing the first solution. It’s often dismissed as being particularly bad and crude, and yet this method allows us to see that the story choices it made were actually pretty widely imitated.

This next visualization is even more preliminary than the rest of this project. I used Don Richard Cox’s Annotated Bibliography to get some results for just a few of my questions just for fictional texts, all the way up to 1998. I used Voyant to test word frequency on these results. Bigger words mean these words appear more frequently in this data:

A few things stand out here. You’ll see a ring of large character names: Crisparkle, Neville, Bazzard, Edwin, Tartar. These are the top candidates for the mysterious detective figure Datchery, and for marriage with one of the novel’s two principle female characters. They’re pretty similar to the top choices before 1914.

The smaller words are mostly about what ultimately happens to Edwin's opium-using uncle John Jasper in these versions, and these really show how resilient the violence associated with his character has been, with words like hanged, leap, stab, throw.

The dates represent years when there were more adaptations. So you see 1914 is a big year, as is 1951. It’s an interesting question—why does this story revive at different periods in our history?

But what’s most striking is the giant words in the middle, which refer to Edwin living or dying. By the 1990s, you’d think we’d have come to some conclusion, but lives and dies are equal in size, and the debate rages on.

A study of Droodiana will never tell us anything about the real ending Dickens intended. But it will give us a window into literary history, provide insight into how readership changed over time, and even let us trace how works influenced and changed each other. I’m excited to continue to explore these topics as I continue my research.

This is very much a work in progress, so let me know if you've got any questions or suggestions!


