Blog Post

Technology: Juxta Commons

One tool that I have started to use for early research is Juxta Commons, which has been referenced by other HASTAC scholars but not discussed much on its own.  JC is a web tool, although there is a software version available for download.


Simply put, Juxta Commons allows you to upload text files from your computer, or from a link, and assign these sources into groups of as many sources as desired.  Then, with rather minimal input from the user, the site will compare the texts and note differences.  Fortunately, one minor change (such as an extra period) will not make the rest of the text sources different despite the words only being a space or two off.  You can specify what differences you want highlighted—e.g., capitalization, punctuation, only word differences—and then merely look for the highlighted words to see what is different between the documents.  If you click on the highlighted section, it will show you what the difference is in the other sources in the group.  Further, the percentage of difference of each source from the "main" source is given.


This has been useful for me in comparing online databases of the same stories to see what kind of differences occur based on what publication was transcribed.  For instance, the word "flashlight" might appear as "torchlight" in a different version. As I am interested in potentially exploring how purportedly "archaic" a particular author's work is, the ability to see which sources use modernized, americanized, or otherwise altered words and phrases so that I can accurately find the closest-to-original version of an author's text.  In general, the ability to compare texts to find differences could be useful in finding errors of HTML transcriptions for stylometric readings.


The only pitfall with this program is that if the main source document lacks a large section present in other sources, then you will not be able to see that an entire section is missing.  For instance, if I have a short story from three sources, and in the main source an entire paragraph is absent, I will not have any way to know that unless I change the source document.  Thus, one must be certain to play with which documents are sources and be aware of this potential oversight.


Overall, Juxta Commons is a useful tool, whether for preliminary analyses of a corpus or for closer examinations of differences between different publications of the same text.


No comments