Blog Post

Five Computer Skills for the Aspiring Digital Humanist

I'm a second-year student in the Univeristy of Michigan's School of Information master's program (M.S.I.), and I've tailored my own specialization around the digital humanities. Im interested in using the Web to promote self-motivated learning and the preservation of and access to humanities artifacts (especially literature). My main focus is on digital texts, and this term I'll be working on a master's thesis exploring student digital text use. I am also continuing to develop my online digital text of James Joyces novel Ulysses. I'm sure I'll be writing about both my thesis and my digital text, but for my first post I'd like to discuss a broader topic: the skills that contribute to a good digital humanist.

As an IMLS Digital Humanities Intern at the Maryland Institute for Technology in the Humanities this summer, I talked to a variety of digital humanists about the technical abilities they felt students intending to work in the field should know. While the digital humanities (DH) encompasses so many possible jobs -- preservation, librarianship, teaching -- I focused on the skills required to develop DH sites and software. Because I am already skilled in some areas of web programming and design, some basic techniques (e.g. CSS) are not included on my list; really, the list is best geared toward people who are already interest in HCI and the web, and want to focus their skillset on the DH world.

The list follows -- please feel free to suggest additional abilities!

Five (of Many) Computer Skills for the Aspiring Digital Humanist
1. Know how to work with databases, specifically SQL. Since a non-web-designer programmer might be doing most of the heavy-duty database work at a DH center, you need to be able to understand and design databases rather than able to do a lot of coding with them.

2. Work with TEI and XML. Reading the Text Encoding Initiative's TEI-Lite documentation is a great place to get started (http://www.tei-c.org/Guidelines/Customization/Lite/).

3. Learn Javascript/AJAX, PHP, and other web customization scripts. Know the differences between programming languages as well as libraries like MooTools -- and know which are right for which job.

4. Python is repeatedly mentioned as a great tool for text analysis. If you don't know anything except HTML, Python is a great language with which to begin. Simple, clean, works with Google App Engine, and lets you quickly analyze a piece of text.

5. Content Management Systems: know how they work and how to customize a site built on one. Drupal and Joomla! are both frequently used; Drupal seems to have better documentation. Wordpress kind of falls into this category as well.

For all these skills, the importance of having links to functioning examples to give potential employers can't be overemphasized.

141

24 comments

I'm curious, how do you see databases being used most often in Humanities work? On first thought, it might not be intuitive that databases have anything to do with Shakespeare, for instance.

84

Lev Manovich (in The Language of New Media) posited that every "text," in the widest sense of the word, is really made up of a database (the content) and an interface (the way you read or otherwise access the content). From that point of view, a requirement of database skills is totally intuitive.

That said, my question about this list is: where are the "humanities" in your digital humanities? Programming's great, but I could see collapsing all of this into two items: programming and database management. You can argue that these skills are necessary (though, in spite of my invocation of Manovich, I'm not sure I agree), but I would say that they're certainly not sufficient to create or define a "digital humanist."

I realize that this is a list of "computer skills," but I'd still like to let the question stand.

93

Point well taken! However, I bet a lot of information scientists would place the raw text of a book into the category of "unstructured information," along with, say, the words on the websites that the Google spider crawls all day. Databases are usually thought of as "structured." That is, in a book, there is no metadata that says the 19th word of the 7th page of the 2nd chapter has some kind of relationship to the 97th word of the page 435. In a database, you create such relationships.

137

Databases can be useful when you're dealing with digital texts encoded with TEI (more information on the use of TEi with databases is available on the TEi website: http://www.tei-c.org/About/Archive_new/ETE/Preview/rahtz.xml#body.1_div.4). I haven't worked with databases enough yet to give you a fully informed answer, but I would imagine a database contains relationships based on the tagging of different "types" of words in a text (e.g. adverbs, character names, etc.), and the ability to run queries on the DB aids text analysis. I believe the people at MITH were working with databases in their design of the Shakespeare Quartos Archive, for example (http://mith.info/quartos/about.php).

99

This list was not intended to define a model digital humanist, but to list technical study suggestions for individuals interested in working in the web programming part of the field (thus the title of "five web skills", not just skills in general). But I agree that any one item on the list isn't a "necessary" skill to being a good or standard digital humanist; rather, the list contains abilities that digital humanists have mentioned to me as either using themselves or liking in potential hires.

I come from a humanities background and have a strong understanding of what interests me in the disciplines I'd like to advance with technology, so I tend to focus more on the technical skills I need to develop. My mental image of a digital humanist is of a humanist who is using technology to advance his learning and teaching -- the subject matter as preexisting, the technical skills as added. However, I'm also aware that many come to the digital humanities from different backgrounds, sometimes even the exact oppostite of how I expect -- tech savvy people getting interested in the humanities as content.

I expect that many of the "humanities" (not "digital") skills will not extend far beyond those required for traditional scholarship in general (e.g. writing coherent arguments, a spirit of scientific inquiry) -- but maybe I've been in the humanities too long to see this clearly? I'd be interested to see someone write about the "humanities" side of skills, especially from the perspective of someone who didn't always work with the humanities.

91

I'm always badgering humanists to learn more digital skills. But I don't know any humanists who know all five of these.

Come to think of it, I don't know any humans who do either.

112

If you had called it "research and publication skills" senor literaturegeek, perhaps readers would have caught on.

Perhaps not, since you are part of a rigorous evidence-based research trend across academia, particularly at places where good research is valued. (Ohh ouch.)

I attended a workshop called essential programming skills for scientists several years ago that presented Python as a time-saving and easy to acquire skillset. The department heads that attended were in general agreement that it was a very good suggestion. Using python, you can cut twenty percent of your delevopment time compared to C. The slower processing is more than made up for by the time saved. (See Software Carpentry website for free curriculum.)

Year before last, I attended another workshop that featured (virtual observatories) the sociology data curator from Harvard who spoke on the necessity for using high throughput computing for vast datasets of video. He joked a bit about being a humanities guy in a roomful of astronomers and biologists, saying that humanities researchers use some pretty big datasets. His example of the video work was impressive.

We may all be using these tools for searches on a daily basis in the near future. Relyig on the human brain's database recognition system may work for people like Superman, but most of us need help with a thousand terrabytes.

The key is not tools, but getting the data into a usable form and building usable ontologies between data collections of anything you can think of. Try thinking about an odd match like a period of literature and climate records. That's pretty ordinary come to think of it.

83

I definitely agree with your assessment of the need for technological tools: our datasets have just become too large to work with manually. The Digital Humanities 2009 Conference had a bunch of excellent presentations on large literary corpi that couldn't have been done without computers -- for example, an evaluation of sentence length and wordiness over the course of Henry James' publications pointing to a possible move on the author's part from handwriting to dictation. Cool and useful stuff that we just can't do with index cards and (even mechanical) pencils.

86

I worked as a web developer / history research assistant for online projects at the Center for History and New Media (http://chnm.gmu.edu) for a few years and have some experience training people in these skills. This conversation about databases, programming, and humanities skills brings up several important issues and I think it's worth re-posting for further comments (I found this post somewhat obscurely since it's no longer on the front page of the site). I would also like to see what other people think are necessary skills for digital humanists. For example, I didn't see anything on this list about, say, learning how to install your own blog, wiki, or creating a youtube channel and uploading videos to it... but maybe those are trivial depending on the field or level of base knowledge assumed. In fact, are the skills on this list really necessary if you want to be a digital humanist?

I think the first thing worth discussing is the extent to which a digital humanist--or, as this list implies, someone who does both humanities and web development--needs to actually "know" all of these skills. I myself am a bit odd, having started in web development and moved into the humanites... but I've watched enough people come from the other side, both at work and at workshops held by CHNM (the latest installation of which is: http://thatcamp.org/). In my opinion, what's important for people to know is not *how* to program in PHP/MySQL, but how and in what ways databases can be and are structured. Every field in the humanities could benefit from knowing this -- and not just because you want to collect datasets but because a lot of web 2.0 is built on the backend of PHP/MySQL.

The second point I have is built on this first point. Web 2.0 has given us many, many tools to do a lot of what we humanities people need to do without actually knowing the backend of it (example: whole websites and communities built off of a single wordpress installation or a single mediawiki--all somebody has to do is follow the instructions to install it in 5-10 minutes]. And if people want to know how a database works, why not install phpMyAdmin for viewing your MySQL databases, on something like NavCat on their desktop so they can look at data without actually knowing any complicated commands in MySQL? Teaching people how to set up some of this on their own would be time well-spent, and it's a lot faster than trying to learn how to program while solving larger issues in the digital humanities.

I do advocate people learning the basics of how things work, but I also think most people's time is better spent on content development and/or using tools available in new ways, or adapting them to larger humanities issues. For example, personally, I'd like to see more efforts by historians in using the web to build communities of scholars and students who share research in innovative and creative ways, across continents and generations, rather than having these historians know how to program a database. As a digital humanist/historian myself, I would much rather spend my time learning how to actually apply these tools to solve a specific overall problem in my field (for example, how do different groups of people share the same "memory" or create different ones for the same subject? how can web 2.0 create networks and how should we set them up?). Knowing how to use the available tools and in what ways I could manipulate them would be more beneficial than trying to learn how to program a custom solution using PHP/MySQL or python.  [stepping down off soapbox now...]

 

101

I definitely agree that, for most DHers at least (though less the web workers this list was intended for), a theoretical knowledge of things like SQL is all that is necessary; DHers should be comfortable talking with the programmers supporting their work.

It sounds like there's a divide in the DH world; people wnting to focus on content but use technology to express that, and people with a love for a specific content area, but a desire to focus on the tools to advance that content. Perhaps I should amend post to have different lists for each group, though I imagine the list for content-focused folks would need to allow for far more variance in the acceptable skill set.

>are the skills on this list really necessary if you want to be a digital humanist?

I can't think of any one skill I'd argue a digital humanist must have (well, outside assumed abilities of reasoning and written/oral argument), and there are certainly different skills a DHer might want to focus on (3D modeling for example, especially if you're interested in history or archaeology). This list is just an attempt to summarize the skills that someone looking to hire a web designer/programmer for DH work will probably want to see. I'd be happy to hear of any additional skills you think aspiring DH workers should try to learn.

172

> personally, I'd like to see more efforts by historians in using the web to build communities of scholars and students who share research in innovative and creative ways, across continents and generations, rather than having these historians know how to program a database.

Have you seen ThoughtMesh, a project I believe has been mentioned at HASTAC before? It auto-generates tags to relate your essay to others across the Web.

You may also be interested in AcaWiki, a service that offers summaries of scholarly journal articles, which just launched.

141

Thanks for the links. I have seen ThoughtMesh before, but just now the search keeps crashing on me. I don't remember the purpose of ThoughtMesh, but it's most likely I just didn't find it useful in my specific discipline (history).

AcaWiki looks quite promising, though limited in scope. But it doesi seem very appealing and now I just need to figure out how I could market this to the history (or other) humanities community... For example, this could be very useful for history graduate seminars and the inevitable "reading responses" we are always forced to write.

96

True, so far ThoughtMesh's contributors have been biased toward certain subjects (including anthropology and literature, for some reason). That said, I just learned that the organizers of a bi-annual international history conference are planning to mesh their papers soon.

If you are at UCSC you may want to check out the Art of Collaboration conference from 22-23 October. I'll be presenting there and would be happy to demo some of these tools for sharing creativity and scholarship.

(I assume it was the HASTAC search that crashed on you, as ThoughtMesh's seems to be working fine.)

131

I appreciate how 'literaturegeek' is focusing on web programming but doesn't list CSS as a skill in his set.  Even the skills to ground up an HTML page are suspect as students tinker with Wordpress templates and iWeb.  Indeed, in the introductory Internet multimedia class I teach, I begin not with HTML as the "fundamental" unit of exchange on the Web, but rather move one step up the ladder to XML.  Not just the parent of HTML, XML provides base for many technologies including RSS feeds, KML (Google Earth), and MXML (used by Flex to create Flash applications).  I get excited at the possibilities of students geolocating their HTML in Google Earth, or, say, writing "RSS poetry."

That said, the list of five technologies that 'literaturegeek' provides is a clear match to the technologies that we use at the Vectors Journal, and that many Vectors fellows come away with from our collaborations.  We rely heavily on Content Management Systems (including our own back-end tools), and the project team members, regardless of technical skill, dabble in equal parts of MySQL, HTML, and Javascript. We also recently engaged in text mining and encoding during our recent summer institute.  Substitute Python with PHP and the list of five describes the technologies of our process well.

However, through a lens all of the above could be summed up as "database literacy," and the technologies to manage and manipulate the database.  As 'mikenutt' mentions, if we define a database as a set of 'structured data', then perhaps the main goal of multimedia education becomes structured data and producing with the networks they create.

'mikenutt' is also great to point out that a book isn't structured; there is typically no cross-referencing.  At Vectors we find that much of the beginning stages of a project focuses on the "structuring up" of the data, through annotation, work using database tools, etc.  It seems that a necessary first step to make any digital humanities project is to place content into a form best suited for digital production: a structured database.

Students can begrudgingly discuss the themes of Shakespeare or Animal Farm, and in the richer schools, dissect a Cubist painting.  However, when students begin to work with data networks, such as using Wikipedia to find new paths for a book report, text messaging their friends during class, or discovering Craigslist as an alternative to dollars-and-cents business transactions, we start to squirm.  But perhaps this is simply an expression of student's unease with the established literacies in the face of new media.

Students walk away from high school and college with an in-depth knowledge of MS-Word and PowerPoint, and sometimes MS-Excel.  Why not add one widely used application to the list: phpMyAdmin?  If Word instills linear editing skills, PowerPoint linear presentation skills, and Excel an intro to flat files, phpMyAdmin would provide the necessary literacy for relational data and by association an introduction to database design and management.  We might even begin to establish a literacy for text messaging, Wikipedia, and Facebook.

139

PHPMyAdmin for all is a wonderful idea. At DH centers or any DH project using a large body of literary or historical work, humanists need to have a basic degree of the "database literacy" you mention in order to understand how these projects are structured.

I like your emphasizing XML over HTML in your intro class -- I have recently begin to work with TEI, and I'm impressed with how much easier it is to move and share ideas organized that way. I imagine students who learn XML before HTML will be able to give precedence to the sharing of information over simple information display (hurray for web 2.0! and 3.0! and...).

99

Great point about phpMyAdmin. Despite the fact that I do know command line sql, I much prefer using this tool for things like making copies of tables for backups and new database structures from random excel files (where people have, for example, kept offline lists of things for years). It just simplifies everything so much.

I also think every DH should be able to teach students how to use basic web tools wikis and blogs, both in terms of coding interfaces and in terms of content. My case in point: last year I was the web TA for a class of 200+ students. I installed a clean copy of mediawiki (looks like wikipedia) for their final projects. The students placed themselves into groups via the wiki and were instructed to create their own project pages according to the instructions, although we gave them ample room for creativity. I discovered two things almost immediately:

1) A majority of the students have no clue as to how to actually edit or add content to a wiki or wiki discussion page. Sure, they know how to use wikipedia and copy and paste into their papers, but they didn't actually know how to modify that data, which leads me to the second thing --

2) Students still take wikis to be static content of facts and truth. In some ways, using a wiki was great to show them how much they could actually control what went on their project page! And even better, we had them do peer reviews by posting in each other's discussion tabs. When a project page left out a major point from a lecture or reading that would have changed the content that page (or the opinion of it) the students noted this and sometimes even questioned it.

 

106

What a great topic...

At the risk of shocking some of you I would add two more skills- very different between each other - and *very* different from those you identify, Literaturegeek, which importante i d not dispute.

The first is the one with which I most risk to infuriate my reader...

Flash. Yes! :) Indeed, a technology that has caused many many headaches to those who value internationalization and accessiblity. Still, it is an unbeatable tool when it comes to visual impact.

I guess what I am saying is that I see a digital humanist as a information designer, in the broader sense of the word 'design', as well as a data cruncher and analyst.

The second is arguable less susceptible to disagreement....

3D worlds and metaverses. I am referring more to Open SIm and less to Second Life. 

I do believe that in the upcoming year, (1) the ability to design information with visual impact (wether with Flash oranother software piece...) and (2) embodiment will both be key to engage scholars in areas for which this affordance is particularly pertinent such as History or the Arts.

 

 

132

Over on Facebook, some of us have been having a lovefest for the amazing designer and visualization expert, Edward Tufte.  Your post about flash made me think of Tufte's really brilliant ways of using 2-D static surfaces to convey 3-d animated information.   This is really off the point, but, if you don't know Tufte, you'll enjoy this, Ana, so I thought I'd pass on the url:  http://www.edwardtufte.com/tufte/

89

Thank you for the link, Cathy -- Tufte's work is amazing. This reminds me that I should post about Jonathan Harris' wonderful work visualizing scrapes of public blogs (discussed on TED and viewable at wefeelfine.org). Beautiful stuff with strong content behind it is even more beautiful.

135

Dear Cathe, I have all his books. I love his work! Somehow I had not made that connection though. You're absolutely right. 

By the way, one vote in for having Edward Tufte as a speaker in one of HASTAC upcoming conferences!  

I'll join the Facebook discussion then.:)

136

I totally agree with your vote for Flash (and would like to add Flex). Too many people have been annoyed by poor Flash use (that period when giant Flash spslash pages were the norm...), and I don't think Flash gets enough credit for information visualization (as you note) as well as simple interactive learning activities (the latest version has some ready-made interactive quiz modules and games which are easy to customize to your knowledge domain).

Your post makes me think we need yet another division of digital humanists beyond those whose emphasis is on digital content versus digital carrier -- digital humanists who use these tools for their own analytic needs, versus those who need tools like Flash to share their findings.

3D worlds -- definitely a cool digital humanities area. Have you heard of MITH's work with the Preserving Virtual Worlds project? (http://mith.umd.edu/research/?id=20).

92

Hello again, literaturegeek 

I did not know of MITH's work in the preservation of virtual worlds but do now: thank you! 

While working on a sufferent theme I cam across Daniel Pink's 'symphonic thinking' and cannot resist to leave his definition here:

Symphonic thinking is the signature ability of composers and conductors, whose jobs involve corralling a diverse group of notes, instruments, and performers producing a unified and pleasing sound. Entrepreneurs and inventors have long relied on this ability. But today, Symphony is becoming an essential aptitude for a much wider swath of the population.”

Daniel H. Pink, A Whole New Mind: Moving from the Information Age to the Conceptual Age

At Georgia Tech, Merrick Furst and Richard A. DeMillo found in this symphonic thinking the foundations for new r programs and career design for CS graduates at GATech, with the aim of better preparing them for an increasingly competitive global environment. As I read the roles proposed for the new GATech CS graduate students to combine today - communicator, entrepreneur, master practitioner and innovator - I can't stop thinking that these are not far from the directions digital humanists are pursuing.

I particularly like the way Furst's and DeMillo's proposal stresses 'communicator' as one of the several roles student may choose and around which to focus their choice of courses and so on...

 

102

 

I liked the way you separated tools for analytic needs and tools needed to communicate those findings.

Ideally you would have a '2 - in -1' situation... and there are tools out there that promise to do this but very few hold to their promise.  

The way we present things- the visual impact of ideas transmitted - is often the bottleneck... the reason why things do not change, or do not change as fast as they could/should.

Remember Anne Balsamo's use of Prezi at Hastac III?:) The way you present is very very important and often under-rated. 

I feel this emerging community of digital humanists must give this 'other' aspect in humanities research  - the communication of results- the attention it deserves. Maybe this is a space that the Arts can carve while becoming part of the (symphonic?) thinking behind the project from the get-go, rather than joining the research team when the project has matured and its is time to 'show results'... It is too late then.

 

131

I agree with the vote for basic wiki management instruction, and would like to add Google Docs to the list. There are probably other things like these which are really simple to use, but may intimidate those who have never tried (I wonder what the ratio of Wikipedia users to those who have ever actually edited a Wikipedia article is?).

125