Submitted by Michael Widner on Mar 08, 2009-10:13pm

Welcome to a HASTAC Scholars Discussion Forum on
Digital Textuality & Tools

featuring
Geraldine Heng (The University of Texas at Austin)

hosted by HASTAC Scholars
Michael Widner (The University of Texas at Austin)
and Angela Kinney (University of Illinois at Urbana-Champaign)

 

There is no question that modern academics have come to embrace - even depend - on digital resources to facilitate research. But what about materials that defy easy conversion to digital media? Testimonies both ancient and modern jotted in ink, prayers and curses etched onto pot shards, graffiti spanning from vulgar to mundane to profound, doodles of the bored and brilliant alike - these are only a few of the relics that often fail to be adequately represented in printed or digital form. Unless scholars in all disciplines advocate for the time, effort, and funding necessary to digitize these materials in some way, inevitably they will be neglected by future scholars. This sad truth is already evident: Danuta Shanzer's 2004 survey of all NRC-ranked American classics departments reveals that only four departments (out of 29) require their graduate students to studypalaeography; a mere six require study of textual criticism and editing. The easy accessibility of modern printed editions and digital transcriptions has resulted in the devaluation of a living textual history. It is time for us as scholars of all kinds of texts to advocate loudly that manuscripts and other neglected textual vehicles be made accessible to a digital world. To do otherwise is to cheat future students out of a rich history of texts.

While the idea for this forum developed from a concern for the digitization of pre-modern materials, scholars of all levels and disciplines must contemplate the future of text in the age of technology. Conscientious human efforts have preserved texts of all sorts - papyrus, parchment, paper - for centuries. What role should contemporary scholars and archivists play in the preservation of texts and their histories? How can technology enhance our study of all texts and textual vehicles? What tools and initiatives would facilitate textual research and pedagogy? All of these questions are relevant and animate the discussions of numerous projects.

We hope, then, that the following discussion will span the many themes associated with the topic of texts in a digital age and include comments from a broad spectrum of people. We would like to start this discussion with a few leading questions - some technical, some sociological, and some academic:

  • Has the role of digitized materials and digital tools in the research of texts been adequately defined or even discussed beyond the pragmatics?

  • Is it worth investigating the implications of digitization on a sociological level

  • Will widespread digitization change the way we do research? How? What are the effects, both positive and negative?

  • What tools do we need (or can we imagine) to improve textual research? What tools & technologies haunt our wildest dreams?

  • How can we stimulate further collaboration between those with the necessary digital prowess and those with the necessary academic expertise? Or how can we produce scholars with both sets of abilities?

By no means is this list of questions exhaustive, so please feel free to pose additional questions and raise issues we may not have raised yet. And although this forum is concerned primarily with texts, please feel free to reply via vlog or artistic expression if desired. We in no way wish to overprivilege textual methods of communication!

Angela Kinney is a PhD student in the University of Illinois (Urbana-Champaign) Department of the Classics and the Program in Medieval Studies. She holds an MA in Classics from the University of Illinois. She is spending the academic year 2008-2009 at the University of Bristol (UK) to work with Professor Gillian Clark on Augustine's use of satirical techniques in his De Civitate Dei (City of God). Her current research projects include arguing for 6th-century authorship of the Vita Apollinaris Valentiniensis and a comparison of the physical description of the Greco-Roman goddess Fama (Rumor) with descriptions & iconography of angels in Judeo-Christian texts. Her digital interests include the digitization and accessibility of pre-modern manuscripts, as well as website/graphic design and online instruction. Her favorite ways of procrastinating include message boards, scrabble, and Google Books.

Michael Widner is a PhD candidate in the Department of English at the University of Texas at Austin. He received his MA from Southern Methodist University. His dissertation focuses on the relationships between genre, identity, and bodies in medieval English and French literature. Though his research leads him to read about knights, saints, and hot pokers, he also closely follows technology news and current pedagogical practices and theories that attempt to deploy technology in relevant and effective ways. He currently teaches "The Rhetoric of Cartoons", a class in which he attempts to suck all the joy out of reading graphic novels like Alan Moore's Watchmen and Marjane Satrapi's Persepolis. Many years ago, he was a UNIX Systems Administrator for SBC; he doesn't regret quitting that career, but is grateful for the technological expertise with which it left him. He is currently struggling with Facebook addiction.

 

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
GMAP & Geraldine Heng
Posted on Mar 08, 2009-10:33pm by Michael Widner

I'd like, first, to welcome everyone to what I hope will be an engaging and energetic conversation on digitization efforts, manuscripts, and, more broadly, the issues confronting us all as we consider the types of tools and projects necessary to make the marriage of scholarship and technology a happy one.

Lest anyone think that the digitization of manuscripts is well on its way, I offer this article in The Chronicle Review by Eric Yager. He opens:

"When I tell people that the five years I spent researching and writing my last book included about a month and a half of work in the French national archives, they often look skeptical or even laugh, saying, "Right, research in France. That sounds really tough." Sometimes they pantomime the copious drinking of wine. Or they ask why anyone needs to go to the archives at all, since everything is now on the Internet.

Actually there's a lot that isn't on the Internet. And once you fly across the ocean in a cramped economy seat, arrive in Paris with your luggage and research notes, locate your rented apartment, renew your pass at the archives, secure a numbered spot in the crowded manuscripts room, find your documents in the catalogs, carefully write the shelf marks (call numbers) on the neat little forms provided for that purpose, and stand in line to hand your requests to the harried or indifferent clerk at the call desk - your work has only begun."

There are great masses of manuscripts that people haven't even examined yet, much less considered for digitization. One immediate concern that occurs to me is that of selection biases. Since the process is expensive and laborious, those manuscripts deemed most important are first in line. That group would likely not include, however, works like those Yager examines or that many, many other scholars focus on. Does this problem mean we should just go through the archives in order of shelf marks?

Next, I want to introduce Geraldine Heng. She is, along with Susan Noakes, a founder of the Global Middle Ages Projects (GMAP). She has graciously agreed to provide us with an account of GMAP, its relation to this forum, and some of the issues such projects raise. I'm sure her post will provide much for us to discuss.

Geraldine Heng is Director of Medieval Studies, Associate Professor of English and Comparative Literature, and holder of the Perceval endowment in Medieval Romance, Historiography, and Culture at the University of Texas at Austin, an endowment created to honor her work.

She is the author of Empire of Magic: Medieval Romance and the Politics of Cultural Fantasy (Columbia, 2003, 2004, 521 pp) and is completing two books, The Invention of Race in the European Middle Ages, and Global England: A Literary Archeology of the Global Middle Ages, and co-editing Holy War Redux (also a data-mining project for the Digging-into-Data challenge, and including contemporary holy wars) with John Ganim of the University of California at Riverside.

Heng's research focuses on literary, cultural, and social encounters between worlds, and webs of exchange and negotiation between communities and cultures, especially as transacted through issues of gender, race, sexuality, and religion. She is particularly interested in medieval Europe's discoveries and rediscoveries of Asia and Africa. Her book, Empire of Magic traces the development of romance-the foremost narrative genre of the European Middle Ages-and the King Arthur legend as cultural responses to the atrocities and traumas of the crusades, and Europe's myriad encounters with the East.

Heng is the founder and, with Susan Noakes of the University of Minnesota, co-director of the Global Middle Ages Project (G-MAP) , the Mappamundi cyber initiative, and the Scholarly Community for the Globalization of the Middle Ages (SCGMA). Mappamundi clusters together a variety of digital initiatives for the study, research, and teaching of a global Middle Ages.

Observations from the Global Middle Ages
Posted on Mar 09, 2009-11:59am by Geraldine Heng

Digital work in premodern studies is a treasure house of riches. Projects on medieval manuscripts, artifacts and art, intricate maps, and cosmopolitan cities, thrive and proliferate today, presenting extraordinary results and, also, special challenges.

Some of these projects drive cutting-edge technological innovation: Peter Bajcsy's group at iCHASS, working on the manuscripts of Jean Froissart (chronicler of the Hundred Years war between England and France), is developing widely-applicable tools for image analysis and pattern recognition. To keep track, Chris Baswell and Matthew Fisher are compiling a central database of all European manuscript digital projects. Beyond manuscripts, there is the visualization of cities, artifacts, monuments, and sites: among the highlights of the 2008 HASTAC conference were UCLA's virtual Ancient Rome and the collaboratively-produced virtual Temple Mount of Jerusalem http://www.ust.ucla.edu/ustweb/Projects/israel.htm & http://www.archpark.org.il/

Excitingly, this work is no longer confined to premodern Europe. The Timbuktu manuscripts of medieval Africa are being digitized: http://www.sum.uio.no/timbuktu/index.html http://www.aluka.org/page/about/news/nytimes20080521.jsp International Dunhuang, mentioned by Cathy Davidson on HASTAC forums, is putting online the fabulous resources of a cave-complex at a famed cosmopolitan city, Dun Huang, on the old eastern Silk Road.

For the Global Middle Ages initiatives that aim at studying an interconnected world in deep time, the biggest challenge is how to weave together disparate existing projects and our own, new, efforts, in narrating a multi-layered, multi-dimensional past that privileges no single region in the world, but offers as many points of entry, and analysis, as there are subjects to study them.

We have many digital projects brewing, all clustered under the name of Mappamundi ('map of the world') and focusing on different locations in time and space.

'Discoveries' of America is a visualization platform that depicts geographic imagination around the world--in Islamic civilization, China, and Scandinavian Europe--as societies imagined, or traveled to America before Columbus. It will show how maps flow and circulate in networks of cultural exchange, raising questions of timelines and knowledge. Along with navigational techniques, instruments, and cartography, Ayhan Aytes (the projected designer of 'Discoveries') anticipates grafting 3 layers onto the platform: a semantic layer with taxonomies that allow for varied tagging functionalities, a chronologic layer offering multiple timelines (episodic, linear, cyclical, discontinuous), and a spatial layer that intersects terrestrial with celestial navigation. Envisaged as a model for how Mappamundi could function, people will be able to contribute wiki-style modules to 'Discoveries' so that there's ongoing collaborative knowledge-making.

Ana Boa-Ventura, who hosted the very successful HASTAC Scholars Forum on Metaverses, is proposing her project on farming social media like YouTube, FlickR, and Wikipedia for resources to support study of a global Middle Ages, to the National Center for Supercomputing Applications this month as part of its 1 million CPU hours initiative, and hopes to work with colleagues at the NCSA to mine social media for attitudes to and materials on the Middle Ages.

Susan Noakes is collaborating with researchers at St Andrews, Scotland, in a Byzantium-Constantinople-Istanbul project associated with the Byzantinist Paul Magdalino. The St Andrews team has digitized the Topkapi palace, and are digitizing the Hagia Sophia, and sunken ships and their contents excavated from the Byzantine harbor at Yenikapi, Istanbul. Thanks to the efforts of Kevin Franklin of iCHASS, our Africanist colleagues will soon launch a collaboration with the Center for High Performance Computing in South Africa, to begin digitizing medieval Africa. We are also planning collaborations with supercomputing centers in Cyprus and in Beijing.

All these projects bring many challenges: from questions of funding, of how to find time to do all this extra work, how to amass data, and devise tools and interfaces, to--most fundamentally--the human challenges of working across cultures, languages, races, genders, politics, disciplinary training, institutions, and a wild and woolly mix of personalities, mindsets, and habits. Our colleagues in high-performance computing are often flummoxed by the "fuzzy logic" of humanities scholars (fuzziness that we, of course, know as creativity and inspiration). Humanities scholars can find it hard to translate complex research issues and ideas into "data" that can be queried by applications devised for computers. If the Global Middle Ages Project, as Kevin Franklin puts it, has the potential to become as large as the human genome project (!), it must build many kinds of bridges, through many kinds of conversations (including discussions like this HASTAC forum), and across many divides. Collaboration requires patience, and a good deal of time.

 

The pictures accompanying this post are of Timbuktu, Mali, today, where the Timbuktu manuscripts are housed. These pictures were taken by Bob Gee, a University of Texas graduate student enrolled in "Global Interconnections: Imagining the World 500-1500 c.e." who traveled to Mali in 2004. Manuscript pages pictured below are taken from actual manuscripts found at Timbuktu.

Some of these divides are, at base, political. When considering whether Byzantium-Constantinople-Istanbul could be digitized by the center at Cyprus, we were advised by an Ottomanist that locating the project at Cyprus would alienate the government of Turkey and colleagues in Istanbul. In the politics of the neighbor, the wars of the Middle Ages, it seems, are not yet over. And an Africanist warned that we should delicately and carefully approach the subject of digitizing Africa through the location and technology of Capetown, since South Africa is often seen by the rest of the continent to harbor hegemonist cultural tendencies. But thinking about how the politics of the present shape the production of the past--instead of how the past determines the present--can also be invaluable. Since my blog post in the HASTAC Scholars forum on the Future of the Digital Humanities, a group of us at the University of Texas has decided to take our project on Holy War into the slew of how contemporary politics has used the past. The reasoning goes like this. For a thousand years now, the countries of the West and the countries of Islam have been plagued by wars described--often without irony--as "holy." To some in Islamic countries, the holy wars known as the Crusades, beginning in 1096, are the foremost paradigm of an expansionist West, and the West's militarized past. To some in the West, the religion of Islam was birthed in the context of Holy War in the 7th century, with the Qu'ran authorizing the conquest of the non-Muslim world for the spread of religious domination.

I teach a graduate seminar called Holy War Redux as a series of questions in the classroom and in scholarly research. What is a holy war, and what conditions are necessary before a holy war can be declared? Once declared, do holy wars ever end? How are holy wars distinguished from wars waged for political, economic, territorial, and other purposes? Must holy wars be justified ethically and morally as just or defensive wars? How many societies have instituted holy wars in the history of the world, and what is the role of nationalism, ethnicity and race, tribalism, regionalism, caste and class interests in the waging of holy wars? Who are understood as combatants in holy wars, and what is the place of innocent victims? How is society re-organized as a result of holy war? What social, economic, political, psychological, or demographic changes result? How is the rhetoric of holy war refreshed over time, and how do cultures of the neighbor shape such wars? What logistics and statistics underlie holy wars?

These questions are not only of historical interest. Two renowned scholars--one a medievalist, the other an international relations theorist--have argued that the West and the countries of Islam are locked in a primordial struggle that will continue to produce holy wars in our current millennium. Bernard Lewis, an advisor to the Bush administration who urged the invasion of Iraq, called this struggle "the clash of civilizations." Samuel Huntington, in a famous book whose title borrowed Lewis' phrase, gave the phrase contemporary currency by suggesting that the end of the Cold War left a vacuum that is now filled by the "clash of civilizations" between the West and the East (the East being constituted by Islamic countries, and "Sinic" societies that form a natural partner for the countries of Islam).

Humanities scholars from several disciplines have argued vociferously against the simplistic and dangerous reductionisms of the Lewis-Huntington theses, but their arguments have thus far been driven only by conceptual reasoning, not by data. We are thinking to see if the Digging-into-Data Challenge can offer us an opportunity to discover if large-scale data mining can furnish evidence of a different kind--persuasive in different ways--from all our earlier arguments against the dangerous, simplistic, and mischievous reductionisms of Lewis-Huntington.

My example, in this, is Robert Pape's use of large-scale data in his Dying to Win: The Strategic Logic of Suicide Terrorism, which amassed a database of suicide terrorist attacks around the world from 1980 to 2004, using sources in Arabic, Hebrew, Russian, Tamil, English and other languages. Mining the data, Pape found that "Islamic fundamentalism is not as closely associated with suicide terrorism as many people think." His data showed that "the world leader in suicide terrorism" are the Tamil Tigers of Sri Lanka, a Marxist secular group that invented the "suicide vest." The data also showed that the incidence of suicide terrorism rapidly diminishes when "modern democracies" withdraw military forces from the territory that the terrorists view as their homeland. The data drove Pape to conclude that "suicide terrorism is mainly a response to foreign occupation and not Islamic fundamentalism."

We are thinking that our project for the Digging-into-Data Challenge should be to mine several different databases to test the dual Lewis-Huntington hypotheses promoting the idea of a "clash of civilizations."

If the data show that the incidence of holy war is linked to specific historical events, like the discovery of fossil fuels in Saudi Arabia in the 1930s (leading to "oil crusades") or the establishment of the state of Israel in 1948 (note Osama bin Laden's repeating references to a "Crusader-Zionist" alliance), it will be possible to argue that specific events, not primordial relations of enmity, shape the relationship of the West to the Near East. Rather than an essential struggle to which no end can ever be envisaged, and for which all solutions are necessarily inadequate, specific solutions can be envisaged to answer to specific events.

If the data show that the rhetoric of holy war was invoked at historical junctures before or during the Cold War, the force of Huntington's assertion of a "clash of civilizations" in the wake of the Cold War's closure is also diminished by data. Our group in Texas is collaborating with colleagues at the University of Sheffield's Humanities Research Institute on the subject of the Cold War.

Our working hypothesis is that no "clash of civilizations," either as formulated by Lewis, or as elaborated by Huntington, exists. Data could thus supply powerful alternative ways to prove incorrect these twinned pernicious formulations that have widespread influence today. So widespread is that influence that Fareed Zakaria, in the latest issue of Newsweek, just reflexively reinvoked this non-existent "clash of civilizations" again, as if the term witnessed conditions of reality, rather than political ideology.

We are thinking that, in the first instance, Holy War Redux could thus be a hypothesis-driven project of some value to contemporary political culture and to work in the humanities. Beyond that, we could apply data mining to address the larger research and teaching questions related to the history, incidence, and conceptualization of holy wars around the world. If we do, indeed, work on this project, there will be many challenges for the team at the Texas Advanced Computer Center, a supercomputing center that is attempting humanities projects for the very first time, to solve, along with our Sheffield counterparts at the HRI. But their solutions could have valuable and widespread applicability for humanities work on this campus and elsewhere.

This is not so far a stretch from digital work on manuscripts and images as one might think. This is simply another way for medievalists to work, as we remind people of the uses of the past--remember, for the Bush administration, Bernard Lewis' authority was as a medievalist par excellence--it's just an extension of the possibilities made available by data, digital textualities, and tools today.

Stunning!
Posted on Mar 09, 2009-10:36pm by Cathy Davidson
These images are absolutely stunning. The global middle ages are fascinating. Thank you for this inspiring introduction to your work.
Second Welcome
Posted on Mar 10, 2009-12:22am by Angela Kinney

As co-facilitator, I echo the welcome given by Michael and encourage all to contribute their throughts to the discussion. One reason I suggested this forum topic was the great potential for interaction between scholars of all disciplines as well as between scholars & administrators. Additionally, we all have a shared intellectual history in texts: this can be a very personal matter if one realizes what is at stake. I also invite futuristic speculation on how the fate of premodern texts in a postmodern age might influence the fate of postmodern texts in the distant future, as well as thoughts on the ultimate limits of preservation & digital representation.

For anyone desiring to read Danuta Shanzer's study on the status of palaeography and editing in US classics departments, here's the full citation (I couldn't find a nice link for it):

?Editions and Editing in the Classroom: A Report from the Mines in America,? in Vom Nutzen des Edierens: Akten des internationalen Kongresses zum 150-jhrigen Bestehen des Instituts fr sterreichische Geschichtsforschung, ed. B. Merta, A. Sommerlechner, & H. Weigl = Mitteilungen des Instituts fr sterreichische Geschichtsforschung, Ergnzungsband 47 (2005): 355-368.

 

I wanted to comment briefly on a bit that Michael wrote:

"There are great masses of manuscripts that people haven't even examined yet, much less considered for digitization. One immediate concern that occurs to me is that of selection biases. Since the process is expensive and laborious, those manuscripts deemed most important are first in line. That group would likely not include, however, works like those Yager examines or that many, many other scholars focus on. Does this problem mean we should just go through the archives in order of shelf marks?"

Michael has hit upon a significant problem in digitization efforts, a problem that has pervasive effects throughout all disciplines. (I will use a classical/medieval example simply because it is familiar.) With regard to digitizing ancient & medieval texts, too often digitization projects focus on two main categories of texts:

1. Texts transmitted by the "prettiest" manuscripts, i.e. the greatest number of illuminations and elaborate initial capitals.

2. Texts written by those esteemed as great authors (or otherwise widely known texts).

While these two categories of texts are vital and absolutely need to be digitized, overprivileging them does little to benefit the scholarly community. The "prettiest" (in modern eyes) manuscripts often are late medieval Bibles or Books of Hours - made for display or for the wealthiest of patrons. The texts themselves are well established & most lack substantial marginalia (because the purpose of the books was not necessarily practical). These pretty books may be of use to art historians and students of the development of scripts - but beyond there is nothing new or strange about them. When a library has a limited amount of funding available to digitize manuscripts, I humbly suggest that perhaps a manuscript sans illuminations or a text more obscure than the Bible might be a more substantial contribution than the most aesthetically pleasing codex.

At this point it is also somewhat unhelpful to overprivilege texts of authors already renowned. I am a bit more hesitant about this category - we are still missing texts from great thinkers such as Aristotle and Cicero (among many others); and the marginalia in school texts can also be a unique record of a medieval student's thought process. However, the academic community would benefit immensely from the digitization of more obscure works. Texts often go unstudied because no one has broken the pathway. Few people want to be the first to edit or translate a text - especially if there's no guaranteed payoff. (It's worth noting that editing is rather a thankless task in the US - numerous scholars have told me that one cannot count on a edition to bring tenure, a job, or even the Ph.D.) Texts go untranslated or unedited for numerous reasons, many of which relate to value judgments: the author is unknown or the famed Anonymous, the script is difficult, the grammar is seen as corrupted or "tainted" by vulgarisms, the subject isn't currently en vogue, the genre is generally seen as derivative or "mere compilation" (e.g. encyclopedias & chronicles), the location happens to be in an out-of-the-way monastery or library, etc. The point is that quick judgments made centuries ago by a frustrated cataloguer or bored scholar can banish a text to willful obscurity. These underprivileged texts are least likely to be selected by libraries beginning digitization projects, but they are the texts most in need of accessibility and study.

I'll stop here, but incunables often are underprivileged as well for similar reasons (or simply because they are printed works). So how can we avoid perpetuating arbitrary privileges (which ultimately derive from class/wealth) in choosing texts for digitization?

 

Thank you and congratulations on such a rich ...
Posted on Mar 21, 2009-09:41am by Cathy Davidson

This has been an amazing forum. Thank you for sharing and inspiring so much.

Congratulations! And thank you.

Public access, open access and the scholarly ...
Posted on Mar 11, 2009-03:49pm by Anaventura
Anaventura
Offline

The beautiful imagery in Geraldine Heng's post made me think of visual media. At the risk of proposing a euro-centric tone here :), I would like to bring up here a recent project based in Germany, at the Max Planck Institute for the History of Science (MPIWG).

This project is mentioned in the SCGMA blog, by the way - SCGMA standing for Scholarly Community for the Global Middle Ages - accessible through www.scgma.org.

The MPIWG co-initiated a call for Open Access to Digital Images - and has recently launched on its website a set of recommendations on the scholarly use of visual media. These recommendations are aimed namely at the publication of historical digital images, which are core to the GMA project - hence, this post?

The material is the result of careful consultations the Institute conducted with scholars and representatives of leading museums, libraries, image archives and publishers. It is well worth a look by Historians and non -Historians alike. Best practices are downloadable from the Institute?s website. The document is addressed at curators - exhorting them to accommodate scholars? needs by providing access to high-resolution images for a low cost (or no cost) and at scholars - exhorting them to recognize museums and libraries as the custodians of physical objects of cultural heritage. Furthermore, the document stresses the importance of the role of all stakeholders in the process as ?guarantors of authenticity?.

I was particularly drawn by a post by Gerard Meijssen urging the Open Access movement within the MPIWG to consider public sources. Meijssen specifically mentions Durova (one of the most popular image restorer of images posted in Wikimedia Commons). Check some of Durova?s excellent work here. http://durova.blogspot.com/

Specifically Meijssen says, addressing the MPIWG Open Access movement ?In your appeal for Open Access to digital imagery, you acknowledge that much of the material is no longer copyrightable. This material can be legitimately used by anyone who gets hold of a copy. This is another reason why it is better to cooperate not only in the scientific world but also with the public world. With digital materials it is like with ideas. when I share my idea with you and you with me, we both have two ideas. When it is publicly acknowledged what material is maintained by what archive, this archive gains public visibility. For all of these reasons I would like you to consider that there is a public aspect to Open Access and hope that you will consider opportunities for cooperation.?

I would be really curious to know some reactions of the HASTAC community on Durova?s work. I am also intrigued at these new interactions between public and scholarly media. I am talking specifically about images but these interactions occur equally with text. Any text mining in Digital Humanities will need to include public sources? But that will be a topic for a later post. For now, I would be really curious about reactions to Meijssen?s idea: should there be a public aspect to the Open Access movement in the scholarly use of visual media?

Open Access and Publicity
Posted on Mar 11, 2009-04:36pm by Michael Widner
Thanks for sharing this work with us, Ana. For those who like their links up-front, here is the best practices PDF and the comments from Gerard Meijssen . The apparent skepticism you note toward Open Access reminds me of the resistance some authors have received from their publishers when trying to provide copies of their works online. A number of authors, particularly Sci-Fi ones, now like to offer free copies of their books. Their reasoning is twofold. First, they want as many readers as possible. Second, they argue that offering free copies actually drives sales. Some of the bloggers on bOINGbOING have covered this issue for years now, particulary Cory Doctorow (who is, himself, an SF author). Most recently, Doctorow posted a link to this blog post that discusses spikes in sales after authors offered free ebooks:

"The theory that free ebooks released online will boost print sales is not a new one. Information radicals like Cory Doctorow and Charles Stross have been releasing their books under Creative Commons licenses - which allow readers to freely pass around the texts without fear of copyright infringement - for years, but it's only recently that most major publishers have dipped their toes into the pool (though incidentally many of Doctorow's books have been published by Tor).

Authors who go this route believe that the ebooks act as a form of advertising, arguing that the negative effects on sales from people reading it for free are offset by the word-of-mouth campaigns those same people will initiate. These Creative Commons evangelists also tend to point out that most readers don't like long texts on a screen, a fact that may cause them to buy the print copy once they've sampled enough of the story online."

It sounds like the pieces you mentioned are working from similar assumptions about the value of lowering barriers to access in order to increase both publicity (for the work and the library, museum, et al.) and the utility of the texts themselves.

Since copyright should be far less of a concern for scholarly materials (we're all committed to the building of knowledge, after all, not the pursuit of wealth), it seems like it is in everyone's interests that as many texts as possible be freely and easily accessible. What concerns me, though, is that there is as yet no (relatively) central organization driving this sort of work across the disciplines. One thing I especially like about GMAP is that it provides an organization to support interdisciplinary medieval studies with a global perspective. It seems like we need something similar to reduce the fragmented approach to manuscript digitization and access, as well. It's ironic, though, that in an age of folksonomies, distributed computing, and other such impulses toward decentralization, that very fragmentation (it's funny how two words like "decentralized" and "fragmented" have similar meanings but vastly different connotations) impedes progress toward a common goal.

 

Digitizing and Cost
Posted on Mar 09, 2009-11:58pm by Marjorie Woods

Thanks for setting up this discussion forum and the very interesting questions posed in the first contributions, which I'm pondering. In the meantime, I'm posting a very practical concern. It's been my recent experience that, in Italy at least, the cost of purchasing digitized copies of manuscripts is extremely high, and it's pushing up the cost of microfilms, since the libraries would prefer to digitize now. For example, the cost of getting a digitized copy of a complete glossed manuscript of theAeneidwas about $600 (at least when the euro was really high), and the cost of microfilming it was $400--much more than I had been expecting. I'm assuming that, as was the case earlier with microfilms at many locations, that once a digitization has been done on request for a scholar, any other scholar who wants a copy can get it more cheaply, though I'm not sure of this. I would be interested to know if those working in other locations have found costs to be roughly similar.

There are so many surviving manuscripts that the decisions about what to copy will probably be a combination of what has been popular and what individual scholars are willing to subsidize for their own research. It might be advantageous to work out overlapping areas of research so that we can share resources and expenses.

One of the advantages of having access to digitized versions of material online is, of course, having more comparative material to work with. But I would be very hesitant about broad conclusions based on the results of research done only such material (as I am about research conducted on manuscripts from only one collection, no matter how large).

Unintended Consequences
Posted on Mar 10, 2009-09:56pm by Michael Widner

Jorie, the costs you relate stunned me. They also make me wonder what other unintended consequences result from the impulse to digitization that's so prevalent now. I honestly have no idea what many of them may be, but I hope others will enlighten me.

I particularly like your suggestion that scholars coordinate their work to share costs and resources. It fits well with the urge to collaboration and more openness in scholarship. I wonder if there's anything like a clearinghouse website where scholars can self-organize in this way. If not, I think such a site could be a concrete and useful result of this forum. I know the new HASTAC site will offer a number of Wikis. A similar model could work for manuscript work, as well.

Digitizing Clearinghouse
Posted on Mar 11, 2009-05:13pm by Angela Kinney

I also love the idea of a scholarly clearinghouse - I'm imagining something where scholars could have a profile not just with fields of research, but maybe some kind of "wish list": things they want/need to digitized or otherwise made available. Since people often cite manuscripts & visual objects differently, perhaps a system of tagging wishlist items with keywords [is this fundamentally different from the system of social tagging known as "folksonomy"? I never know how to use that term properly].

It might be fun to brainstorm other features that could be used on such a site. I know there are numerous sites for sabbatical homes already, not to mention the amazing couchsurfing.com,but perhaps such a site could have a section devoted to housing swaps/rentals for sabbaticals, shared rides/rooms at conferences...book loans...help in planning research trips. For instance, when I was traveling alone on a research trip to Europe, I could have used the information & advice of a student or professor living in the cities I was visiting. Of course I could have boldly contacted people at random off university websites, but a social networking site for academics could remove a some of the nervousness/fear from the act of contacting strangers for help.

This is probably idealistic, but I like to think of the academic community as a generally more trustworthy bunch than, oh the rest of the Internet.(After reading that statement again, I want to lecture myself about delusions of safety.) Still, it stands: back in the US I hosted graduate students whom I had never met before (e.g. those visiting for conferences, interviews) all the time, but I have never hosted a stranger contacting me through the Internet.

An Academic Craigslist
Posted on Mar 12, 2009-10:53am by Michael Widner

It sounds like you're asking for an academic Craigslist. I have friends who look for rides between cities on Craigslist; while I don't think I would take a ride with a complete stranger, they've been safe doing so (so far).

I've been thinking about other features such a website could offer, too. The problem is the necessity of a critical mass before it becomes useful, especially when dealing with manuscript work, which is often solitary and narrowly focused. Adding features like housing arrangements, etc. could provide other services that would be more immediately useful.

I'm also taken with the idea of collaborative transcription and translation efforts based on high-res scans. I know there's some optical character recognition (OCR) work going on, but I'm not sure how widespread it is. Given that manuscripts are, by definition, written by hand, having a crowd of paleographers train a system to recognize a particular hand might give the computer a leg up on automatically processing the rest of a work and, in the future, recognizing scribal hands on its own.

Does anyone knows what sorts of projects like this are already underway?

Re: Digitizing and Cost [+ Locating Materials...
Posted on Mar 11, 2009-05:30pm by Angela Kinney

The following links don't solve the problem of undigitized material & the cost of digitizing it. See my reply to Mike's post about the idea of an academic clearinghouse site - which is a brilliant idea. Thanks to both of you for coming up with that collaborative thought! I, too, am pained by the cost a scholar must shoulder (often without reimbursement from his/her university) for microfilming or digitizing a manuscript. I recently inquired about digitizing a couple rolls of microfilm - the cost is just as high as the priced you cited!

At any rate, I wanted to list a couple links that help with finding material that has been digitized. As many of us know - once something is digitized and online, there is the entirely separate task of locating it.

Catalogue of Digital Manuscripts - The goal of this initiative (led by UCLA professor of English Matthew Fisher) is to index the many digitized manuscripts that are scattered around the Internet. There are links to the material; it can be searched via a variety of fields (language, location, etc.). Not everything is there, and I do wonder how they will handle links that change or cease to work, but this is a commendable project.

Digital Scriptorium - A collaborative archive of digital manuscript images

I'm sure there are others. I, frankly, got tired of looking through all the individual library collections, which would be silly to list here. This, I think, brings us to another issue: how do we find what we need? How can the process of locating items be made more efficient?

The issue with locating materials is that there is so little collaboration between universities and institutions at this point. Individual libraries digitize their own material and put it up in dribs & drabs using their own taxonomy (often mss. and papyri are not listed in general library catalogs, and when they are - let's just say the quality and completeness of entries varies). Libraries publish their own manuscript catalogues, but other than Worldcat, there exists no main database (to my knowledge - am I wrong?) for finding manuscripts of a particular work, digital or otherwise. Doing a project involving manuscripts (and incunables) involves a lot of scouring bibliographies and catalogs of individual libraries. After all of this time-consuming work, scholars still find out about hidden/unknown/obscure mss. after-the-fact. It's beyond frustrating.

In searching for an article on the UCLA digital mss. cataloguing project, I came across this article:

Time to Change our Thinking: Dismantling the Silo Model of Digital Scholarship(by Stephen Nichols)

To summarize Nichols' views would make this long post even longer - suffice it to say that he argues strongly for changing the way we think about digital humanities projects. Collaboration, he says, is key to digital projects on numerous levels. I think his point is sound & the world needs to hear it - he says a lot more as well - go check it out & see what you think.

I've included an excerpt below (in italics since blockquote isn't working):

It is not news in 2009 that digital humanities require a wholly new mind-set. That is what is meant by ?the cogito? of digital humanities. We cannot continue to focus simply on digital projects while ignoring the intellectual and social context in which they take place. We must begin by accepting the very different social, intellectual, and institutional context fostered by data-driven research. Digital scholarship creates a potentially productive network at many levels, and it entails significant change at the level of the individual scholar, in terms of operational methods, and in the kind of intra- and extra-institutional partnerships required.

The typical digital project cannot be pursued, much less completed by the proverbial ?solitary scholar? familiar to us from the analogue research model. Because of the way data is acquired and then scaled, digital research rests on a basis of collaboration at many levels:

  • First, as a partnership between scholars and IT professionals;
  • Second, as a dynamic interaction between scholars of the same and different disciplines, since the data is too large to be handled by a single scholar, and too varied to be encompassed by a single discipline;
  • Third, in concert with a team of IT professionals responsible for designing the site, developing functionality as requested by the scholars, posting the data, and, not least of all, assuring access to end-users around the world.

 

 

 

Organizing Body
Posted on Mar 12, 2009-10:57am by Michael Widner
I'm convinced we need a single, high-profile body to organize the various libraries and digital resources into a single site. The fragmentary nature of digital humanities initiatives (see the previous HASTAC forum) applies just as much to manuscript digitization efforts. I noticed, for instance, that the UCLA catalogue you linked doesn't have any of the University of Texas at Austin's manuscripts listed, though they're available via Digital Scriptorium. Even then, the selection is poor and fragmentary, mostly because the funding hasn't been there (at least, that's my understanding).
Unifying and Codifying
Posted on Mar 14, 2009-09:45pm by ajehanmorris
ajehanmorris
Offline

I think one of the problems with assembling such diverse sets of information is that no one necessarily describes (in the keyword sense) images or objects the same way. For a great example of how wonky this can get, see ARTSTOR. Unlike citation formats for texts, there is great diversity and lack of cohesion in how images are cited.

In other words, do you say the secondary source and page or plate number (Camille, Gothic Idol, pl. 9) or do you give just the shelf mark and folio number, neglecting the actual scanned image source (The Pierpont Morgan Library, New York, Ms M. 638, fol. 8r) or do you cite the webpage on which you found it and neglect all else? And what about sculpture - we've never yet come up with a universal pattern for identifying one capital in a series without a map of each site, as there is no set number of colums, etc. So you can see "Nave capital, Notre Dame, Paris" or "Capital, Notre Dame de Paris" or "Capital 13, left nave, Notre Dame" - or the more standard iconographic description, "Flight into Egypt, Capital, Notre Dame de Paris" - until someone is challenging the iconographic designation, and then it becomes "Balaam and his ass (Flight into Egypt?)" or no one knows what it is and so it is "Knight with dragon? Soldier? St. George? St. Theodore? Apocalypse" and don't even get me started on the issues with changes between spellings, or Biblical names (is it Noah or Noe), etc. etc. Art history is a semantically shifting field, and for the most part, before the Renaissance, titles of artworks are arbitrarily assigned handles that can appear in any number of ways, and are occasionally changed in accordance with current opinion (hence the "Venus of Willendorf" becomes the "Woman of Willendorf" although textbooks may call her either, at any time, or both).

University digital image collections (those generated by art history or other departments) are, quite naturally, skewed towards the teaching and research needs and interests of their faculty, and the organizational system is customized to the creators of each database. (Not to mention the huge personal sets of images that we personally hang on to in our computer files, which may or may not be accessible to others, and are in fact a result of our own scanning, collecting, and hoarding.) When SMU was working on designing its visual resources digital collection, endless hours were spent trying to identify and design a categorical organization system that not only built on the familiar system of the slide library but was also expansive into new directions that were not always immediately needed.

Colum Hourihane and his incredible team at the Index of Christian Art have taken the very carefully organized and detailed tagging information from the original paper system and have transferred it into a multi-field digital identification system, which is beautifully systematic, but occasionally awkward to the casual end user. The Index could provide a prototype for such a universal system, but as there are several varieties of database programs available to each university which is considering switching to digital, this formula would have to be transmitted to the software developers as well to be most readily adopted by all and sundry.

A lower-cost approach
Posted on Mar 15, 2009-05:53am by wdmartin
wdmartin
Offline

I have a suggestion for reducing the cost of digitizing manuscripts. Some of them, anyway.

Don't digitize the manuscripts themselves; whenever possible, digitize existing microfilm versions of them.

A substantial amount of the effort needed to produce a digitized manuscript consists of having somebody sit there scanning pages one at a time. But in many cases, our predecessors have already done that part. With an appropriate scanner, you can digitize microfilm basically as fast as the computer can accept data. A lot of the process can be automated, so that the scanning would proceed automatically, requiring human intervention only to load new rolls of microfilm. That means that one person could run multiple scanners simultaneously.

There are disadvantges, of course. For one thing, there wouldn't be any color. For another, the resulting digital images would only be as good as the original photography, which I imagine is of variable quality. And of course it only works for manuscripts that previous generations of scholars deemed important enough to microfilm, meaning that it suffers from the same selection bias that Mike discussed earlier.

But on the plus side, it can be done fast and relatively cheaply. I believe EEBO and ECCO have used this approach to build their impressive collections of digitized books from the English Renaissance and the 18th century.

And, as usual, the barriers to taking this approach will be more social than technical. Many of the libraries with the largest microfilm collections depend on them for income, and may very well be strongly reluctant to put their collections up on the web.

Still, it's worth investigating. We've been making microfilms of medieval manuscripts for nearly 70 years. Let's build on all that work.

A Hybrid Approach
Posted on Mar 15, 2009-11:46am by Michael Widner

Will, that's an interesting idea. You're right about the disadvantages, but we shouldn't let perfection be the enemy of the good. I think a piecemeal approach (or, a less negative word, "hybrid") to this work where we grab what we can from everywhere is a good way to generate and keep momentum.

It worries me that digitizing microfilm might start as a stop-gap measure, but then, because of intertia, those images never get replaced by scans of the actual manuscripts. I can imagine someone making a business decision that argues the scans of the microfilms are good enough and therefore the extra money required to rescan the manuscripts is not worth it.

The cost of digitisation; the open source pri...
Posted on Mar 11, 2009-07:58pm by GerardM
GerardM
Offline

You can compare digitisation with Open Source. "Everything that is already there is free. You have to pay for what is not there". This means that it is reasonable to have you pay for the work that needs doing when a digital copy has to be created.

There are several issues with this.

  • You cannot predict in advance what is available and therefore the costs
  • People will restrict their work to what has already been digitised
  • Important material may not get digitised in this way
  • When an archives burns or collapses, everything that has not been digitised is lost

There are problems in the execution as well. When scans are stored in the JPG format, they are compressed consequently these images are not suitable for digital restoration. They are at best useful to show on websites.

A good modern archivist is aware of these issues and he can also tell you about big investments that had to be redone because of changes in technology.

Thanks, GerardM

 

Open Source
Posted on Mar 12, 2009-11:09am by Michael Widner

You make an interesting analogy, Gerard. Thanks! I'm a big advocate of open source software (OSS; I run Linux on all my computers, for instance), but I hadn't considered the parallels. Much as various OSS projects need an organizing body such as Canonical for Ubuntu or Novell for OpenSUSE to avoid irregular release cycles, lack of focus, and project abandoment, which are all issues that plague less well-supported OSS projects, manuscript digitization and tool creation demands similar infrastructure to support it.

As inimical to scholarship as it seems, I wonder if there's a for-profit model for this work that could be viable. I could see libraries being willing to pay a fee to a company that would digitize, archive, and cross-reference the library's collections. The corporation could probably offer lower costs because of scale, provide database housing, and the other technological infrastructure necessary. Surely there are companies doing this already, but a quick Google search suggests that most libraries have their own digitization department that keeps such things strictly in house. The result, of course, is that such efforts remain localized to individual universities and collections and therefore fragmented.

The cost of NOT sharing digitised works
Posted on Mar 13, 2009-07:27pm by GerardM
GerardM
Offline

Digitising is expensive. Once the work is available in a digitial format, distribution is essentially free. There is much that needs to be digitised and the catastrophe at the archive of Cologne demonstrates that it is of a vital importance to digitise as soon as possible everything that is unique.

Many of the scholarly papers are not available in a digital format and consequuently these papers are no longer realistically available. Modern research is done digitally. The whole of the surviving ancient Greek literature fits on one DVD and this obviates much of the need for the paper copies.

When libraries digitise, the results need to be shared for the same reasons why modern scholarly papers need to be available as Open Access. Sharing the digitised copies allows for modern scientists to stand on the shoulders of giants.

Science is expensive, science is worthwhile. The expence is justified when the best information is available The value of a publication is in the network of connections to that work. Increasingly those connections are digital.

Thanks, Gerard

Availability vs. Obviation
Posted on Mar 14, 2009-07:54pm by ajehanmorris
ajehanmorris
Offline

There is a conceptual problem with the removal of physical items from circulation or from public availability just because a new technology is available. I know few rare book librarians who don't tear up a bit at the notion of how many manuscripts were destroyed and lost - including textual or image program variants - when the printing press arose. What manuscript studies tell us is that every single copy is a unique work, and not fully replacable by any copy, no matter how beautifully done. Art historians always work on singularities, forever forced to remember that in terms of medieval art there are no exact duplicates of anything, and each work lost or damaged is a permanent loss, and so the preservation aspects of digitization are as exciting as the belief that the originals are then somehow obsolete, and can be discarded, is disturbing.

One issue with some digitization projects is that, speaking from an art historical perspective, the copy does not equal the original, and therefore the original still contains information that a digital copy does not. The same issues apply with printed facsimiles, which, as Professor Woods noted, are horrendously expensive both to produce and purchase. Gold leaf does not 'read' properly, nor does silver leaf, which has tarnished, read as 'leaf' instead of 'black' at all. The problem of colors has haunted all forms of image duplication, and in a digital format, where the individualized settings of one person's computer can override even the most cautiously balanced color scans, the problem is increased. This talk of colors may sound like art historical whimpering, but the difference between a lapis based pigment and another hue is most readily seen in terms of tone gradations. If lapis is used, we know certain things about the cost of the ms that may not be so for another pigment. Also, use marks, taking the form of dirt smudges, fingerprints, ruffled edges, etc. all carry invaluable information. Very good facsimiles, digital or print, preserve this information, while cropped duplications of certain illuminations, as seen in text books, de-locate these images in an unnerving way. For art historians, facsimiles of any quality still feel 'deadened', as parchment moves and wrinkles and breathes in a way that no heavy paper or computer screen can match. Some degree of work will always require that scholars have access to the original.

However, the accessibility such versions provide is of endless value, for teaching and for research. Some of the best digital facsimiles, even if they are of low-grade quality, preserve the sense of turning pages and moving through the book, as do print facsimiles. For teaching purposes, I have been able to introduce students to lovely manuscripts in facsimile form, while the originals stayed safely housed in Paris or London. To be able to present a student with a website collection, like Digital Scriptorium or even the British Library, is to grant them access to a much wider variety of images and manuscript types than any single text book could provide. Again, as Professor Woods noted, the access to comparative material for both students and scholars is also wonderful, and critically needed (although yes, the perfect ms will never show up until after the due date, in keeping with the academic's version of Murphy's law). Above all, perhaps, the readiness of this information will help protect the originals, as the damage of creating one digital facsimile, although potential, is less than the general wear and tear of allowing manuscripts to be handled more than is necessary. (Not that most libraries will easily allow access to their manuscripts, even to scholars - in these cases, digital facsimiles may be the only way that lower-rung scholars and graduate students may access some works). Plus, a dissertation on one manuscript family could require travel to Paris, New York, Moscow, Istanbul, and Madrid. Meanwhile, digital editions cut back the need for such travel, if the facsimiles are of high enough quality to be easily read, and maintain the structural integrity of the original.

In short, there are huge advantages, but also drawbacks that should be heeded and considered, even apart from cost, so that we are not following in the footsteps of those well-meaning but sadly destructive followers of Gutenberg who shredded manuscripts for loo paper.

Shredded Manuscripts
Posted on Mar 15, 2009-11:54am by Michael Widner

April, I share your horror of dismantled manuscripts. Of course, digitization projects should actually reduce the wear on manuscripts, particularly ones that many scholars want to examine, as you point out. Not everyone's work demands the physical object.

You also raise a good point about the information lost when a physical object is reproduced digitally. Some of the problems, like the quire structure or colors should, in theory, be transferrable to a digital format. Since we have the ability to balance colors with great precision, I wonder if we can mostly discount the problems of individual computer configurations. Surely art historians are, like yourself, aware of this problem and would take steps to eliminate it. Perhaps departments could maintain workstations where the colors will maintain fidelity to the original.

As for quires, the people doing the scans simply need scan the binding, too. A manuscript expert could also provide a description of the quires to go along with the scans since, even with the manuscript in one's hands, it can be hard at times to determine structure.

Social Text?
Posted on Mar 10, 2009-09:57am by GeoffreyRockwell

Thank you to our convenors Michael and Angela for starting the conversation. I'm struck how we jumped from "digital textuality" right to manuscripts and the need to digitize them. I would argue we want to take also take a detour and follow the question of Renear et al, "What is a text, really?" (See Refining our Notion of What Text Really Is: The Problem of Overlapping Hierarchies.) Digitizing, encoding, linking, annotating, and publishing online (with tools) involves decisions about what a text is (what is important, what others need, what is essential) such that we know what to record in digital form and what to discard. I suspect that every digital text project is not only an edition of its subject, but also an implicit theory of what a text is. We will probably be reinventing texts over and over as a way to discuss them through things. (But more on that.) For the moment let me put forward three theories of text in things:

1. Image of Text: The text is a visual 2D object best represented by a high-resolution archival quality digital image.

2. Text as OHCO: A text is a Ordered Hierarchy of Content Objects best interpreted by structured information following community developed guidelines like the TEI.

3. Text in Performance: Mutable text is played with by us editors and therefore we should represent our ongoing history of deformative play as a game like the Ivanhoe game.

The first two theories can be reconciled in multimedia. We can have both and link them in multimodal text and get all the gifts of both from text analysis to the presence of the image. The third resists the idea of fixing the text and for that matter it resists the idea that there is a text to be captured before it slips away. It is this third thing theory I would recommend to you. What if we worry less about fixing our cultural heritage in digital form for all time and imagined what it would take to open textuality to public performance?

What do I mean by this? Well, I mean that we can imagine a different "we" involved in textuality, a bigger tent where there is public engagement. The Suda Online project suggests how we can involve people in social editing. This might relieve some of the anxieties of funding and support. We will never get all that stuff digitized if we have to do it the craft project model with NEH grants. That is the way of archive fever. Perhaps if we give up the definition of what is worth gathering and how it should be gathered we can enable a more participatory digitization.

I'll end this with my thing theory of text and that is all the pedestrian text around us outside of books. What would be our theory of text if we took as paradigmatic the wild words on signs, boxtops, instructions, t-shirts, labels, tatoos, posters, graffiti, and so on that we approach and pass all day?

Geoffrey Rockwell, Univesity of Alberta, theoreti.ca

Textuality, Genre, and Variant Manuscripts
Posted on Mar 10, 2009-09:48pm by Michael Widner
Thanks for your insightful comments, Geoffrey. I read the paper you linked that asks "what is a text?" and was quickly struck by how its authors try to negotiate the difficult problem of genre, though they rarely do so in specifically generic terms; instead, they talk about multimodal texts and different perspectives. Much of my own research is on genre theory and the problems caused when we try to assign texts to genres (much less when we try to conceptualize genre itself). The authors' pragmatism leads them near the realization of genre's arbitrary and fraught nature, though they remain wedded to the necessity of hierarchies. I understand why, of course, since that's something computers do well and that nearly all markup languages I know of implement (I'm thinking of XML, in particular). It also makes intuitive sense to organize textual data hierarchically, though it continues to trouble me, especially considering their reliance on perspectives, which I see as remarkably close to genre; both, for instance, provide a context within which to understand certain textual markers.

 

You also raise the difficulty of how to effectively (I won't say "accurately") represent texts in digital format if we cannot agree on exactly what constitutes a text. Current manuscript digitization projects, for the most part, do little more than provide high-resolution scans of each page. In some ways, this approach has much to recommend it since it leaves all decisions about the structure, nature of the content, genre, etc. to the scholars examining the texts. At the same time, however, it leaves little room for advanced cross-referencing, aggregation, or other techniques databases make possible. The manuscript becomes, rather than a text, pure image. Obviously, we must confront the theoretical questions about the structures of a text when designing new tools.

I also like your desire for a "broader tent". The problem with premodern manuscripts, though, is that most of them have yet even to be transcribed (itself eye-destroying labor), much less translated. Maybe the high-resolution scans could lead to more open transcription efforts, which would be great training in paleography, but I wonder how much broader than that the tent could be in practice. Once texts are transcribed, though, translation efforts could certainly come from a much larger coalition of interested people. I'm sure there would be some scholarly skepticism of the results, but so long as the entire process remains open and well-documented, anybody could go back and check the work when necessary.

One final point, before this post gets too long. The opportunity for variant readings that refuses to fix the text would reduce the need for editorial decisions about which works in the stemma to privilege. It would also be nice, as a reader, to have the variants easily at hand rather than hidden away in an appendix. Often, the variants make little to no difference to one's interpretation, but it would be nice to find that out quickly and (almost) immediately rather than flipping back and forth in a bulky tome.

social text as "an open door to new forms of ...
Posted on Mar 10, 2009-01:59pm by Anaventura
Anaventura
Offline

Geoffrey (and all)

I am glad you brought up social texts! By the way, what Wikimeda Commons members like Durova are part of is really a *social* restoration of images.

I agree with you that we cannot wait for funding agencies to support all the digitizing that is out there to be done. I do think they can stress the importance of social text in upcoming grants in DH. And I think they are on the right path when, for instance in the announcement of the Digging into Data Challenge (closing this week), they say, and this quote is from the NSF site- (DID isJISC, SSHRC, NEH and NSF):

?[...] unprecedented quantities of diverse and relevant information are now available to scholars on the Internet and in other digital forms. [...] continuing advances, such as the collaborative creation of knowledge through such vehicles as Wikipedia, computationally mediated social networks established by resources such as Facebook, and creative uses of three-dimensional virtual worlds such as Second Life, open doors to new forms of "cyberscholarship" for the social sciences and humanities.?

I was really excited to see how social text is depicted as 'an open door to new forms of cyberscholarship'. Any thoughts on this? Other examples like the Suda Online Project (wonderful project!) that folks may want to share?

Social Scanning
Posted on Mar 11, 2009-11:46am by GeoffreyRockwell

Social projects are relatively new to the humanities or old if you think of the symposium as a humanities tradition of social research with food. I doubt they map on directly to "traditional" projects as easily as SUDA Online did. The two I'm involved in answer questions differently:

The Dictionary of Words in the wild lets us gather examples of public textuality. It is a hypothesis that is still trying to figure out its theory.

The Day of Digital Humanities is an experiment that will happen next week where about 100 digital humanists will blog a day of activities to answer the question, "Just what do digital humanists do?" We will see how it works.

There is the Distributed Proofreading part of Project Gutenberg and a bunch of science and art ones.

My question would be how can we imagine social research solving the problem Michael points out of scanning documents so that there is something for scholars to work on. If we take into account the political sensitivities that Heng points out we have an interesting problem akin to the issues we face here in Canada when university linguists try to "save" First Nation's languages. How can we help others to digitize cultural materials in their custody? Does it have to be to our standards? The linguistics department here has developed a successful summer school that might be a model, Canadian Indigenous Languages and Literacy Development Institute.

Geoffrey Rockwell, Univesity of Alberta, http://theoreti.ca

Wikiprofessional
Posted on Mar 11, 2009-08:10pm by GerardM
GerardM
Offline

Have a look at Wikiprofessional. This project brings three things together. Literature in Pubmed, databases like UniProt and a Wiki with structured data. Combine this with a clever data miner that brings these things together and you will find information that was not available without one of those three parts.

Currently Wikiprofessional is bio-medical and English only. We are working on adding support for Portuguese and Spanish. Have a look at what it does on a the Wikipedia article on malaria... you can try it with any other Wikipedia article as well. :)

Thanks, GerardM

"social restoration of images"
Posted on Mar 10, 2009-07:06pm by Cathy Davidson
Ana, that phrase of yours just makes my head spin. That is exactly right, the "social restoration of images." As someone trained in reception theory and history of the book (as it was once called), that concept of a social restoration of images is right on the mark. And "social" means, well, all the things social means.
best Software Tools for linguistic/structural...
Posted on Mar 11, 2009-11:29am by malangela
malangela
Offline

hello there,

what are the best softwares/tool available in the field of linguistic/structural analysis? In the simplest form we would like a tool that can be fed with text (e.g. papers, articles, etc) and that it can perform frequency analysis, provide statistical and relational representation of words and other linguistic structures.

In addition, these softwares should be able to help with comparative analysis e.g. between 2 or more textual objects. We are familiar with visualisation tools such a Texarc and manyEyes but we know very little about anything else, could you advise? THanks!! M.a.f.

One simple tool along the
Posted on Mar 11, 2009-02:28pm by Duane S.
Duane S.
Offline

One simple tool along the lines of what you describe is probably not readily available, however you can likely piece together what you need from some of the following resources ...

 

Check out the TAPoR software page:

http://tapor.ualberta.ca/Resources/TASoftware/

 

See also TextGrid:

http://www.textgrid.de/en.html

 

See also Philoline and MONK for text comparison type tools.

http://code.google.com/p/text-pair/

http://monkproject.org/

 

Apart from tools like these I know of several toolkit frameworks like T2K, UIMA, GATE, MorphAdorner, WordHoard etc ... that perform various types of text annotation and analytics including POS Tagging, tokenization, clustering, information extraction, classification, and advanced search and sort.

http://alg.ncsa.uiuc.edu/do/tools/t2k

http://gate.ac.uk/

http://incubator.apache.org/uima/

http://morphadorner.northwestern.edu/

http://wordhoard.northwestern.edu/userman/index.html

HTH,

-- Duane

 

Text Tools
Posted on Mar 13, 2009-09:18am by malangela
malangela
Offline

Duane,

this is brilliant,

thank you very much - I will keep youse posted with progress of something good emerges!

 

Best

 

marie

Another Tool
Posted on Mar 13, 2009-04:25pm by GeoffreyRockwell

Another tool linguists are fond of is R. http://www.r-project.org/

Geoffrey Rockwell, Univesity of Alberta, http://theoreti.ca

Hello
Posted on Mar 11, 2009-07:06pm by Durova
Durova
Offline

Hi, Gerard Meijssen gave a heads up that our Wikimedia Foundation work caught the attention of your forum. So created an account and saying hello.

The Wikimedia Foundation is best known as the parent nonprofit of Wikipedia, but it actually operates a wide array of educational websites. Two of the ones that would be of interest to this forum are the image hosting site commons.wikimedia.org and the text hosting site wikisource.org.

Gerard and I, along with others, have been working to persuade more museums and archives to digitize their collections in high resolution image formats. Although our current focus is mainly on visual material, we also recognize the value of digitizing rare books and manuscripts. I am an administrator at both Commons and Wikisource.

As an example of the potential this represents, a fellow Wikisource volunteer had been working on George Washington's first state of the union address. He asked if I could help do something to improve the image to accompany the text--it was a yellowed newspaper clipping over 200 years old. Instead of restoring the clipping we located Washington's handwritten notes. http://commons.wikimedia.org/w/index.php?title=File:Washington_-_State_o...

 

If more documents of this sort get digitized globally, the pace of serious research will really improve.

Best regards,

Lise Broer

(Durova)

Big fan here:)
Posted on Mar 11, 2009-10:55pm by Anaventura
Anaventura
Offline

Hi Lise a.k.a. Durova:)

Big fan of your work here! How wonderful that Gerard brought you into the discussion! Out of curiosity (and the question is for you and/or Gerard)- for MPWIG set of recommendations on open access to visual media for scholarly use - were Wikimedia members (and I mean specialists such as yourselves) involved? Just curious.

Also for the folks out there interested in this Wiki-driven last portion of the debate:)check out Gerard Meijssen's OmegaWiki.

Thank you
Posted on Mar 12, 2009-01:38pm by Durova
Durova
Offline

Yes, Gerard and I often confer about these things.

One thing that's been very persuasive when talking to archives, museums, etc. is to explain the benefits of getting attention on Wikipedia's main page. English Wikipedia's main page receives an average of about 7 million page views per day. Each day the site highlights one featured article, one featured picture, and brief summaries of several more new articles. The circulation of The New York Times is 1 million copies per day, and their website traffic is negligible compared to Wikipedia's.

In these times of smaller budgets, nonprofit organizations are looking for innovative ways to get attention. I can offer a digital restoration on slightly damaged artwork--material they wouldn't otherwise sell as poster reproductions at the gift shop, but still quite encyclopedic. The archive receives free restoration services, perhaps also garnering main page attention at Wikipedia along with credit and source attribution in the form of links to the archive's website. The public benefits from good illustration, and academics such as yourselves get a foot in the door.

The key is to reach a critical mass of awareness among trustees and other management that an open approach works to their benefit. Many of them are frightened of letting a genie out of the bottle, and don't yet realize that the genie would help them.

Inventoriana: Collaborative Indexing, Annotat...
Posted on Mar 13, 2009-11:29am by Angela Kinney

As I was searching for the article I posted in a previous comment, I came across what looks to be an exciting development in collaborative image analysis and manuscript work. Inventoriana is a tool for adding in-image searchable commentary using an interface based on Google Maps.

The creator, Drew Massey, describes the application as "a powerful new way for all kinds of potential users ? librarians, curators, scholars, students, scientists, and others ? to interact with digitizations and to add value to them through meaningful posting, editing, and exchange of commentary within the images themselves."

Take a look at the general overview (PDF), which includes several samples of how Inventoriana might be used in collaborative editing and tagging projects. Examples included in the overview include analysis of two states of an particular etching by Rembrandt, an elaborate medieval liturgical manuscript, and an edition of Charles Ives' Concord Sonata interpreted by John Kirkpatrick (the last example especially interested me as a template for how editions could be compared using this application).

A brief article about Inventoriana.

It looks like use of the product is free for individuals; there is a subscription-based service for institutions.

The most shocking part of this discovery? I went to high school with Drew. I knew he'd end up doing something great - but related to my field? Everyone should be so fortunate (and perhaps, given the scope of the project, everyone is). He possesses a harmony of talents - programming expertise, understanding of paleography/manuscripts, & experience with musicology. If anyone is interested in hearing more about Inventoriana and how it was developed, I can see if he'd be willing to answer questions.

premodern text and internet
Posted on Mar 13, 2009-07:44pm by Ishan
Ishan
Offline

When we look at data on the internet, most of the Web 2.0 type websites encourage us to tag that data. Users generate multiple tags describing the same piece of data; this then allows other users to search for and re-organize data using those tags. Take a look at the music website http://www.last.fm/home for example. What if we were to organize our texts in this way? This could easily be done for all the masses of premodern texts already on the internet (public domain sites, private websites, specialist websites, etc. etc.). Once adequately tagged, any given paragraph of any given text would be mapped onto multiple organizational schemes by those tags, enabling all sorts of research.

Unlike previous ways of sorting data -- even previous digital ways to do so -- tagging is ridiculously easy to do, if we build websites that allow us to tag things. Given that there are many websites on the internet that store premodern texts, adding a tagging feature to all of them seems like the way to go.

 

Another frontier of technology is language and translation. Machine translation of all the most-common spoken languages has long been a reality, and is just always getting better. See google translate, babelfish, etc. We could easily do this for pre-modern languages (with some funding from someone!). (p>Now, thinking about the pre-modern past, anyone with experience in classical languages (latin, greek, sanskrit, arabic, etc.) knows that those languages (of course, by definition, only in writing!) always embody the most prescriptive of grammars -- unlike the spoken languages of today. Sanskrit and Arabic even have grammatical traditions that already represent the grammar in abstract terms that resemble modern generative grammar. The grammarians of these two traditions, at least, already created pretty perfect generative grammars. Creating such a grammar is one of the hardest parts of making a good machine translation, and lots of linguists had to work together to make these things possible for our modern languages. What I'm getting at is that producing a translation program for the classical languages should be a much easier than it was to produce such programs for modern languages.

It hasn't been done yet, but if we were to do this, it would literally change the way we did all our work. Working with immense bodies of text would be easy: you wouldn't have to read it all and could still easily find what you wanted. No machine translation is ever going to be perfect, so we'd still have to know the languages (so there's no need to get scared of technology eliminating translators and scholars!). But translation would be a lot easier, and much much faster. A very good friend of mine was once toying with a program to translate Sanskrit, so I hope he gets time to finish it someday. The problem seems to be funding and money: a lot of pre-modern studies is not heavily invested in technology; conversely, much of technology seems to not be invested in pre-modern studies.

Take a look at Perseus, a digital Greek/Roman classics library among other things: http://www.perseus.tufts.edu/hopper/ . If you've never seen it before, try the Greek collection (in my opinion it is their best. Perseus has stopped short of machine translation, but it links every single word in every single Greek text to its dictionary entry and even recognizes the grammatical form of the word. Where there are ambiguities, the users get to vote on the correct form. That was not the case in the old Perseus.

If we combined the two approaches -- and there's no reason to think that one day we won't be -- then we'd have an amazing tool. The same website could store texts, translate them, and allow users to tag them. Tags change the way we think about text: they locate specificity within large masses of text, allow for immensely comparative work to be done across all sorts of lines (one could tag by chronology, themes, character names, literary styles, structures, languages, metaphors, imagery, etc. etc.), and eventually, they create a map of the text itself.

Complete Agreement
Posted on Mar 13, 2009-11:28pm by Michael Widner

Ishan, thanks for raising these points. I agree completely and have been having similar discussions about tagging digitized texts with a coder friend of mine. It looks like the tool Angela found, Inventoriana, is pretty close to what we need already, too. There are, in any case, numerous implementations of this technology, it just needs application to manuscripts and journal articles.

I hadn't considered that ancient languages would be easier for the machines to handle, but it makes perfect sense.

By the way, I love last.fm. Here's my library.

Tags are great; some reservations about machi...
Posted on Mar 16, 2009-05:57pm by Angela Kinney

First of all - I think tagged texts would absolutely be useful for scholars and non-scholars alike. I would encourage tagging in the original language, though, if only because a word like the Latin vis has so many English equivalents. Sticking to the original tongue would reduce searching problems introduced by synonyms. Or perhaps two "layers" of tags - one in the original language for those who know it and one in a modern language for those who don't. (In this system, the option of hiding one "layer" of tags would be necessary, I think, to avoid clutter.)

 

I want to share a few thoughts regarding the usefulness of machine translation with regard to ancient languages. However, before I post, I want to be clear that I'm not against technologies like machine translation - just dubious about their ultimate yield for scholars as well as non-academics at this point in time. I have read over my thoughts & they sound so pessimistic! In actuality, that's not how I would describe my attitude toward the technology. My concerns are based on the fact that there is only so much money put toward textual research tools - and I'm not convinced that machine translation for ancient tongues will produce benefits equal to the costs of development.

My comments below are based on my own use of a variety of modern machine translation programs (free & shareware). Also, I can only speak about the specifics of Latin, Greek, & Old English, so perhaps things are different in Sanskrit other ancient tongues. Preemptive thanks to Amy Oh & Daniel Abosso (also Ph.D. students at the University of Illinois) for exchanging emails with me on this subject. Many of their thoughts are also represented here.

1. Ancient "prescriptive" grammars: If you limit Latin/Greek grammar to the forms & style learned in the first year of study, then perhaps I would agree that the grammar is - at this point - prescriptive enough to be translated easily by machine.

However Latin/Greek each survived for centuries, undergoing myriad diachronic changes along the way. The corpus of Latin/Greek literature is far from complete, and much of the untranslated material includes text written by people who themselves did not know or heed the rules of "classical" Latin grammar to perfection. They used idiosyncratic spelling (significant in an inflected language), anomolous tenses, creative structure. What would a translation program do with "bad" Latin such as that written by the marginally literate lower classes on curse tablets or Latin corrupted by centuries of manuscript transmission or the quasi-Latin intoned by certain Carolingian priests?

Compare the "creative" case usage in the formula rattled off by an unlearned priest: Baptizo te in nomine patria, et filia, et Spiritus Sancta with the correct version: Baptizo te in nomine patris et filii et spiritus sancti.

In my experience, small errors can completely break even a modern language translator (e.g. wrong gender of German adjective).

Anyway, the untranslated stuff out there tends to be imperfect Latin - even Latin mixed with early Romance borrowings/grammar. The language was constantly changing. I don't know as much about how Greek changed after the Koine period (though I imagine Byzantine Greek introduced many changes). Perhaps Greek after Koine enjoys a stability that Latin does not, since it (relatively) seamlessly segues into modern Greek, while colloquial Latin goes to seed, eventually producing the Romance languages. But I am just speculating here - a linguist would know more.

It's probably important to remember that untranslated texts are untranslated for a reason. There haven't been loads of new texts discovered recently. For whatever reason - unpopular genre, bad grammar, corrupted transmission, incomplete, difficult palaeography, etc. - they were neglected or spurned by some of the most productive scholars known in the field. This reason alone tends to exclude texts adhering to perfect Latin/Greek grammar from the list of literature needing translation. There are of course exceptions to what I've just said - but I think remembering the sociology of academic work does help when considering what texts one might wish to feed into a machine translation program.

2. Genre: Genre may make all the difference in whether a machine translation program can handle a text. Most poetic texts would be quite difficult to do well, I think, especially later poetry. The big epics have been translated, so I'm not sure what use machine translation would be there. Nuances of poetic texts (political allusions, underlying sexual connotations, etc.) would be completely lost. History/hagiography - possible, depending on the level of innovation and corruption. Same for some types of dialogues and sermons. Rhetorical prefaces? These are difficult even for scholars to translate into English - those of certain medieval & Renaissance authors are almost opaque because of their overcomplicated artistry.

And so on & so forth. Perhaps within a limited number of genres, machine translation could be useful, but not across all. Generic differences must be taken into account.

3. Reliability: It was pointed out to me that machine translation programs for modern languages exist to help people with tongues that could otherwise be learned through aural/oral means. This can't be done with ancient languages. There will always be some means of double checking the output of a translation program for a modern language like French. There exists no such way to reliably check the output for Latin/Greek - texts have been studied for centuries without consensus on interpretation/translation. So it is my humble opinion that using machine translation to decipher untranslated texts risks completely misconstruing the text if there is no paradigmatic translation existing. And if there is, then why use machine translation?

Surely, the risk of misconstruing the text exists even for machine translation of modern languages. And I'm sorry to say that, given the results I've seen for machine-translated German, French, Spanish, and Russian, I can't image the Latin/Greek results will be very intelligible at this point in time. Perhaps one day, but we're quite far off from that point.

Perhaps machine translation could work for inscriptions, since those are very limited in genre and scope, and tend to adhere to more basic rules of grammar. Generally abbreviations used in inscriptions are fairly standard too, which is not the case in medieval palaeography. (I don't see automatic transcription of Latin manuscripts - unless such transcription were limited to a few consistent hands or to a specific scriptorium where abbreviations were standardized) in the near future because of the multiplicity of abbreviations for meanings and vice versa. There's just too much variation.

4. Audience: Who would use machine translation programs for ancient languages? At the risk of sounding dogmatic, a classicist should never use such a program for the purposes of scholarship. For the reasons above, to do so would be irresponsible. I venture to say that machine translation often would give much impression of, for lack of a better phrase, "what text X is about." So, the program is really left for particular kinds of historians & archaeologists (and other related fields where the ancient languages may be one-year or so requirement). [Note: I don't mean to imply that all historians/archaeologists are trained without much concern for languages - but it must be acknowledged that some are.] And who would they rely on for accurate translations? Other people who have had one year of Latin?

Or is this being suggested for non-scholars? I'm not even sure a student reading Tacitus or Aristotle would be able to use a machine translation to check his/her own work...

 

Okay, I'm ready for the inevitable onslaught!

Tags and MT
Posted on Mar 17, 2009-11:37am by Michael Widner

Angela, the need for multi-lingual tags is an interesting idea. How do you see it working differently from, say, the ability to search for particular words in a text?

As for MT issues, you're right to raise the difficulties involved (I especially like the concern for genre), but my intuition is that the machines could do some of the heavy lifting of translation. Then humans would need to come in afterwards and fix things. Perhaps if we had a way of marking where the translation of given passages came from and a caveat to the users about reliability, the usefulness could outweight the potential pitfalls. You're absolutey right that such translations should not be relied upon for scholarly purposes. Instead, I see it as providing a larger audience with at least some access to texts that may have been completely opaque before. I'm sure scholars would still work on the original text whenever possible. One of the goals of GMAP, though, is to bring a global Middle Ages not just to scholars, teachers, and students, but to the population at large. We have multiple audiences to serve.

South Asia, Holy War
Posted on Mar 13, 2009-08:21pm by Ishan
Ishan
Offline

There's been a lot of recent discussion about Holy War and comparative studies of Holy War. There are classes about such topics, books already in print about this and more books to come. In principle, comparative studies of holy war is a great idea. We run into problems, however, when people work cross-culturally and end up labeling things in traditions with which they are less-familiar as holy war. I myself have seen the Bhagavad Gita labeled as an example of Holy War. Is it?

Well, given that it has gods advising humans, and a war going on, it might well be the case that it can fit into Holy War. But there's also a rather large history, the mainstream history, really, of the Gita being read as an ethical injunction that has nothing to do with Holy War in the sense of an organized violent action against an other (left just as that -- the soul, infidels that are really out there, the abstract idea of the infidel, etc.). At the same time, I'm sure that someone, somewhere, has used the Gita in exactly such a sense, as Holy War.

To say something categorical about the Gita, and any other really important text, seems impossible then. Whereas the Orientalists had no problem doing this, today we stop and say that 'well, there are a multiplicity of interpretations of the Gita,' but we rarely seem to have a sense of what this multiplicity is. We have no idea how the various interpretations are all connected, we have no idea how the multiplicity is assembled, and we can only deal with a few of the interpretations at a time.

Digital text makes assembling the multiplicities of interpretation possible and easy. In addition, working like this ensures that we are always working in the most context-sensitive of ways, and working from the ground up. How could we do this?

 

Start with the text, on a website. If, hypothetically, something like the aforementioned Perseus existed for Sanskrit, each word would already be a link to a dictionary entry and there would be multiple side-by-side translations and footnotes. The non-specialist would be able to work in just the same context-sensitive way as a specialist. Now, with tags, specialists could already have gone through and tagged areas of the text with various things.

The comparative researcher could go through and search to see where enough confluences of religious tags and martial tags occur to justify research on Holy War. Premodern exegetical commentaries of the text could be linked directly to the text itself. Remember that in the pre-modern period, they were linked just like this -- you never separated the text from its commentary, you learned the various commentaries of a text and the text together. The only difference is that back then, you had to memorize it all. And people did. Of course, no one does this anymore. The effect of this has been an artificial separation of exegesis and primary text over the last 200 years of Sanskrit research. Technology actually lets us relink the text to its exegesis. The non-specialist looking at such a text, then, would be able to see a map (by visualizing the network of tags) of the text, its interpretations and other connected texts immediately.

The Gita and Holy War are indeed connected, but not in the ways that have been suggested so far by the non-specialists that I have seen label the Gita as such. Thinking technologically about this would allow such a close level of ground-up analysis that anyone could find the right connections.

Thanks for reading!
Mock Up
Posted on Mar 13, 2009-11:38pm by Michael Widner

I think it would be fairly straight-forward to set up a proof of concept of this with some number of the many free transcriptions already available online. While the overlay of tags and commentaries on high-res images of manuscripts would have to wait, a purely textual version should be easy to do. Any suggestions for some of the first texts for the experiment?

That's what, in essence, this Canterbury Tales website attempts, except that it's not dynamic and collaborative. It's clearly been a labor-intensive work, but it seems fairly static and not as expansive as it could be. Librarius has 2007 as their copyright. I wonder if they'd be willing to open source the work they've already done.

I am going to research the necessary software to implement something like this, but with more open participation, and then host it on the web somewhere, either on my own domain or, if I can, on HASTAC or some place similar.

I am loving this discussion!
Posted on Mar 14, 2009-07:36am by Cathy Davidson
This is utterly fascinating. I don't have a lot to contribute except to say you should be in touch with Mark Olson, our Director of New Media, and I'm betting we will be able to host this on HASTAC, with great pride of place. If the details work, it would be an honor for us and we would love to honor you and all your terrific contributions (you individually and you as our first and really fabulous first year of HASTAC Scholars). Someday (really: I promise!) we will be launching the new website that is far more interactive and handsome and functional and less boxy, and state of the art Drupal programming and all that, and it will make hosting this easier and more appealing. We will, of course, also use our network to get out the word. Think of what a remarkable experiment it would be to have the curated, closed professional Canterbury Tales running on one site and then a collaborative, interactive version on HASTAC and then for us to be able to file continual reports, based on sound scholarly principles, on the two? That would be such a marvelous service to the profession and such a great test of Web 2.0 for humanities scholarship. I've written already about Galaxy Zoo where professional astronomers with the Sloan Digital Sky Survey crowdsource identification of celestial objects and distant galaxy. For Chaucer, we wouldn't have anything like those world numbers but we might surprise ourselves at the response. And we would learn whether we had many or few respondents. I hope we can work this out. Ishan, I hadn't checked out that feature of the Greek dictionary on Perseus and, conceptually, it seems to me like a brilliant compromise between machine-generated lexicography and human, between professional dictionaries and contributive ones. Some version of "mixed reality" scholarship may be the model we end up with. Galaxy Zoo, by the way, is a version of that kind of mix since one's answers are considered by humans and the patterns that emerge from wrong answers turn out to be what changes paradigms. If many people identify something inaccurately, what feature are the professional astronomers not seeing and that might be relevant? Sometimes the answer is "nothing" or "irrelevant" and sometimes it is a paradigm shift. Thanks again for a very interesting morning's reading. I look forward to more!
Functions Needed and Invitation to Participate
Posted on Mar 14, 2009-11:05pm by Michael Widner

Cathy, thanks for letting me know who to contact. I need to do some more research into existing tools so as to avoid duplicating much work. I suspect we'll need a new interface, at the least, though. In the meantime, I'm thinking about what sorts of functionality such a site would need. I've come up with:

Tags

Definitions

Variant readings

Commentary

Other links

I think tagging and commenting need to be distinct to keep from overloading the tags with too much information. The commentary would allow people to add interpretations, explanations, etc. to selected passages.

What other functions does everyone think would be helpful? I'd also like to invite anyone who's interested in joining to help me work on this project either by finding tools, thinking about interfaces, theoretical concerns, etc., finding resources, networking with the appropriate people, or anything else that you can do.

I'm shocked that this hasn't been done. There are a number of sites that provide some of the functionality we've been discussing, like the Perseus Digital Library Ishan told us about, but I have yet to find something that combines the openness of wikis and tags with scholarly apparatus for texts. It's nice to see that the Perseus library provides its software as Open Source. That's probably a good place to build on.

Collaborative Rubaiyat
Posted on Mar 15, 2009-11:56am by travis
travis
Offline

Hi Michael and Cathy,

You might take a look at the Collaborative Rubaiyat, an installation of our eComma annotation engine that we put together as part of an exhibition at the Harry Ransom Center here at UT Austin.

It presents multiple editions of FitzGerald's translation in a simple diff-like format, allows free tagging (and here, for another example), threaded commentary, word cloud searches, etc.

We're still working on adding functionality and simplifying the installation and setup process, but we're planning to release the software under the AGPL later this semester.

eComma
Posted on Mar 15, 2009-01:11pm by Michael Widner
Thanks, Travis. I just remembered your project earlier this morning. Would you mind giving us some details about the theoretical and usability discussions you've had while developing eComma and future directions you see for it? I know, for instance, from your blog that you're working on a redesign of the interface. Also, what are the technical details of the project? That is, what language did you write it in, what sort of backend does it have, and what, if any, existing tools were you able to adapt?
RE: eComma
Posted on Mar 16, 2009-02:02pm by Angela Kinney

I'd also like to hear more about eComma! I've signed up with an account & have been exploring...

One mundane question: do you foresee any problems with spam accounts or annotation spam? Now that spambots have little trouble registering accounts on forums, etc., I was just wondering what you were planning to combat abuse (or what you're already doing).

this discussion is amazing
Posted on Mar 15, 2009-11:22pm by Ishan
Ishan
Offline

I just read through this whole thing, and I've got to say that the amount of good ideas on this page is really quite something. Also, lots of good links and resources!

I'll add a few more to our list:

1.) Wikiversity: http://wikiversity.org/ Though many of the articles are still at a very formative stage, this is a great place for scholars and others to share information. I like that it isn't trying to present anything new, but rather just to collect what we've got, and serve as a good reference site with a focus on disciplinarity. Some classes I've been in recently have encouraged us to post on there and help out. I hope this trend continues. I often hear people say that wiki-type pages are not good sources, but user-defined content means that you can make it a good source if you don't think it is! The more people use these things, the better they get.

2.) Good online dictionaries (south asian languages only): http://dsal.uchicago.edu/dictionaries/ . Once you learn how to use these dictionaries, you can do a lot more with a digital dictionary than you can with a paper dictionary. For example, I used Platt's Urdu dictionary with the search box "ending in" and selected -na as my ending. Why? All verbs in Urdu end in -na. Result of my search? A very convenient list of all the verbs in Urdu. When displayed as a list of headwords and first-lines of definitions only, I've got a very convenient and quick reference that beats using a dictionary any day.

3.) Last thing to say: digital work really enables a different type of research altogether, I'd venture to say that it changes the way we think. We can now think in much more modular ways than we did before, separate parts from wholes, etc. For a long time, in South Asian studies at least, huge texts were given a privilege as whole texts; yet in S. Asia these texts were never primary as wholes in the way they were for the academic. I work on medieval S. Asian hagiographies. Academics have treated this genre as a given whole, reading the whole texts and isolating each text. When you talk to a handful of devotees at a temple, however, you realize that no one is reading the whole text like this -- people are discussing thematic chunks of texts (f.ex.: how do saints interact with animals?), and pulling a story from one hagiography into another story from some other genre together based on some unity completely not given in the form of the hagiographies. They are users of the texts. Most premodern manuscripts in literary cultures I am familiar with work like this too - one uses the text. If you don't like a verse, you don't have to copy it. If you want to change a word, you can. Contemporary S. Asian studies would often reconstruct a "critical edition" in this case, but the users of the text are saying something else -- they aren't going 'behind' the text, but going 'forward' with it. Working digitally lets us be users of the text again.

Project HESTIA (uses tech mentioned by Gerald...
Posted on Mar 16, 2009-06:04pm by Angela Kinney

A new project caught my eye the other day: Project HESTIA: the Herodotus Encoded Space-Text-Imaging Archive. The initiative aims to examine spatial representation in Herodotus' History and "develop visual tools to capture 'deep' topological structures of the text, extending beyond the usual two-dimensional Cartesian maps of the ancient world." It's a brand-new project, so there's not much more than an outline on the (very nicely designed!*) website. However, I challenge anyone to read the aims stated on the homepage & the project description without getting excited.

I am so eager to see the results...treatments of spatial representation are inherently fascinating to me and this project sets an ambitious goal of mapping both topological and narrative space (in addition to the temporal space, I suppose, though that isn't mentioned explicitly). This type of approach to Herodotus - in which his work may represent "a decentred or multicentred understanding of the Mediterranean world based on relational flow and connectivity" - has not been attempted before, to my knowledge.

In its goals, the initiative reminds me very much of the Global Middle Ages Project & Mappamundi (described in the third post of this thread by Professor Geraldine Heng, leader of both projects). The difference is perhaps that Project HESTIA involves studying one (quasi-comprehensive) text on numerous levels & aiming to map the world represented by this text in all of its deep complexity.

*A well organized project website is rare enough to find these days - sites as beautifully structured & artistic as those of Project HESTIA and the Global Middle Ages Project are a dream come true. A thank you to their respective designers from the bottom of my (irascible) heart!

Mappamundi and Project HESTIA
Posted on Mar 17, 2009-11:46am by Michael Widner

Angela, I immediately thought of Mappamundi, too. I attended a lecture Prof. Heng gave last night, in fact, in which she discussed the vision for the future of Mappamundi. Project HESTIA sounds like a natural partner. Even though it's classical rather than medieval, the concerns for mapping and visualization are so close that it would seem a shame if there weren't, at the very least, some mutual awareness between the projects.

Further, the question of how to represent a different conception of space is a fascinating one. We can't just overlay a Google world map with medieval maps and routes if we want to attempt to capture not only the "real" geography, but also the medieval experience of that geopgraphy. Maps raise a new set of questions about how to represent them as digital texts, questions I think we've just barely touched on so far in this discussion.

Wikimedia Commons accepts TIFF files
Posted on Mar 16, 2009-08:13pm by Durova
Durova
Offline
Good news for open collaboration: Wikimedia Commons began accepting TIFF file format today. This makes it much easier to collaborate on media files within a wiki environment because TIFF is an uncompressed format. http://durova.blogspot.com/2009/03/three-cheers-for-brion-vibber-erik.html
The Princeton Charrette Project
Posted on Mar 18, 2009-03:48pm by Michael Widner
I found out about another digital manuscript project last night. At a dinner I attended, I met Sarah-Jane Murray of Baylor University who described her work on the Princeton Charrette Project. It provides a listing and transcription of the manuscripts of Chrtien de Troyes's Le Chevalier de la Charrette. There is XML for each manuscript, a transcription key with images from the manuscripts, and images of each folio with line numbers superimposed. You can search for words, find poetic devices, and do other such analysis of the manuscripts. So, here's another resource to add to the list the offers some other possibilities for how to deal with manuscripts.