Blog Post

Digital Databases

I first became aware of the possibilities of digital databases for historical research when using the Samuel J. May Anti-Slavery Collection to establish the reception and impact of a particular text, Emancipation in the West Indies by James A. Thome and J. Horace Kimball. A full text word search instantaneously called forth the minutes of the American Anti-Slavery Society, the public addresses and private correspondences of many prominent abolitionists like William Lloyd Garrison and Theodore Dwight Weld, as well as the letters of James G. Birney and other politicians. Although I also used non-digitized archival materials and hard-copies of multivolume edited primary source collections, this research experience came in stark contrast to the painstaking research of historians of an earlier generation, like Gilbert Hobbs Barnes, who had traveled to dozens of archives in order to construct an argument about the impact of antislavery literature like Emancipation in the West Indies.[1]

This example, and countless others like it, point to some of the many changes that the digitization of archival materials and the introduction of optical character recognition (OCR) has wrought on the study of the past. Now, the largest and most commonly known digital databases are Google Books, the HathiTrust Digital Library, the Internet Archive, and Open Library. The largest collections of digitized materials to date are in English, though Google Books is digitizing materials in at least seven other languages and there are other major projects digitizing materials in Latin, for example, and in French.[2] Tools such as Google N-Gram Viewer and Bookworm have made it possible to do frequency analysis of particular words over time.

Accessibility has been and will remain a major concern with digital archives. While the above collections and digitization initiatives like the Library of Congress’ American Memory Project or the University of Michigan’s Making of America are open access, many other collections remain accessible only through paid subscription or institutional affiliation. Some of the largest early curated collections – like the Early American Imprints series, Early English Books Online, or Eighteenth-Century Collections Online, for example – and some smaller and more tailored collections – like Alexander Street’s Black Thought and Culture or their North American Immigrant Letters, Diaries, and Oral Histories database, for example – still require subscription access. Similarly, the Gale Slavery and Anti-Slavery transnational archive, one of the largest of its kind, is available only through paid subscription. Many universities are moving to make their collections open-access, however. Harvard University Library Open Collections Program provides such databases as Immigration to the United States, 1789-1930, for example, and digital collections like their Latin American Pamphlets are also openly accessible.[3] Similarly, access to resources like Brown University’s Latin American Travelogues is complete free and open to all and is part of a broader collaboration with the Open Library project mentioned above.

In addition to collections of digitized archival material, there are also ever-increasing numbers of newspaper databases. The same issues of accessibility apply here, with such collections as America’s Historical Newspapers (including early newspapers, 1690-1922, African American Newspapers, 1827-1998, Ethnic American Newspapers, 1799-1971, and Hispanic American Newspapers, 1808-1980) and 19th Century Newspapers requiring paid access while other collections like NewsLibrary providing free accessibility. Many historians have also long been accustomed to using the powerful digitized archives of government and legal historical documents provided through companies like LexisNexis or Proquest, but we have yet to see how much of this material becomes freely available.

Although I have focused these introductory remarks on digital archives containing mostly primary source archival texts, we might also discuss the rapid proliferation of audio and visual archives as well. The online version of the Alan Lomax Collection of folk music, for example, premiered this year.[4] And, it is not just primary sources but also scholarly commentary that is increasingly going digital. The University of Wisconsin-Madison’s Havens Center Audio Archive, for example, contains podcasts from many renowned historians, including Thavolia Glymph’s most recently posted “Turned into the Streets: Black Women and Children Refugees in the Civil War.”

So, with that brief and necessarily selective introduction to some of the digital databases that are out there (apologies for also skewing the survey toward my own particular research interests in U.S. and Latin American history), we may want to discuss some of the new methodologies of historical practice these resources enable as well as the questions they inevitably raise.

One example is the way in which these databases open possibilities for doing concept history and the history of ideas in the field of intellectual history more generally. While diverse works of history – such as Raymond Williams’ Keywords: A Vocabulary of Culture and Society (1976), Daniel Rodgers’ Contested Truths: Keywords in American Politics Since Independence (1987), Eric Foner’s The Story of American Freedom (1998), Walter Mignolo, The Idea of Latin America (2006), Andrew Sartori’s Bengal in Global Concept History: Culturalism in the Age of Capital (2008), João Feres Júnior’s The Concept of Latin America in the United States (2010) – might broadly be thought of as interested in charting the history of particular words, ideas, or concepts, there is well-developed field of concept history which takes its inspiration from the work of Reinhart Koselleck and others originally begun in Germany (with similar national projects taking place in France, Finland, Netherlands, Denmark, Sweden, Italy, and Spain).[5] The journal Contributions to the History of Concepts has become the flagship publication for this research and the most exciting current research projects currently underway are seeking to write a transnational history of concepts.[6] As David Armitage has recently pointed out, the use of digital databases may help to bridge the gap between diachronic and synchronic approaches to intellectual history.[7] To me, the question of where to properly locate conceptual change remains open and should be productively contested; digital databases allow historians to analyze previously unimaginable amounts of material in previously impossible ways. (Here is an in-progress review essay I am working on about the field of concept history in relation to the field of intellectual history more generally for those interested to read more).

Some potential questions for discussion:

-       What other fields of historical inquiry are the proliferation of digital databases rapidly transforming and with what kind of benefits and consequences?

-       How have other people been using digitized databases in their own work and what possibilities and problems has this raised?

-       In what ways do digitized databases transform our pedagogy as well as our research?


[1] See, Gilbert Hobbs Barnes, The Anti-Slavery Impulse, 1830-1844(1933; 1964). See also, Barnes and Dwight L. Dumond, eds., Letters of Theodroe Dwight Weld, Angelina Grimké Weld and Sarah Grimké(2 Vols., 1934; 1965); Dumond, ed., Letters of James Gillespie Birney, 1831-1857(2 Vols., 1938) which have since been digitized.

[3] For a selection of other web-accessible Harvard Collections see, http://digitalcollections.harvard.edu/.

[4] Larry Rohter, “Floklorist’s Global Jukebox Goes Digital,” (New York Times, Jan. 30 2012), accessed 11/18/12, http://www.nytimes.com/2012/01/31/arts/music/the-alan-lomax-collection-from-the-american-folklife-center.html?pagewanted=all; Michael Martin, “Major U.S. Folk Music Archive Makes Online Debut,” (National Public Radio, May 9, 2012), accessed 11/18/12, http://www.npr.org/2012/05/09/152341534/major-us-folk-music-archive-makes-online-debut.

[5] See, Otto Brunner, Werner Conze, and Reinhart Koselleck, eds., Geschichtliche Grundbegriffe: Historisches Lexikon zur Politisch-sozialen Sprache in Deutschland, 8 vols., (Stuttgart, 1972-1990).

[6] See, Contributions to the History of Concepts, http://journals.berghahnbooks.com/choc/;  Iberconceptos project, http://iberconceptos.net

[7] See, David Armitage, “What’s the Big Idea? Intellectual History and the Longue Durée,” (History of European Ideas, 2012). See also, Martin J. Burke, “Conceptual History in the United States: a Missing ‘National Project,’ ” (Contributions to the History of Concepts, 2005); Hartmut Lehmann and Melvin Richter, eds., The Meaning of Historical Terms and Concepts: New Studies on Begriffsgeschichte (1996).  

______________________________

Some suggestions of further reading:

Armitage, David. “What’s the Big Idea? Intellectual History and the Longue Durée,” (History of European Ideas, 2012).

Bamman, David and Davis Smith, “Extracting Two Thousand Years of Latin from a Million Book Library,” (Journal on Computing and Cultural Heritage, 2012).

Burke, Martin J. “Conceptual History in the United States: a Missing ‘National Project,’ (Contributions to the History of Concepts, 2005).

Foner, Eric. The Story of American Freedom, (New York: W.W. Norton, 1998).

Ifversen, Jan. “About Key Concepts and How to Study Them,” (Contributions to the History of Concepts, 2011).

Júnior, João Feres. The Concept of Latin America in the United States (2010)

Koselleck, Reinhart. The Practice of Conceptual History: Timing History, Spacing Concepts, trans. Tood Samuel Presner, (Standford University Press, 2002).

--------. Futures Past: on the Semantics of Historical Time, tarns. Kieth Tribe (Columbia University Press, 2004).

--------. “Introduction and Preface to the Geschichlichte Grundbegriffe,” (Contributions to the History of Concepts, 2011).

Michel, Jean-Baptist and others. “Quantative Analysis of Culture Using Millions of Digitized Books,” (Science, 2011).

Mignolo, Walter. The Idea of Latin America (2006)

Rodgers, Daniel T. “Republicanism: The Career of a Concept,” Journal of American History79, no. 1 (1992): 11–38.

--------.Contested Truths: Keywords in American Politics Since Independence, New York, 1987.

Sartori, Andrew. Bengal in Global Concept History: Culturalism in the Age of Capital (University of Chicago Press, 2008).

Sebastián, Javiér Fernández, and others. Diccionario Político y Social Iberoamerico: Conceptos politicos en la era de las Independencias, 1750-1850, Vol.1 (Centro de Estudios Politicos y Constitucionales, 2008-).

Williams, Raymond. Keywords: A Vocabulary of Culture and Society (1976). 

104

18 comments

Another digital database type I'd throw into the mix is collections of born-digital artifacts. I recently heard a talk by Dan Cohen, in which he showed the Center for History and New Media's collection, the September 11 Archive, which is a collection of all kinds of born-digital remembrances and artifacts of 9/11/01. He mentioned that people  who are not historians have been using the database for non-historical purposes, such as mapping cell phone usage and analyzing teen slang. 

What that example, as well as your many excellent examples, shows us is that we should feel free to approach these huge databases with unorthodox questions, perhaps questions that tell us more about the culture than about our specific historical subject. 

Digital databases of texts also provide opportunities for breadth of inquiry that would not be possible without digitization. I, for one, have used America's Historical Newspapers to look at public interest in the navy frigate Philadelphia, a task that would not have been possible without keyword searches. (I will say, though, researching a ship named the Philadelphia is not all fun and games: a keyword search for the term Philadelphia is naturally problematic when you're looking for a ship and not a city. There wasn't a uniform code of reference for ships, not even a USS, so I had to get creative with my search keywords. Because of that difficulty, I'm sure I didn't actually see all the newspaper articles that referenced the Philadelphia, because I skipped over ones that appeared to be only about the city. The problems of the keyword search, though, don't negate the amount of time saved by not doing all the searches by hand.)

 

76

That is a great point about born-digital artifacts, Abby!

This also raises pressing questions surrounding the possibilities and problems of archiving born-digital materials, like emails, in terms of traditional primary source material historians often use (especially  governmental and nongovernmental organizations, businesses and corporations). This is already raising issues with the freedom of information act and struggles over novel types of classification and declassification.

I heard a talk on Wednesday by Jo Guldi on “The Long Land War: A Global History of Land Reform, 1860-Present” where she explained how she uses Zotero and the plug-in “Paper Machines” to deal with the overwhelming amount of paper generated by bureaucracies like the UN’s Food and Agriculture Organization. She also mentioned a growing community of radical archivists who are working on the frontlines of born-digital preservation, including the digital archive that is being put together by the Occupy Movement. To add to the considerations of accessibility, then, it seems to me that issues of surveillance also come to the fore (as police, FBI, CIA and so forth can readily tap these digital archives that are being molded for other purposes). 

Kirsten Weld, who commented on Guldi’s paper, also raised questions about archival preservation of materials pertaining to crimes against humanity when the documentation is in danger of deteriorating before it becomes declassified, pointing to the struggles of activists and radical archivists to just get in there and preserve it in digital form. She has a forthcoming book called Paper Cadavers: The Archives of Dictatorship in Guatemala that folks should check out if they are interested (and a related article on the recently discovered police archives in Guatemala in the Radical History Review entitled “Dignifying the Guerrillero, Not the Assassin: Rewriting a History of Criminal Subversion in Postwar Guatemala.”).

79

 

You've made a lot of great points regarding the role of digital databases in transforming the study of history. 

It's also becoming true that many people still prefer viewing print and hard copies over e-resources.  I believe it's because the physicality of the manuscript, archive or book makes it much more exciting to see as a research experience than a scanned copy online.  

I spoke to a few professors in various fields of history, particularly East Asian history, and some have told me that they love reading original manuscripts and its annotations, mainly concerning the thought process of the writer - how he/she started revising their ideas chronologically.  

Since the digital copies can't really project the density of the ink so well on the screen, the originals can reveal what was initially thoughtfully conceived, what was revised and the underpinning ideas behind the texts.

But in any case, I do see a trend of visitors altering their research methodology via online databases.  But of course, many of them (students particularly) always seem to start with "google" first and consider that as the ultimate "database."  

61

There are major benefits for historians to having access to digital databases in todays world. However as you point out, there are alot of costs in running a website and storing digital information. Many institutions that run the digital databases then pass on this cost to the reader and make it subscription based. This limits the accessability as some people cannot afford it or simply get put off by it. One way that some website databases have overcome this is by having advertisements cover the cost instead. Although this does seem to affect the perception of credibility it does allow for open access.

In my experience the most successful databases are the ones that are completely open access and do not have advertisements. The costs are usually covered by institutions such as Universities and Government Ministries in order to keep it running. One example of these databases is a website called Paperpast that has been useful to me in previous studies.

http://paperspast.natlib.govt.nz/cgi-bin/paperspast

 

78

I think that it is a great thing that historical document and artefacts are being put online for not just historians but other people who may be interested in looking at them, for example, family history. But as you have pointed out the problem with this is the lack of access to a lot of databases, with them costing.

I have tried using a few databases when I was doing a family tree and I wanted to know more about my ancestors. This though was made quite difficult with a lot of them that had the information you had to subscribe to. And I didn't want to pay monthly just to look at a few documents that may or may not relate to my ancestors.

Though as you have pointed out it does costs to digitise these things. It's not cheap to put it online. But then you have the databases that are free and you can go on and it's open access. Though I have found that some of these databases can be questionable and you need to check the information. Although the ones that cost, you should also be suspicious about as there are a lot of scams online.

The ones that I have found to be the best are the ones designed by the government, libraries and universities. These are the best and some of them cost and some don't it depends on the database and funding.

72

I do agree that it can be annoying to pay for access to a website.  But given the cost and need for constant revision, don’t you think that this is fair?  Most online subscriptions are not overly dear, and allow 24/7 convenient access.  I see making a minimum payment for subscription as a fair trade off instead of paying a lot more money to physically access them in the country of origin.  With the genealogy site for instance, if you wanted to find records of English ancestors, you would have to go to England to get that information if it was not digitized.  I would much rather prefer to pay $160 for a year subscription than the alternative.

However, I do believe that a monopolization effect can occur when big corporations have vast majority of sources that forces you to pay.  This is what I don’t like about it.  Just like you (and I’m sure most others), I would prefer to have free access, which is usually government funded.  But I will pay, if needed, to access information I need.

69

I do agree with you that when you compare the cost of accessing it online to going to the country of origin, that it is a cheaper option. As you have stated with genealogy you would have to go to the country where it would hold the records. So it is the better option in that way. But then you also have the problem that not all documents have been digitized and only a small portion has been put online. I don't want to pay for access when the site does not even have what I am after. As I do know it takes time but if your paying and they have yet to make it available you almost feel you're being taken advantage of. I think that the best solution before you pay try and see what information they have and is it what you want and is there another that is free. 

But in saying that a few of the sites I've visited that cost have a certain time where they are free. Just if you don't like the site, the trick is you have to make sure you cancel it before they charge you. Which I think could catch a few people out. http://www.ancestry.com.au/ is a site where it cost for UK heritage alone $21.95 a month but once again its trying to remember to cancel it as i don't know many people who will need to use at that much. As once you have the information you don't need the site and this could be in a week depending on what information you want. To me not many people will need it for a long term contract most databases don't take to long to distinguish what you want and then you have no use for it and cancel it but its already cost you for the month or the year whatever subsrciption you go for. For it to be fair I think needs to be free but there are other options which are free. It al depends on what you want some well have them others won't. The problem to me is that some cost. 

69

I'm on the same page as Kendal in terms of the barriers to accessing online information pertaining to family history. History online is a great medium for the archiving, researching and gathering of information for groups and individuals but the cost barrier is a big one in my opinon. When endeavouring to research into aspects of my own family tree I was a bit aghast at the prices of some of these databases that have the POTENTIAL to connect you with others that they or may not have sources and information relevant to you. While some databases like ancestory.com are arguably legitimate I find it a little distasteful that they want to charge me money for the potential to share and connect to my own family members but by the same token if the other family members are already paying they have access to my information.

I'm all for the free online databases like the maorilandonline.co.nz website for example.  I found a wealth of knowledge about my grandparents and great grandparents land interests. Some of which my family and I did not even know about! Didn't cost me a cent and unlike some databases it actually didn't require me to sign up to have access to any of this information.

 

70

 

Kendal :

You give us good point about The Cost of Accessing History Online, I agree with you, “it is a great thing that historical document and artifacts are being put online for not just historians but other people who may be interested in looking at them.” But for cost, I think I have different point with you.

If you want use a database, you may charged by that website or some organization. May be you just want to have looked a few filed, but still need log in the ID, and pay few money for first time. But I think if that website is trustworthy, I would like to pay the little money to get the real good answer for my question. Yes, we can see, the Databases make it easier for us to enter and get what we want, and save our times, and as we know, no good goods cheaper, why they spend lots of money to sharing thevaluable results and important information and number for free to public, and public, I mean may be they just have look for fun, not for study not for teaching, that is unfair and unsafe.

The company or organization set up the database for them works, so they must have payback for they worked hard, I think. So, maybe, Universities and Government Ministries can set up foundations to keep those website is running, I still think they should charged for visit.

75

I see your point and yes I agree with you to a point. I agree that though do alot of work to get things digitize and to put the information online. It takes time and money. Not to mention trying to keep it constantly updated with the correct the information. I disagree with that they should charge, if they set up a foundation. I think for some websites there should be funding. As I know alot of people who will choose to go on a free database than a cost one purely because they can't afford it. But I think thats dangerous as you can get some very wrong and dodgy databases. So if they can get funding and make them free people won't need to do that. But I do understand why they have to charge and I can accept their reasoning to do it. I wish they didn't have to but I do understand it.   

73

 

DigitalDatabase

I think Digital Databases is very useful for the scholar and students, like the Google Books, the HathiTrust Digital Library, the Internet Archive, and Open Library, which are really good information and books to doing the research, free sharing and no limited time. For example, the finance student doing their report cannot without the database online, the media study students also needs the database to do research by database. That is really useful, but also some issue we need to think about, like the how we can make sure the information is right and correct? How we can control the copyright online? 

84

Shanfa I agree with your thoughts about the digital databases. They are very useful for students and scholars as they help cut research time. 

One way to make sure the information is right and correct is to asses the credibility of the database. Some websites are volatile in the fact they let people edit the contents, an example of this would be wikipedia.org. Other databases are likely to be correct based on the reputation of the institution that runs it. An example of a credible database is http://archives.govt.nz/. This website is trustworthy because it was built and is maintaned by the Department of Internal Affairs New Zealand.  They take many steps to ensure the content is reliable and reputable. If there is any issues regarding the content they also have a complaints procedure so they can take the appropiate action. Since Archives NZ is funded and maintained by a department of the government they abide by all copyright laws. From these factors it is easy to asses the credibilty of this website and make sure the information in the website is correct. 

Your question in regards to copyright is an important one though and we do need to think about this when accessing digital databases.

72

 

 

Anthony:

Thank you point that, and yes, you are right, we can screen the website before we want have look, this is important. But, some time we do not have many choose online, I mean there are so many many website, we just can pass some famous one, but there is still some websites which looks like real one but they are not.  So I think, maybe in the future, the technology will be getting right way to fix that, will be very helpful.

76

I agree with you Shanfa about how useful digital databases are for students and scholars. That it had made access to imformation that much easier to study and gather data. But in saying that their is the downside that some of the information is wrong and isn't accurate. As with some things like wikipedia where anyone who has access to the internet can change it. Which you can get those people who change something to be funny or those who merely are incorrect. But as historians we are taught to doubt everything and to check. Which makes us lucky as we question everything. Unfortunately not everyone is taught this and some take things as face value and assume that is right.

So the problem is as you have pointed out to make sure the information is correct. Anthony you are right that one way is for people to check the credibility of it. Checking to see how they operate it and control the information being put in. This is one way to solve that problem. But you have the issue of what they may not sure. As they control what information is put on it. Some only put on what they think people will want to look for and ignore some other information because of it. Although I understand that it takes alot of data to put things online and you have to continue to change and adapt to the technology of the internet and that is constantly changing. 

Copyright is a massive problem online. As many people do not think that copyright applies online and that if its out on cyber space its free. But its not the case and the copyright applies online aswell. I can't really see a way this can be solved though realistically. They have tried but its just not possible to monitor it. The only thing that I can think of is to educate people and hope they have enough respect for other people's work to not take advantage of it.

 

74

I'm on board with what Kendal is saying here about relying on copyright to manage peoples behaviour in regards to databases. It is difficult to use it as a check in balance all the time. you really do just have to educate people and hope that they respect the work of others. 

I have to disagree with Anthony about whether or not just because it is a government department that they will adhere to ALL copyright laws. It is difficult for most people to adhere to them (or have adequate knowledge of them) let alone a government department.

The other thing about copyright law in NZ is that an infringement upon someones copyright has to be brought BY the creator of the copyright. There is no 'authority' to take infringements to, if you are just an average person noting someones infringement.

Noting copyright is a courtesy but I would think there would be a need for a stronger way to assess credibility etc...

85

Skye you make some interesting points. I did not know that New Zealand law requires the creator to bring charges. That should not be happening and maybe there need to be a change in the law. Though in saying that how would you monitor copyright? But there needs to be a place where you can say that this has been copyrighted. I know it's not the most important thing to be focusing on in terms of violating laws but it is people's work and it's pretty despicable that they are stealing their work and not acknowledging them.

84

Hey Kendal,

Yup Nz law requires that the creator of the copyright brings the proceedings agasint whoever it is that has infringed upon their rights. It goes the same way for monitoring copyright its essentially up to the creator to do that - you don't really know whats going on with your 'works' until you see someone infringing them. When you copyright something they come under one of the following headings; literary works, dramatic works, artistic works, sound recordings, films, communication works, typographical arrangement of published works. For infringement to even occur it has to be by the use of either the whole or substantial amount of the 'works' itself - if there isn't this then infringement is questionable. There are a number of acts that don't amount to infringing ie using a work for private study, govt using it but I agree its pretty bad that people can steal your work and get away with it. I also totally agree that parts of the law does need to be re-looked at in this country.

66

Hi Benjamin,

This was definitely an interesting read, thanks for sharing.  I just wanted to touch on one of your suggested questions for discussion regarding the ways in which digitized databases have transformed pedagogy as well as research.  I noticed that you were a former high school teacher.  I have almost finished studying to become a history teacher myself and would like to expand on this idea.  In New Zealand, the senior level history curriculum strongly pushes for teachers to include research or the incorporation of digital records in their pedagogy.  This desire stems from the suggested ‘key competencies’, or what we want students to get out of education in a holistic sense (See here: http://nzcurriculum.tki.org.nz/The-New-Zealand-Curriculum/Key-competencies) .   Basically, it calls for the students to become effective users and contributors to their chosen field of future professional pursuit.  This in turn suggests they need to be fully equipped with the knowledge of accessing and using research tools available.  However, I do feel that we have not got it completely right yet, and I am concerned about the quality and reliability of things found online.  Students can be quite lazy, which breeds passivity.  They will select the first thing that comes up on Google without giving a second thought to authenticity, authorship or reliability.  Digital databases of varying kinds have allowed this change to occur in the high school setting. 

I do believe however, that it has improved research and pedagogy in a University/Tertiary Education level setting.  Databases like ProQuest, JSTOR and Project Muse have absolutely improved my learning and research throughout study.  I have no idea how I would have done it without it as it makes research so much faster and more efficient.  This has transformed pedagogy in this setting in a huge way, giving more power to the student, in terms of efficiency, breadth of information, and amount of research that can be performed.

Therefore, I do think that age and scholarship play a huge role in the use of digital records and archives.  Those who have studied are able to identify quality records more effectively than younger audiences and the untrained eye.  I hope to teach this skill to my high school students, to give them an idea of how to access and use these databases correctly.

83