Blog Post

TEI Headers and their Meaning

Image representing concept, syntax, and language.

The TEI header has five primary parts: the file description, encoding description, text profile, container element, and revision history. The file description contains a file bibliographical description of the file, and the encoding description entails details about the transcription, analysis and encoding of the file. The text profile classifies and gives context to the text. The container element provides a location for the inclusion of non-TEI metadata, and the revision history, as one might expect, provides a history of the changes to the document.

The file description in the provided file contains the title of the document, the distributor (Dartmouth) with Professor Leuner as the contact email. It also contains a license to use the work, and bibliographic information about the document and its publication. The encoding description states that the text file was keyed from a scanned image on Google Books, and the text profile specifies that the document is in English. A container element is not used, and there is no revision history which makes sense because we are just beginning to work on the document.

The header describes and provides metadata for the information that is contained in the body. Without the header, you lose context for the document. You do not know how the document was created, whether editorial changes were made, or what else was involved when the document was encoded. The header does not necessarily need to be lengthy, but if it is not then it contains diminishing amounts of information.

There are often multiple possible syntaxes that will result in the same or nearly the same result: i.e. using multiple lines of <p> vs using a single <p> with line breaks within it. Are there general community standards and such, or is it often project specific? If it is project-specific, doesn’t this make some encodings more difficult to read at times than others?



1 comment

Hi Kevin,

You tapped into one of the fundamental traits and debates of the TEI: the fact that there are lots of ways to tag or markup the same thing. This flexibility is a great advantage in terms of having options, because as we discussed, the tags you choose have big implications for the meaning you assign to aspects of a text. However, this also means that different projects might be accomplishing the same tagging procedure in different ways. To establish a kind of "best practice" for the community to follow, the TEI is not only a list of guidelines, but it is also a consortium, a community of those who use the TEI and think about how we can make it fit the needs of humanities projects as work evolves. The TEI community (and I consider myself to be part of this) discusses our questions:

So, yes, there are some general community standards, but it often requires a little bit of discussion to ascertain how other projects are handling a given tagging scenario. Editors can then decide if that approach will serve their needs, or if they need to forge a new path using the guidelines. When "best practice" seems to deviate from what the guidelines say, the TEI board meets and makes changes to the guidelines so that the guidelines reflect how the community uses the TEI. Does this answer your question? Great work, by the way!