Blog Post

Text without an identity

xml

There are five main parts of the TEI header which themselves have smaller parts. According to the TEI Guidelines:

<fileDesc> contains a full bibliographic description of an electronic file.

 <encodingDesc> documents the relationship between an electronic text and the source or sources from which it was derived.

 <profileDesc> provides a detailed description of non-bibliographic aspects of a text, specifically the languages and sublanguages used, the situation in which it was produced, the participants and their setting.

<xenoData> (non-TEI metadata) provides a container element into which metadata in non-TEI formats may be placed.

<revisionDesc>  (revision description) summarizes the revision history for a file.

The parts tell us the basic data, such as author, title, and publication distributor, but also more of the digital aspects of the actual item such as it’s paid availability online, different publication statements regarding it’s status on Google Books, and it’s language. I discovered from the header, which I had mostly guessed from looking at the page, that the electronic text file was scanned into Google Books and partially proofread for accuracy.

The relationship between these two items I see as the division between a book cover and its pages. A title contains publication data, title, author, and editor. It would also contain less obvious metadata, such as publication statement in the header, like book cover material, or antiquity factor. Things that aren’t specifically laid out but necessary for the whole picture and visualization of the book.

The header’s lengthiness is necessary mostly because it’s format gives organization white it’s white space and indentations. It’s not dense with content, but does take up a good amount of the page but this is just what comes with XML…. Another language may parse the data without needing so much space but then it wouldn’t be that readable by a person. There’s also the case that the data just isn’t field up, as only one of the tags are mandatory <fileDesc>, and otherwise there could just be nothing filled in the other tags. If you leave the header out completely, you may still get that data later in the body but it’s like missing the title of the book. If you look for the author, you know it’s right there on the cover. Otherwise it could be spread out amidst the book. The fact that the text is in electronic form already takes out many aspects of a book, such as smell and feel. Taking away the header would distance the text even more from a tangible book or even a book-esque item. Without the background details, it’s just text without an identity.

 

 

107

No comments