I just read and thought some of you would also enjoy this peice on "The inevitable messiness of digital metadata" by David Weinberger of Harvard's Berkman Center and LIbrary Innovation Lab (and a Cluetrain co-author for fellow geeks who remember the pre Web 2.0 days). He writes in response to an op-ed by Neil Jeffries in Wikipedia's Signpost.
I have one of those brains that really likes categories. So on first reading, I felt a stab of shock at the proposal to give up on the idea of singular, overarching data standards. But at the same time I feel completely overwhelmed when I even attempt to contemplate how such a standard would actually work. Some of the challenges also seem to overlap with the issues of digital badges in education.
I wonder what some of you, especially doing technical digital humanities work or developing digital badge systems, think of this approach. How important are semantic data standards to your work? Are you making up your own data structures or do you depend on interoperability?
Read David Weinberger's Harvard post below.
He sets out two basic assumptions: (1) Data has meaning only within context; (2) We are not going to agree on a single metadata standard. In fact, we could connect those two points: Contexts of meaning are so dependent on the discipline and the user's project and standpoint that it is unlikely that a single metadata standard could suffice. In any case, the proliferation of standards is simply a fact of life at this point.
Given those constraints, he asks, what's the best way to increase the interoperability of the knowledge and data that are accumulating on line at a pace that provokes extremes of anxiety and joy in equal measures? He sees a useful consensus emerging on three points: (a) There are some common and basic types of data across almost all aggregations. (b) There is increasing agreement that these data types have some simple, common properties that suffice to identify them and to give us humans an idea about whether we want to delve deeper. (c) Aggregations themselves are useful for organizing data, even when they are loose webs rather than tight hierarchies.
Neil then proposes RDF and linked data as appropriate ways to capture the very important relationships among ideas, pointing to the Semantic MediaWiki as a model. But, he says, we need to capture additional metadata that qualifies the data, including who made the assertion, links to differences of scholarly opinion, omissions from the collection, and the quality of the evidence. "Rather than always aiming for objective statements of truth we need to realise that a large amount of knowledge is derived via inference from a limited and imperfect evidence base, especially in the humanities," he says. "Thus we should aim to accurately represent the state of knowledge about a topic, including omissions, uncertainty and differences of opinion."
Neil's proposals have the strengths of acknowledging the imperfection of any attempt to represent knowledge, and of recognizing that the value of representing knowledge lies mainly in its getting linked it to its sources, its context, its controversies, and to other disciplines. It seems to me that such a system would not only have tremendous pragmatic advantages, for all its messiness and lack of coherence it is in fact a more accurate representation of knowledge than a system that is fully neatened up and nailed down. That is, messiness is not only the price we pay for scaling knowledge aggressively and collaboratively, it is a property of networked knowledge itself.