Blog Post

Swine flu and information compression

Bruce Schneier, one of the world's foremoest experts on cryptography and information security, has a great blog with tidbits from around the Internet. Last week he posted this gem (link:, a hat tip to a gentleman who calculated the number of bits of entropy present in the genetic code of H1N1 swine flu virus. The number quoted (25 kilobits or 3.2 kilobytes of data) sparked an incredible discussion in the comments beneath the post, both about the virus itself and about the nontrivial problem of representing genetic "code" in computer-science terms, when in reality a very small amount of information is being processed in complex, multi-layered ways to draw out much more information than might be signified initially by the four-base combinations.

The bigger problem is how to compare informational representation. I remember reading one of my books about computers as a little kid that told me that your average CD could hold 72 minutes of music, or 600 copies of the Bible (text), or 19 hours of speech, or...many different things, all of them in different formats and all of them representing "information" in some way or form. The saying "a picture is worth a thousand words" comes to mind here. But a JPEG file is only recognizable as a picture by a human being, who may or may not draw the interpretation or meaning that the creator intended; it's also just a collection of bits. That same number of bits could be used to represent a text file, which again would only be human-readable and subject to different interpretations. A picture of a man holding a rose has different connotations for a viewer than writing the text "a man holding a rose", which will generate an infinite number of mental images, none of which will have the clarity of the aforementioned picutre. Which is "more" data?


1 comment

Really interesting Harrison. How folks interpret the representation gets into the idea of user-centered design. Representing knowledge should be done in a way that considers the user of the representation, yes. There's a lot of discussion about how library catalogs, being representations of knowledge in the form of taxonomies, have shaped the way people conceive of the knowledge the library contains. So the complement to what you're saying in a way - that the representation can influence the user's interpretation of what is being represented. You might be interested in Simon Spero's work (at UNC).

Also tangentially related to this is the work I'm doing with Stephanie Haas - looking at communication between public health workers in NC on responding to and managing H1N1. It's a social network analysis. So we will use sociograms to represent the way this network works. (Just coincidental that's it's also about H1N1.) Sociograms have been deemed a good way to represent social roles and connections. Sociograms are one type of model as are some of the systems analysis models you and I often talk about.

I love the whole notion of information/knowledge representation.