Bruce Schneier, one of the world's foremoest experts on cryptography and information security, has a great blog with tidbits from around the Internet. Last week he posted this gem (link: http://www.schneier.com/blog/archives/2009/09/hacking_swine_f.html), a hat tip to a gentleman who calculated the number of bits of entropy present in the genetic code of H1N1 swine flu virus. The number quoted (25 kilobits or 3.2 kilobytes of data) sparked an incredible discussion in the comments beneath the post, both about the virus itself and about the nontrivial problem of representing genetic "code" in computer-science terms, when in reality a very small amount of information is being processed in complex, multi-layered ways to draw out much more information than might be signified initially by the four-base combinations.
The bigger problem is how to compare informational representation. I remember reading one of my books about computers as a little kid that told me that your average CD could hold 72 minutes of music, or 600 copies of the Bible (text), or 19 hours of speech, or...many different things, all of them in different formats and all of them representing "information" in some way or form. The saying "a picture is worth a thousand words" comes to mind here. But a JPEG file is only recognizable as a picture by a human being, who may or may not draw the interpretation or meaning that the creator intended; it's also just a collection of bits. That same number of bits could be used to represent a text file, which again would only be human-readable and subject to different interpretations. A picture of a man holding a rose has different connotations for a viewer than writing the text "a man holding a rose", which will generate an infinite number of mental images, none of which will have the clarity of the aforementioned picutre. Which is "more" data?