Information entropy – ah, hmmm, huh?
A friend of mine recently reviewed a book chapter for me, in which I examined what lies behind the concept of information overload. She asked why I’d chosen not to touch on information entropy. My answer was simple and somewhere along the lines of: “Duh?”
In the physics lab “entropy” is used to described certain states in thermodynamics. I’m no physicist, so bear with me on this one; the lab rats have been doing their best to explain things to me. So, in lay terms, entropy is used to describe
- Energy that is no longer available (an example of this would be a car where the brakes have been applied and where energy has been lost in road friction / heat).
- The amount of disorder or randomness in a system. Gas, as it whooshes about, being more random / disordered than a solid. (Or a group of adults who get up from the dinner table on New Year’s Eve and start dancing to Jeff Beck and Hi Ho Silver Lining being more random than the same group when sitting and eating.)
Okay, that’s the end of Thermodynamics 101.
But there’s also Information Entropy. This is very different but you need to know about the physics one (entropy as the second law of thermodynamics) so you can ignore it completely (for the time being).
Anyway, you can trace Information Entropy back to the 1940s and Claude.E.Shannon (1916-2001), known as the father of modern digital communications and information theory and his paper, “A Mathematical Theory of Communication” (1948, Bell System Technical Journal), which looked at the engineering challenges involved in getting a message from one point to another.
The information content of a message, he theorized, could be reduced to the number of ’1′s and ’0′s it took to transmit it. This idea was gradually adopted by communications engineers and stimulated the technology which led to the binary language that underpins the digital information age. Shannon also coined the term “bit” for a binary digit.
Shannon Entropy, sits within Information Theory, the mathematical discipline that looks at how information is stored, transmitted and reproduced. It measures it, accounting for the possible variables eg a flipping a coin (2 sides) will have less entropy than rolling a dice (6 sides). While Shannon Entropy is strictly applied to the the minimum amount of binary code required to transmit a message from A to B it is also being deployed by non-mathematicians as a way of showing how much information is unequivocally captured within a message (its meaning to the recipient). Shannon himself didn’t get sidetracked by the semantic value (language comprehension and connotation) in the message, just the engineering challenge of transmitting it from A to B intact. In fact, the application of entropy to wider semantic issues of meaning hacked Shannon off quite a bit, apparently.
Time for a joke I think…
Back in the days before email. Way, way, back. People used to send messages via telegram. Such communications were expensive and often charged by the word, so people became very economic with their phraseology. This was particularly evident among professionals who used telegrams regularly – ie journalists.
Back in the 1960s a journalist sent a telegram to the home of veteran Hollywood star Cary Grant. It was a simple question, in theory, designed to establish the actor’s exact age. The telegram read: “How old Cary Grant.” The reply that came back was: “Old Cary Grant fine. How you?” The joke, I believe, establishes the potential difference between the minimum character / bit count for information delivery and minimum required for accurate message comprehension / connotation. It would have been worth paying for the “is”.
You can also argue, well, I do, that the journalist was also applying data compression – the minimum number of words / bits required to convey the information. They fact that the journo failed shouldn’t prevent us acknowledging that they tried. You can also argue, well, I do, that the problem wasn’t the data compression but in its decompression by Cary Grant and what was probably a very knowing attempt to sidestep the sensitive subject of age.
Data compression is useful because it reduces space in information transmission and storage. But, at a language and messaging comprehension and connotation level, we ‘re also trying to apply reduction (compression) techniques so that we can dispense meaning the the minimum space / time possible. On one level this may be a practical desire to reduce issues around “information overload” but that doesn’t explain the phenomenal success of Twitter where the 140 character limit is almost winsome. Data compression at a semantic level is becoming more important if we believe that one key to resolving information overload is to reduce the amount of information people have to deal with. I have an alternative view about this which relates to how we feel about information and this was the subject of a recent survey on this blog. But I’ll save that for another day.
Okay – back into the physics lab
You remember I told you to forget all about the second law of thermodynamics for a bit? Now’s the time to start thinking about it again. What happened with Information Entropy was actually a bit of a hijack. The mathematicians kinda stole the word entropy and messed with it’s meaning a bit, on the basis that most of the population wouldn’t notice or understand. But there are aspects of thermodynamic entropy that are interestingly applicable for information and how it becomes more random / disordered as changes take place. In thermodynamics the classic example involves the ordered structure of sugar crystals compared with the disordered / random nature of sugar dissolved in water.
If you think about information and how it changes, it’s remarkably like the sugar dissolved in water. Over time, different bits of information get de-structured and mixed with other bits. It can become impossible to disentangle this information and restore it to the order of its original components. Looked at one way, this could result in knowledge. High quality information brought together, some bits lost / discarded along the way, but resulting in something different but useful. (It’s also entirely possible that there is a negative outcome possible where poor information is brought together resulting in dissatisfaction and misinformation.)
This makes for a slightly more refined version of the basic knowledge pyramid, which CDA used as the starting point for its Hierarchy of Mutuality and which is loosely modelled on Maslow’s Hierarchy of Needs*.
* Maslow argued that human beings required basic needs to be met in a hierarchy before they were free to realise themselves creatively and intellectually.
Maslow’s Heirarchy of Needs
Knowledge Pyramid
» CDA’s Heirarchy of Mutuality
The question is, where are we going with all this? CDA is currently actively engage in development measurement systems for online engagement. We believe that these have to be a mixture of qualitative and quantitative data to be truly meaningful and that there comes a point where you have to park interpretation of the metrics; dwell times, page views, bounce rates and simply ask “How was it for you?”
Contribute to the debate
I’m currently working on a second part to the article above which will also cover The Triangle of Truth (thanks Clodagh). I’d been interested in any feedback on the argument so far.
» Email me at the lab cdacontentlab@webwordsworking.co.uk









