Edward Tufte has been hailed as the 'da Vinci of data' by the New York Times, and with good reason. He has spent much time researching and communicating best practices for presenting complex datasets, with some counter-intuitive suggestions at times. While his work is heavily applicable to presentation of statistical data in general, some insights can be gleaned and re-purposed for a new age of infographics and visualizations.
If his entire body of work were to be summarized to a single catch-phrase (or two), they would be - show as much data as possible, and let the data speak for itself. Philosophically, he counsels treating the audience as if they were intelligent (see what I meant by counter-intuitive?) and treating the data - all of it including the important bits, the favourable bits and the not-so-favourable bits - as sacrosanct.
In a tip of the hat to would-be visualizers, he thinks of graphic representation of data as "intelligence made visible" - which puts the onus of the presenter to make the data accessible and understandable, while not "dumbing it down". What does that mean in practice though? Tufte's take is that data helps decision-making - analytical thinking often calls for decisions to be made based on evidence. One way in which data can help decision-making is by demonstrating *how* things work - the mechanism, the tradeoffs, the process & dynamics, or cause & effect. From this understanding, visualizers can understand one clear goal - clearly demonstrate the deviations in the data that serve the analytical task at hand. In Tufte's view, to the extent that data presentation is done with the purpose of informing, initiating dialogue or invoking action, intervention-thinking or policy-thinking demand causality-thinking.
But this is strictly not a license for cherrypicking or otherwise misrepresenting data - a big no-no. It is a disservice to the audience and it is dishonesty, pure and simple.
Using an example of Dr John Snow's deduction of cholera as the cause of a sudden string of deaths in a localized area in the mid-19th century, Tufte gleans the following heuristics for communicating data in a sincere and useful way.
- Place data in an appropriate context for assessing cause and effect.
- Make quantitative comparisons - this is what graphical representation is best at.
- Consider & present alternative explanations & exceptions. (Trust the data, and trust the audience.)
- Present the uncertainties and all efforts taken to minimize them.
Tufte is famous for coining terms like "chartjunk' (all presentational elements that do not directly contribute to comprehending data like gridlines, faux-3D, trendlines etc), "high data-to-ink ratio" (ink used for representing data / ink used in the plot) and "1+1=3" (chartjunk in close proximity has a tendency to confuse more than clarify). As is probably obvious, he is a proponent of presenting as much data as possible in a minimal, no-frills style. Take heart, fellow visualizers, this isn't to say that infographics can't be pretty, just that good design is rarely surface veneer.
Tufte, a statistician at heart, does not recommend graphic representation where it is unnecessary. A relevant quote by Ad Reinhardt: "if a picture isn't worth a 1000 words, to hell with it!"
So that's Tufte's philosophy as it pertains to data visualization. Here's a link to the cliffnotes of Tufte's famous one-day "boot camp" for more tips on presenting statistical data, and presenting in general.