Chapter 15 Detect Lies and Reduce Bias
The goal of data visualization is to encode information into images that capture true and insightful stories. But we’ve warned you to watch out for people who lie with visualizations. Looking back at income inequality examples in the Introduction to this book, we intentionally manipulated charts in Figure 1.1 and Figure 1.2, and maps in Figure 1.3 and Figure 1.4, to demonstrate how the same data can be rearranged to paint very different pictures of reality. Does that mean all data visualizations are right? Definitely not. On closer examination, we declared that the second of the two charts about US income inequality was misleading because it intentionally used an inappropriate scale to hide the truth. But we also confided that the two world maps were equally truthful, even though the US appeared in a darker color (signaling a higher level of inequality) than the other.
How can two different visualizations be equally right? Our response may conflict with those who prefer to call their work data science, a label that suggests an objective world with only one right answer. Instead, we argue that data visualization is best understood as interpretative skill that still depends on evidence, but more than one portrayal of reality may be valid. As you recall, our field has only a few definitive rules about how not to visualize data, which we introduced in Chapter 7 on chart design and Chapter 8 on map design. Rather than a binary world, we argue that visualizations fall into three categories.
First, visualizations are wrong if they misstate the evidence or violate one of these rigid design rules. For examples of the latter, if a bar or column chart begins at a number other than zero, it’s wrong because those types of charts represent values through length or height, which readers cannot determine if the baseline has been truncated. Similarly, if the slices of a pie chart adds up to more than 100 percent, it’s wrong because readers cannot accurately interpret the chart, which also incorrectly presents data.
Second, visualizations are misleading if they technically follow the design rules, but unreasonably hide or twist the appearance of relevant data. We acknowledge that the word “unreasonably” can be subject to debate here, but we’ll review several examples in this chapter, such as using inappropriate scales or warping the aspect ratio. Inserting this category between wrong and truthful underscores how charts and maps can accurately display data and adhere to design rules, yet misdirect us from the truth, just as a magician knows how to misdirect their audience while performing sleight of hand tricks.
Third, visualizations are truthful if they show accurate data and follow the design rules. Still, there’s a wide spectrum of quality within this category. When looking at two visualizations that are equally valid, sometimes we say that one is better than the other because it illuminates a meaningful data pattern that we did not yet recognize. Or we may say that one is better because it portrays these patterns more beautifully, or with less ink on the page and greater simplicity, than the other. In any case, let’s agree that we’re aiming for truthful visualizations, with a preference for the better side of the quality spectrum.
In this chapter, you’ll learn to sort out differences between the three categories: wrong, misleading, and truthful. The best way to improve your lie detector skills is through hands-on tutorials in the art of data deception, to better understand how to lie with charts and how to lie with maps. As the saying goes, it takes a thief to catch a thief. Learning how to lie not only make it harder for people to mislead you, but also educates you more deeply about the ethical decisions we make when designing visualizations that tell the truth, while recognizing there’s more than one path to that destination. Finally, we’ll discuss how to recognize and reduce four general categories of data bias—sampling, cognitive, algorithmic, and intergroup—as well as spatial biases that more specific to working with maps. While we may not be able to stop bias entirely, in this chapter you’ll learn how to identify it in the works by other people, and strategies to reduce its presence in our own visualizations.38
The “how to lie” tutorials were inspired by several excellent works in data visualization: Cairo, The Truthful Art, 2016; Cairo, How Charts Lie, 2019; Darrell Huff, How to Lie with Statistics (W. W. Norton & Company, 1954), http://books.google.com/books?isbn=0393070875; Mark Monmonier, How to Lie with Maps, Third Edition (University of Chicago Press, 2018), https://www.google.com/books/edition/How_to_Lie_with_Maps_Third_Edition/MwdRDwAAQBAJ; Nathan Yau, “How to Spot Visualization Lies” (FlowingData, February 9, 2017), http://flowingdata.com/2017/02/09/how-to-spot-visualization-lies/; NASA JPL, “Educator Guide: Graphing Global Temperature Trends,” 2017, https://www.jpl.nasa.gov/edu/teach/activity/graphing-global-temperature-trends/↩︎