Chapter 1 Introduction
In this book, we will learn how to create data visualizations through a series of design principles and step-by-step tutorials, in order to make your information-based analysis and writing more insightful and compelling. Just as sentences become more persuasive with supporting evidence and source notes, your data-driven arguments can become more powerful when accompanied by appropriate tables, charts, or maps. While words tell us stories, visualizations show us data stories by transforming quantitative, relational, or spatial patterns into images. Data visualizations work best when they draw our attention to highly meaningful patterns that would be difficult to communicate through text alone, yet jump out at our eyes in pictures.
Our book features a growing number of free and easy-to-learn digital tools for creating data visualizations. Unlike infographics, which are typically created as one-time artwork, visualizations can be reused and modified multiple times by updating the underlying data. We also focus on creating interactive data visualizations that you can embed on your website. While static visualizations can be distributed on paper or PDF documents, interactive visualizations will engage wider audiences on the internet by encouraging them to interact with the data, explore patterns that interest them, and download files if desired.
But the remarkable growth of data visualizations and the internet in recent decades also poses a serious problem. Today we encounter more words, numbers, and images than at any prior point in our lives, but much of it is not trustworthy. The “information age” has devolved into an “age of disinformation.” When presented with conflicting stories, who do you believe? How do you make wise decisions about what information to trust? In particular, for this book about data visualization, what types of evidence persuade you, and why?
For example, how can you tell whether or not we—the authors of this book—are lying to you?
Let’s start with a simple one-sentence statement:
Claim 1. Economic inequality has sharply risen in the United States since the 1970s.
Do you believe this claim—or not? Perhaps you’ve never thought about the topic in this particular way before now (and if so, it’s time to wake up). It’s possible your response depends on whether this statement blends in with your prior beliefs, or pushes against them. Or perhaps you’ve been taught to be skeptical of claims lacking supporting evidence (and if so, thank your teachers).
So let’s move on to a more complex two-sentence statement that also cites a source:
Claim 2. In 1970, the top 10 percent of US adults received an average income of about $135,000 in today’s dollars, compared to the bottom 50 percent who earned around $16,500. This inequality gap grew sharply over the next five decades, as the top tier income climbed to about $350,000, while the bottom half barely moved to about $19,000, according to the World Inequality Database.1
Is this second claim more believable than the first one? It now makes a more descriptive claim by defining economic inequality in terms of average income for the upper 10 percent versus the bottom 50 percent over time. Also, this sentence pins its claims to a specific source, and invites us to read further by following the footnote. But how do these factors influence its persuasiveness? Does the wording of the claim make you wonder about the 40 percent of the population in between the two extremes? Or, what about the authority of the source? Who is responsible for creating the World Inequality Database, and do you trust them?
To answer some of those questions, let’s supplement the second claim with a bit more information, as shown in Table 1.1.
|Year||Top 10 Percent||Middle 40 Percent||Bottom 50 Percent|
Note: Shown in constant 2019 US dollars. National income for individuals aged 20 and over, prior to taxes and transfers, but includes pension contributions and distributions. Source: World Inequality Database 2020
Does Table 1.1 make Claim 2 more persuasive? Since the table contains essentially the same information as the two sentences about top and bottom income levels, it shouldn’t make any difference. But the table communicates the evidence more effectively, and makes a more compelling case. For many people, it’s easier to read and grasp the relationship between numbers when they’re organized in a grid, rather than complex sentences. As your eyes skim down the columns, you automatically notice the huge jump in income for the top 10 percent, which nearly tripled over time, while the bottom 50 percent barely budged. In addition, the table fills in more information that was missing from the text about the middle 40 percent, whose income grew over time, but not nearly as much as the top tier. Furthermore, the note at the bottom of the table adds a bit more context about how the data is “shown in constant 2019 US dollars,” which means that the 1970s numbers were adjusted to account for changes to the cost of living and purchasing power of dollars over a half-century. The note also briefly mentions other terms used by the World Inequality Database to calculate income (such as taxes, transfers, and pensions), though you would need to consult the source for clearer definitions. Social scientists use different methods to measure income inequality, but generally report findings similar to those shown here.2
Now let’s substitute a data visualization—specifically the line chart in Figure 1.1—in place of the table, to compare which one is more persuasive.
Is Figure 1.1 more persuasive than Table 1.1? Since the line chart contains the same historical start and stop points as the table, it should not make any difference. But the line chart also communicates a powerful, visualized story about income gaps that grabs our attention more effectively than the table. As your eyes follow the colored lines horizontally across the page, the widening inequality between the top versus the middle and bottom tiers is striking. The chart also packs so much granular information into one image. Looking closely, you also notice how the top-tier income level was relatively stable during the 1970s, then spiked upward from the 1980s to the present, far distant from the other lines. Meanwhile, as the middle-tier income line rose slightly over time, the lowest-tier slightly declined, meaning that the bottom half of the population has less purchasing power in 2019 than they did at their peak in the early 2000s. The rich got richer, and the poor got poorer, as they saying goes. But the chart reveals how rapidly those riches grew, and how poverty became even worse, over time.
What’s going on? If Figure 1.2 contains the same data as Figure 1.1, why do they look so different? What happened to the striking growth in inequality gaps, which now seem to be smoothed away? Did the crisis suddenly disappear? Was it a hoax?
Although the chart in Figure 1.2 is technically accurate, it misleads readers. Look closely at the labels in the vertical axis. The distance between the first and second figures ($1,000 to $10,000) is the same as the distance between the second and the third ($10,000 to $100,000), but those jumps represent very different amounts of money ($9,000 versus $90,000). That’s because this chart was constructed with a logarithmic scale, which is most appropriate for showing exponential growth. You may recall seeing logarithmic scales during the early months of the Covid pandemic, when they were appropriately used to illustrate very high growth rates, which are difficult to display with a traditional linear scale. This second chart is technically accurate, because the data points and scale labels match up, but it’s misleading because there is no good reason to interpret this income data using a logarithmic scale, other than to disguise the crisis. We have the ability to illuminate the truth with charts, but we can also use charts to lie.
Let’s expand our analysis of income inequality beyond national borders. Here’s a new claim that introduces comparative evidence and its source. Unlike the prior US examples that showed historical data for three income tiers, this global example focuses on the most current year of data available for the top 1 percent in each nation. Also, instead of measuring income in US dollars, this international comparison measures the percentage share of the national income received by the top 1 percent. In other words, how large a slice of the pie goes to the richest 1 percent in each nation?
Claim 3. Income inequality is more severe in the United States, where the richest 1 percent of the population currently receives 20 percent of the national income. By contrast, in most European nations the richest 1 percent receives a smaller share, ranging between 6 to 15 percent of the national income.3
Following the same train of thought above, let’s supplement this claim with a visualization to evaluate its persuasiveness. While we could create a table or a chart, those would not be the most effective ways to quickly display data for all of the nations featured in our claim, and we also want to encourage readers to explore information around the globe. Since this is spatial data, let’s transform it into a map, as shown in Figure 1.3.
Is Figure 1.3 more persuasive than Claim 3? While the map and the text present the same data about income inequality in the US versus Europe, there should be no difference. But the map pulls you into a powerful story that vividly illustrates gaps between the rich and poor, similar to the chart example above. Colors in the map signal a crisis. The US and some other nations stand out in dark red, at the highest level of the legend, where the top 1 percent of the population receives 20 percent or more of the national income. By contrast, as your eye floats across the Atlantic, nearly all of the European nations appear in lighter beige and orange colors, showing that their top-tier receive a smaller share of the national income, typically between 5 and 15 percent. Income inequality burns red in the US, while Europe appears calm.
Why does the second map in Figure 1.4 look different than the first map in Figure 1.3? The US is now shaded in a lighter purple color, similar to that of Europe. Did the inequality crisis simply fade away from the US? Which map tells the truth?
This time, neither map is misleading. Both make valid interpretations of the data with reasonable design choices, even though they create very different impressions in our eyes. To understand why, look closely at the legend in the bottom-left corner of both maps. The first map sorts nations in three ranges (0-10, 10-20, 20+), while the second map uses slightly different range cutoffs (0-9, 9-21, 21+). Since the US share is 20.5 percent, in the first map it falls into the top bucket with the darkest color, but in the second map it falls into the middle bucket with a lighter color. Yet both ranges are equally valid, as there is no clear rule about using one versus the other in this case. In the same way, the first map displays three shades of red, while the second map uses purple. Based on design principles discussed later in the book, both are equally valid color choices.
To summarize, people can lie with charts and maps, and some design choices can intentionally mislead. But data visualization is also interpretive and open-ended. It’s common to create more than one picture of the truth.
That poses a challenge for us as authors of this book. Data visualizations do not always represent the truth. Like words, they can be manipulated to mislead your audience. To help you create true and meaningful charts and maps, we’ll point you toward principles of good design, encourage thoughtful habits of mind, and in some cases specifically tell you what not to do. But data visualization can be a slippery subject, sometimes more art than science. Often there is no single correct answer to a visualization problem, but rather several plausible ones, each with their own strengths and weaknesses. As a reader of this book who’s learning about data visualization, your job is to search for good answers without necessarily expecting to find the one right answer, especially as visualization tools and methods continue to evolve. Also, keep your guard up and continue to watch out for data visualizations that lie.
World Inequality Database, “Income Inequality, USA, 1913-2019,” 2020, https://wid.world/share/#0/countrytimeseries/aptinc_p50p90_z;aptinc_p90p100_z;aptinc_p0p50_z/US/2015/kk/k/x/yearly/a/false/0/400000/curve/false.↩︎
The World Inequality Database builds on the work of economists Thomas Piketty, Emmanuel Saez, and their colleagues, who have constructed US historical income data based not only on self-reported surveys, but also large samples of tax returns submitted to the Internal Revenue Service. See WID methods at World Inequality Database, “Methodology,” WID - World Inequality Database, 2020, https://wid.world/methodology/. See overview of methodological approaches in Chad Stone et al., “A Guide to Statistics on Historical Trends in Income Inequality,” Center on Budget and Policy Priorities, January 13, 2020, https://www.cbpp.org/research/poverty-and-inequality/a-guide-to-statistics-on-historical-trends-in-income-inequality. See comparable findings on US income inequality by the Pew Charitable Trust in Julia Menasce Horowitz, Ruth Igielnik, and Rakesh Kochhar, “Trends in U.S. Income and Wealth Inequality,” Pew Research Center’s Social & Demographic Trends Project, January 9, 2020, https://www.pewsocialtrends.org/2020/01/09/trends-in-income-and-wealth-inequality/↩︎
World Inequality Database, “Top 1% National Income Share,” 2020, https://wid.world/world/#sptinc_p99p100_z/US;FR;DE;CN;ZA;GB;WO/last/eu/k/p/yearly/s/false/5.070499999999999/30/curve/false/country.↩︎