Chart Design Principles
There are so many different types of charts. However, just because data can be made into a chart does not necessarily mean that it should be turned into one. Before creating a chart, stop and ask: Does a visualized data pattern really matter to your story? Sometimes a simple table, or even text alone, can communicate the idea more effectively to your audience. Since creating a well-designed chart requires time and effort, make sure it enhances your data story.
Although not a science, data visualization comes with a set of principles and best practices that serve as a foundation for creating truthful and eloquent charts. In this section, we’ll identify some important rules about chart design. But you may be surprised to learn that some rules are less rigid than others, and can be “broken” when necessary to emphasize a point, as long as you are honestly interpreting the data. To begin to understand the difference, let’s start by establishing a common vocabulary about charts.
Deconstruct a Chart
Let’s take a look at Figure 7.1. It shows basic chart components that are shared among most chart types.
A title is perhaps the most important element of any chart. A good title is short, clear, and tells a story on its own. For example, “Pandemic Hits Black and Latino Population Hardest”, or “Millions of Tons of Plastic Enter the Ocean Every Year” are both clear titles that quickly convey a larger story.
Sometimes your editor or audience will prefer a more technical title for your chart. If so, the two titles above could be changed, respectively, to “Covid-19 Deaths by Race in New York City, Spring 2020” and “Tons of Plastic Entering the Ocean, 1950–2020.”
A hybrid strategy is to combine a story-oriented title with a more technical subtitle, such as: “Pandemic Hits Black and Latino Population Hardest: Covid-19 Deaths by Race in New York City, Spring 2020.” If you follow this model, make your subtitle less prominent than your title by decreasing its font size, or changing its font style or color, or both.
Horizontal (x) and vertical (y) axes define the scale and units of measure.
A data series is a collection of observations, which is usually a row or a column of numbers, or data points, in your dataset.
Labels and annotations are often used across the chart to give more context. For example, a line chart showing US unemployment levels between 1900 and 2020 can have a “Great Depression” annotation around 1930s, and “Covid-19 Impact” annotation for 2020, both representing spikes in unemployment. You might also choose to label items directly instead of relying on axes, which is common with bar charts. In that case, a relevant axis can be hidden and the chart will look less cluttered.
A legend shows symbology, such as colors and shapes used in the chart, and their meaning (usually values that they represent).
You should add Notes, Data Sources, and Credits underneath the chart to give more context about where the data came from, how it was processed and analyzed, and who created the visualization. Remember that being open about these things helps build credibility and accountability.
If your data comes with uncertainty (or margins of error), use error bars to show it, if possible. If not, accompany your chart with a statement like “the data comes with uncertainty of up to 20% of the value”, or “for geographies X and Y, margins of error exceed 10%”. This will help readers assess the reliability of the data source.
In interactive charts, a tooltip is often used to provide more data or context once a user clicks or hovers over a data point or a data series. Tooltips are great for complex visualizations with multiple layers of data, because they declutter the chart. But because tooltips are harder to interact with on smaller screens, such as phones and tablets, and are invisible when the chart is printed, only rely on them to convey additional, nice-to-have information. Make sure all essential information is visible without any user interaction.
Some Rules are More Important than Others
Although the vast majority of rules in data visualization are open to interpretation, as long as you honestly interpret the data, here are two rules that cannot be bent: zero-baselines for bar and column charts, and 100-percent baselines for pie charts.
Bar and Column Charts Must Begin at Zero
Bar and column charts use length and height to represent value, therefore their value axis must start at the zero baseline. This ensures that a bar twice the length of another bar represents twice its value. Figure 7.2 contrasts a good and a bad example. The same rule applies to area charts, which display filled-in area underneath the line to represent value. Starting the baseline at a number other than zero is a trick commonly used to exaggerate differences in opinion polls and election results, as we describe later in Chapter 15: Detect Lies and Reduce Data Bias
But the zero-baseline rule does not apply to line charts, which represent values through the position and angle of the line, not its height. Visualization expert Alberto Cairo demonstrates that starting a line chart at a number other than zero does not necessarily distort its encoded information because we rely on its shape, not its height, to determine its meaning.17
TODO: Confirm revision with Ilya; Add visual here of line chart baseline at zero versus higher number to illustrate the argument?
Pie Charts Represent 100%
Pie charts is one of the most contentious issues in data visualization. Most dataviz practitioners will recommend avoiding them entirely, saying that people are bad at accurately estimating sizes of different slices. We take a less dramatic stance, as long as you adhere to the recommendations we give in the next section.
But the one and only thing in data visualization that every single professional will agree on is that pie charts represent 100% of the quantity. If slices sum up to anything other than 100%, it is a crime. If you design a survey titled Are you a cat or a dog person? and include I am both as the third option, forget about putting the results into a pie chart.
Remember that you create a chart to help the reader understand the story, not to confuse them. Decide if you want to show absolute numbers, percentages, or percent changes, and do the math for your readers.
Avoid chart junk
Start with a white background and add elements as you see appropriate. You should be able to justify each element you add. To do so, ask yourself: Does this element improve the chart, or can I drop it without decreasing readability? This way you won’t end up with so-called “chart junk” as shown in Figure 7.3, which includes 3D perspectives, shadows, and unnecessary elements. They might have looked cool in early versions of Microsoft Office, but let’s stay away from them today. Chart junk distracts the viewer and reduces chart readability and comprehension. It also looks unprofessional and doesn’t add credibility to you as a storyteller.
Do not use shadows or thick outlines with bar charts, because the reader might think that decorative elements are part of the chart, and thus misread the values that bars represent.
The only justification for using three dimensions is to plot three-dimensional data, which has x, y, and z values. For example, you can build a three-dimensional map of population density, where x and y values represent latitude and longitude. In most cases, however, three dimensions are best represented in a bubble chart, or a scatterplot with varying shapes and/or colors.
Beware of pie charts
Remember that pie charts only show part-to-whole relationship, so all slices need to add up to 100%. Generally, the fewer slices—the better. Arrange slices from largest to smallest, clockwise, and put the largest slice at 12 o’clock. Figure 7.4 illustrates that.
If your pie chart has more than five slices, consider showing your data in a bar chart, either stacked or split, like Figure 7.5 shows.
Don’t make people turn their heads to read labels
When your column chart has long x-axis labels that have to be rotated (often 90 degrees) to fit, consider turning the chart 90 degrees so that it becomes a horizontal bar chart. Take a look at Figure 7.6 to see how much easier it is to read horizontally-oriented labels.
Arrange elements logically
If your bar chart shows different categories, consider ordering them, like is shown in Figure 7.7. You might want to sort them alphabetically, which can be useful if you want the reader to be able to quickly look up an item, such as their town. Ordering categories by value is another common technique that makes comparisons possible. If your columns represent a value of something at a particular time, they have to be ordered sequentially, of course.
Do not overload your chart
When labelling axes, choose natural increments that space equally, such as [0, 20, 40, 60, 80, 100], or [1, 10, 100, 1000] for a logarithmic scale. Do not overload your scales.
Keep your typography simple, and use (but do not overuse) bold type to highlight major insights.
Consider using commas as thousands separators for readability (
1,000,000 is much easier to read than
Be careful with colors
The use of color is a complex topic, and there are plenty of books and research devoted to it. For an excellent overview, see Lisa Charlotte Rost’s “A Friendly Guide to Colors in Data Visualization” and “How to Pick More Beautiful Colors for Your Data Visualizations,” both on the Datawrapper blog.18 But some principles are fairly universal. First, do not use colors just for the sake of it, most charts are fine being monochromatic. Second, remember that colors come with some meaning attached, which can vary among cultures. In the world of business, red is conventionally used to represent loss, and it would be unwise to use this color to show profit. Make sure you avoid random colors.
TODO: Ilya please review both of Rost’s blog posts above. Decide if there are any additional key principles we should briefly mention below. Also, decide if we should apply any of her principles to our own color choices for sample charts and maps in this book.
Whatever colors you end up choosing, they need to be distinguishable (otherwise what is the point?). Do not use colors that are too similar in hue (for example, various shades of green––leave them for choropleth maps). Certain color combinations are hard to interpret for people with colorblindness, like green/red or yellow/blue, so be very careful with those. Figure 7.8 shows some good and bad examples of color use.
If you follow the advice, you should end up with a de-cluttered chart as shown in Figure 7.9. Notice how your eyes are drawn to the bars and their corresponding values, not bright colors or secondary components like the axes lines.
TODO: Compare our rules and recommendations to different dataviz style guides, such as Urban Institute, and cite them http://urbaninstitute.github.io/graphics-styleguide/
Cairo, How Charts Lie, p. 61.↩︎
Lisa Charlotte Rost, “Your Friendly Guide to Colors in Data Visualisation,” Chartable: A Blog by Datawrapper, July 31, 2018, https://blog.datawrapper.de/colorguide/; Lisa Charlotte Rost, “How to Pick More Beautiful Colors for Your Data Visualizations,” Chartable, accessed October 21, 2020, https://blog.datawrapper.de/beautifulcolors/index.html↩︎