- Histograms

A histogram chart is best for showing the distribution of raw data, with the number of values displayed in each bucket. Although a histogram may look similar to a column chart, the two are different. Since histograms show continuous data, you can adjust the bucket ranges to explore frequency patterns. For example, you can shift histogram buckets from 0-1, 1-2, 2-3, etc. to 0-2, 2-4, etc. But column (and bar) charts show categorical data (such as the number of apples, bananas, etc.) so their ranges cannot be adjusted. Figure 7.28 shows a histogram of average daily calorie consumption in 174 countries in 2006–2008 based on data from the United Nations Food and Agriculture Organization.

Figure 7.28: Histogram: Explore the full-screen interactive version.

Histograms are designed to represent distributions, not individual values. Figure 7.28 shows two peaks, one of 2,100–2,300 calories and the second one at 2,700-3,100 calories (two consecutive buckets). We can also see that daily calorie consumption in five countries does not exceed 1,900, and exceeds 3,700 calories in three countries. Histograms by themselves, without annotations, don’t tell us what those countries are.

Note: You can view and copy the Google Sheet with the calorie consumption data. You can use Sort sheet A->Z function to see that five countries with lowest per-capita calorie consumption are Eritrea (1,590), Burundi (1,680), Comoros (1,840), Haiti (1,850), and Zambia (1,880). The three highest are Greece (3,710), USA (3,750), and Austria (3,800). Note that the dataset is over a decade old and things may be different today.

Histograms use buckets or bins, which are predefined ranges of values, and count how many data points (usually rows in your dataset) fall within each interval. Each interval’s count is represented by a bar. Intervals should not overlap, and we recommend you make them all equal size. Consider making your ranges “pretty”, that is, using whole numbers such as multiples of 5 (5, 10, 15, 20) or 100 (1500, 1600, 1700, 1800) for breakpoints, but only if that makes sense for your data distribution.

Now, let’s look at how to create the histogram in Figure 7.28 in Google Sheets. Our dataset contains two columns (Country and Average Daily Calorie Consumption) and 174 records, as shown in Figure 7.29. But in reality you only need one column that lists all values to build a histogram.

To create a histogram, you only need one column with numeric values.

Figure 7.29: To create a histogram, you only need one column with numeric values.

Select a column with values, and go to Insert > Chart. Google Sheets will likely automatically choose Histogram chart as the Chart type in Chart editor, but if not, use the dropdown and set it manually (you will find Histogram under the Other category). While some readers know what histograms represent, it may be useful to add a y-axis label (eg Number of countries) and a subtitle (eg Each bar represents number of countries per calorie range). You can add both from the Customize tab in the Chart editor.

To further assist the reader in interpreting the histogram, you can break down the column into individual items (in our case, countries), which will appear as blocks with white boundaries. You can do that, and manually set the range of each bucket in the Chart editor (Customize > Histogram > Show item dividers and Customize > Histogram > Bucket size). Larger intervals will contain more datapoints and will also appear wider in the chart as fewer larger intervals are needed to cover the entire range. Smaller intervals will contain fewer datapoints each, and will appear narrower.

Unfortunately, currently there is no way to get rid of decimal points in the x-axis labels in a Google Sheets histogram, even though all breakpoints may be integers.

TODO: DISCUSS with AMELIA and ILYA the organization of the Histogram section and its sample data. See Meeting Notes