In the early stages of a visualization project, we often start with two interrelated issues: Where can I find reliable data?, and after you find something, What does this data truly represent? If you leap too quickly into constructing charts and maps without thinking deeply about these dual issues, you run the risk of creating meaningless, or perhaps worse, misleading visualizations. This chapter breaks down both of these broad issues by providing concrete strategies to guide your search, understand debates about public and private data, mask or aggregate sensitive data, navigate a growing number of open data repositories, source your data origins, and recognize bad data. Finally, once you’ve found some files, we propose some ways to question and acknowledge the limitations of your data.
Information does not magically appear out of thin air. Instead, people collect and publish data, with explicit or implicit purposes, within the social contexts and power structures of their times. As data visualization advocates, we strongly favor evidence-based reasoning over less-informed alternatives. But we caution against embracing so-called data objectivity, since numbers and other forms of data are not neutral. Therefore, when working with data, pause to inquire more deeply about Whose stories are told? and Whose perspectives remain unspoken? Only by asking these types of questions, according to Data Feminism authors Catherine D’Ignazio and Lauren Klein, will we “start to see how privilege is baked into our data practices and our data products.”9