Open Data Repositories

Over the past decade, an increasing number of governmental and non-governmental organizations around the globe have begun to pro-actively share public data through open data repositories. While some of these datasets were previously available as individual files on isolated websites, these growing networks have made open data easier to find, enabled more frequent agency updates, and sometimes support live interaction with other computers. Open data repositories often include these features:

  • View and Export: At minimum, open data repositories allow users to view and export data in common spreadsheet formats, such as CSV, ODS, and XLSX. Some repositories also provide geographical boundary files for creating maps.
  • Built-in Visualization Tools: Several repositories offer built-in tools for users to create interactive charts or maps on the platform site. Some also provide code snippets for users to embed these built-in visualizations into their own websites, which you’ll learn more about in Chapter 7: Embed on Your Web.
  • Application Program Interface (APIs): Some repositories provide endpoints with code instructions that allow other computers to pull data directly from the platform into an external site or online visualization. When repositories continuously update data and publish an API endpoint, it can be an ideal way to display live or “almost live” data in your visualization, which you’ll learn more about in Chapter 10: Leaflet Map Templates.

Due to the recent growth of open data repositories, especially in governmental policy and scientific research, there is no single website that lists all of them. Instead, we list just a few sites from the US and around the globe to spark readers’ curiosity and encourage you to dig deeper:

  • Data.gov, the official repository for US federal government agencies.
  • Data.census.gov, the main platform to access US Census Bureau data. The Decennial Census is a full count of the population every ten years, while the American Community Survey (ACS) is an annual sample count that produces one-year, three-year, or five-year estimates for different census geographies, with margins of error.
  • Eurostat, the statistical office of the European Union.
  • Global Open Data Index, by the Open Knowledge Foundation.
  • Google Public Data.
  • Google Dataset Search.
  • IPUMS, Integrated Public Use Microdata Series, the world’s largest individual-level population database, with microdata samples from US and international census records and surveys, hosted by the University of Minnesota.
  • openAfrica, by Code for Africa.
  • Open Data Inception, a map-oriented global directory.
  • Open Data Network, a directory by Socrata, primarily of US state and municipal open data platforms.
  • World Bank Open Data, a global collection of economic development data.

In addition, students, staff, and faculty at better-funded institutions of higher education also may have access to a paid library subscription to “closed” data repositories. For example, Social Explorer includes decades of demographic, economic, health, education, religion, and crime data for local and national geographies, primarily for the US, Canada, and Europe. Previously, Social Explorer made many files available to the public, but it now requires a paid subscription or 14-day free trial.

Now that you’ve learned more about navigating open data repositories, the next section will teach you ways to properly source the data that you discover.