Geocode Locations into Coordinates with US Census or Google
Before addresses can be mapped, they need to be geocoded. Geocoding is a process of transforming a human-readable address, such as 300 Summit St, Hartford, CT into a latitude-longitude pair, such as 41.747,-72.692. These numbers are x- and y-coordinates that maps understand.
If you have just a few addresses, it might be faster to geocode them with Google Maps. Search for an address, right-click on that point, and select What’s here? to reveal a popup window with its latitude and longitude, as shown in Figure 11.2. You can copy and paste the coordinates into your spreadsheet. Similar tools also geocode one place at a time, such as LatLong.net.
But what if you need to geocode dozens, hundreds, or even thousands of addresses? In this section, we will look at two ways to geocode larger lists of addresses. First, you’ll learn how to use our custom-built Google Sheets Geocoder, which lets you upload up to 1,000 addresses and choose between Google Geocoder (available pretty much worldwide) and the US Census Geocoder (for US addresses only) Second, you’ll learn how to use a stand-alone US Census Geocoder that allows you to upload up to 10,000 addresses (for the US only) and download results.
TODO: I tried to add numerical limitations for each tool to clarify the “two ways”; check details
Geocode addresses with Google Sheets Geocoder
The Google Sheets Geocoder script lives inside a Google Sheet that you can copy to your own Google Drive (you don’t need editing access, just go to File > Make a copy).
The spreadsheet contains six columns. Populate the first column, Location, with your addresses. The remaining five columns will be filled by the geocoding script. Select all six columns, go to Geocoder in the menu, and choose which geocoding utility to use, like is shown in Figure 11.3.
Note: If your address data is split into multiple columns (such as Street, City, and State), revisit Clean Data with Spreadsheets section to remind yourself how to “glue” multiple cells into one.
If you run the script for the first time, Google Sheets may ask for permission or warn that the script is not safe. Disregard this message. The code is open-source and is available on GitHub, so you or your programmer friend can make sure it doesn’t steal your personal data.
TODO: Fix the Google App code to remove warning message
Once the script finishes executing, you will get a pop-up notification that will tell you how many addresses were successfully geocoded, and how many failed. Inspect Found and Quality columns to ensure the geocoder matched your addresses correctly. Then look at the failed addresses and see if you can spot problems with them. For tops about using Google geocoder, see documentation.
Note: The Geocoder plugin is a small Apps Script program that is connected to your Google sheet. It sends your addresses to either US Census Geocoder, or Google Geocoding API, and gets geocoded results as a response.
Geocode US addresses to census tracts with Google Sheets Geocoder
You can use a modified version of the Google Sheets Geocoder, available in its own spreadsheet, to assign census tracts and GeoIDs to addresses within the United States.
A GeoID is a unique identifier of a place according to the US Census. A sample 15-digit
090035245022001, consists of a state (09), followed by county (003),
followed by census tract (524502, or more conventional 5245.02),
followed by a census block group (2), and finally a census block (001).
Make a copy of the template spreadsheet into your own Google Drive by going to File > Make a copy.
You only need to populate the first column, Location. The rest seven columns will be populated by the Geocoder. Similar to the previous template, select all eight columns, and go to Geocoder > US Census 2010 Geographies, like is shown in Figure 11.4.
If you run this script for the first time, Google Sheets will ask you for permission to run, and will possibly warn you that this script is unsafe. Once again, you shouldn’t worry. The plugin is open-source and you can inspect it to make sure it doesn’t steal or retain your personal data.
Insert Google Sheets Geocoder script into your own spreadsheet
If you don’t want to make a copy of the Google Sheet templates from the previous examples, you can insert the open-source Geocoder scripts into your own Google sheet.
- In your personal Google spreadsheet, go to Tools > Script Editor. This should open up a new tab.
- Replace the empty
function myFunction()with the contents of
geocoder-census-google.gsfrom the plugin’s repo on GitHub.
- In Script Editor, click File > Save. An Edit Project Name window will pop up, where you should give the script a meaningful name, such as “Geocoder”.
- Close the Script Editor, and go back to your spreadsheet. Refresh and wait for a couple of seconds. Geocoder should appear in the menu.
Geocode up to 10,000 US addresses with US Census Geocoder
One of the fastest ways to geocode up to 10,000 US addresses at a time is to create a CSV file with 5 columns and upload it to Address Batch form of the US Census Geocoder. In the menu on the left-hand side, you can switch from Find Locations to Find Geographies if you wish to include census tract and GeoID data in addition to the coordinates.
Your CSV file needs to be formatted the following way, without the header row:
| 1 | 300 Summit St | Hartford | CT | 06106 | | 2 | 1012 Broad St | Hartford | CT | 06106 |
Here, the first column is unique IDs (make sure it is unique to each address, but they don’t have to start at 1 or be in an increasing order). The second column is street address. The third column is city. Column four is state, and the final fifth column is zip code.
Upload the file using the Browse… button of Select Address File, use Public_AR_Current Benchmark, and hit Get Results.
In a few moments (it usually takes longer for larger files), the tool will return a file named GeocodeResults.csv with geocoded results. Save it, and inspect it in your favorite spreadsheet tool. The resulting file is an eight-column CSV file with the original ID and address, match type (exact, non-exact, tie, or no match), and latitude/longitude coordinates.
Getting a tie matching means there are multiple possible results for your address. To see all possible matches of an address that got a tie, use One Line or Address tools in the left-hand side menu and search for that address.
Tip: If you see some unmatched addresses, use a filtering functionality of your spreadsheet to filter for unmatched addresses, then manually correct them, save as a separate CSV file, and re-upload. You can use the US Census Geocoder as many times as you want, as long as a single file doesn’t exceed 10,000 records.
- In reality only the first two columns, unique ID and street address, are required. City, state, and zip code may be left blank if you don’t have that data. But there are usually fewer exact matches for addresses with missing elements.
- If your data lacks ID values, you can create a column of consecutive numbers. See Calculate with Formulas and Functions section of this book to see how.
- Make sure your street addresses don’t contain city, state, and zip code data. If they do, use splitting text to columns technique, described in the Clean Data with Spreadsheets section of the book, to get rid of that extra data. But if your street addresses contain apartments, you can leave them in.
US Census Geocoder has a comprehensive overview and documentation that you can refer to if you encounter issues not covered here.
If for some reason you cannot geocode address-level data, but you need to produce some mapping output, you can use pivot tables to get counts of points for specific areas, such as towns or states. In the next section, we will look at hospital addresses in the US and how we can count them by state using pivot tables.