Bulk Geocode with US Census
In Chapter 3: Strengthen Your Spreadsheet Skills, you learned how to geocode addresses with a Google Sheets Add-On called Geocoding by SmartMonkey. Geocoding converts street addresses to latitude-longitude coordinates (such as 300 Summit St, Hartford CT, USA to 41.75, -72.69) that can be placed on maps. While the Geocoding by SmartMonkey Add-On for Google Sheets works well for medium-sized batches of addresses, sometimes you need a faster geocoding service for larger jobs.
One of the fastest ways to geocode up to 10,000 US addresses at a time is to use the US Census Geocoder. First, create a CSV file with 5 columns. Your file must not contain a header row, and needs to be formatted the following way:
| 1 | 300 Summit St | Hartford | CT | 06106 | | 2 | 1012 Broad St | Hartford | CT | 06106 |
- Column 1: Unique IDs for each address, such as 1, 2, 3, etc. While it does not necessarily have to start at 1 or be in consecutive order, this is the easiest. To quickly create a column of consecutive numbers in most spreadsheets, enter
1, select the bottom-right corner of the cell, hold down the Option or Control key and drag your mouse downward.
- Column 2: Street address.
- Column 3: City.
- Column 4: State.
- Column 5: Zip Code.
Although some of your data, such as zipcodes or states, may be missing and the geocoder may still be able to recognize and geocode the location, unique IDs are absolutely necessary to include for each row (address).
Tip: If your original data combines address, city, state, and zip into one cell, then see how to Split Data into Separate Columns in Chapter 5: Clean Up Messy Data. But if your street addresses contain apartment numbers, you can leave them in.
Second, upload your CSV file to the US Census Geocoder address batch form. Select Find Locations Using… > Address Batch, then choose your file to upload. Select Public_AR_Current as the benchmark, and click Get Results.
Note: In left-side menu, you can switch from Find Locations to Find Geographies
if you wish to obtain additional information, such as the GeoID for each address. The US Census assigns a unique 15-digit GeoID to every place, and a sample (such as
090035245022001) consists of the state (09), followed by the county (003),
the census tract (524502, or more conventional 5245.02),
the census block group (2), and finally the census block (001).
In a few moments the tool will return a file named GeocodeResults.csv with geocoded results. It usually takes longer for larger files. Save it, and inspect it in your favorite spreadsheet tool. The resulting file is an eight-column CSV file with the original ID and address, match type (exact, non-exact, tie, or no match), and latitude-longitude coordinates. A tie means there are multiple possible results for your address. To see all possible matches of an address that received a tie, use One Line or Address tools in the left-side menu and search for that address.
Tip: If you see some unmatched addresses, use a filtering functionality of your spreadsheet to filter for unmatched addresses, then manually correct them, save as a separate CSV file, and re-upload. You can use the US Census Geocoder as many times as you want, as long as a single file doesn’t exceed 10,000 records.
To learn more about this service, read the Overview and Documentation section of the US Census Geocoder.
If for some reason you cannot geocode address-level data, but you need to produce some mapping output, you can use pivot tables to get counts of points for specific areas, such as towns or states. In the next section, we will look at hospital addresses in the US and how we can count them by state using pivot tables.