Public versus Private Data
In addition to asking questions about the origins and limitations of your data, it’s also important for you to be aware of important distinctions between public versus private data, and their implications for designing your visualizations. This section offers some general observations about data privacy based on our context in the United States. Since we are not lawyers (thank goodness!), please consult with legal experts for advice about your specific case.
In the United States, the 1966 Freedom of Information Act and its subsequent amendments have sought to open access to information in the federal government, with the view that increased transparency would promote public scrutiny and pressure on officials to make positive changes. In addition, state governments operate under their own freedom of information laws, sometimes called “open records” or “sunshine laws.” When people say they’ve submitted a “FOIA,” it means they’ve sent a written request to a government agency for information that they believe should be public under the law. But federal and state FOIA laws are complex, and courts have interpreted cases in different ways over time, as summarized in the Open Government Guide by the Reporters Committee for Freedom of the Press, and also by the National Freedom of Information Coalition. Sometimes government agencies quickly agree and comply with a FOIA request, while other times they may delay or reject it, which may pressure the requester to attempt to resolve the issue through time-consuming litigation. Around the world, over 100 nations have their own version of freedom of information laws, with the oldest being Sweden’s 1766 Freedom of the Press Act, but these laws vary widely.
What’s most important—and confusing—about access to US data is that individual-level data is usually considered private, except in certain areas where our governmental process has determined that a broader interest is served by making it public. On one hand, here are two categories where individual-level data is private under federal law:
Patient-level health data is generally protected under the Privacy Rule of the Health Insurance Portability and Accountability Act, commonly known as HIPAA. Public health officials regularly aggregate patient records into larger anonymized public datasets to track progress about various illnesses. This process keeps individual-level data about each patient private, but allows the public to benefit from information about broad trends.
Student-level education data is generally protected under the Family Educational Rights and Privacy Act, commonly known as FERPA. Public education officials regularly aggregate student records into larger anonymized public datasets to track the progress of schools, districts, and states. Once again, this process keeps individual-level data about each student private, but allows the public to benefit from information about broad trends.
On the other hand, here are three categories where government has ruled that the public interest is served by making individual-level data available to all:
Individual contributions to political candidates are public information in the US Federal Election Commission database. See related databases such as Follow The Money by the National Institute on Money in Politics and Open Secrets by the Center for Responsive Politics, which both describe more details about donations submitted through political action committees and controversial exceptions to campaign finance laws. Across the US, state-level political contribution laws vary widely, and public records are stored in separate databases. For example, anyone can search the Connecticut Campaign Reporting Information System to find donations made by the first author to state-level political campaigns.
Individual property ownership records are public, and increasingly hosted online by many local governments. This privately-funded US public records directory provides links to county and municipal property records, where available. For example, anyone can search the property assessment database for the Town of West Hartford, Connecticut to find property owned by the first author, its square footage, and purchase price.
Individual salaries for officers of tax-exempt organizations are public, which they are required to file on Internal Revenue Service (IRS) 990 forms each year. For example, anyone can search 990 forms on ProPublica’s Nonprofit Explorer, and view the salary and other compensation of the top officers of the first author’s employer, Trinity College in Hartford, Connecticut.
The boundary between what types of individual-level data should remain private or become public is continually changing, and subject to political and social pressures. On one hand, critics of “big data” and “surveillance capitalism” charge that governments seek more power and corporations seek more profits by collecting and commodifying massive amounts of personal data about each individual. On the other hand, the Black Lives Matter movement has gradually made more individual-level data publicly available on violence by police officers. For example, New Jersey state law required local police departments to make “use of force” reports publicly available, but no one could easily search these paper forms until a team of journalists from from NJ Advance Media created The Force Report public database, where anyone can look up individual officers and investigate possible patterns of violent behavior. Similarly, a team of ProPublica journalists created The NYPD Files public database, which now allows anyone to search closed cases of civilian complaints against New York City police officers, by name or precinct, for potential patterns of substantiated allegations. People working in the field of data visualization need to stay informed about the shifting boundary lines between private versus public individual-level data, and contribute to discussions about whose interests are served by making more data available.
TODO: ADD TO ABOVE? Similarly, the Washington post. Up with the West Virginia newspaper to obtain privately owned drug records Through a court order, which they transformed into a public database that allows anyone to search individual doctors prescribing narcotics for potential patterns of substance abuse
TODO: ADD – a deeper concern is privately-owned individual-level data The credit score companies know my purchases in my payment history on my mortgages and credit cards Amazon knows my purchase history Netflix knows viewing history Google knows my browsing history Apple knows my location history via iPhone When people criticize big data, are usually refer to private companies compiling individual data