Fixing cities' data privacy potholes

Several issues appear in the crossroads of data privacy and smart cities' push toward open data frameworks. Municipal organizations must create policies, right now, to keep from spilling personally identifiable information and ensure privacy.

Back in 2015, Howard Matis, a physicist who works at the Lawrence Berkeley National Laboratory in California, didn’t know that his local police department had license plate readers (LPR). And if they did, so what? (It was Oakland. They did. In 2015, the police had 33 automated units.)

Matis wasn’t particularly worried about police capturing his movements. What difference does it make? How private is your car anyway, sitting in plain sight, on a public city street?

But Matis changed his mind after he gave permission to Ars Technica’s Cyrus Farivar to get data about his car and its movements around town.

It wasn’t a difficult technical feat. The data is accessible via public records law. Using a Freedom of Information Act (FOIA) request, Farivar got the entire LPR data set of the Oakland Police Department, including more than 4.6 million reads of over 1.1 million unique plates captured in just over three years. Then Ars showed Matis what could be done with the information.

Ars hired a data visualization specialist who created a simple tool. It allowed the publication to search any given plate within the massive data set—we're talking 18 Excel spreadsheets—and to plot its locations on a map. That’s when things got a bit more worrisome for Mr. Matis.

After Ars ran his plate, the journalists showed the physicist a map of the five instances where a camera had captured Matis' car, guessing (correctly) that the locations were near where he lived or worked. These were places where, Matis confirmed, he and his wife go “all the time.”

Learn the future of public sectors in a citizen-centric digital world.

How in the world was it OK to just hand that over to anybody who asked for it, Matis wondered? "If anyone can get this information, that’s getting into Big Brother," Matis mused. "If I was trying to look at what my spouse is doing, [I could]. To me, that is something that is kind of scary. Why do they allow people to release this without a law enforcement reason? Searching it or accessing the information should require a warrant."

The issue gets to the heart of a problem with the open data on which smart cities run—and upon which they’re building the cities of tomorrow. Namely, there's so much data, coming in at such a fast, furious pace from sensors everywhere, that "people are dealing with data in rates they just don't understand right now," says Charles Belle, founder and CEO of Startup Policy Lab, a nonprofit think tank dedicated to connecting policymakers and the startup community. Local governments are becoming more and more like Google: They're custodians of an increasing number of data sets, be it policing, transportation, housing, or infrastructure data, and it can lead to smart cities acting pretty dumb.

Read on for what city planners and technologists can do to avoid falling into these kind of privacy potholes.

The problem with policies

The Oakland fiasco "highlighted how some people get upset that [the police] were collecting data,” says Belle. That was, actually, a kind of common practice at the time, he says. But on the policymaking side, the concern was more basic. Belle points to Oakland having "zero policies to manage the data—how it could be used, not just internally but externally by third parties.”

"Those are the kind of problems that should not occur," Belle says. "With expertise, you can ask, ‘Does it make sense to hand that over?’ Somebody can say no, we're not going to release that; you have to submit a more scoped request."

This isn't cyber wizardry. It's not a foreign government's hacking crew phishing email credentials out of city employees to get at citizens' data in order to commit mass identity fraud. And it's certainly not an Equifax-like failure to promptly patch software that then leads to a massive data breach. The problem with Oakland was certainly not that FOIA requests were a new, technology-introduced phenomenon.

It's simply that a long-term FOIA request policy somehow was never transferred over to the gush of data that came with open data. This was one of five critical issues that arose at a roundtable on government data and privacy organized by the Startup Policy Lab organized and the city attorney of Oakland.

In a nutshell, open data frameworks are typically proactive. New information is put online, where people are looking for it. Contrast that with the way local governments have tended to react to FOIA requests. First, FOIA requests cost labor and money, being outside the normal workflow. Second, they typically come from journalists, who are often seen as antagonists to government agencies, hence considered more of a nuisance than a benefit for anybody in government.

Open data is about opening up data. FOIA frameworks are typically closed. The policies don't automatically transfer from one framework to the other. At the heart of the problem is that open data initiatives are typically taking place in fits and starts. Thus, the differing frameworks coexist, creating confusion for government officials, citizens, companies, and others.

This is the kind of problem that can be spotted by somebody with the appropriate skills, Belle says. In other words, your city needs a chief data officer (CDO), as well as a team of people with the right expertise, including the legal and technical know-how or a decent playbook on how to do this stuff. That, in fact, was one of the five roundtable takeaways: Hire a CDO, or if you can't afford it, look at what other cities have done and work with them to create the playbook needed.

It's tricky to keep data anonymous

In his upcoming book, The Social Dynamics of Open Data, Joel Gurin, president of the Center for Open Data Enterprise, writes that with proper privacy and security mechanisms in place—mechanisms that enable healthcare organizations to share anonymized data—large health data sets can be analyzed for public good. Such data sets can target services to underserved populations, for example. Or they might fuel the Precision Medicine Initiative, which is aimed at developing highly targeted medical treatments based on a range of inputs, including some extremely sensitive data sets such as personal medical histories and genetic analysis.

And anonymizing that type of data, as well as securing it from those who would steal it for criminal purposes, is crucial. Medical information is full of often immutable personally identifiable information (PII) that can be used in insurance, identity, and tax fraud. Time was, medical records were selling for up to $50 on the Dark Web. Given the onslaught of cyberattacks on healthcare organizations, the market right now is actually glutted. As of April 2017, you could get medical information for less than 1 cent per record. Still, thieves value it, and they've been coming after it.

So what do we do?

Those are just a few issues that crop up in the crossroads of data privacy and smart cities' push toward open data frameworks. And here's the million-dollar question: What should municipal organizations and agencies do, right now, to keep from spilling PII and ensure that we get the benefits we expect from opening up that data?

Gurin suggests several best practices for setting up open data access. The goal is to balance risk, control access so only authorized agencies and individuals can eyeball the data, and build community engagement and trust in what's being done with citizens' information.

Check out what other cities are doing. You can find stories from cities that have taken part in the What Works Cities initiative. Part of Bloomberg Philanthropies, the initiative aims to help 100 midsize American cities enhance their use of data and evidence to improve services, inform local decision-making, and engage residents.

The list of cities is closed for now, but Gurin points out that a lot of cities are still figuring out what their options are. Before you even worry about privacy, he points out, you have to consider, “‘How do you release the data in the first place?’ The What Works Cities initiative has a lot of guidelines on that."

Identify targets where privacy issues are less severe but the public good is obvious. Take traffic data, for example. It's useful in planning public transport and for handling rush hour.

Traffic data shouldn't be much of a temptation for hackers. But keep in mind that it's tough to anticipate how disparate data sets can be stitched together via what's known as the Mosaic Effect: You can scrub a data set of names, addresses, and Social Security numbers to no avail, as your data set could still be combined with other data sets. People can mix that data, reassembling it in unforeseen ways, like a mosaic. In a worst-case scenario, anonymity can be compromised by those with ill intent.

Hold town halls to bring the public into the discussion. "Any community has developers and hackers," Gurin says. Those are the people from whom you want input. One idea: Hold a hackathon. You could invite the local Code for America group, for example.

Another important part of making a smart city that actually works for those who live in that city is to hear from other data holders, Gurin suggests. For example, the Orlando Police Department worked with sexual assault and domestic violence victim advocates to figure out how to balance transparency and victim privacy.

It's all about engaging in dialogue with the community, learning from citizens how open data can actually help, and earning community trust. "The best test of whether it serves the public interest is whether people in the community who handle the data see a use for it," Gurin says.

Employ de-identification technologies. It might seem impossible to de-identify data while still retaining its value to smart city analysts and researchers, but it's essential to try. Gurin suggests identifying individuals with unique ID numbers that make it possible to connect data about them in different data sets without revealing their identity. Another suggestion: Drop non-critical information. One common practice is to drop the last three digits of an individual's ZIP code, for example.

Control access. Not every agency needs to see everything. Access to some data needs to be limited to qualified, vetted researchers, as opposed to being open to the public. Approaches include tiered levels of access; a federated model involving a cloud repository, limited access, and secure methods of sharing; or opt-in sharing levels, such as allowing individuals to share healthcare data, for example, in the hope that it can help researchers find better treatments.

This article/content was written by the individual writer identified and does not necessarily reflect the view of Hewlett Packard Enterprise Company.