Skip to main content

The privacy equation: Data is not information until it is useful

We want privacy, but we also want the efficiency that comes with data processing and the scientific advances that result from big data analytics. Is there a way to serve both individuals and agencies? And how do we get there? A Hugo-award-winning SF author gives his perspective.

The relationship between privacy and data collection is one of the most misunderstood parts of the digital environment. Data is mostly useless bits. Information is what you get when you process raw data. Our environment succeeds because corporations, insurers, healthcare providers, and others have to process large amounts of data to pull out the kind of information on demographics that does everything from maximizing profits to saving lives.

Here is a datum (i.e., a single piece of data): I do not like okra. It’s like eating zucchini with hair. I don’t care how you prepare it. I don’t like okra.

That specific fact is essentially useless. Unless you invite me to dinner. Then it’s useful.

There. That’s the difference between data and information.

Data is not information until it is useful.

At best, data is an unfiltered assemblage of facts. Information is derived from the processing of those facts to reveal a pattern or a trend.

Data is an unread encyclopedia; information is the specific article you’re looking for. Data is information only when it is pertinent and useful. It is information when it can be applied to produce a result.

This distinction is critical to understanding the privacy equation.

Your data. Their information.

There are people who are quitting Facebook because they think they are protecting their privacy.

That’s a nice thought, but…

We give away our data everywhere, every time we leave the house, every time we interact with the world, and in particular, every time we add another piece of data to our digital footprint. Every time you click on anything on the Internet, some click-counting software is going to notice it.

Every time we order something from Amazon, we give the company data about our preferences. Buy a camera, the site will recommend specific accessories. Buy a science fiction book, the site will recommend other science fiction books. Over time, Amazon gains enough data from your purchases to have useful information for its marketing—and for its advertisers, if Amazon chooses to sell that data.

Every time we use a debit or credit card, we’re adding data to the bank’s database about where we shop and what we buy. Your bank is unlikely to sell that information, but you did give it to them.

This is true even at the grocery store. Use that store’s “loyalty card” and the store gets a record of what you just bought. Bagels, lox, and cream cheese or white bread, bologna, and mayonnaise—you’re adding data to a database. Ever notice how many of the coupons on the back of the receipt are for products you might buy on your next visit?

Some of this is for your convenience, but most of it is for the convenience of the seller. It lets the company manage inventory more efficiently to help it keep costs (and prices) down.

This is the other side of the equation: The more information a store has on customer buying habits, the more that store can manage its supply chain to meet specific demands.

But beyond the needs of commerce, there are venues where massive data gathering is absolutely essential.

Innovation in your inbox. Sign up for the weekly newsletter.

It’s not a privacy breach if it helps you

That applies to your doctor, for instance. Your health records should be as detailed as possible. You want your doctor to know if you are allergic to a medication or if your family has a history of diabetes. When a health provider has access to a massive database, it also has access to information about which treatments are most effective for specific situations. This could be life-saving.

Data gathering allows institutions of all kinds, both public and private, to gather statistics. Those statistics can be mined for all kinds of demographic information. We can discover patterns of economic distress, changing demands of electrical power usage, conditions of traffic congestion, patterns of legal abuses, whether specific laws are affecting behavior and to what degree, and hundreds of thousands of other situations that could not be understood without massive collections of data.

One of the biggest breakthroughs in data collection began when Herman Hollerith used punch cards to store the data gathered in the 1890 census. Each card represented a person. Each hole or set of holes represented a specific fact. Male or female, married or single, age, and more. The cards were tabulated by metal needles going through a hole to complete an electrical connection; each connection clicked a counter. (For a visual demonstration, watch "Connections: Faith in Numbers," Episode 4.

Hollerith’s punch cards were the ancestors of the IBM card, the ones with the dire warning, “Do not fold, spindle, or mutilate.” (Ask your grandparents.) The IBM cards were tabulated by an optical sensor able to sort hundreds of cards a minute.

Hardware and software advance in a leapfrog progression. Improved hardware makes it possible to run software faster, which also makes it possible to run more sophisticated software. Advances in software are happier in more powerful hardware. (The Cretaceous era of personal computing was an especially frantic time.) Optically sorted punch cards made it possible to not only tabulate a massive amount of data, but also gather much more data for tabulation—what we might call “increased granularity.”

The punch card has long since been retired to the Museum of Ancient Computing, but the essential principle is the same. A record in a database has fields for everything the data miners might consider relevant to their eventual information extraction. And most modern databases now have a built-in flexibility. (FileMaker Pro, for instance, allows you to modify a database as you go, adding or deleting fields in an already operational database.)

We are gathering data faster than we can process it. The more data we gather, the more powerful the data-diddling software will have to be—and the more powerful the hardware will have to become. There are patterns and trends buried in all that data that we have yet to discover. Very likely, this is going to require advances in neural networks, fuzzy logic, and ultimately, some form of what is inappropriately called artificial intelligence. (I’ll have more to say about intelligence engines in another outing.)

In the meantime, what can an individual do to protect his or her privacy?

But what about my data?!

Well, first we need to understand that much of what we call privacy doesn’t exist in the digital era. We trade information about ourselves in return for services—applying for a driver’s license, a Social Security number, and credit cards, or registering to vote. All those things put us “into the system.”  

Where it is fair to take issue—and justifiably so—is when that information is given or sold to someone who will use it unethically. It is fair to take issue when that information is not securely protected against thieves. We need to be able to trust those to whom we give our information. This is an area where the law is still playing catch-up.  

In the meantime, you know all those fun little quizzes on Facebook (and elsewhere)? The ones that tell you which superhero you are, or where you get your pirate name by checking the color of your shirt and the month and year of your birth? Those aren’t fun games—those are data-mining tools. Some are created by advertisers, but the worst of them are circulated by potential identity thieves. Fill out enough of these and you might be giving away much more of your privacy than you realize.

Where do we draw the line? Until both the law and technology catch up—until legislators draw specific boundaries about the use of personal data, until data collectors can securely protect personal information from hackers—we are all going to have to be guardians of our own data. In fact, even after law and technology have both caught up, we still need to be rigorous about where and how we share our personal information. There are too many data collectors, too many data miners, and too many opportunities to be hacked. (Even as I write this, the bank is telling me that someone in Myanmar just tried to make a purchase with the credit card I keep locked in my safe. Go figure.)

At this writing, the privacy equation is still tilted against the individual—and that’s a situation that doesn’t look like it will change any time in the foreseeable future. So the best defense, the only defense, is to be cautious to the point of paranoia.

Over here, I’ve created my own database. Every time I have to share information, I note what I shared and with whom I shared it. I’m thinking that a generalized app that tracks what personal information gets shared and where might be the strongest tool we have against tracking hackers.

There’s a business opportunity for someone.

This article/content was written by the individual writer identified and does not necessarily reflect the view of Hewlett Packard Enterprise Company.