Technically Speaking: Differential Privacy

What it means, and what that means for us

At the 2012 launch of the iMac, Phil Schiller said on stage that "you'll be able to go home and tell people today you heard about plasma deposition." You didn't. Nobody did. Yet at this year's WWDC when Apple mentioned their next new term, suddenly everyone's talking about it. Differential Privacy sounds just about as much a jargon term but it's concerned with something that affects every single iPhone user. It's a term for what's both technically intricate and politically on fire.

It's hard to think of a context these days where even just the word privacy wouldn't cause a fuss or get attention. Couple it to Apple and prefix it with a word we do understand but can't see what it's got to do with all this, and you've got questions being asked. There are questions about the effectiveness of this idea and the worth of it, but let us answer the big one about what the term actually means.

Not the dictionary definition

It means Apple wants to have its cake and eat it -- and that maybe this could well happen. Previously Apple has been the firm that's said it doesn't have and doesn't want details about you while Google wants everything. A lot of screen pixels have been lit up by arguments over just how much data either company is taking from us and it boils down to this: if you want privacy, don't get a smartphone.



Look at President Obama: the single most technologically savvy Commander in Chief the US has ever had and he is stuck using some half-bricked phone because of security issues. We face those same issues, we've just possibly less important details to protect.

Yet this isn't a deal with the devil whereby we get to play Candy Crush Saga wherever we go in return for giving our bank account details to the bad guys. It's close, mind. To get the benefits that smartphones give us, our little devices have to be connected to the internet and they have to keep pumping out and receiving a ton of data.

For Apple, so far, that's been a fairly small ton: your iPhone says where it is in the world so your iPhone can then tell you where the nearest gas station is. While we don't and never will know the full details of what gets sent back and forth, Apple makes a big deal of how that data is used to give you this feature and how the company doesn't collate and keep and analyse where you make what requests from.

It's not very harsh to say that Google survives by doing the opposite: it's open about how it is using the mass of requests and other data it gets in order to improve its services before it shuts them down. Beyond improvements to Android, there are things like health: Google estimates the spread of flu and Dengue fever based on search patterns from its users -- or it did until it shut that down. We're not being anti-Google here about anything but the number of great ideas it launches and abandons. The company is even using Differential Privacy itself with, for instance, collecting usage data from people using its Chrome browser.

Apple is into health

That kind of gigantic use of data is a boon and really an unexpected one. The one search request you make for, say, a local pharmacy, is dealt with on the spot by giving you the result you need but it's also fed into a global system that can do stupendously clever and useful things for all of us.



Apple's managed to position itself ahead of Google as the company working on cutting-edge medical research technology and it's also continually aiming to position itself as being the company dedicated to your privacy. It's the company that doesn't try to build up a profile of you that it can then sue to sell to advertisers. Privacy or crowd-sourced features: logically, Apple can't have it both ways.

Oh, yes, it can

There's you with your one iPhone and here's the entire world of iOS users. Apple wants to work with data from the latter without inconveniencing the former or making you fear for your privacy. It's planning to do that with this Differential Privacy.

So the object is to get enough information to be useful while at the same time not getting enough that individual people can be identified. You're thinking it's got to be easy to just not record, say, someone's phone number or email address. That's not enough, though: just knowing bits of information about someone can be enough to know too much.

Journalists are taught about avoiding what's called jigsaw identification: when someone can't be named for legal purposes, it's still possible to inadvertently reveal them because you add a detail to the picture. You might say that police have arrested "a 23-year-old man" and someone else might say "who works at Acme Shoes" while a third says "he's the company's only night security guard". At this point his name is practically irrelevant: a local lynch mob is on its way to have a chat.

It's very hard to avoid this yet journalists keep it in mind specifically to prevent illegally identifying someone. Hackers have no scruples. Not only do they have no scruples, they positively require jigsaw identification and they are very good at working out information that helps them.

Consequently if you have any mass pot of real data about real people and what they really do, hackers will learn useful information that you don't want them too. So change the rules. Drop that word real. What Apple is certainly going to do is make the data fake.

Oh, no, it can't

Remember that the aim is not to amass a big database, it is to use the results from such a lot of data to produce useful details. To tell you that there is a traffic delay up ahead. To warn you that you sound like you've got prostrate problems. (Or to get advertisers to pay to reach you with vital snake oil.)

You can get the same results if you take that mass of data and you add in known amount of fake detail. Say half of the data is accurate and half is random: 50 percent of the results you get will be wrong but you know that and you can work out the proportions. If 100 people ask for their nearest pharmacy and you add another fake 100 people whose search request is to do with the Kardashians, you apparently discover that half your users need a local chemist. Yet because you know how much fake information is in there, you can calculate that people have a genuinely serious medical condition. Then you remember that the Kardashian fans were fake so you can relax and hang up on the Center for Disease Control's waiting muzak.



There are many ways to do this faking of data and Apple only alluded to them. Craig Federighi said at WWDC: "We believe you should have great features and great privacy. Differential privacy is a research topic in the areas of statistics and data analytics that uses hashing, subsampling and noise injection to enable...crowdsourced learning while keeping the data of individual users completely private. Apple has been doing some super-important work in this area to enable differential privacy to be deployed at scale."

Security experts, cryptographers and hackers all focused on the words "hashing, subsampling and noise injection". We've covered hashing before and all three terms are rabbit holes of interesting details that together only really give us a few clues about how Apple is planning to do this.

What we can know for sure, though, is that this means precisely that Apple is going to do more data collection. How well that sits with its ambitions to protect our privacy will rest entirely on how it implements and develops these Differential Privacy systems.

-William Gallagher (@WGallagher)
2 Comments
  1. Avatar
    bobolicious Mac Enthusiast Joined: Aug 15, 2002

    ...does the term 'digital pickpocket' beg further definition...?

  2. Avatar
    Steve Wilkinson Senior User Joined: Dec 19, 2001

    Originally Posted by NewsPosterView Post

    Look at President Obama: the single most technologically savvy Commander in Chief the US has ever had and he is stuck using some half-bricked phone because of security issues.



    Wait till the next one who's so technically incompetent she keeps the 'official' phone in an office on another floor and has lackys copy and read stuff to her, while having her other lackys setup and run other systems to keep all her illegal actions off the official one.

    Hillary for Prison '16

Post a comment
Please note that it takes a couple of minutes for new comments to be visible in this area.