(Cross-posted from my personal blog.)
This post is an edited transcript of a talk I recently gave on the past and future of privacy. I argue that the story may be a more positive and hopeful one than people often realize.
The talk stemmed from work I’ve done at the Centre for the Governance of AI. The people whose writing most influenced its ideas are Joan Feigenbaum, Aaron Segal, Bryan Ford, and Tyler Cowen. Mainly: This paper and this blog post. I’ve also especially benefitted from conversations with Andrew Trask.
I think it’s fair to say that discussions of privacy issues tend to have a pessimistic edge to them. For a long time, the dominant narrative around privacy has been that it’s under threat. It’s either dying or it’s dead.
Here we have five decades of magazine covers announcing or predicting the death of privacy at the hands of technology. We also have the 1890 Harvard Law Review article “The Right to Privacy,” which is often seen as the starting point for modern political and legal discussions of privacy. The framing device for that article was that the camera and other new technologies posed severe threats to privacy. These threats were meant to be unprecedented enough to warrant, for the first time, the introduction of a new “right to privacy.” So it seems like we have been watching and working to stave off the death of privacy for a long time now.
The following narrative is a bit of a caricature, but I also do think it should be fairly recognizable. Something like it seems implicit in many discussions of privacy. The narrative goes like this: (1) People used to have a lot more privacy. (2) Unfortunately, over time, technological progress has gradually eroded this privacy. (3) Now new technologies, such as artificial intelligence, are continuing this trend. Soon we may have little or no privacy left.
Let’s call this narrative “Privacy Pessimism.”
Now, of course, the Privacy Pessimism narrative isn’t universally accepted. However, I think that even when people do decide to critique it, these critiques don’t often qualify as “heartening.” Here’s a quote from Privacy: A Very Short Introduction, pushing back against Privacy Pessimism. The author writes:
A requiem, however, is premature. The invaders are at the gate, but the citadel will not fall without a battle.
There is definitely an element of optimism here. But I still don’t know that the situation being described is really one we should be happy to find ourselves in. I don’t know how comforting it is for the people on the right-hand side of the painting up there to be told: “Well, at least there’s a battle on.”
What might it look like to be a proper optimist about privacy? One way to explore the space of narratives here is to just turn Privacy Pessimism on its head. The narrative we get out looks something like this: (1) It used to be the case that people had very little privacy. (2) Fortunately, technological progress has tended to increase privacy. (3) Now new technologies, like artificial intelligence, can be used to further increase and protect privacy.
Let’s call this narrative “Privacy Optimism.”
Privacy Optimism is obviously a much less familiar narrative than Privacy Pessimism. It also has a bit of a ring of contrarianism-for-the-sake-of-contrarianism. But is there anything to be said in favor of it?
My view is that, perhaps surprisingly, both narratives actually do capture important parts of the truth. Technology has given us more of certain kinds of privacy and less of other kinds. And both hopeful and distressing futures for privacy are plausible.
Even if losses and risks are very real, positive trends and possibilities ought to be a larger part of the conversation. It’s worth taking optimism about privacy seriously.
To lay out my plan for the rest of the talk: First, I’m going to be saying a little bit about what I have in mind when I talk about “privacy.” Second, I’m going to present an extremely condensed history of privacy over the past couple hundred years. I’m going to try to show that technology has had both positive and negative effects on privacy. I’ll actually try to argue that for many people the positive effects have probably been more significant. Third, I’m going to talk about why certain privacy problems are so persistent. Then, finally, I’m going to try to stand on top of this foundation to get a clearer view of the future of privacy. I’ll be giving special attention to artificial intelligence and the positive and negative effects it could ultimately have.
The word “privacy” has a lot of different definitions. There are actually entire books devoted to nothing other than just clarifying the concept of privacy. I’m going to be using a relatively simple definition of privacy. In the context of this talk, privacy is just the thing that we lose when people learn things about us that we’d rather they didn’t know.
Sometimes we would like to talk about “larger” or “smaller” violations of privacy. Here’s how I’ll be thinking about quantification in the context of this talk: The more unhappy we are about a loss of privacy — or the more unhappy we would be if properly informed — the more privacy we have lost.
To illustrate: Sometimes people are a bit annoyed when others discover their middle name. Maybe their middle name is “Rufus” or something. But typically people don’t care very much. Fully informing them of the implications of having their middle name known also probably wouldn’t lead them to care much more. So we can consider this case to be one where the loss of privacy is “small.” In contrast, most people would be quite a bit more upset if all of their texts with their significant other were published online. So that would count as a much larger violation of privacy.
Last bit of conceptual clarification: I’m also going to distinguish between two different kinds of privacy. I’m going to say that we suffer a loss of social privacy when people within communities that we belong to learn things about us. And we suffer a loss of institutional privacy when institutions such as governments and companies learn things about us.
To illustrate: Let’s say that someone would prefer for people in the workplace not to know their sexual orientation. If their orientation is revealed to their coworkers, against their wishes, then that would be a loss of social privacy. If some web advertising company is able to infer someone’s sexual orientation from their browsing activity and sends them targeted ads on that basis, then that would be a loss of institutional privacy.
These are the concepts I’m going to be applying throughout the rest of the talk.
Let’s move on to history. One choice we need to make when exploring the history of privacy is the choice of where to begin. I’m going to start in the centuries leading up to the Industrial Revolution. My justification here is that up until the start of industrialization — up until mechanization, the use of fossil fuels, and this sort of thing really started to take off — a lot of the core material and technological features of people’s lives didn’t tend to change very much from one generation to the next. Many of these features were also relatively constant from one region to the next. So while they did change and they did vary, they were still constant enough for us to talk about “pre-industrial privacy” without feeling totally ashamed about how much we’re overgeneralizing.
Let’s focus on social privacy first. I think it’s fair to say that life in the pre-industrial era — or, for that matter, present-day life in places that have yet to industrialize — is characterized by extremely little social privacy.
One significant fact is that it’s common for pre-industrial homes to have only a single room for sleeping. In some places and times, it may also be common for multiple families or for a large extended family to share a single home. It might even be common for people to be able to enter neighbors’ homes freely. So that means that if you’re at home, you don’t get space to yourself. You don’t have private places to stow possessions, have conversations, host guests, or have sex. If you do something at home, your family and perhaps many other people are likely to know all about it.
Another significant fact is that you most likely live out in the countryside, in a very low population density area. You probably very seldom travel far from home. A major limitation here is that the best way you have of getting around is probably simply to walk. This means that “strangers” are rare. Most people you interact with know most other people you interact with. This makes it very easy for information about you to get around. You can’t very easily, for example, choose to present a certain side of yourself to some people but not to others.
You also probably can’t read. Even if you could, you probably wouldn’t really have access to books or letters. These things are expensive. One thing this implies is that if you have a question, you can’t simply look up the answer to the question. You need to ask someone in person. This makes private intellectual exploration extremely difficult. You can’t have an intellectual interest that you pursue without alerting the people around you that you have this interest. Or you can’t, for example, have a private question about your body, about community politics, or about ethical dilemmas you might face.
Gossip is probably very prevalent and used to enforce strict behavioral norms. Because you probably live not very far above the subsistence level, you depend on your community as a social safety net. This means that disapproval and ostracism can be life-threatening. In extreme cases, you might even be subject to violence for violating important norms. Picking up and leaving to find a new community to accept you will also probably be extremely risky and difficult to do. It really is no small thing if information that you’ve been deviating from community norms gets around.
So the state of social privacy in the pre-industrial era was not good.
I think that technological progress has mostly been a boon to social privacy.
One of the most straightforward ways in which it’s been a boon to social privacy has been by driving economic growth. Around the start of the Industrial Revolution, most people on Earth had standards of living that were just slightly above the subsistence level. Today most people on Earth are much richer. This means we can afford to spend more money and time on things that enhance our privacy. For example, we can typically afford to spend more money on homes with multiple sleeping rooms. Another smaller example: My understanding is that until the second half of the nineteenth century, it wasn’t yet fully the norm for Americans to use envelopes when sending letters. Because paper envelopes were expensive and difficult to produce. Then people got richer and manufacturing technology got better and suddenly it made sense to buy that extra bit of privacy. Simply put: Poverty is bad for privacy. And technological progress has supported a dramatic decline in poverty.
Technological progress has also had a positive effect through new modes of transportation such as the bicycle, train, and car. These have helped people to have rich lives outside the watch of neighbors and family. It is relatively easy to go out to attend some meeting of a club, or religious group, or support group, without knowledge of your attendance or behavior at the meeting necessarily getting around. It’s also easier to just go out and hang out with whoever privately. There’s a lot of literature specifically on the liberating effect that the widespread adoption of the car had on teenagers and housewives in the context of American domestic life.
The effect of information technology has also been pretty huge. Again, I think that the ability to ask and receive answers to questions without alerting people in your social circle is a really big deal. I think it’s a big deal that girls can privately look up information about birth control and reproductive health, for example, and I think it’s a big deal that people in religious communities can privately explore doubts on online forums. I think that people’s ability to communicate somewhat more furtively with friends and romantic partners using letters, phones, and instant messenger platforms has also been very important.
There are also many small examples of improvements in information technology leading to improvements in social privacy. One recent case is the introduction of e-readers. There’s some evidence that this has actually changed people’s reading habits, by allowing people to read books without necessarily alerting the people around them as to what they’re reading. A lot of the people who read Fifty Shades of Gray, for example, probably would not have read it if they had needed to display a physical copy. There are so many small examples like this that we could pick. I think that together they add up into something quite significant.
Okay, so social privacy has mostly gotten better. How about institutional privacy? Unfortunately, I think this is a very different story. In my view, the past couple hundred years have seen institutional privacy decline pretty dramatically.
The basic idea here is that pre-modern states and other centralized institutions simply had a very limited capacity to know about individuals’ lives. Imagine that you’re the King of France in, say, 1400. You don’t have computers, you don’t have recording devices, you don’t have a means of transporting information faster than horse-speed speed, you don’t have a massive budget or a massive supply of literate civil servants. If you want to collect and process detailed information about the people in your kingdom, you’re going to have a hard time. Even just learning roughly how many people reside in your kingdom would be a difficult enough challenge.
Today, you can count on many different institutions owning and being able to make use of vast amounts of data about you. That’s something new and something that would not have been possible in the relatively recent past. Steady improvements in information technology have supported a steady ratcheting up of how large and usable these datasets are.
I probably don’t need to say very much about this trend, because so much has already been said about it elsewhere. One especially good book on the subject Bruce Schneier’s Data and Goliath. While the book is pretty gloomy, it’s also focused primarily on the erosion of institutional privacy in an American context. Here the main institutions of interest are companies like Facebook and intelligence agencies like the NSA. I think the picture gets a lot bleaker if you decide to look at places with weaker institutional safeguards.
Recent news reports concerning the mass surveillance of the Uighur minority in China provide one of the most obviously bleak examples. A set of technologies ranging from smartphone apps to facial recognition cameras are apparently being leveraged to collect extensive data on Uighur individuals’ movements throughout cities, electronic communications, consumption patterns, and so on. It’s apparently made quite clear that if you don’t comply with certain expectations, if you seem to be in any way a threat to social order, then you’ll face a high risk of imprisonment in indoctrination camps. This is horrifying. And it’s a modern variety of horror.
The picture I’ve just painted of the past two hundred years is one of rising social privacy and declining institutional privacy. A more detailed history would acknowledge that these trends have not been totally steady. There have surely been individual periods where social privacy declined and institutional privacy grew. The move toward people having more unified online identities through platforms like Facebook would be an example of a technological development that probably decreased social privacy. And the move toward more companies using end-to-end encryption would likewise be an example of a technological development that probably increased institutional privacy. But at least if we zoom out enough I think the overall trends are pretty clear.
How about the net effect of these two trends? Have the past couple hundred years of change, overall, constituted decline or progress?
I think there are two main ways to approach this question. First, we can introspect. We can ask: “Would I be happy returning to the levels of both institutional and social privacy I would have had in Year X?” Returning to pre-industrial levels of institutional privacy, for example, might mean wiping all of your data from the servers of government agencies and tech giants. But returning to pre-industrial levels of social privacy might mean giving your extended family access to a live HD video stream of your bedroom. If you ultimately wouldn’t be happy to accept the trade-off, then this suggests that at least you personally have experienced an overall gain. You can also ask the same question for different pairs of dates to probe your intuitions about more short-term changes.
Second, we can observe our own behavior. We can ask: “When there are trade-offs between social and institutional privacy, what do I tend to choose?” It seems to me like opportunities to trade one form of privacy off against the other actually come up pretty frequently in modern life. You can ask a friend a personally revealing question, for example, or you can ask Google. You can buy a revealing product on Amazon, or you can buy it with cash at a local store. You can directly reveal romantic interests in people, within communities you belong to, or you can swipe on a dating app. The choices you make when you face these dilemmas provide at least some evidence about the relative importance of the two forms of privacy in your life.
My personal guess is that, for most people in most places, the past couple hundred years of changes in individual privacy have mainly constituted progress. I think that most people would not sacrifice their social privacy for the sake of greater institutional privacy. I think this is especially true in countries like the US, where there are both high levels of development and comparatively strong constraints on institutional behavior. I think that if we focus on just the past thirty years, which have seen the rise of the internet, the situation is somewhat more ambiguous. But I’m at least tentatively inclined to think that most people have experienced an overall gain.
For many other people, who have been acutely victimized by insufficiently constrained institutions, the trend has surely been a much more negative one. For the Uighur community in Xinjiang, for example, it’s very difficult to imagine classifying the past thirty years of change as “privacy progress.” It’s very difficult to imagine taking a rosy view of privacy trends in East Germany in the middle of the previous century. A whole lot of people have experienced very tangible and even horrific suffering due to the erosion of institutional privacy.
So it would be far too simple to call the story here a “positive” one. Nevertheless, I do think that the story is much more positive — and certainly much more nuanced — than many discussions of privacy issues suggest.
Before turning to the future, I want to spend a bit more time talking about the ongoing decline of institutional privacy. Specifically, I want to dwell on the question: “Why is it that we keep losing ground? What’s blocking us from having more institutional privacy than we do today?”
In the last section, I talked about how technological progress has made it possible for institutions to learn more about us. But of course, this is only a n_ecessary condition_ for institutions learning more about us. It’s not a sufficient condition. It doesn’t explain why they actually are learning all that they’re currently learning.
I think that one explanation here is practical necessity. Sometimes institutions learn about us because they would otherwise be unable to provide certain services, which are worth the loss of privacy. Credit services are one clear case. Credit card companies generally keep records of their customer’s purchases. These records are often pretty personally revealing. But the companies pretty much need to keep these records in order to provide customers with the services they want. It’s hard to bill someone at the end of the month if you have no idea what they bought that month.
The other explanation is abuse of power. Sometimes institutions learn about us in order to pursue objectives that don’t actually benefit us. I think that the totalitarian system of surveillance established in East Germany is again one especially clear case. The main point of the Stasi pretty clearly wasn’t to give the people of East Germany what they wanted. Instead it was about advancing the interests of a powerful group, even though it came at an enormous cost to the people below.
Both explanations are clearly very important. Here, though, I want to focus on practical necessity. One justification for this focus is that “practical necessity” is, in some sense, the more fundamental issue. The problem still exists even in the case of ideal governance, even in the case where everyone has good intentions and we get the design of institutions just right.
Another justification for this focus is that data collected on the grounds of “practical necessity” can also subsequently be abused. You can credibly or even correctly say that some surveillance system is necessary to keep people safe, that it’s necessary to provide people with security, and then turn around and abuse the information that the system allows you to collect. You can correctly say that you need to collect certain data from customers, then abuse or mishandle that data once you have it. So the fact that “practical necessity” works as a justification for data collection, in so many cases, also helps to explain why institutions have so many opportunities to abuse their power.
Let’s suppose that “practical necessity” is actually one major explanation for the decline in institutional privacy. This then suggests a deeper question: “Why is it practically necessary for institutions to learn so much about us?”
One simple answer is that information is lumpy. The idea here is that learning some simple fact, which you must know in order to provide a service, can require learning or being in a position to learn many other sensitive facts. So even institutions that provide us with very narrow services end up receiving huge lumps of information about us.
To make this idea concrete, I’m going to give a few brief examples.
First, let’s suppose that you want to learn whether someone else’s backpack has a bomb in it. If you don’t have access to any fancy tools, then you really only have a couple of options. First, you can open the backpack and look inside and learn whether or not there’s a weapon in there. In the process of doing this, though, you will also learn everything else that the backpack contains. Some of these items might of course turn out to be quite personal. Second, you can refrain from opening the backpack and thereby learn nothing. These seem to be the only two options. You need to either learn too much or learn too little.
Second, let’s say you’re a cellular service provider. To provide this service, you need to know the times and destinations of the messages your customers are sending. Unfortunately, by collecting this seemingly limited information, you’ve put yourself in a position where you can make a lot of inferences about this person’s life. You may be able to infer that this person is having an affair, or that they’re receiving chemotherapy, and so on, purely on the basis of their call logs. A lot of additional information may come along for the ride.
Third, let’s say that you want to study the statistical relationship between some health outcome and various lifestyle choices. You have no interest in learning about any individual person’s lifestyle. You only want to learn about overall statistical relationships. It unfortunately may be quite hard to do this with collecting lots of information about individuals and being in a position to identify or infer things about them when doing your analysis.
To summarise, then, here’s a partial explanation for why institutional privacy continues to be so poor: Institutions provide us with services that often require learning about us. Unfortunately, the lumpiness of information means that they are often in the position to learn a lot about us. Institutions aiming to abuse their power can also collect more information than they need to, in the first place, by at least semi-credibly claiming that this information is necessary for the services they provide.
Institutional privacy has then been on the decline, in no small part, because the quantity and sophistication of the services that centralized institutions provide continue to grow. New services like telecommunication, online social networking, and protection from terrorism have been begetting further losses of institutional privacy.
If we want to hold out hope, that one day this trend may reverse, then one natural question to ask is: Could information be made less lumpy? Can we develop methods that allow institutions to have all the information they need to provide import services, without requiring so much other information come along for the ride?
At least in principle, I think that we can. We already have examples of technological innovations that reduce the problem of lumpy information in certain limited contexts.
Just as one example, a bomb-sniffing dog can be considered a sort of technology. And what a bomb-sniffing dog allows you to do is to get out this one piece of information, the fact that some bag either does or does not have a bomb in it, without learning anything else about what’s inside the bag. Another example would be “differential privacy” techniques. These are a set of techniques that help people to discover overall statistical patterns in datasets, while being in a much weaker position to learn about the individuals represented in these datasets.
However, I think it’s pretty clear that these sorts of technologies have had only a comparatively small effect so far. The effect hasn’t been nearly large enough to offset the other forces that are pushing us toward a world with less and less institutional privacy.
In this last section, I’m going to be turning to talk specifically about artificial intelligence.
There’s recently been a lot of worry expressed about the privacy implications of AI. I also just sort of think, in general, that progress in AI is likely to be one of the most important things that happens over the next several decades and centuries. So I’d like to better understand what progress in AI implies for the future of privacy.
Institutional privacy is again going to be the main focus of this section, because I think this is currently where the most noteworthy and potentially scary developments seem to be happening. But I will first say a bit about social privacy.
One way that AI might support social privacy is by helping to automate certain tasks that would otherwise be performed by caregivers, assistants, translators, or people in other professions that essentially require sustained access to sensitive aspects of someone’s life.
For example, there are people working on AI applications that could help people with certain disabilities or old-age-related limitations live more independently. This includes voice command and text-to-speech software, self-driving vehicles, smartphone applications that “narrate the world” to vision-impaired people by describing what’s in front of them, and so on. Greater independence will often mean greater privacy.
On the other hand, one way that AI could erode social privacy is by making it easier to look other people up online. We may get to a point where consumers have access to software that lets you snap a photo of someone and then, taking advantage of facial recognition systems, easily search for other photos of them online. This could make people less anonymous to one another when they’re out and about and make it difficult to compartmentalize different aspects of your life (insofar as these aspects are documented anywhere in online photos).
It’s obviously difficult to say what the overall effect on social privacy will be. I would loosely expect it to be positive, mainly on the basis of the historical track record. But history of course provides no guarantees. A lot more thinking can certainly be done here.
Moving on, now, to institution privacy. A lot of people have raised concerns about the impact that AI could have on this form of privacy. I think these concerns are very often justified.
I think there are essentially two main reasons to worry here. The first reason to worry is that AI can make it faster and cheaper to draw inferences from data. It can automate certain tasks that you would normally need human analysts to perform. Picking out individuals in surveillance footage would be one example. To stress the “scariness angle” here, the Stasi relied on hundreds of thousands of employees and part-time informants. Imagine if it was possible for contemporary authoritarian countries to do what was done in East Germany for only one hundredth of the cost.
The second reason to worry is that AI can enable new inferences from data. AI can allow people to learn things from data that they otherwise wouldn’t easily be able to learn. One simple illustration of this phenomenon is a study, from a few years ago, that involved giving a lot of people personality tests, looking at their Facebook profiles, and then analyzing correlations between their test results and the pages they like on Facebook. The researchers then used the data they collected to train a model that can predict an individual’s personality traits on the basis of relatively small number of Facebook likes. At least the headline result from the study is that, if you know about 300 of the Facebook pages that someone has liked, then you can predict how they’d score on a personality test about as well as their spouse can. So this an example of using AI to draw new inferences about individuals from fairly small amounts of data.
One way to rephrase the second concern is just to say that AI may make information even more lumpy than it is today. There may be contexts where you’re sharing what seems to you like a relatively small or narrow piece of information, but AI makes it possible to draw far-reaching inferences from that information that unaided humans would otherwise struggle to make.
My view is that it’s very reasonable to worry about where things are going. I think that the effect that AI has had on institutional privacy so far has almost certainly been negative. At least in the near-term, I think that effect will probably continue to be negative as well.
At the same time, though, I do think that there are also some ways in which AI can help to support or safeguard institutional privacy. I think a lot of the applications are promising this regard have only really begun to be seriously explored in the past few years, so have yet to leave major footprints. I’m first going to walk through a few key varieties of applications. Then I’m going to make a speculative case for optimism. I’m going to argue that if certain technical challenges are overcome, and these kinds of applications become sufficiently widely adopted, then progress in AI may eventually allow us to partially reverse the decline of institutional privacy.
Here’s the first way that artificial intelligence can help support institutional privacy: It can be used to strip sensitive information from datasets, before humans look at them.
As one example of this sort of application, the police equipment company Axon is currently using AI to automatically detect and blur faces that are incidentally captured in body camera footage. They’re essentially using facial recognition software to not draw out additional information from the footage, but rather to remove information that it would otherwise be very costly to remove. A similar example is using image and text recognition software to automatically redact sensitive information from documents and ensure that nothing has accidentally been left in.
Another illustrative example, although this may not deserve the classification “AI,” is the software used to process data from airport body scanners. Just to give some context: About a decade ago, you may remember, airports started introducing these full-body scanners. You’d step into them, and then TSA agents would see, on their screens, essentially a pretty high-resolution scan of your naked body. A lot of people were pretty understandably concerned about that invasion of privacy. The TSA then responded to these concerns by writing software that essentially takes the raw scans and then transforms them into much more abstract and cartoonish images that don’t reveal personal anatomy. The software now allows them to learn whether people are carrying weapons without also learning what these people look like naked.
Here’s a second way that AI can help support institutional privacy: It can help to automate information-rich tasks that human employees would otherwise need to perform.
To use an analogy, let’s return to the bomb-sniffing dog case. A bomb-sniffing dog can be seen as a “tool” for automating the process of analyzing the contents of a bag and determining if there’s anything dangerous inside. Because we can automate the task, in this way, it’s not necessary for any human to “access” the “raw data” about what’s inside the bag. Only the dog accesses the raw data. The dog has also essentially been “designed” not to “leak” the data to humans.
So if you’re automating the process of responding to customer requests, or the process of looking for signs of criminal activity in some dataset, then this reduces the need for any humans to actually look at the data that’s been collected. This may somewhat reduce concerns about individual employees abusing personal data and concerns about personal data being leaked. It may also just reduce the additional sense of “violation” that some people have, knowing that an actual human is learning about them. I think that most people tend feel a lot more comfortable if it’s just a dog or some dumb piece of software that’s doing the learning.
Here’s a third and final way that AI can help support institutional privacy: It can reduce the need for institutions to collect data in the first place.
There’s an obvious objection that can be raised to the points I just raised about automation. Even if humans don’t need to look at certain data, they may still choose to. For example, Google Docs is a highly automated service. No Google employee needs to look at the personal diary that you keep in a doc file. But you may still have a sense of personal squeamishness about the fact the diary is being stored as a readable file on a server Google owns. You might have concrete concerns about your privacy one day being violated, for instance if a major data breach ever occurs. Or you might just dislike the thought, however hypothetical, that someone could look at the diary.
These sorts of concerns can become a lot more serious and well-justified if a less well-established company, with weaker incentives to maintain a good reputation, is collecting your data. They can also, of course, become quite serious if your data is being held by a corrupt government or any organization that could plausibly use the information to coerce you. So automation only really gets you so far. What would really be nice is if institutions didn’t need to collect your data in the first place.
As it turns out — although this may initially sound like magic — it actually is possible to process data without collecting it in any readable form. There are a number of relevant techniques with wonky names like “secure multiparty computation” and “homomorphic encryption.” These techniques are all currently extremely slow, costly, and difficult to use. This explains why they’re still so obscure. But over the past decade a lot of progress has been made on the challenge of making them practical. We still don’t know just how much more progress will eventually be made.
Ultimately, progress in this domain and progress in AI could interact to enhance privacy. Progress on techniques like secure multiparty computation could make it possible for institutions to create and use certain AI systems without collecting data. And progress in AI could create more opportunities to apply techniques like secure multiparty computation. These techniques are of no real use, of course, if it’s still human beings doing the data processing.
It really is important to stress, again, that techniques for privacy-preserving computing are still wildly impractical to apply in most cases. There are currently associated with very serious practical limitations. You are absolutely not going to find them being used everywhere next year. You also probably won’t see them being used everywhere next decade.
My understanding is that no one really knows exactly how efficient and user-friendly these techniques could ultimately get. It’s possible they’ll never get good enough to matter very much. But they might also one day get good enough for their pervasive adoption to become practical.
From a privacy optimism perspective, I think it’s useful to consider the best-case-scenario. If the practical limitations can mostly be overcome, then what’s the most that we could reasonably hope for?
To probe this question, I’m going to focus on one particular technique called “secure multiparty computation” (MPC). There still aren’t very many examples of MPC being used in the wild. It also has a pretty boring name. At the same time, though, it has at least two appealing properties. First, at the moment, MPC is at least _more practical_ to apply than a number of other techniques. Second, the theoretical limits on what MPC can accomplish are extremely loose. If enough practical limitations are overcome, then the “ceiling” on its long-run impact is especially high.
Before explaining exactly what “secure multiparty computation” is, I sort of need to introduce a few basic background concepts.
We can think of any given information processing task as involving three (often overlapping) sets of participants. First, there are input parties who provide inputs to be processed. This might be, for example, two Tinder users inputting their “swipes” into the app. Or it might be a group of patients whose data is being used to train a medical diagnosis system.
Then there are the computing parties. These are the parties who process the inputs. In the first of the two cases just mentioned, Tinder would be the computing party. It’s computing whether the two people have matched. In the second case, the computing party would be the medical team training the AI system.
The people who receive the outputs of the computation are the results parties. The two Tinder users are again the results parties in the first case. The team developing the system is again the results party in the second case. They’re the ones who end up with the trained AI system, which is what constitutes the output in this case.
To just to make the idea totally clear, here’s a little chart with four different cases laid out. You can also see how the breakdown works in the case of domestic surveillance and personal genetic data analysis.
Privacy issues basically arise when information travels from one class of parties to another. For example, Tinder ends up with information about who their users have swiped left and right on. The medical researchers end up with information about the patients who have provided their data. In general, institutions that serve as computing parties are often in a position to extract a lot of information from the inputs they receive and the outputs they send out.
Okay now. Preamble over. What secure multiparty computation (or MPC) refers to is: a set of protocols that prevent computing parties from needing to learn the inputs or outputs of a computation.
And the key result here is that: So long as a given computation involves at least two computing parties, and so long as at least one of them is correctly following the expected protocol, then it’s not in principle necessary for the computing parties to ever gain access to the inputs or outputs. They can instead receive what looks to them like nonsense, process this apparent nonsense, and then send out a different form of apparent nonsense. The results parties can then reconstruct the correct output from the apparent nonsense they receive. The computing parties never need to be in a position to learn anything about the inputs and outputs.
It doesn’t matter what the computation is. The result always holds. In principle, it is possible to perform any computation using MPC.
Let’s return a couple concrete cases. Let’s say two groups are cooperating to analyze medical data. If they split the work of analyzing it, then they can use an MPC protocol to learn whatever they need to learn without actually collecting any of the medical data in a legible form. Likewise: If two different companies – or two different groups within a company – cooperate to process dating app swipes, then it’s possible to tell users if they’ve matched without knowing which way each user swiped or even knowing what you’re telling them.
This may sound like magic. But it actually is possible.
The core assumption that we need to make, when using MPC, is that at least one of the computing parties is following the expected MPC protocol honestly. If all of the computing parties collude to break from the protocol, then they actually can learn what the inputs and outputs are. If at least one party is honest, however, then this can be enough to keep them all entirely in the dark. Given the involvement of enough independent parties, or given your own partial involvement in the computation, the privacy guarantee here can ultimately become extremely strong.
So here’s a general strategy for providing privacy-preserving digital services: Involve multiple parties in the provision of the service. Use MPC. Then, so long as at least one of the parties is honest, it won’t be necessary for any of them to be in a position to learn anything about what they’re sending and receiving.
Again, it’s important to stress that it’s not yet practical to use MPC in most cases. The “overhead” associated with MPC is still pretty enormous. User-friendliness is also still very low. So I’m mainly just talking right about what’s in principle possible in principle.
Nonetheless, thanks to various improvements that have been made over the past decade, MPC has started to find its first practical applications. These applications aren’t necessarily earth-shattering. But, just to ground my discussion of MPC a bit more, I do think it’s worth taking the time to describe a few of them.
First, there was a case a few years ago, where Estonian researchers associated with the company Cybernetica wanted to know whether working while you’re in college makes you less likely to graduate. They knew about the existence of a couple of datasets that could be used to answer this question. One dataset was held by the Ministry of Education. That dataset contained information about who was in school in what years and who graduated. Then there was another dataset held by the Estonian equivalent of the IRS, which contained information about who was doing part-time work in what years. It’s pretty obvious how you could put these two datasets together to see if there was a relationship between working and not graduating.
The issue, of course, is that both of these datasets are extremely sensitive. Researchers certainly shouldn’t be given free access to them. The datasets also shouldn’t be combined into some master dataset. They’re held by different government agencies for good reason. The researchers’ clever solution to this problem, then, was to use MPC. They performed the necessary computation in conjunction with the two government agencies. By using MPC, they were able to calculate the correlation between working and not graduating without getting access to any of the data sets or requiring the datasets to be combined.
Another similar case, associated with the same Estonian research group, is one where researchers used MPC to detect instances of value-added tax fraud from sets of financial records. Essentially, they performed a computation over different privately held sets of financial records and detected discrepancies that were suggestive of fraud. The group didn’t need to collect any of the records to do this.
A final example is the use of MPC to perform “set intersection searches.” These are essentially searches that produce a list of individuals who show up in multiple datasets of interest. It turns out that set intersection searches are used in a number of law enforcement contexts. One specific case I want to focus on comes from a paper by Aaron Segal, Bryan Ford, and Joan Feigenbaum. Let’s say that some unidentified person has robbed multiple banks in different towns. You are the FBI, and, to make progress on the case, you’d like to generate a list of people who were in the right town on the day of each robbery. To do this you might want to collect cell records and run a set intersection search on them. You want to see if anyone made calls in the right towns on the right days.
The way you would traditionally do this is to collect call logs from multiple cell towers. The obvious privacy downside here, though, is that these logs contain a lot of phone calls. By collecting them, you’ve put yourself in a position where you could easily learn a lot of information about a lot of different people. A potentially better option, then, is to use MPC. You can carry out the set intersection search jointly with the holders of the different call logs. Or, in theory, you could carry it out jointly with other independent government bodies. This allows you to get out the list of common phone numbers without collecting any of the underlying data in a legible form.
My impression is that this sort of thing has not yet been done by any law enforcement agency. The idea has only been proposed. But it does seem to already be feasible today.
Let’s step back again from this handful of concrete, present-day cases. From a long-run-oriented perspective, I think there are a couple especially key points to stress.
The existence of MPC protocols implies that, in principle, training an AI system does not require collecting or in any way accessing the data used to train it. Likewise, in principle, applying a trained AI system to an input does not require access to this input or even to the system’s output.
The implication, then, is this: Insofar as an institution can automate the tasks that its members perform by training AI systems to perform them instead, and insofar as the institution can carry out the relevant computations using MPC, then in the limit the institution does not need to collect any information about the people it serves. I’m talking about no “lumps” at all.
There is essentially no principled ceiling on institutional privacy.
Coming back to the idea of optimism: I think that this analysis essentially suggests an optimistic story with a big conditional. If major practical limitations can be overcome, then artificial intelligence and privacy-presenting computing techniques together could dramatically reduce the practical need for institutions to learn about the people they serve.
In this hopeful world, dishonest excuses for excess information collection would likewise become increasingly untenable. Pure abuses of power would, at least, become much more difficult to spin as something other than what they are.
We don’t currently know how far we can move in the desired direction. Major progress would be required. But researchers have already come a long way over the past few decades of work on artificial intelligence and privacy-preserving computing. It’s really hard to justify any sort of confident cynicism here, especially if we are thinking in terms of decades-long or even centuries-long timescales.
Summing up this section: I think one reasonable guess is that, like many past technologies, artificial intelligence will tend to enhance social privacy and erode institutional privacy. However, in the long run, AI and other associated technologies might also be used to protect or even increase institutional privacy. If certain major practical limitations can be overcome, then it is possible to envision a world where “practical necessity” no longer requires institutions to learn very much at all about the people they serve.
Summing up all the sections of this talk together: I think that the historical effect of technological progress on privacy, while certainly very mixed, has been much more positive than standard narratives suggest. It’s enhanced our privacy in a lot of ways that typically aren’t given very much attention, but that do hold great significance both practically and morally.
I think that the long-run trend seems to be one of rising social privacy and declining institutional privacy. Whether or not this corresponds to a “net improvement” depends on the strength of institutional safeguards. So, insofar as technology has given certain people more “overall privacy,” technology has not done this on its own. Good governance and good institution design have also been essential.
One might expect AI to continue the long-run trend, further increasing social privacy and further decreasing institutional privacy. I don’t think it’s unreasonable, though, to hope and work toward something more. It’s far too soon to rule out a future where we have both much more social privacy and much more institutional privacy than we do today.
In short: You don’t need to be totally nuts to be an optimist about privacy.
 This talk carefully avoids taking a stance on a number of complicated ethical questions around privacy. For example, I’m not taking a stance on whether privacy has intrinsic moral value or whether privacy only matters because it fosters certain other valuable things like freedom, happiness, and security. I’m also not taking a stance on when it is and isn’t ethically permissible to violate someone else’s privacy.
Probably the most questionable assumption implicit in this talk is that privacy tends to be a good thing. Of course, we can all agree that privacy isn’t always a good thing. As one especially clear-cut example: Someone should not be able to keep the fact that they’re a serial killer a secret from the government or from the community they live in. Horrible murders aside, I do tend to have a pretty pro-privacy attitude in most contexts. But there is also a far-from-nuts case to be made that harmful forms of privacy are much more pervasive than most people think.
In general: The less people know about each other, the less able they are to make informed decisions about how to interact with one another. If you’re going on a date, knowing whether or not someone is a serial killer is obviously helpful. But so is knowing how former partners feel about them, knowing what their health and finances are like, and knowing whether they quietly hold any political beliefs that you find abhorrent. It’s considered socially acceptable to keep this sort of information private, at least from people you don’t yet know very well, but there is some sense in which keeping it private reduces the level of “informed consent” involved in social interactions. There’s a big economics literature on the various inefficiencies and social ills that can be produced by “private information.”
So that’s one broad way in which privacy can be harmful. People’s decisions to keep certain information private can cause external harm. I think that sometimes, though, the pursuit of at least social privacy can also hurt the pursuer. Here’s one simple example. Given the choice, most college freshmen will prefer to live in a room with only a single bed rather than a room they share with someone else. Privacy is presumably one of the major motivations here. But living in close quarters with someone else can also foster a sort of intimacy and friendship that it’s otherwise pretty hard to achieve. I think that sometimes decisions to pursue social privacy in certain settings can have long-run interpersonal costs that it’s easy to underestimate.
For these two broad reasons, all cultures have at least some norms that are meant to limit privacy. The character of these norms varies a lot from place to place. If you drop in on a randomly chosen tribe, you’ll probably find they have a pretty different orientation toward privacy than you do. I generally feel like the more I think about privacy, and the more I consider the cross-cultural variation in orientations toward privacy, the more uncertain I become about just exactly which expansions of privacy are worth applauding. This stuff is confusing.
 Of course, the boundary between “practical necessity” and “abuse of power” is often heavily contested. If you pick any given government surveillance program and ask both a government official and someone writing from The Intercept to classify it, you will tend to get different classifications out.
A lot of cases are in fact very blurry. Suppose that some free application collects a lot of user data, which is then used to target ads. The company may say that it’s “practically necessary” to do this, because they otherwise could not afford to offer the relevant service for free or to invest in improving it. The company may also say that ad targeting is itself a service that users benefit from. On the other hand, it’s probably the case that at least some subset of users will be upset about the loss of privacy. Almost no users on either side of the issue will have a good understanding of how their data is actually being used, whether a more privacy-preserving version of the app would actually be economically viable, and so on. Questions about whether users are meaningfully consenting to privacy losses and about what consent implies are also tricky. Overall, the right label for any given case is bound to be controversial.
 When I say “artificial intelligence,” I essentially mean “machine learning.”
 I think that social norms can play an important role in protecting against potential threats to social privacy. If there’s a strong social consensus that people who use some privacy-reducing tool are jerks, then this can actually reduce how much the tool is used. Google Glass is one famous example of a threat to social privacy that was stopped in its tracks by anti-jerk norms.
Of course, the harder it is to find out that someone is using a tool, the weaker the effect of social norms will tend to be. But I think that social norms can still have important effects even when violations of these norms are tricky to catch. For example: Most people will refrain from searching through the folders of their friend’s computer, even if they’ve left it unlocked and unattended while running out to the store.
 A fourth concern is that, since certain useful AI systems are created by “training” them on fairly personally revealing data, the rise of AI is introducing additional incentives to collect this sort of data.
 One survey of AI researchers suggests that the vast majority of people in the field believe that AI systems will eventually be able to do anything that people can. However, it goes without saying, this could ultimately take a very long time. The median respondent thought there was about a 50/50 chance that the complete automation of all jobs will become technically feasible within the next 125 years.
 The OpenMined community is one group currently working toward this future. They focus primarily on increasing the accessibility of existing but little-used techniques for privacy-preserving machine learning.
Fascinating article! The distinction between social and institutional privacy is great.
I think there's a problem with the introspective method of comparing the two. It seems that the sort of privacy you prefer is affected by the sort other people prefer. so, for example, if everyone read a hardcopy of Fifty Shades of Gray you might be more comfortable giving up the privacy of the e-reader. but if everyone reads it on the e-reader, and you're reading a hardcopy, then it becomes far less socially acceptable and you become the only one admitting to reading it!
Similarly with asking a question. If everyone asks their friends, then everyone is revealing their ignorance. If everyone is looking up answers, you're the only one revealing your ignorance.
So the amount of social privacy other people have affects the amount of social privacy you would want to have. I'm not sure if the same applies to institutional privacy, but if it's a trade then the more people trade social for institutional privacy the more you'll prefer to do so too.
The section on Secure Multi-Party Computation was also particularly interesting, and I'm off to learn more about it :)