What must machine learning models memorize?

by Gavin Brown

Machine learning models serve a variety of purposes in today’s society, such as identifying faces in images or helping customers via a chatbot. Many models are trained on vast amounts of (possibly sensitive) data. The models themselves are large: OpenAI’s celebrated GPT-3 model for generating text has 175 billion parameters. Broadly speaking, these models are designed to distill the relevant information from their training data and perform well at some task. But some successful algorithms go beyond this goal. Recent work has shown that state-of-the-art models for generating text will store and regurgitate entire training examples verbatim. This happens “by accident,” since the learning algorithms are not constructed with memorization in mind. Comically, researchers found that GPT-3 memorized an entire passage from Harry Potter and the Sorcerer’s Stone. More worrying is the memorization of personal details such as emails and phone numbers or, in models for producing computer code such as Github’s Copilot, the propagation of security vulnerabilities. Again, such behavior is not the goal. But the best-performing machine learning methods happen to be adept at memorizing training data.

Tasks Where Memorization is Necessary

One might hope that, with more care, we could avoid these issues and produce models that simultaneously succeed at their task and do not memorize their training data. We might hope for small models, requiring fewer resources to operate. In a recent paper, we (RISCS researchers Gavin Brown, Mark Bun, and Adam Smith and Apple researchers Vitaly Feldman and Kunal Talwar) show that these goals are sometimes incompatible. There exist natural learning tasks on which every near-optimal learning algorithm must memorize a large fraction of its training data. Perhaps surprisingly, we show that much of the information memorized is actually irrelevant to the learning task. As an analog, Harry Potter is irrelevant to the task of literacy; many people speak English without ever reading J.K. Rowling! Models succeeding at our tasks must memorize details that ultimately don’t matter. In our paper, this irrelevant information takes the form of noise or uninformative features.

For those interested in the technical details, our results are stated using information theory, which provides a mathematical way to discuss how one object (such as a machine learning model) informs us about another (such as a data set). In our setup, we work with a data set X consisting of n examples, each of which is a list of d bits. The data set contains approximately nd bits of information. The examples are all drawn from some probability distribution P. If you know P exactly, then you know how the data set is generated and thus you know everything about “the task.” Let the model M be the output of the learning algorithm. We prove that, for all learning algorithms with near-optimal accuracy on fresh examples, we have

I(M ; X | P)  ≳  nd

where the ≳ notation hides a constant factor. I represents mutual information. Intuitively, it means that, even for someone who knows the underlying distribution P, observing the trained model M reveals a great deal of information about the data set X. Since we condition on P, this implies that the model must contain a lot of information which is specific to the data set and irrelevant to the task. 

Why Does Memorization Occur?

We demonstrate memorization in two problems that abstract common learning paradigms. The first is a sequence prediction task, like the language- and code-generation tasks discussed above. The second, designed to capture a variety of common problems including image labeling, is a high-dimensional multilabel prediction problem. Both of these tasks are simple but contain important elements of real learning problems. One crucial aspect is that the learning algorithm will commonly be unable to distinguish relevant information from irrelevant.

As often occurs in practice, our data sets arise from a mixture of subpopulations, each with its own particular rules. In our tasks, when a learning algorithm receives a large number of samples from a single subpopulation, it can learn those rules and discard irrelevant details. But when it receives just one example, it is forced to memorize in order to predict well on future examples of the same type.

Implications

Our lower bounds on mutual information imply a lower bound on model size: the models must be huge (on the order of the size of the data set) in order to contain that much information. For some tasks, then, approaches that seek to compress the data or reduce the size of the model cannot succeed.

The successful framework of differential privacy provides a theoretical means to address privacy loss in data analysis. Since they protect the data, algorithms satisfying reasonable levels of differential privacy cannot have high accuracy on our tasks. This type of impossibility result, by itself, is not new. However, our subpopulation structure suggests a new way in which privacy might be hard: the areas where data is scarcest are often exactly where we most value user privacy. Alas, these areas may be exactly where algorithms stand to gain the most from memorization.

Our formal statements do not immediately imply that an attacker can reconstruct the training data from the model. To supplement our theorems, we experiment on synthetic data. We generate data sets from a distribution used in our proofs, train neural networks to high accuracy, and attempt to extract training data. We find that simple attacks can reconstruct many training samples to high accuracy, showing that efficient reconstruction from some common models is possible.

Gigantic models dominate the current cutting edge of machine learning. Recent empirical work shows that these models both remember and output large amounts of training data verbatim. Our research shows that this behavior, of relevance to both data privacy and the foundations of machine learning, may in fact be necessary for high performance.

For further technical reading, please see our paper.

Gavin Brown is a fifth-year PhD student in the Boston University Department of Computer Science, advised by Adam Smith. He is originally from Ohio and received a BS in Mathematics from Case Western Reserve University in 2015.  His research focuses on the theory of data privacy and machine learning. He wants to understand when and why machine learning models memorize large parts of their training data. He also works on differential privacy, and is interested in both creating algorithms for fundamental tasks and proving lower bounds.

 

“Machine Learning & Artificial Intelligence” by mikemacmarketing is licensed under CC BY 2.0

View all posts