In Machine Unlearning, the aim is to reduce performance on some "unlearned" tasks, while keeping performance on some "retained" tasks.  While traditionally used in the context of privacy preservation and GDPR, some of the research is relevant to the field of AI Interpretability. Here is some terminology often used in the machine unlearning literature. (note that there can be some minor differences in use):

  • Forgotten/Unlearned task: task or knowledge you want the model to forget.
  • Retained task: task or knowledge you want to have the model stay good at. (i.e: the entire dataset except for the unlearned task).
  • Original model: the base model that you start off with.
  • Unlearned model: the model after the machine unlearning technique is applied. This model should be worse at some "unlearned" task, but should still be good at the "retained" task.
  • Relearned model: train the unlearned model to do the unlearned task again. 
  • Retrained model: train a randomly initialised model from scratch on the whole dataset, excluding the task you don't want it to do (i.e: only on retained tasks). Can be very expensive for large models.
  • Streisand effect: parameter changes are so severe that the unlearning itself may be detected. (Related to Goodhart-ing the unlearning metrics).


For an overview, one can look at "A Survey of Machine Unlearning