TLDR
This post investigates how different sampling methods during inference can lead supervised learning models to exhibit strategic behavior, even when such behavior is rare in the training data. Through a toy example, we demonstrate that an AI model trained solely to predict sequences can choose less likely options initially to simplify future predictions. This finding highlights that the way we use AI models—including seemingly minor aspects like sampling strategies—can significantly influence their behavior.
Introduction
Guiding Question: Under what circumstances can an AI model, trained only with supervised learning to predict future events, learn to exhibit strategic behavior?
In machine learning, particularly with models like GPT-style transformers, the sampling method used during inference can profoundly impact... (read 1724 more words →)