A ToM-Inspired Agenda for AI Safety Research
TLDR: Theory of Mind (ToM) is the cognitive ability to attribute mental states to oneself and others, allowing individuals (or AIs) to understand that others have perspectives different from their own. Understanding ToM in models can help us mitigate three high-stakes problems from transformative AI: 1. Our agentic ecosystems are...
Mar 247