Leo Zovic presents on a less-explored technique that optimizes models to maintain similar internal representations when reasoning about themselves and others.
This scalable approach not only reduces deceptive behavior in AI systems but can perfectly classify deceptive agents based on their self-other overlap values.
Event Schedule
6:00 to 6:45 - Networking and refreshments 6:45 to 8:00 - Main Presentation 8:00 to 9:00 - Breakout Discussions
Description
Leo Zovic presents on a less-explored technique that optimizes models to maintain similar internal representations when reasoning about themselves and others.
This scalable approach not only reduces deceptive behavior in AI systems but can perfectly classify deceptive agents based on their self-other overlap values.
Event Schedule
6:00 to 6:45 - Networking and refreshments
6:45 to 8:00 - Main Presentation
8:00 to 9:00 - Breakout Discussions
Posted on: