AI Safety Thursdays: Understanding The Self-Other Overlap Approach

Juliana Eberschlag

2 AI Safety Thursdays: Understanding The Self-Other Overlap Approach

by Juliana Eberschlag

1 min read

0

2

The host has requested RSVPs for this event

Description

Leo Zovic presents on a less-explored technique that optimizes models to maintain similar internal representations when reasoning about themselves and others.

This scalable approach not only reduces deceptive behavior in AI systems but can perfectly classify deceptive agents based on their self-other overlap values.

Event Schedule

6:00 to 6:45 - Networking and refreshments
6:45 to 8:00 - Main Presentation
8:00 to 9:00 - Breakout Discussions

Posted on: 15th May 2025

2

New Comment

Moderation Log