x

LESSWRONG

LW

Maria Kapros

Maria Kapros

Message

9

2

1

3y

Maria Kapros

9

3y

Maria Kapros — LessWrong

Feature-Based Analysis of Safety-Relevant Multi-Agent Behavior

Introduction TL;DR Today’s AI systems are becoming increasingly agentic and interconnected, giving rise to a future of multi-agent (MA) systems (MAS). It is believed that this will introduce unique risks and thus require novel safety approaches. Current research evaluating and steering MAS is focused on behavior alone i.e inputs and...

Apr 21, 2025•10

W2SG: Introduction

Epistemic status: Naive and exploratory, reflects my primary conceptual understanding, awaiting a technical deep dive. 99% of ideas are not my own, rather distilled from the resources hyperlinked throughout. Many alignment researchers err towards local optimization i.e. seek low-hanging fruits and leverage incremental improvements. Fast and imperfect iterative improvement is...

Mar 10, 2024•2