x

Angie Normandale

Subscribe

Message

8

2

2y

Angie Normandale

Subscribe

Message

8

2

2y

Why modelling multi-objective homeostasis is essential for AI alignment (and how it helps with AI safety as well). Subtleties and open challenges.

Angie Normandale2y50

I've been exploring this for the last year, I think it's a promising avenue for solving some key alignment issues. Homeostatic approaches are well documented in neuroscience but a surprisingly neglected approach to alignment.

Research in support:

Managing competing drives, homeostatic approaches ensure safety wins out
-Mathematical formalisations by Laurençon et al, Kermati and Gutkin
-Robotics researchers successfully implemented a homeostatic approach to help a system manage competing drives
-Friston suggests it's a way to manag... (read more)

Reply

Inducing Unprompted Misalignment in LLMs

Angie Normandale2y51

Great paper! Important findings.

What’s your intuition re ways to detect and control such behaviour?

An interesting extension would be training a model on a large dataset which includes low level but consistent elements of primed data. Do the harmful behaviours persist and generalise? If yes, could be used to exploit existing ‘aligned’ models which update on publicly modifiable datasets.

Reply