x

LESSWRONG

LW

Davide_Zagami — LessWrong

Davide_Zagami

Davide_Zagami

Message

43

1

15

9y

Davide_Zagami

43

9y

Vulnerabilities in CDT and TI-unaware agents

by PabloAMC, Davide_Zagami, and Chris_Leong

The aim of this post is illustrating the need to take into account decision-making and incentive considerations when designing agents. This post is also a proof that these considerations are important in order to ensure the safety of agents. Also, we will postulate that there exist some agents that are...

Mar 10, 2020•5

Am I understanding the problem of fully updated deference correctly?

I understand that one solution to AI alignment would be to build an agent with uncertainty about its utility function, so that by observing the environment and in particular us, it can learn our true utility function and optimize for that. And according to the problem of fully updated deference,...

Sep 30, 2018•3