LESSWRONG
LW

2004
astrobiscuit
3010
Message
Dialogue
Subscribe

https://www.linkedin.com/in/tymekm/

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Risks from Learned Optimization: Introduction
astrobiscuit6y30

To me that the mesaoptimizer in the toy example is:

  • aligned with its goal - it reaches the door (which it incorrectly identifies)
  • dysfunctional - it incorrectly identifies doors.

From a consequentialist perspective this may be irrelevant, but from safety point of view this distinction is important and big.

In the context of this article I believe that misalignment (pseudo alignment) would occur when the goal of the mesa optimizer would diverge from its original goal (change completely, extend, etc.)

(As a secondary point that I haven't thought a lot about, it seems problematic to discuss alignment unless the mesa optimizer's goal liberally contains the base goal: Find doors in order to achieve Obase.)

Reply
No posts to display.