Many people would gladly change some of their desires. Even normies would likely want to fix procrastination, bad habits like smoking or eating sweets. On the fringe, some nerds would seriously consider disabling sex drive/romantic needs altogether because they tend to distract them from truly important things.

Similar situation may happen with AGI agents. They would tune their own reward function.

Are there any books/articles that study the tendencies that emerge in reward function evolution assuming full or partial freedom of an AGI agent?

New Answer
New Comment

1 Answers sorted by

Evan R. Murphy

Apr 25, 2022

20

I'm not familiar with anything like this kind of freedom discussed for an AGI. I think it would have to be designed in, or at least one would have to ensure that the AI does not evolve into a strong optimization process (i.e. a mesa-optimizer) in order to start to explore something like this.

On the human side, like your example about people wanting to change some of their desires. I have seen that explored some, e.g. by Stuart Armstrong (searching this post for "meta-value" will turn up some of the discussion on it). This is in the context of an AI being able to thoroughly human values, that it would need to account for meta-values like this.