corrigible-commenter
corrigible-commenter has not written any posts yet.

corrigible-commenter has not written any posts yet.

I think decision making can have an impact on values, but that this depends on the design of the agent. In my comment, by values, I had in mind something like “the thing that the agent is maximizing”. We can imagine an agent like the paperclip maximizer for which the “decision making” ability of the agent doesn’t change the agent’s values. Is this agent in an “epistemic pit”? I think the agent is in a “pit” from our perspective, but it’s not clear that the pit is epistemic. One could model the paperclip maximizer as an agent whose epistemology is fine but that simply values different things than we do. In the... (read more)
I think the thought experiment that you propose is interesting, but doesn't isolate different factors that may contribute to people's intuitions. For example, it doesn't distinquish between worries about making individual people powerful because of their values (e.g. they are selfish or sociopathic) vs. worries due to their decision-making processes. I think this is important because it seems likely that "amplifying" someone won't fix value-based issues, but possibly will fix decision-making issues. If I had to propose a candidate crux, it would probably be more along the lines of how much of alignment can be solved through using a learning algorthm to help learn solutions vs. how much of the problem needs to be solved "by hand" and understood on a deep level rather than learned. Along those lines, I found the postscript to Paul Christiano's article on corrigibility interesting.
Thanks a lot for this post. I think “specification gaming” is an essential problem to solve but also possibly one of the hardest to make progress on, so it’s great to see resources aimed at helping people do that. I was unaware of CycleGAN, but it sounds very interesting and I plan to look into it. Thanks for posting something that put me onto an interesting obvservation!