Doing alignment research with Vivek Hebbar's team at MIRI.
after talking to Eliezer, I now have a better sense of the generator of this list. It now seems pretty good and non-arbitrary, although there is still a large element of taste.
Suppose an agent has this altruistic empowerment objective, and the problem of getting an objective into the agent has been solved.
Wouldn't it be maximized by forcing the human in front of a box that encrypts its actions and uses the resulting stream to determine the fate of the universe? Then the human would be maximally "in control" of the universe but unlikely to create a universe that's good by human preferences.
I think this reflects two problems:
I'm offering a $300 bounty to anyone that gets 100 karma doing this this year (without any vote manipulation).
Manifold market for this:
They also separately believe that by the time an AI reaches superintelligence, it will in fact have oriented itself around a particular goal and have something like a goal slot in its cognition - but at that point, it won’t let us touch it, so the problem becomes we can't put our own objective into it.
My guess is this is a bit stronger than what Nate believes. The corresponding quote (emphasis mine) is
Separately and independently, I believe that by the time an AI has fully completed the transition to hard superintelligence, it will have ironed out a bunch of the wrinkles and will be oriented around a particular goal [...]
and I wouldn't be surprised myself if by the time an AI is superhuman at basically all tasks, it is still as incoherent as humans, especially if it uses more inference compute than a human brain.
Feynman once challenged people to come up with a problem that could be stated quickly but he couldn't solve to within 10% in a minute, and a colleague stumped him with finding .
Even if it has some merits, I find the "death with dignity" thing an unhelpful, mathematically flawed, and potentially emotionally damaging way to relate to the problem. Even if MIRI has not given up, I wouldn't be surprised if the general attitude of despair has substantially harmed the quality of MIRI research. Since I started as a contractor for MIRI in September, I've deliberately tried to avoid absorbing this emotional frame, and rather tried to focus on doing my job, which should be about computer science research. We'll see if this causes me problems.
I made a Manifold market for some key claims in this post:
Did this ever happen?