These three things are all instances of being OOM better at something specific. If you consider the LLM somewhat human-level at the thing it does, this suggests that it's doing it in a way which is very different from what a human does.

That said, I'm not confident about this; I can sense there could be an argument that this counts as human but ramped up on some stats, and not an alien shoggoth.

Reply

More people getting into AI safety should do a PhD

rotatingpaguro2mo31

If I had to give only one line of advice to a randomly sampled prospective grad student: you don't actually have to do what the professor says.

Reply

1

Richard Ngo's Shortform

rotatingpaguro2mo30

Ok. Then I'll say that randomly assigned utility over full trajectories are beyond wild!

The basin of attraction just needs to be large enough. AIs will intentionally be created with more structure than that.

Reply

Richard Ngo's Shortform

rotatingpaguro2mo10

I read the section you linked, but I can't follow it. Anyway, here it is its conclusive paragraph:

Conclusion: Optimal policies for u-AOH will tend to look like random twitching. For example, if you generate a u-AOH by uniformly randomly assigning each AOH utility from the unit interval , there's no predictable regularity to the optimal actions for this utility function. In this setting and under our assumptions, there is no instrumental convergence without further structural assumptions.

From this alone, I get the impression that he hasn't proved that "there isn't instrumental convergence", but that "there isn't a totally general instrumental convergence that applies even to very wild utility functions".

Reply

Shortform

rotatingpaguro2mo83

It's AI-based, so my guess is that it uses a lot of somewhat superficial correlates that could be gamed. I expect that if it went mainstream it would be Goodharted.

I expect Goodhart would hit particularly bad if you were doing the kind of usage I guess you are implying, which is searching for a few very well selected people. A selective search is a strong optimization, and so Goodharts more.

More concrete example I have in mind, that maybe applies right now to the technology: there are people who are good at lying to themselves.

Reply