ErickBall — LessWrong

LESSWRONG
is fundraising!
LW

Okay but the six year trend is not super indicative of the future trend given the way the AI industry has developed in the last couple years. It would be much more useful to have a good measure of the recent trend, and it's clear we don't really have that. The wide uncertainty bounds already should have made it clear how noisy this result was, but people have tended to ignore the range and put a lot of stock in the point estimate. The log linear trend may well be legit (probably is) but the use of just 14 tasks to fit the key portion of the results is absurd.

Stop Applying And Get To Work

ErickBall1mo20

I think this is true in most fields, and even in technical fields where you can learn a lot from reading papers, you learn a lot of very different (and more practical) skills by working with experienced people before you strike out on your own.

Gemini 3 is Evaluation-Paranoid and Contaminated

ErickBall1mo41

Doing well on SimpleQA could also be evidence of benchmark contamination in the training data.

The "impossible scenario of actual time travel" [forward in time] is pretty funny. Thanks for replicating the key findings from the post.

Dominance: The Standard Everyday Solution To Akrasia

ErickBall1mo73

This seems valid as far as it goes, but there are common situations where dominance/accountability as a motivation strategy starts to fall apart. I'm thinking mainly of some types of white-collar knowledge work (at a fairly high level) where it's not easy to tell how hard or effectively someone is working from the outside. Maybe somebody with more management experience can comment on this, but my impression is: As a manager, you want to avoid micromanaging, both because you don't have the time or expertise for it and because it pisses people off. I think it's much easier, and maybe more reliable, to try to hire people with "internal motivation" than to try to motivate them via dominance. Dominance as a motivator very easily turns into fear and is fundamentally adversarial. You don't want your employees to feel like they're working against you, right? Complex intellectual work benefits from a collaborative environment. So unless you can make the work super interesting, it mostly falls to the employees to manage their own akrasia.

Is 90% of code at Anthropic being written by AIs?

ErickBall1mo10

I think a more traditional software company would be a much better measuring stick than Anthropic. Then your metric could be something closer to "lines of production code committed" and that code would account for most of the meaningful coding work done by the company (versus an AI dev company where a lot of the effort goes into experiments, training, data analysis, and other non-production code). Though, of course, "90% of the code written by AI" still wouldn't mean that the AI did 90% of the work, since the humans would probably do the hardest parts and also supervise the AI and check its output.

Legible vs. Illegible AI Safety Problems

ErickBall1mo91

This seems to assume that legible/illegible is a fairly clear binary. If legibility is achieved more gradually, then for partially legible problems, working on solving them is probably a good way to help them get more legible.

If Anyone Builds It Everyone Dies, a semi-outsider review

ErickBall2mo10

It seems like the pressing circumstances are likely to be "some other AI could do this before I do" or even just "the next generation of AI will replace me soon so this is my last chance." Those are ways that a roughly human level AI might end up trying a longshot takeover attempt. Or maybe not, if the in between part turns out to be very brief. But even if we do get this kind of warning shot, it doesn't help us much. We might not notice it, and then we're back where we started. Even if it's obvious and almost succeeds, we don't have a good response to it. If we did, we could just do that in advance and not have to deal with the near-destruction of humanity.

Self-preservation or Instruction Ambiguity? Examining the Causes of Shutdown Resistance

ErickBall5mo30

The narrow instrumental convergence you see here doesn't (necessarily) reflect an innate self-preservation drive, but it still follows the same logic that we would expect to cause self-preservation if the model has any goal. Currently the only available way to give it a goal is to provide instructions. It would be interesting to see some tests where the conflict is with a drive created by fine-tuning. Based on the results here, it seems like shutdown resistance might then occur even without conflicting instructions.

Also, the original instructions with the shutdown warning really weren't very ambiguous. If someone told you to take a math quiz, and if someone comes in and tries to take your pen, let them take it, would you try to hide the pen? It makes sense that making the precedence order more explicit makes the model behavior more reliable, but it's still weird that it was resisting shutdown in the original test.

Consider chilling out in 2028

ErickBall6mo30

As you implied above, pessimism is driven only secondarily by timelines. If things in 2028 don't look much different than they do now, that's evidence for longer timelines (maybe a little longer, maybe a lot). But it's inherently not much evidence about how dangerous superintelligence will be when it does arrive. If the situation is basically the same, then our state of knowledge is basically the same.

So what would be good evidence that worrying about alignment was unnecessary? The obvious one is if we get superintelligence and nothing very bad happens, despite the alignment problem remaining unsolved. But that's like pulling the trigger to see if the gun is loaded. Prior to superintelligence, personally I'd be more optimistic if we saw AI progress requiring even more increasing compute than the current trend--if the first superintelligences were very reliant on massive pools of tightly integrated compute, and had very limited inference capacity, that would make us less vulnerable and give us more time to adapt to them. Also, if we saw a slowdown in algorithmic progress despite widespread deployment of increasingly capable coding software, that would be a very encouraging sign that recursive self-improvement might happen slowly.

The Intelligence Symbiosis Manifesto - Toward a Future of Living with AI

ErickBall6mo10

I hate to rain on your parade, but equal coexistence with artificial superintelligence would require solving the alignment problem, or the control problem. Otherwise it can and eventually will do something we consider catastrophic.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments