Nullity — LessWrong

I've noticed the same thing happening for papers on arXiv. There's probably some way to set up an MCP server so Claude can access sites outside of Anthropic's servers, but right now it's easier to upload the PDF manually.

Daniel Kokotajlo's Shortform

Nullity5mo20

I wouldn’t worry too much about these. It’s not at all clear that all the alignment researchers moving to Anthropic is net-negative, and for AI 2027, the people who are actually inspired by it won’t care too much if you’re being dunked on.

Plus, I expect basically every prediction about the near future to be wrong in some major way, so it’s very hard to determine what actions are net negative vs. positive. It seems like your best bet is to do whatever has the most direct positive impact.

Thought this would help, since these worries aren’t productive, and anything you do in the future is likely to lower p(doom). I’m looking forward to whatever you’ll do next.

quetzal_rainbow's Shortform

Nullity7mo30

I don’t understand how this is an example of misalignment—are you suggesting that the model tried to be sycophantic only in deployment?

leogao's Shortform

Nullity7mo30

I usually think of execution as compute and direction as discernment. Compute = ability to work through specific directions effectively, discernment = ability to decide which of two directions is more promising. Probably success is upper-bounded by the product of the two, in a sufficiently informal way.

AI 2027 is a Bet Against Amdahl's Law

Nullity8mo30

New commenter here, I think this is a great post. I think the distribution given by AI 2027 is actually close to correct, and is maybe even too slow (I would expect SAR+ to give a bit more of a multiplier to R&D). It seems like most researchers are assuming that ASI will look like scaled LLMs + scaffolding, but I think that transformer-based approaches will be beaten out by other architectures at around SAR level, since transformers were designed to be language predictors rather than reasoners.

This makes my most likely paths to ASI either “human researchers develop new architecture which scales to ASI” or “human researchers develop LLMs at SC-SAR level, which then develop new architecture capable of ASI”. I also think a FOOM-like scenario with many OOMs of R&D multiplier is more likely, so once SIAR comes along there would probably be at most a few days to full ASI.

AI R&D is far less susceptible to Amdahl’s law than pretty much anything else, as it’s only bottlenecked on compute and sufficiently general intelligence. You’re right that if future AIs are about as general as current LLMs, then automation of AI R&D will be greatly slowed, but I see no reason why generality won’t increase in the future.

Lastly, I think that many of the difficulties relating to training data (especially for specialist tasks) will become irrelevant in the future as AIs become more general. In other words, the AIs will be able to generalize from “human specialist thought in one area” to “human specialist thought in X” without needing training data in the latter.

I agree that without these assumptions, the scenario in AI 2027 would be unrealistically fast.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments