teradimich — LessWrong

We won’t get docile, brilliant AIs before we solve alignment

But is it appropriate to be ~98% sure that the ASI level will be achieved in the coming years?
If not, then it seems reasonable to allow more uncertainty.
To prove that the forecasts are well calibrated, it would be worthwhile to make more verifiable statements. I have often seen claims that Yudkowsky has perfectly calibrated probabilities, but according to his other public forecasts or his page in Manifold, it does not seem so.

Vladimir_Nesov's Shortform

teradimich3mo20

What do you think about GPT-5? Is this a GPT-4.5 scale model, but with a lot of RLVR training?

Permanent Disempowerment is the Baseline

teradimich3mo10

keeps the future of humanity in a good shape (as well as making it harmless)

Is this the result you expect by default? Or is this just one of many unlikely scenarios (like Hanson's 'The Age of Em') that are worth considering?

America’s AI Action Plan Is Pretty Good

teradimich3mo21

I am sitting here crying as the last remaining bits of diplomatic goodwill and hope for internationally coordinated treaties on coordinating the AI takeoff evaporates.

We can still hope that we won't get AGI in the next couple of years. Society's attitude towards AI is already negative, and we're even seeing some congressmen openly discuss the existential risks. This growing awareness might just lead to meaningful policy changes in the future.

Zach Stein-Perlman's Shortform

teradimich4mo10

plausibly about 3e26 FLOPs

Or 6e26 (in FP8 FLOPs).

And already on February 17th, Colossus had 150k+ GPU. It seems that in the April message they were talking about 200k GPUs. Judging by Musk's interview, this could mean 150,000 H100 and 50,000 H200. Perhaps the time and GPU were enough to train a GPT-5 scale model?

A Slow Guide to Confronting Doom

teradimich7mo112

I sympathize with this line of thinking, but I've never understood something like P(doom)>0.8.

The analogies with cancer or poison seem a bit odd, because we're trying to estimate the probability of an event that has never happened before. Without relying on anything like physical laws, without anything close to consensus. Even among the people who proposed the key ideas of the AI Risk discussions, not all were confident pessimists.

We have too many unknowns. We don't know when superintelligence will appear. We can't predict how governments and corporations will treat AI in the coming years. We don't know what will happen if someone tries to use a sufficiently advanced AI for automated safety research. Or narrow AI might change the situation in the world before superintelligence appears. Our civilization could collapse for any number of reasons.
And I don't think we can say for sure what superintelligence will do to humans.

OpenAI: Detecting misbehavior in frontier reasoning models

teradimich8mo30

Earlier, you wrote about a change to your AGI timelines.
What about p(doom)? It seems that in recent months there have been reasons for both optimism and pessimism.

Towards_Keeperhood's Shortform

teradimich8mo10

It seems a little surprising to me how rarely confident pessimists (p(doom)>0.9) they argue with moderate optimists (p(doom)≤0.5).
I'm not specifically talking about this post. But it would be interesting if people revealed their disagreement more often.

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

teradimich8mo40

Thanks for the reply. I remembered a recent article by Evans and thought that reasoning models might show a different behavior. Sorry if this sounds silly

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

teradimich8mo30

Are you planning to test this on reasoning models?

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments