LESSWRONG
LW

sanxiyn
911141565
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
ryan_greenblatt's Shortform
sanxiyn1mo30

I think IMO results were driven by general purpose advances, but I agree I can't conclusively prove it because we don't know details. Hopefully we will learn more as time goes by.

An informal argument: I think currently agentic software engineering is blocked on context rot, among other things. I expect IMO systems to have improved on this, since IMO time control is 1.5 hours per problem.

Reply
ryan_greenblatt's Shortform
sanxiyn1mo94

I think non-formal IMO gold was unexpected and we heard explicitly that it won't be in GPT-5. So I would wait to see how it would pan out. It may not matter in 2025 but I think it can in 2026.

Reply2
GDM also claims IMO gold medal
sanxiyn1mo70

I think it is important to note that Gemini 2.5 Pro Capable of Winning Gold at IMO 2025, with good enough scaffolding and prompt engineering.

Reply
Grok 4 Various Things
sanxiyn2mo30

Do you have any Solomonoff inductor you know? I don't, and I would like an introduction.

Reply
OpenAI Model Differentiation 101
sanxiyn2mo10

Ethan Mollick's Using AI Right Now: A Quick Guide from 2025-06 is in the same genre and pretty much says the same thing, but the presentation is a bit different and it may suit you better, so check it out. Naturally it doesn't discuss Grok 4, but it also does discuss some things missing here.

Reply
The next wave of model improvements will be due to data quality
sanxiyn2mo50

Anthropic does have a data program, although it is only for Claude Code, and it is opt in. See About the Development Partner Program. It gives you 30% discount in exchange.

Reply
Serving LLM on Huawei CloudMatrix
sanxiyn2mo20

CloudMatrix was not, but Huawei Ascend has been there for a long time, and was used to train LLM even back in 2022. I didn't realize AI 2027 predated CloudMatrix but I still think ignoring China for Compute Production was unjustified.

Reply
Serving LLM on Huawei CloudMatrix
sanxiyn3mo10

This is a good argument and I think it is mostly true, but this absolutely should be in AI 2027 Compute Forecast page. Simply not saying a word about the topic makes it looks unserious and incompetent. In fact, that reaction happened repeatedly in my discussion with my friends in South Korea.

Reply
AI companies' eval reports mostly don't support their claims
sanxiyn3mo140

I know cyber eval results are underelicitation. Sonnet 4 can find zero day vulnerabilities, we are now in process of disclosing. If you can't get it to do that it's your skill issue.

Reply
Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies
sanxiyn3mo50

Preordered ebook version on Amazon. I am also interested in doing Korean translation.

Reply
Load More
DeepMind
3y
(+55/-43)
AlphaStar
3y
DeepMind
3y
(+20)
AlphaTensor
3y
24Serving LLM on Huawei CloudMatrix
3mo
7
10OpenAI lied about SFT vs. RLHF
7mo
2
-5Next automated reasoning grand challenge: CompCert
1y
0
9National Telecommunications and Information Administration: AI Accountability Policy Request for Comment
2y
0
7Cyberspace Administration of China: Draft of "Regulation for Generative Artificial Intelligence Services" is open for comments
2y
2
5Large language models aren't trained enough
2y
4
26Alpaca: A Strong Open-Source Instruction-Following Model
2y
2
31Adversarial Policies Beat Professional-Level Go AIs
3y
35
15DeepMind on Stratego, an imperfect information game
3y
9
3Russia will do a nuclear test
3y
7
Load More