faul_sname — LessWrong

Why is OpenAI releasing products like Sora and Atlas?

also "crowdsource the search for bugs in the implicit physics engine of the video generator"

Epistemic status: brain noise: "the AGI" -> "a country of geniuses in a datacenter" -> "a galaxy-scale civilization of midwits in the internet".

faul_sname's Shortform

faul_sname6d126

Does significant RL just make model reasoning chains weird, or is there some other reason Anthropic has quietly stopped showing raw thinking outputs?

Back when extended thinking for Claude Sonnet 3.7 was released, Anthropic showed the full reasoning chain.

As well as giving Claude the ability to think for longer and thus answer tougher questions, we’ve decided to make its thought process visible in raw form.

Then with Claude 4 they introduced reasoning summaries, but said

Finally, we've introduced thinking summaries for Claude 4 models that use a smaller model to condense lengthy thought processes. This summarization is only needed about 5% of the time—most thought processes are short enough to display in full.

On September 18, 2025, Anthropic posted an article Extended Thinking: Differences in Thinking Across Model Versions

The Messages API handles thinking differently across Claude Sonnet 3.7 and Claude 4 models, primarily in redaction and summarization behavior. See the table below for a condensed comparison:

Feature Claude Sonnet 3.7 Claude 4 Models

Thinking Output Returns full thinking output Returns summarized thinking

Interleaved Thinking Not supported Supported with interleaved-thinking-2025-05-14 beta header

Feature	Claude Sonnet 3.7	Claude 4 Models
Thinking Output	Returns full thinking output	Returns summarized thinking
Interleaved Thinking	Not supported	Supported with `interleaved-thinking-2025-05-14` beta header

leogao's Shortform

faul_sname9d30

My understanding is that Bridgewater has a bunch of people like this, but they are unlikely to share their answers with the broader world.

faul_sname's Shortform

faul_sname11d3810

Why is it worse for x risk for China to win the AI race?

My understanding of the standard threat model is that, at some point, governments will need to step in and shut down or take control over profitable and popular projects for the good of all society. I look at China, and I look at the US, and I can't say "the US is the country I would bet on to hit the big red button here".

There's got to be something I'm missing here.

Humans Are Spiky (In an LLM World)

faul_sname13d20

My point is more that we have millennia of experience building tools and social structures for making humans able to successfully accomplish tasks, and maybe 2 years of experience building tools and structures for making LLM agents able to successfully accomplish tasks.

I do agree that there's some difference in generality, but I expect that if we had spent millennia gathering experience building tools and structures tailored towards making LLMs more effective, the generality failures of LLMs would look a lot less crippling.

If you take a bunch of LLMs and try to get them to collaboratively build a 1GW power plant, they are going to fail mostly in ways like

they have hilariously poor vision
they don't make effective use of new tools
they don't create new tools to trivialize repetitive tasks
they get caught in loops of trying the same ineffective thing over and over

All of these are failure modes which can be substantially mitigated by better scaffolding of the sort that is hard to design in one shot but easy to iteratively improve over time.

Humans Are Spiky (In an LLM World)

faul_sname13d20

Humans are hilariously bad at wilderness survival in the absence of societal knowledge and support. The support doesn't need to be 21st-century-shaped but we do need both physical and social technology to survive and reproduce reliably.

That doesn't matter much, though, because humans live in an environment which contains human civilization. The "holes" in our capabilities don't come up very often.

The right tools could also paper over many of the deficiencies of LLM agents. I don't expect the tools which make groups of LLM agents able to collectively do impressive things to result in particularly human-shaped agents though.

Concretely, sample efficiency is very important if you want a human-like agent that can learn on the job in a reasonable amount of time. It's much less important if you can train once on how to complete each task with a standardized set of tools, and then copy the trained narrow system around as needed.

(Note: perhaps I should say "language-capable agent" rather than "llm-based agent")

Daniel Tan's Shortform

faul_sname18d20

I no longer consider scaffolded LLMs as a relevant concern/threat.

I am extremely surprised to see you say that, to the point that I think I must be misinterpreting you. What tools an LLM has the ability to use seems to have huge effects on its ability to do things.

Concretely, Claude 3.5 Sonnet can do far more useful coding tasks with a single tool to execute bash commands on a VM than Claude 4.5 Sonnet can in the absence of that tool. Or is "while loop plus tools" not the type of scaffolding you're referring to?

Wei Dai's Shortform

faul_sname18d94

I think it's likely that without a long (e.g. multi-decade) AI pause, one or more of these "non-takeover AI risks" can't be solved or reduced to an acceptable level.

I think it is also worth considering the possibility that these risks aren't the sort of thing which can be reduced to an acceptable level with a decade-scale AI pause either. Particularly the ones which people have been trying to solve for centuries already (e.g. principal-agent problem).

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments