x

cozyfae

Subscribe

Message

20

10

1y

cozyfae

Subscribe

Message

20

10

1y

Recent LLMs can use filler tokens or problem repeats to improve (no-CoT) math performance

cozyfae4mo1-2

That context-less intermediate tokens improve LLM performance isn't surprising, given the theoretical analysis showing that the generation of intermediate tokens allow constant-size transformers to solve more complex problems - see Denny Zhou's presentation.

It is indeed curious that recent LLMs have significantly improved performance compared to older ones. But I wonder if there's a better, different explanation than "meta-cognition".

Reply

Daniel Kokotajlo's Shortform

cozyfae5mo10

Thanks for the clarification!

Reply

1

Daniel Kokotajlo's Shortform

cozyfae5mo1-1

I'm under the impression that the harness has been adjusted over time to fit Claude's deficiencies: https://www.lesswrong.com/posts/7mqp8uRnnPdbBzJZE/is-gemini-now-better-than-claude-at-pokemon

Therefore this benchmark is really benchmarking human+ai capability.

Reply

Was Barack Obama still serving as president in December?

cozyfae6mo10

This is not very surprising when we think about the fact that LLMs are statistical next-token predictors.

Reply

The Doomers Were Right

cozyfae7mo10

Fair scunes for all!

Reply

MakoYass's Shortform

cozyfae7mo10

there's going to be a lot of pressure to make this set of beliefs legible and accountable to the safety team or to states or to the general public.

Where does this pressure come from?

Reply

On "ChatGPT Psychosis" and LLM Sycophancy

cozyfae10mo91

How does the sycophancy compare between o-series models and 4o? AFAIK only o-series have deliberative alignment applied on them.

Reply

Recent AI model progress feels mostly like bullshit

cozyfae1y8-5

These machines will soon become the beating hearts of the society in which we live.

An alternative future: due to the high rates of failure, we don't end up deploying these machines widely in production setting, just like how autonomous driving had breakthroughs long ago but didn't end up getting widely deployed today.

Reply

We Should Prepare for a Larger Representation of Academia in AI Safety

cozyfae1y60

How has these predictions shaken out? How does the growth rate of AI Safety researchers compare between academia & industry?

Reply

How to Make Superbabies

cozyfae1y10

There may be additional societal and political problems afterwards. But none of those problems actually matter unless the technology works.

What do you think of the argument that "There may be additional technical problems afterwards. But none of those problems actually matter unless we have answers for societal and political problems."?

Reply