LESSWRONG
LW

3435
Lukas Petersson
3956110
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
You can't eval GPT5 anymore
Lukas Petersson2d10

I see. Thank you!

Reply
You can't eval GPT5 anymore
Lukas Petersson3d50

Hi again, should I assume it's not happening?

Reply
LLM robots can't pass butter (and they are having an existential crisis about it)
Lukas Petersson4d20

RT-2 (the paper you cited) is a VLA, not LLM. VLAs are what the "executor" in our diagram uses.

Reply
You can't eval GPT5 anymore
Lukas Petersson22d30

Hey Ted! Any updates? :)

Reply
You can't eval GPT5 anymore
Lukas Petersson1mo30

We set it to some date in the future

Reply
You can't eval GPT5 anymore
Lukas Petersson1mo20

Thanks! Vending-Bench v2 is going to be fire. Would love to include gpt5 <3

Reply
You can't eval GPT5 anymore
Lukas Petersson1mo10

This is a great point. I admit I have to better understand what each model provider does behind the scenes in the API. Sad if the days of access to the model is gone.

Reply
You can't eval GPT5 anymore
Lukas Petersson1mo100

We thought about that, but then it's not reproducible if we want to run it for new models later

Reply
You can't eval GPT5 anymore
Lukas Petersson1mo172

Thanks, that would be great!

Reply
Project Vend: Can Claude run a small shop?
Lukas Petersson4mo40

Thanks for highlighting our work!

Reply1
Load More
98LLM robots can't pass butter (and they are having an existential crisis about it)
6d
6
158You can't eval GPT5 anymore
1mo
15
39AI misbehaviour in the wild from Andon Labs' Safety Report
2mo
0
7The Same Heaven
7mo
1
5Linguistic Imperialism in AI: Enforcing Human-Readable Chain-of-Thought
8mo
0
58AI Safety as a YC Startup
10mo
9