LESSWRONG
LW

1965
Lennart Finke
96Ω3460
Message
Dialogue
Subscribe

Statistics student at ETH. Previously, thinking about evals as a research fellow at Pivotal. 

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
26What GPT-oss Leaks About OpenAI's Training Data
1d
5
15SimpleStories: A Better Synthetic Dataset and Tiny Models for Interpretability
Ω
5mo
Ω
0
18Why does Claude Speak Byzantine Music Notation?
6mo
2
25A Visual Task that's Hard for GPT-4o, but Doable for Primary Schoolers
1y
6
What GPT-oss Leaks About OpenAI's Training Data
Lennart Finke22h00

It is no secret that labs indiscriminately scrape from all over the internet, but usually a filter is applied to remove unwanted content. Because I assume the pretraining team would consider these strings as unwanted content, we can infer there is room to improve the pretraining filtering. I think that better pretraining filtering is useful for mitigating emergent misalignment.

Reply
Why does Claude Speak Byzantine Music Notation?
Lennart Finke6mo10

The component of ignoring two intervening characters is less mysterious to me. For example, a numbered list like "1. first_token 2. second_token ..." would need this pattern. I am wondering mostly why the specific map from b'xa1'-b'xba' to a-z is learned.

Reply
A Visual Task that's Hard for GPT-4o, but Doable for Primary Schoolers
Lennart Finke8mo10

A much appreciated update, thank you!

Reply
A Visual Task that's Hard for GPT-4o, but Doable for Primary Schoolers
Lennart Finke1y30

Agreed, although that it turn makes me wonder why it does perform a bit better than random. Maybe there is some nondeclarative knowledge about the image, or some blurred position information? I might test next how much vision is bottlenecking here by providing a text representation of the grid, as in Ryan Greenblatt's work on ARC-AGI.

Reply