LESSWRONG
LW

It is no secret that labs indiscriminately scrape from all over the internet, but usually a filter is applied to remove unwanted content. Because I assume the pretraining team would consider these strings as unwanted content, we can infer there is room to improve the pretraining filtering. I think that better pretraining filtering is useful for mitigating emergent misalignment.

Why does Claude Speak Byzantine Music Notation?

Lennart Finke6mo10

The component of ignoring two intervening characters is less mysterious to me. For example, a numbered list like "1. first_token 2. second_token ..." would need this pattern. I am wondering mostly why the specific map from b'xa1'-b'xba' to a-z is learned.

A Visual Task that's Hard for GPT-4o, but Doable for Primary Schoolers

Lennart Finke8mo10

A much appreciated update, thank you!

A Visual Task that's Hard for GPT-4o, but Doable for Primary Schoolers

Lennart Finke1y30

Agreed, although that it turn makes me wonder why it does perform a bit better than random. Maybe there is some nondeclarative knowledge about the image, or some blurred position information? I might test next how much vision is bottlenecking here by providing a text representation of the grid, as in Ryan Greenblatt's work on ARC-AGI.