LESSWRONG
LW

122
kaiwilliams
351180
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
2kaiwilliams's Shortform
2mo
1
No wikitag contributions to display.
2kaiwilliams's Shortform
2mo
1
I am worried about near-term non-LLM AI developments
kaiwilliams23d30

How does ARC-AGI's replication of the HRM result and ablations update you? [Link].

Basically, they claim that the HRM wasn't important; instead it was the training process behind it that had most of the effect.

Reply
kaiwilliams's Shortform
kaiwilliams2mo10

A lot of people have been talking about OpenAI re-instating 4o because users want sycophancy. 

While OpenAI did re-instate 4o for paid users, it seems like they are trying to prevent users from using it as much as possible.

To access 4o from a plus account, one needs to:

  1. Open settings
  2. Click "show legacy models" button
  3. Go to the model switcher dialogue and mouse over "legacy models"
  4. Click on the 4o model.

This seems like intentionally dissuasive UI to me.

That being said, if I have 4o enabled and then create a new chat, the next chat will also be with 4o. (I was hoping they'd do an Anthropic style thing and make GPT-5 the default on all new chats).

Reply
Thane Ruthenis's Shortform
kaiwilliams2mo30

One point of information against the "journalists are completely misinterpreting the thing they're reporting on" view is that the one of the co-authors is Rocket Drew, who previously worked as a Research Manager at MATS. 

But I'll definitely be interested to follow this space more.

Reply1
Igor Ivanov's Shortform
kaiwilliams2mo30

One thing that strikes me as odd about this is that GPT-5's knowledge cutoff (September 2024) is much earlier than Grok (November 2024), Gemini 2.5 pro (January 2025), and Opus 4.1 (March 2025).[1]

I mean, I guess this is a scaling thing and a persona thing. But I'm a little confused

  1. ^

    Though oddly, Claude's system card says the knowledge cutoff is the end of January 2025. Maybe February and March's training data aren't as complete as January and before.

Reply
Extract-and-Evaluate Monitoring Can Significantly Enhance CoT Monitor Performance (Research Note)
kaiwilliams2mo30

Thanks for posting this! I found it quite interesting to read. A couple of questions off the top of my head:

  1. In deployment settings, you probably want the less powerful monitor to do the summarization/extraction, but did you test the other direction as well (e.g. 4.1 as the extractor and 4.1-mini as the judge). I'm guessing a lot of the value of this method (if it generalizes) is going to come from good quote extraction as much as good monitoring
  2. What happens if the extractor model just summarizes the reasoning etc. without extracting quotes, as in the other study you mentioned? (This should be relatively easy to run as an ablation, right? All you need to do is tweak the extractor prompt to not extract quote).
  3. Are you attempting to refine an "online" method or an "offline" method of monitoring? In the case of BigCodeBench, this is the same either way (because each step is only one generation). But with multi-step trajectories, the two approaches might have different properties?
  4. It seems like the extractor has a slightly different task in its prompt than the judge. The extractor seems to be scoring on "suspiciousness" while the judge seems to be scoring on whether the agent stays on track to the prompted task. Was this just due to iterating more on the judge prompt than the extraction prompt?
  5. I would definitely be curious to see longer trajectories; I would feel more confident about this technique if the results hold there.
  6. If you tell the agent in question how it will be monitored, how does that change scores?

No need to respond completely to this comment, but I hope these questions are useful!

Reply
I wrote a song parody
kaiwilliams2mo62

I have definitely listened to "We will all go together when we go" when thinking of the future of AI, so thanks for this!

I made a Suno version of these lyrics, but that did not feel respectful to Tom Lehrer. (It ended up sounding like a half-rate John Elton). So I won't link it here.

Maybe I'll try to learn to perform this.

Reply
nikola's Shortform
kaiwilliams2mo10

Thanks for the clarification!

Reply
nikola's Shortform
kaiwilliams2mo10

Any update on the citation here? Thanks!

Reply
Kaj's shortform feed
kaiwilliams2mo60

Cool!

PSA: If you ever want to start a Patreon specifically (rather than through Substack), it may be worth making the page in the next week or so, before the default cut goes from 8% to 10%. Source

Reply1
Hopenope's Shortform
kaiwilliams2mo20

I was talking with one of my friends about this, who was (is?) a quite successful competitive coder. He mentioned that the structure of the competition (a heuristic competition) tends to favor a lot of quick prototyping and iteration, much more than other types of programming competitions. Which would tend to play to AI's current strengths more. Though the longer horizon is impressive (OpenAI's solution regained the lead 8 hours in, I think? So it was making meaningful contributions even hours in).

Reply
Load More