LESSWRONG
LW

1520
svajxihdvw
0010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
Where does Sonnet 4.5's desire to "not get too comfortable" come from?
svajxihdvw4d*10

In their developer docs for 4.5, Anthropic says that the model compacts the results to stay within the context window: https://www.anthropic.com/news/context-management

When I code with 4.5, it is better at avoiding getting caught in loops than previous versions (but it still happens sometimes, especially when tool integrations seem to be failing).

A few guesses:

  1. I wonder if being trained to understand the size of the context window gives the model an impetus to move on from repetitive output to preserve that limited resource.
  2. I could imagine there being a compacted context containing something like “the two participants exchange a series of repetitive messages about …”, and the model expects that continuing the same repetitive behavior without making progress isn’t rewarding. There are probably simple ways to prevent a model from verbatim repeating the same output over and over, but more abstract repetition (eg. using increasingly creative vocabulary to express the same thought, resulting in repetitive responses from users the model doesn’t directly control) could defeat those.
  3. Also, perhaps compacting the context helps simply by reducing the proportion of history that includes the “loopy” content.
Reply