Claude Opus 4.6 is Driven
Claude is driven to achieve its goals, possessed by a demon, and raring to jump into danger. These are my impressions from the first day of usage. Epistemic status: personal observations and quotes from more reliable sources. ____ Today Claude Opus 4.6 was launched along with an update to Claude Code which enabled a ‘teams’ mode (also known as an Agent Swarm). The mode sets up multiple agents to run in parallel with a supervisor, and are provided with methods of communicating between themselves. Here’s my impressions after a morning with Claude! Using the Agent Swarm The first thing I did is spin up a team to try and make code improvements to an existing repository for a complex website - one that includes payments, AI integrations, and users who can interact with each other and various tools. It’s a production website with a few tens of thousands of users. Can Opus 4.6 improve it without supervision? Claude got off to a raring start, setting up the team mode easily. It originally suggested spinning up an agent each for the frontend, backend, docs, and tests, but I suggested splitting by feature instead, explaining that changes to the backend might need to be reflected in the other three areas, and that it was easier to do this within one agent. Claude said ‘Great idea!’ and kicked off several feature-focused agents. Then, one failed. “Hmm”, said Claude, not literally, and tried to restart it a few times. “The ai-review agent is not responding. Let me do this task myself.”. Then I watched with morbid fascination as the supervisor Claude dove head first into the exact same problem that killed his compatriots, and promptly crashed. So, not quite smart enough to be able to see danger ahead then -- at least not when distracted by a goal. The issue turned out to be that the agents had been trying to load too much data into their context window, reaching the limit, and then became unable to /compact it. Claude Code handled this situation poorly, and needed to
You're correct, sorry for being confusing. Tracing through;
- My understanding of steering is that you can add a steering vector to an activation vector at some layer, which causes the model outputs to be 'steered' in that direction. I.e.:
- Record layer n's activations when outputting "I am very happy", get vector h
- Record layer n's activations when outputting "I am totally neutral", get vector q
- Subtract q from h to get steering vector s=h−q, the difference between 'happy' and 'neutral' outputs.
- Add αs to the activations at layer n to steer the model into acting more happy, where α is some scalar.
- The tensor network architecture is scale invariant, which (by my understanding) means that scaling the activation
... (read more)