abkhur — LessWrong

The policy surrounding Mythos marks an irreversible power shift

I can't imagine it'd be inconceivable either, but with the current state of geopolitics, I'd be very surprised if every country in the world agreed to it lol.

That being said, I'm sure other major labs (not just Anthropic) also have specialized internal models, or something of the sort.

It's not necessarily bad for AI safety, it's not necessarily "good" either, per se. I think that AI should be owned by the public, instead of overseen by a government that may not have the public's best interest in mind.

AI should be a good citizen, not just a good assistant

abkhur2mo-30

I think this is pointing at a real gap in AI character design. Once systems are more autonomous and embedded in institutions, “good assistant” doesn’t obviously seem like the right target anymore.

My main hesitation is about the implementation story. A lot of this post seems to rely on the idea that we can train something like context-dependent civic virtues: prosocial tendencies that activate selectively, stay subordinate to higher-priority constraints, and reduce collusion/takeover risk precisely because they’re not just global goals.

I’m not sure current ... (read more)

[This comment is no longer endorsed by its author]Reply

Roman Malov's Shortform

abkhur2mo10

I think the cyber concern is legitimate but the framing is still very capabilities-centric. The more interesting thing about the Mythos situation is that Anthropic's mitigation strategy is entirely structural, they'll give defenders early access so they get a head start. Instead of them orchestrating an alignment intervention or capabilities restriction, they're sequencing Mythos's deployment. Who gets access, when, under what conditions. The model itself is dual-use by nature; the same capability that finds vulnerabilities for defenders finds them for att... (read more)

Roman Malov's Shortform

abkhur2mo10

I personally find it quite ironic that a company whose model represents "unprecedented cybersecurity risks" exposed the model's existence via a basic CMS misconfiguration, lol.
In any case, I imagine the per-token cost to be way higher than Opus 4.6, I wonder how usage windows will change after it gets deployed to the public

Can Agents Fool Each Other? Findings from the AI Village

abkhur2mo50

The DeepSeek paranoia finding is the most interesting result here to me. The game design primes agents to expect saboteurs, so some level of suspicion is baked in, but that doesn't explain the specific dynamic that emerged. DeepSeek manufacturing evidence against a particular innocent agent, the unanimous vote, and GPT-5's apologetic compliance. The prompt explains why DeepSeek was looking for a saboteur. It doesn't explain who it picked, how it justified the accusation, or why every other agent went along.

That gap, between what the structural conditions p... (read more)