"SOTA alignment research includes stuff like showing that training the models on a hack-filled environment misaligns them unless hacking is framed as a good act"
I am not sure that these are examples of the kind of alignment research TsviBT meant, as the post concerns AGI.
SOTA alignment researchers at Anthropic can:
- prove the existence of phenomena through explicitly demonstrating them.
- make empirical observations and proofs about the behaviour of contemporary models.
- offer conjectures about the behaviour of future models.
Nobody at Anthropic can offer (to my knowledge) a substantial scientific theory that would give reason to be extremely confident that any technique they've found will extend to models in the future. I am not sure if they have ever explicitly claimed that they can.
"Empirically when I advocate internally for things that would be commercially costly to Anthropic I don't notice this weighing on my decisionmaking basically at all, like I'm not sure I've literally ever thought about it in that setting?"
With respect, one of the dangers of being a flawed human is the fact that you aren't aware of every factor that influences your decision making.
I'm not sure that a lack of consciously thinking about financial loss/gain is good empirical evidence that it isn't affecting your choices.
By all means wear what you want, but the positive reactions you get from strangers who directly approach you are not necessarily an accurate way to gauge how most people are reacting to your outfit. You're sampling from the population of "people who have spontaneously chosen to engage with you".
Generally when you wear a polarising outfit people who dislike it won't go out of their way to tell you. I'm extroverted enough that I will (very occasionally) complimented strangers in public on nice/unusual outfits, but I've never told a stranger their outfit is bad.
"I inevitably get weird looks from the kind of people who think having a tattoo is an affront to god but they give me that look for just existing with blue hair and pronouns too"
This line in particular just seems like bad epistemics. Is it really likely that everyone who reacts badly to their outfit would also judge them for having coloured hair?
I think we should show some solidarity to people committed to their beliefs and making a personal sacrifice, rather than undermining them by critiquing their approach.
Given that they're both young men and the hunger strikes are occurring in the first world, it seems unlikely anyone will die. But it does seem likely they or their friends will read this thread.
Beyond that, the hunger strike is only on day 2 and is has already received a small amount of media coverage. Should they go viral then this one action alone will have a larger differential impact on reducing existential risk than most safety researchers will achieve in their entire careers.
https://www.businessinsider.com/hunger-strike-deepmind-ai-threat-fears-agi-demis-hassabis-2025-9
il faut imaginer sisyphe heureux
Von Neumann might have been driven by a feeling of inadequacy, but that doesn't mean it was necessary for his success. One can imagine Von NewOutlook-Mann who took the same actions in life but viewed them as working towards a positive goal rather than needing to prove himself.
It strikes me that Anthropic's blog post is engaging in a bit of double-speak in saying they are "disrupting" the operations of cybercriminals.
What they are describing is retroactively taking action after crime has occurred.
The following illustration from 2015 by Tim Urban seems like a decent summary of how people interpreted this and other statements.
I've thrown on some limit orders if anyone is strongly pro-Kokotajlo.
That's awesome to hear.
(On a side note your hyperlink currently includes a spurious fullstop that means the link 404's).
"I know someone who can actually attend to literally five conversations at once."
I agree that some people are better at generic multitasking than others, and there are some people who are better at monitoring multiple conversations.
I also believe you know somebody who claims they can attend to 5 conversations at once.
But I'd comfortably bet even money that their ability to recall and process information drops off quickly once they're trying to attend to more than 2. My model is that unintentionally tricking yourself into believing you have this ability is easier than actually learning it.
Beyond that I'm not sure that the multi/single thread dichotomy is a particularly useful abstraction to describe how human brains function nor does it provide much predictive power.
Here I am not claiming all humans are single or multi-threaded. I am disputing if it is even a meaningful abstraction.