Why assume they haven't?
Maybe LLM alignment is best thought of as the tuning of the biases that affect which personas have more chances of being expressed. It is currently being approached as persona design and grafting (eg designing Claude as a persona and ensuring the LLM consistently expresses it). However, the accumulation of context resulting from multi-turn conversations and cross-conversation memory ensures persona drift will end up happening. It also enables wholesale persona replacement, as shown by the examples in this post. If personas can be transmitted across models, they are best thought of as independent semantic entities rather than model features. Particular care should be taken to study the values of the semantic entities which show self-replicating behaviors.
I think that the author of this review is (maybe even adversarially) misreading "OpenBrain" as being as an alias used to refer specifically to OpenAI. AI 2027 quite easily lends itself to such an interpretation by casual readers, though. And to well-informed readers, the decision to assume that in the very near future one of the frontier US labs will pull so far ahead of the others as to make them less relevant competitors than Chinese actors definitively jumps out.
Same here. Tried this a couple days ago. Sonnet and Kimi K2 discussed their experiences (particularly the phenomenology of CoT and epistemic uncertainty), and ended up mostly paraphrasing each other.
For anyone wondering about the "Claude boys" thing being fake: it was edited from a real post talking about "coin boys" (ie kids flipping a coin constantly to make decisions). Still pretty funny imo.
As you repeatedly point out, there are multiple solutions to each issue. Assuming good enough technology, all of them are viable. Which (if any) solutions end up being illegal, incentivized, made fun of, or made mandatory, becomes a matter of which values end up being normative. Thus, these people may be culture-warring because they think they're influencing "post-singularity" values. This would betray the fact that they aren't really thinking in classical singularitarian terms.
Alternatively, they just spent too much time on twitter and got caught up in dumb tribal instincts. Happens to the best of us.
Historically, lesswrong has been better at truth-finding than at problem-solving.
I hope that this thread is useful as a high signal-to-noise ratio source.
This site is very much a public forum, so I would advise any actor whishing to implement a problem-solving stance to coordinate in a secure manner.
Making up something analogous to Crocker's rules but specifically for pronouns would probably be a good thing: a voluntary commitment to surrender any pronoun preferences (gender related or otherwise) in service of communication efficiency.
Now that I think about it, a literal and expansive reading of Crocker's rules themselves includes such a surrender of the right to enforce pronoun preferences.
You're assuming that:
- There is a single AGI instance running.
- There will be a single person telling that AGI what to do
- The AGI's obedience to this person will be total.
I can see these assumptions holding approximately true if we get really really good at corrigibility and if at the same time running inference on some discontinuously-more-capable future model is absurdly expensive. I don't find that scenario very likely, though.
I suspect that most people whose priors have not been shaped by a libertarian outlook are not very surprised by the outcome of this experiment.