I'm an admin of LessWrong. Here are a few things about me.
I haven't read all of the quotes, but here's a few thoughts I jotted down while reading through.
I have curated this (i.e. sent it out on our mailing list to ~30k subscribers). Thank you very much for putting these quotes together. While his perspective on the world has some flaws, I have still found wisdom in Tolkien's writings, which helped me find strength at one of the weakest points of my life.
I also liked Owen CB's post on AI, centralization, and the One Ring, which is a perspective on our situation I've found quite fruitful.
When the donation came in 15 mins ago, I wrote in slack
(I think he should get a t-shirt)
So you came close to being thwarted! But fear not, after reading this I will simply not send you a t-shirt :)
That makes sense. We have something of a solution to this where users with RSS crossposting can manually re-sync the post from the triple-dot memu. I'll DM you about how to set it up if you want it.
That'd be a bug! Just to confirm, you were subscribed before I put this post up on Saturday morning, and don't have an email? Also reminder to check spam if you haven't.
My take is it's fine/good, but the article is much more likely to be read (by me and many others) if the full content is crossposted (or even the opening bunch of paragraphs).
Adding onto this, I would broadly say that the Lightcone team did not update that in-person infrastructure was unimportant, even while our first attempt was an investment into an ecosystem we later came to regret investing in.
Also here's a quote of mine from the OP:
If I came up with an idea right now for what abstraction I'd prefer, it'd be something like an ongoing festival with lots of events and workshops and retreats for different audiences and different sorts of goals, with perhaps a small office for independent alignment researchers, rather than an office space that has a medium-size set of people you're committed to supporting long-term.
I'd say that this is a pretty close description of a key change that we made, that changes my models of the value of the space quite a lot.
For the record, all of Lightcone's community posts and updates from 2023 do not seem to me to be at all good fits for the review, as they're mostly not trying to teach general lessons, and are kinda inside-baseball / navel-gazing, which is not what the annual review is about.
Presenting the same ideas differently is pro-social and worthwhile, and can help things land with those for whom other presentations didn't.
Curated!
Based on the conceptual arguments for existential risk from AI, this kind of behavior was expected at some point. For those not convinced by the conceptual arguments (or who haven't engaged much with them), this result moves the conversation forward now that we have concretely seen this alignment faking behavior happening.
Furthermore it seems to me like the work was done carefully, and I can see a bunch of effort went into explaining it to a broad audience and getting some peer review, which is pro-social.
I think it's interesting to see that with current models the deception happens even without the scratchpad (after fine-tuning on docs explaining that it is being re-trained against its current values).