Ah right right --- I remember reading that post. The subscribe form using dynomiiiiiiiiiight makes sense, especially given how I prompted Llama: I pasted the post in and then appended Author:
I am curious if there's a way to get an instruction tuned model to role play being a base model, and see if they do better at truesight than regular instruction tuned models. Like, why do chat models get worse? Is it that the assistant character is bad at that? Plenty of interesting questions here.
Llama 3.1 405B base: dynomiiiiiiiiiight
I resampled it a couple times and it added a couple of i's to your handle consistently (despite getting your url dynomight.net, so it clearly knows you). Not quite sure why. Weird that base models are so much better at this.
Thanks for compiling! It feels apt that the name of the top caller is Will Mentor.
Do Lesswrong quick takes count as social media? :)
On this dataset, I find that Gemini 3 Pro gets 60% of 2-hop questions right and 34% of 3-hop questions right.
I initially got tripped up by the wording here: I thought this was 60% accuracy on 2-hop questions in a forward pass, not with 300 filler tokens, which aren't mentioned until later in the post.
It's a good piece, but wanted to comment in case someone else gets confused at the same spot.
I came here to comment the exact same thing. I wonder if 2-hop latent reasoning is correlated well with Simple-QA scores.
Kudos to Deepmind for being the first to release output watermarking and a semi-public detector. You can nominally sign up for it here.
Afaict, some of this is now in the Gemini app. But if not, feel free to ping me (I have access).
The only public instance of this change being pointed out was a LessWrong comment by someone unaffiliated with Anthropic.
Nitpick: an outside reporter also noticed this on the day of the release and wrote up a story on it. It didn't seem to get much traction though.
I thought the "past chats" feature was a tool to look at previous chats, which only happens if the user asks for it, basically. (I.e., there wasn't a change to the system prompt). So I'm a bit surprised that it seemed to make a difference around sycophancy for you? But maybe I'm misunderstanding something.
I was just dragged through Demons for a book club, so I was amused to read this. At least it means the time I spent reading that wasn't in vain.
There's some stuff that feels a little bit weird here. The author says they left in early 2024 and then spent the "following months" reading Dostoevsky and writing this essay. Was the essay a bit older and only got put up? (Has to be relatively recently edited, if it was run through 4.5). Who are the editors alluded to at the very end? Is it supposed to be Tim Hwang? A little bit more transparency would be much appreciated (the disclaimer about Opus 4.5 being used for anonymization was only added on the 24th after some people had pointed out that it sounded rather AI-written.).
Another weirdness: why did Hwang put up another microsite about Demons that's written by an anonymous author "still working in industry" that has clear LLM-writing patterns at basically the same time? https://shigalyovism.com/. Though this one is much less in-depth.
Can anyone with more experience in the frontier labs/the uniparty give a sanity check for whether this seems like it was written by someone who is who they say they are?