Fair enough. If/when you get any empirical data on how well this post works, writing it up would be pretty valuable and would likely resolve any remaining disagreements we have.
Then I'd lean away from the "this is for people who've awoken ChatGPT" framing. E.g. change your title to something like "so you think LLMs make you smarter", or something to that effect.
This essay seems like it's trying to address two different audiences: LW, and the people who get mind-hacked by AIs. That's to its detriment, IMO.
E.g. The questions in the Corollary FAQ don't sound like the questions you'd expect from someone who's been mind-hacked by AI. Like, why expect someone in a sycophancy doom loop to ask about if it's OK to use AI for translation? Also, texts produced by sycophancy doom loops look pretty different to AI translated texts. Both share a resemblance to low quality LLM assisted posts, yes. But you're addressing people who think they've awoken ChatGPT, not low-quality posters who use LLM assistance.
But reasoning models don't get reward during deployment. In what sense are they "optimizing for reward"?
Yep, and getting a solid proof of those theorems would provide insight IMO. So it isn't what I'd call "trivial". Though, I suppose it is possible that raising the theorem statement from entropy would require many bits of info, and finding a proof would require almost none.
have some threaded chat component bolted on (I have takes on best threading system).
I wish to hear these takes.
Man, that's depressing. Gives too low an estimate for a good outcome though IMO. Very cool tool, though.
A key question is if the typical goal-directed superintelligence would assign any significant value to humans. If it does, that greatly reduces the threat from superintelligence. We have a somewhat relevant article earlier in the sequence: AI's goals may not match ours.
BTW, if you're up for helping up improve the article, would you mind answering some questions? Like: do you feel like our article was "epistemically co-operative"? That is, do you think it helps readers orient themselves in the discussion on AI safety, makes the assumptions clear, and generally tries to explain rather than persuade? What's your general level of familiarity with AI Safety?
Personally, I liked "it's always been the floor". Feels real. I've certainly said/heard people say things like that in strained relationships. Perhaps "it's always the floor" would have been better. Or "it always is". Yes, that sounds right.
Strong upvoted not because I agree with your take on alignment, though there is perhaps some merit to it in spite of it being hard to buy results in alignment, but because I think LW needs to focus more on the practice of rationality, and indeed instrumental rationality.