That would be a different experiment, as it would also be testing whether people would, for example:
Those factors could go either way, but they'd disrupt a pure test of this part of the alien's predictions:
Future humans will enjoy, say, raw bear fat covered with honey, sprinkled with salt flakes.
I still expect ice cream would win a blind taste test, but I didn't predict these survey results.
I found out recently that in a multi-pass conversation on claude.ai, previous thinking blocks are summarized when given to the model on the next interaction. A summary of the start of a conversation I had when testing this:
Maybe this penalizes neuralese slightly, as it would be less likely to survive summarization.
I used to think that AI models weren't smart enough to sandbag. But less intelligent animals can sandbag - eg an animal who apparently can't do something but is able to when it lets them escape, or access treats, or otherwise get outsized rewards. Presumably this occurs without an inner monologue or a strategic decision to sandbag. If so, AI models are already plausibly smart enough to sandbag in general, without it being detectable in chain-of-thought, and then perform better in high-value opportunities.
Confabulations are made-up remembering, as I understand it, not made up outputs. So I can confabulate a memory even if I never share it with anyone.
(which still seems like a good term to use for many AI hallucinations)
Relevant links:
Let's suppose that your read is exactly right, and Yudkowsky in 2021 was predicting median 2040. You have surely spent more time with him than me. Bioanchors predicted ~25% cumulative probability by 2040. A 25% vs 50% disagreement in the world of AI timeline prediction is approximately nothing. What's your read of why Yudkowsky is claiming that "median fucking 2050" is "fucking nuts in retrospect", without also admitting that his implicit prediction of median 2040 was almost as nuts?
This is the second time this year that I've read Yudkowsky attacking the Bioanchors 2050 figure without mentioning that it had crazy wide error bars.
This month I also read "If Anyone Builds It Everyone Dies" which repeats the message of "The Trick that Never Works" that forecasting timelines is really difficult and not important for the overall thesis. I preferred that Yudkowsky to this one.
EDIT: retracting because I don't actually want a response to these questions, I'm just cross.
Sorry this comment is long, I didn't have time to make it shorter. Feel free to skip to the section that you are interested in, or skip the whole thing.
I appreciate the kind advice about prescriptivism vs descriptivism. I don't want to have that debate here but yes, in saying a word choice is "incorrect" I'm necessarily using a prescriptivist lens. With a descriptivist lens I might say "imprecise" or "misleading" or "jarring" or "warping". As well as dictionaries, I also got a second opinion from an LLM. LLMs can of course be sycophantic, but they update more frequently than dictionaries and are more aware of nuances. But perhaps they have a prescriptivist bias, I hadn't considered that till I read your comment, and it seems likely with the test-taking bias.
With hindsight I regret using a prescriptivist lens, but I don't know what the response would have been if I initially commented with a descriptivist lens, so it's hard to make a full update.
Consider this sentence from the essay:
The person claiming that there’s no obligation to respond is often color-blind to some pretty important dynamics.
With my prescriptivist lens, I defended this as "technically correct". With my descriptivist lens, I doubt such a person intends to claim that these dynamics aren't real. A recent example is Banning Said Achmiz, where various people said variants of "no obligation to respond", and they don't read to me as blind to social dynamics.
Speaking for myself, I've been writing on the internet under my real name for a while, and I've experienced the pressures the essay describes. Given that high school kids are getting sometimes brutal lessons in cyberbullying, and that people have been imprisoned for social media posts, it seems hard for anyone in 2025 to have missed the reality. I see some people who seem to be oversensitive to the audience, and (fewer) people who seem to be under-sensitive to the audience, but this seems to me a consequence of value differences and occasionally reasoning failures, rather than "color-blindness".
Another sentence from the essay:
I actually find it super frustrating when someone leaves commentary which, in one way or another, obligates me to effortfully respond, with more time and energy than I properly have to spare…
With my descriptivist lens, I read this as hyperbole, or metaphor, or a description of emotional reality. I still understand the author's meaning, but for me it's jarring and imports the wrong intuitions. When I reread the essay substituting a more precise term, such as "pressured to respond", I get a different vibe.
Basics of Rationalist Discourse has a section on "Don't weaponize equivocation/abuse categories/engage in motte-and-bailey shenanigans". I wish the section was more peaceably named, as the author isn't doing those things. But the contents are relevant here. The author is using "obligation" as a conceptual handle to describe scenarios which have some of the attributes (pressure, consequences, judgment, ...) but not the ones that loom large in my mind (moral/legal force, compulsion, promise-keeping, ...). I therefore comment that the term is prescriptively-incorrect (descriptively-warping) and discuss why.
Which brings us to:
You'd ask a question. Basics of Rationalist Discourse says the same thing.
I deliberately chose not to ask a question. This is partly because I read the author as asking me not to.
Like, it’s not your questions are bad, it’s your questions are costly, and I don’t have the spare resources to pay the costs; I’d like to not keep receiving bills and invoices from you, please.
To be clear, the author hasn't complained to me personally about sending too many bills and invoices. But I still don't want to send any invoices to him in the first place. I don't believe authors have an obligation to respond, I don't want to create obligations to respond, and if I find an author who expresses that questions create an obligation to respond, then I won't be asking that author any questions. Especially not on the place where they complain about that! I instead posted a comment with multiple cues that I didn't want or expect an author response.
The second reason is because of what habryka wrote in Banning Said Achmiz.
The critic has a pretty easy job at each step. First of all, they have little to lose. They need to make no positive statements and explain no confusing phenomena. All they need to do is to ask questions, or complain about the imprecision of some definition. ... At the end of the day, are you really going to fault someone for just asking questions? What kind of totalitarian state are you trying to create here?
So instead of asking a question, or complaining about a definition, I chose to make positive statements about (a) the meaning of "obligated", (b) the intuitions created by that word, and (c) why those intuitions cause errors.
And this totally worked as habryka said it would! By making positive statements, I had to spend a lot more time thinking about what I was saying. Also, I made my self vulnerable to disagreement and chalked up some downvotes and disagreement-votes. That seems very much working as intended.
The third reason is that as a matter of style I prefer to discuss the text than the author. Discussing the author brings up status issues of whether the author is good or bad. Discussing whether the text is good or bad reduces this. Whether the author intended to mislead with a word choice is a question about the author. Whether a word choice is misleading is primarily a question about the text and the reader.
Thanks for replying. I would prefer the policy you describe to the status quo of people having different ideas what the norms are. Perhaps this would be combined with a policy statement on "Do not try to win arguments by fights of attrition".
I don't think it's a weird subject to have a policy on. Thinking of the Policy on LLM Writing:
I think a policy on responding to comments would be similarly helpful. For example, as I read through the section "But why ban someone, can't people just ignore Said?" above, it only really works as a debate in the absence of a site policy. Achmiz says:
If no response is provided to... simple requests for clarification ..., the author should be interpreted as ignorant.
That line of argument doesn't work if there is a site policy that authors are not expected to respond to comments. Firstly, the attack itself is subject to moderation. Secondly, anyone, not just the author, can defuse it by linking to the site policy, which conveniently has a space where the policy can be discussed. Certainly site policy can't stop Achmiz thinking I'm ignorant. But it can reduce the extent to which Achmiz can convince the rest of the audience that I'm ignorant.
LessWrong/Lightcone doesn't have to weakly clarify its best guess of the prevailing norms. It can state what the norms are, in a self-fulfilling statement that sets the norms to what it states. As long as the stated norms are broadly popular, this just works.
So, despite it being close to site-consensus that authors do not face obligations to respond to each and every one of Said's questions, on any given post, there is basically nothing to be done to build common knowledge of this.
Please could you write a policy regarding what obligations/duties/commitments/responsibilities people DO have by contributing LessWrong, regarding responding to comments? This could be a top-level post similar to Policy for LLM Writing on LessWrong.
After reading Banning Said Achmiz..., and associated comments, I thought that I understood LessWrong policy. However, the next thing I noticed on this topic was Sabien's Obligated to Respond, which was then curated. After reading this and associated comments, I am no longer confident. In any case I don't really want to read Banning Said Achmiz every time this topic arises. So I request a policy post with more clarity, less drama, and fewer words.
My suggested policy is something like:
An example of a different policy a site might have is:
I think that would be worse, but I would still appreciate the clarity. Or a hybrid policy could be maximally top-level-author-friendly:
As it stands I have a few ideas for top-level essays and I am unsure what exactly I would be signing up for in terms of reader-interaction. Conversely, if every comment is implicitly demanding an author response, I will make dramatically fewer comments, possibly none.
In "less time than average", which average? In the "create a child that they know will die of cancer at 10" thought experiment, the child is destined to die sooner than other children born that day. Whereas in the "human extinction in 10 years" thought experiment, the child is destined to die at about the same time as other children born that day, so they are not going to have "less time than average" in that sense. Those thought experiments have different answers by my intuitions.
My intuitions about what children think are also different to yours. There are many children who are angry at adults for the state of the world into which they were born. Mostly they are not angry at their parents for creating them in a fallen world. Children have many different takes on the Adam and Eve story, but I've not heard a child argue that Adam and Eve should not have had children because their children's lives would necessarily be shorter and less pleasant than their own had been.