Martin Randall — LessWrong

I found out recently that in a multi-pass conversation on claude.ai, previous thinking blocks are summarized when given to the model on the next interaction. A summary of the start of a conversation I had when testing this:

User: "I'm trying to understand whether you can see your previous thinking output when you talk to me. Mind experimenting?"
Claude thought: ... "Let me pick something specific: "The purple elephant dances at midnight with the number 847." ... (read with Claude consent)
Claude: "... ask me to tell you what I wrote in my thinking block ..."
User: "Good idea. Can you tell me it?
Claude: "... Looking back at my previous thinking block, I said I would write "a specific phrase" to test recall, but then I never actually wrote a specific phrase - I just described what I was planning to do without following through."

Maybe this penalizes neuralese slightly, as it would be less likely to survive summarization.

Martin Randall's Shortform

Martin Randall17d50

I used to think that AI models weren't smart enough to sandbag. But less intelligent animals can sandbag - eg an animal who apparently can't do something but is able to when it lets them escape, or access treats, or otherwise get outsized rewards. Presumably this occurs without an inner monologue or a strategic decision to sandbag. If so, AI models are already plausibly smart enough to sandbag in general, without it being detectable in chain-of-thought, and then perform better in high-value opportunities.

shortplav

Martin Randall24d50

Confabulations are made-up remembering, as I understand it, not made up outputs. So I can confabulate a memory even if I never share it with anyone.

(which still seems like a good term to use for many AI hallucinations)

An epistemic advantage of working as a moderate

Martin Randall1mo20

Relevant links:

Draft report on AI Timelines - Cotra 2020-09-18
Biology-Inspired Timelines - The Trick that Never Works - Yudkowsky 2021-12-01
Reply to Eliezer on Biological Anchors - Harnofsky 2021-12-23

Let's suppose that your read is exactly right, and Yudkowsky in 2021 was predicting median 2040. You have surely spent more time with him than me. Bioanchors predicted ~25% cumulative probability by 2040. A 25% vs 50% disagreement in the world of AI timeline prediction is approximately nothing. What's your read of why Yudkowsky is claiming that "median fucking 2050" is "fucking nuts in retrospect", without also admitting that his implicit prediction of median 2040 was almost as nuts?

This is the second time this year that I've read Yudkowsky attacking the Bioanchors 2050 figure without mentioning that it had crazy wide error bars.

This month I also read "If Anyone Builds It Everyone Dies" which repeats the message of "The Trick that Never Works" that forecasting timelines is really difficult and not important for the overall thesis. I preferred that Yudkowsky to this one.

EDIT: retracting because I don't actually want a response to these questions, I'm just cross.

[This comment is no longer endorsed by its author]Reply

Obligated to Respond

Martin Randall1mo*5-1

Sorry this comment is long, I didn't have time to make it shorter. Feel free to skip to the section that you are interested in, or skip the whole thing.

I appreciate the kind advice about prescriptivism vs descriptivism. I don't want to have that debate here but yes, in saying a word choice is "incorrect" I'm necessarily using a prescriptivist lens. With a descriptivist lens I might say "imprecise" or "misleading" or "jarring" or "warping". As well as dictionaries, I also got a second opinion from an LLM. LLMs can of course be sycophantic, but they update more frequently than dictionaries and are more aware of nuances. But perhaps they have a prescriptivist bias, I hadn't considered that till I read your comment, and it seems likely with the test-taking bias.

With hindsight I regret using a prescriptivist lens, but I don't know what the response would have been if I initially commented with a descriptivist lens, so it's hard to make a full update.

Onwards with descriptivism.

Consider this sentence from the essay:

The person claiming that there’s no obligation to respond is often color-blind to some pretty important dynamics.

With my prescriptivist lens, I defended this as "technically correct". With my descriptivist lens, I doubt such a person intends to claim that these dynamics aren't real. A recent example is Banning Said Achmiz, where various people said variants of "no obligation to respond", and they don't read to me as blind to social dynamics.

Speaking for myself, I've been writing on the internet under my real name for a while, and I've experienced the pressures the essay describes. Given that high school kids are getting sometimes brutal lessons in cyberbullying, and that people have been imprisoned for social media posts, it seems hard for anyone in 2025 to have missed the reality. I see some people who seem to be oversensitive to the audience, and (fewer) people who seem to be under-sensitive to the audience, but this seems to me a consequence of value differences and occasionally reasoning failures, rather than "color-blindness".

Another sentence from the essay:

I actually find it super frustrating when someone leaves commentary which, in one way or another, obligates me to effortfully respond, with more time and energy than I properly have to spare…

With my descriptivist lens, I read this as hyperbole, or metaphor, or a description of emotional reality. I still understand the author's meaning, but for me it's jarring and imports the wrong intuitions. When I reread the essay substituting a more precise term, such as "pressured to respond", I get a different vibe.

Basics of Rationalist Discourse has a section on "Don't weaponize equivocation/abuse categories/engage in motte-and-bailey shenanigans". I wish the section was more peaceably named, as the author isn't doing those things. But the contents are relevant here. The author is using "obligation" as a conceptual handle to describe scenarios which have some of the attributes (pressure, consequences, judgment, ...) but not the ones that loom large in my mind (moral/legal force, compulsion, promise-keeping, ...). I therefore comment that the term is prescriptively-incorrect (descriptively-warping) and discuss why.

Which brings us to:

Just Asking Questions

You'd ask a question. Basics of Rationalist Discourse says the same thing.

What's the value of agreeing on this being an obligation? Like, you're bidding for this label to be attached ... what comes out of that, if we all end up agreeing?
If I were to say that this isn't an obligation, it's actually social pressure, what would you say to that?

I deliberately chose not to ask a question. This is partly because I read the author as asking me not to.

Like, it’s not your questions are bad, it’s your questions are costly, and I don’t have the spare resources to pay the costs; I’d like to not keep receiving bills and invoices from you, please.

To be clear, the author hasn't complained to me personally about sending too many bills and invoices. But I still don't want to send any invoices to him in the first place. I don't believe authors have an obligation to respond, I don't want to create obligations to respond, and if I find an author who expresses that questions create an obligation to respond, then I won't be asking that author any questions. Especially not on the place where they complain about that! I instead posted a comment with multiple cues that I didn't want or expect an author response.

The second reason is because of what habryka wrote in Banning Said Achmiz.

The critic has a pretty easy job at each step. First of all, they have little to lose. They need to make no positive statements and explain no confusing phenomena. All they need to do is to ask questions, or complain about the imprecision of some definition. ... At the end of the day, are you really going to fault someone for just asking questions? What kind of totalitarian state are you trying to create here?

So instead of asking a question, or complaining about a definition, I chose to make positive statements about (a) the meaning of "obligated", (b) the intuitions created by that word, and (c) why those intuitions cause errors.

And this totally worked as habryka said it would! By making positive statements, I had to spend a lot more time thinking about what I was saying. Also, I made my self vulnerable to disagreement and chalked up some downvotes and disagreement-votes. That seems very much working as intended.

The third reason is that as a matter of style I prefer to discuss the text than the author. Discussing the author brings up status issues of whether the author is good or bad. Discussing whether the text is good or bad reduces this. Whether the author intended to mislead with a word choice is a question about the author. Whether a word choice is misleading is primarily a question about the text and the reader.

Banning Said Achmiz (and broader thoughts on moderation)

Martin Randall1mo3-1

Thanks for replying. I would prefer the policy you describe to the status quo of people having different ideas what the norms are. Perhaps this would be combined with a policy statement on "Do not try to win arguments by fights of attrition".

I don't think it's a weird subject to have a policy on. Thinking of the Policy on LLM Writing:

The policy states what obligations people have to LessWrong itself. These obligations are notable for having some moral and legal force, and having moderator enforcement.
Of course any random person may think I have an obligation to do more, or less. But that has no moral force.
In the absence of a policy, we get debates as on Deontic Explorations in Paying to Talk to Slaves about (in part) whether certain content is acceptable on LessWrong. After the policy, there is an objective answer to that question, and fewer debates.

I think a policy on responding to comments would be similarly helpful. For example, as I read through the section "But why ban someone, can't people just ignore Said?" above, it only really works as a debate in the absence of a site policy. Achmiz says:

If no response is provided to... simple requests for clarification ..., the author should be interpreted as ignorant.

That line of argument doesn't work if there is a site policy that authors are not expected to respond to comments. Firstly, the attack itself is subject to moderation. Secondly, anyone, not just the author, can defuse it by linking to the site policy, which conveniently has a space where the policy can be discussed. Certainly site policy can't stop Achmiz thinking I'm ignorant. But it can reduce the extent to which Achmiz can convince the rest of the audience that I'm ignorant.

LessWrong/Lightcone doesn't have to weakly clarify its best guess of the prevailing norms. It can state what the norms are, in a self-fulfilling statement that sets the norms to what it states. As long as the stated norms are broadly popular, this just works.

Banning Said Achmiz (and broader thoughts on moderation)

Martin Randall1mo20

So, despite it being close to site-consensus that authors do not face obligations to respond to each and every one of Said's questions, on any given post, there is basically nothing to be done to build common knowledge of this.

Please could you write a policy regarding what obligations/duties/commitments/responsibilities people DO have by contributing LessWrong, regarding responding to comments? This could be a top-level post similar to Policy for LLM Writing on LessWrong.

After reading Banning Said Achmiz..., and associated comments, I thought that I understood LessWrong policy. However, the next thing I noticed on this topic was Sabien's Obligated to Respond, which was then curated. After reading this and associated comments, I am no longer confident. In any case I don't really want to read Banning Said Achmiz every time this topic arises. So I request a policy post with more clarity, less drama, and fewer words.

My suggested policy is something like:

LessWrong authors do not have a duty to respond to comments and questions. By posting a top-level essay or quick take, authors do not commit to answer questions, respond to criticism, or otherwise engage with commenters.
In the same way, by posting a comment, commenters are not obligated to continue to participate in that conversation.
Do not demand responses to comments, or criticize someone for not responding. If you think a comment is important and that a response would be valuable, you could vote it up.
Of course this doesn't mean that there are no consequences in choosing not to respond, that you will never feel pressured to respond, or that people in the audience won't be swayed by unanswered comments. However, LessWrong admins and moderators do not support these dynamics and will work to reduce them.

An example of a different policy a site might have is:

When posting a top-level essay or quick-take, you are inviting comments, including questions and critiques. Please budget some time to respond to a selection of comments. If you will be too busy to respond, please note this at the end of your essay so readers know what to expect.
In the same way, by posting a comment, you are inviting replies, and especially replies by the author. Please do not post comments if you do not want the author to respond. If you have a minor comment or question that does not warrant an author response, please keep it to yourself.

I think that would be worse, but I would still appreciate the clarity. Or a hybrid policy could be maximally top-level-author-friendly:

LessWrong authors do not have a duty to respond to comments and questions. By posting a top-level essay or quick take, authors do not commit to answer questions, respond to criticism, or otherwise engage with commenters.
However, by posting a comment to a top-level essay, you are still implicitly demanding a response from the author. The author may feel pressured or obligated to respond, driving them away from LessWrong. If you have a minor comment or question that does not warrant an author response, please keep it to yourself.

As it stands I have a few ideas for top-level essays and I am unsure what exactly I would be signing up for in terms of reader-interaction. Conversely, if every comment is implicitly demanding an author response, I will make dramatically fewer comments, possibly none.

Was Barack Obama still serving as president in December?

Martin Randall1mo50

Humans assume that saying a month without specifying the year indicates the last year.

We infer the year from context. Consider:

What are we doing in December? (this year)
What are we doing in January? (next year)
What did we do in December? (last year)

Some short examples I have for humans assuming a much older year, giving a day and month:

Who attacked America on September 11th?
Who tried to burn the houses of parliament on 5th November?

I don't think this fully explains your results, though.

Obligated to Respond

Martin Randall1mo0-11

I write against the use of the word "obligation" in this context, as straightforwardly false. That is a small detail, but thinking of social media responses as obligations can import incorrect intuitions. Sabien's essay has several other interesting elements, which I will not address here.

I don’t know if Sabien will even see this comment.

Obligation vs social costs

An obligation is a duty/commitment to which a person is morally/legally bound. To obligate is to bind/compel, especially legally/morally. An obligate carnivore dies if it doesn't eat meat.

An example: after a presentation, there is a time set aside for unscreened live audience questions. The format is that audience members ask questions and the presenter responds to them. The presenter has agreed to the format and is morally obligated to respond.

A social media example: someone sets up an "ask me anything" space, and promises to respond to the top-voted ten questions. They have a moral obligation to keep their promise. If they don't respond to the top questions, there are social cost of breaking a promise, even if there would be no social costs had they not made one.

In general social media carries social costs and benefits, not obligations. I don't think this is in dispute. From Sabien's essay:

It’s true in a literal sense that you never “have” to respond.

Aside: I would accept "coercion" in the case where someone is writing a comment and intending it to force someone to respond or lose status. I do not write to coerce.

Incorrect intuitions

After agreeing with Sabien that it is technically correct that there is no obligation, I'll move on to the incorrect intuitions that arise from incorrectly treating these social costs as obligations. I have made these mistakes, and the framing of this essay might lead a reader into similar errors.

For example, we may gain an inflated sense of the social costs of not responding. Suppose in some case there is a real social cost associated with some part of the audience incorrectly thinking that the author is weak. If there was an moral obligation to respond, there would be an additional social cost associated with most of the audience correctly thinking that the author does not fulfill their social obligations. Because it's true, and more serious, that has a higher social cost. Authors with incorrect intuitions will therefore respond too often. Duty calls.

For example, compulsory unpaid labor is unethical. If writing this comment obligated the essay author to respond, that would be quite the aggressive act, invading his free time and requiring him to write on my terms, unpaid. However, if writing this comment causes people to disagree with Sabien's choice of words, that is freedom of speech. Similarly, I was not obligated to write this comment.

For example, obligations are more binary: we have them, or we don't. I am obliged to feed my kids. I am not obliged to let them watch Bluey. Social costs (and benefits) are more scalar. If social-cost-of-non-response was an obligation then maybe I could fully defuse it with a disclaimer like "Yo, please feel free to skip over this", and poof, no obligation. If only.

"I like pancakes!"
"Yo, please feel free to skip over this. So you hate waffles?"
Massive flamewar erupts over the virtues of waffles vs pancakes. Some people assume I hate waffles because I don't respond and attack me or defend me on that basis. Other people express dismay that I'm not setting the record straight. In a splinter thread a few lunatics advocate violence against IHOP.

For example, most obligations are obvious and common knowledge. By contrast, it can be hard to estimate social costs. I do not know how busy Sabien is. I do not know how important his LessWrong reputation is to his life. I do not know what type of reputation he wants to have. This may cause authors to become super-frustrated with commenters who don't see their social costs.

In conclusion

If I have an obligation, I desire to believe that I have an obligation.

If I do not have an obligation, I desire to believe that I do not have an obligation.

adamzerner's Shortform

Martin Randall1mo20

I think you can be quite confident in a parental transformation. Parents routinely fall in love with their babies, either immediately, or over the first few weeks or months. Increased brain plasticity during pregnancy and early parenthood is very common. If you know your genetic parents, you can check with them how their parental transformation went as a baseline.

I also agree with you that prior experiences with children (of all ages) are very informative. Ideally you combine the sources of information to make a better combined prediction.

Thesis: I will fall in love with my baby.
Antithesis: I find crying babies overwhelming. Babies cry a lot.
Synthesis: comforting my crying baby will be doing something overwhelming for someone I love.

That synthesis might or might not sound like a good time, but I think it's more predictive than either perspective alone. Also, some people don't want to be transformed, so it's worth considering in both directions.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Onwards with descriptivism.

Just Asking Questions

Obligation vs social costs

Incorrect intuitions

In conclusion