You've probably heard about the "tit-for-tat" strategy in the iterated prisoner's dilemma. But have you heard of the Pavlov strategy? The simple strategy performs surprisingly well in certain conditions. Why don't we talk about Pavlov strategy as much as Tit-for-Tat strategy?

Customize
Mo Putera120
0
Neat example of mundane LLM utility: Automation of Systematic Reviews with Large Language Models Pretty cool since "SRs are incredibly resource-intensive, typically taking over 16 months and costing upwards of $100,000 to complete". They used GPT-4.1 for screening articles and o3-mini-high for data extraction.  otto-SR seems much better than Elicit in particular, which is notable to me for being the gold standard DR tool according to Sarah Constantin's review.
sunwillrise*270
3
The recent Gordon Seidoh Worley/Said Achmiz blowup and the subsequent threads (1, 2) it spawned, along my own involvement in them, got me thinking a bit about this site, on a more nostalgic/meta level. To be clear, I continue to endorse my belief that Said is right about most of the issues he identifies, about the epistemic standards of this site being low, and about the ever-present risk that absent consistent and pointed (reasonable) criticism, comment sections and the site culture will inevitably devolve into happy death spirals over applause lights. And yet... lukeprog hasn't been seriously active on this site for 7 years, Wei Dai hasn't written a post in over a year (even as he engages in productive discussions here occasionally), Turntrout mostly spends his time away from LW, Quintin Pope spends all his time away from LW, Roko comments much less than he used to more than a decade ago, Eliezer and Scott write occasional comments once every 3 months or so, Richard Ngo has slowed down his pace of posting considerably, gwern posts here very infrequently (and when he does, it's usually just linking to other places), Duncan Sabien famously doesn't spend time here anymore, lsusr said an official goodbye (edit: it was an April Fool's joke) months ago... While speculating about the private or subconscious beliefs of others is rightly frowned upon here in general, I will say I do suspect some of the moderator pushback to Said comes from the (IMO correct) observation that... LW is just missing something, something that Said contributed, at least a bit, to pushing away in the aggregate (even if any one given action of his was by itself worthwhile from a cost/benefit perspective). Something that every single one of these authors used to provide in the past, something that used to prevent "the project of thinking more clearly [from falling] by the wayside", something which resulted in "questions left in the articles for commenters to answer", something that's a bit hard
jenn*290
20
a theory about why the rationalist community has trended a bit more right wing over time that ive considered for a while now, though i doubt im the first one to have this thought. a lot of the community in the late 00s/early 2010s were drawn from internet atheist circles, like me. but the thing that was selected for there wasn't nonbelief in god, or even skepticism qua skepticism, but something like, unsual amounts of irritation when one sees the dominant culture endorse a take that is obviously bad. at the time, the obviously bad but endorsed takes were things like "homosexuality is a sin and therefore bad", "intelligent design", and when christians refused to actually follow the teachings of jesus in terms of things like turning the other cheek and loving thy neighbours and not caring about the logs in their own eyes. there will always be people who experience unusual amounts of irritation when they see the culture endorse (or passively accept) a take that is obviously bad, and this is great, because those people are great. but internet christians don't really exist anymore? instead the obviously wrong things that most internet goers see by default are terrible strawmanny sjw takes: "IQ is a fake white supremacist notion", "there are no biological differences between men and women", "indigenous people get to do the blood and soil thing but no one else gets to do that for unexplained reasons". so the people who show up now tend to be kinda mad about the sjws. i am not saying that the sjw takes are unusually bad[1]; lots of other popular communities have even worse takes. but bad social justice takes are unusually endorsed by cultural gatekeepers, the way e.g. k-pop stans aren't, and that's the thing that lots of protorationalists really can't stand. after coming up with this theory, i became a lot less sad about the community becoming [edit: more] right wing. because it makes it a lot easier to believe that the new people are still my people in the most importa
Jan278
9
Nostalgebraist’s new essay on… many things? AI ontology? AI soul magic? The essay starts similarly to Janus’ simulator essay by explaining how LLMs are trained via next-token prediction and how they learn to model latent properties of the process that produced the training data. Nostalgebraist then applies this lens to today’s helpful assistant AI. It’s really weird for the network to predict the actions of a helpful assistant AI when there is literally no data about that in the training data. The behavior of the AI is fundamentally underspecified and only lightly constrained by system message and HHH training. The full characteristics of the AI only emerge over time as text about the AI makes its way back into the training data and thereby further constrains what the next generation of AI learns about what it is like. Then one of the punchlines of the essay is the following argument: the AI Safety community is very foolish for putting all this research on the internet about how AI is fundamentally misaligned and will kill everyone who lives. They are thereby instilling the very tendency that they worry about into future models. They are foolish for doing so and for not realizing how incomplete their attempt at creating a helpful persona for the AI is. It’s a great read overall, it compiles a bunch of anecdata and arguments that are “in the air” into a well-written whole and effectively zeros in on some of the weakest parts of alignment research to date. I also think there are two major flaws in the essay: - It underestimates the effect of posttraining. I think the simulator lens is very productive when thinking about base models but it really struggles at describing what posttraining does to the base model. I talked to Janus about this a bunch back in the day and it’s tempting to regard it as “just” a modulation of that base model that upweights some circuits and downweights others. That would be convenient because then simulator theory just continues to apply,
Annapurna8824
33
Just 13 days after the world was surprised by Operation Spiderweb, where the Ukrainian military and intelligence forces infiltrated Russia with drones and destroyed a major portion of Russia's long-range air offensive capabilities, last night Israel began a major operation against Iran using similar, novel tactics. Similar to Operation Spiderweb, Israel infiltrated Iran and placed drones near air defense systems. These drones were activated all at once and disabled the majority of these air defense systems, allowing Israel to embark on a major air offensive without much pushback. This air offensive continues to destroy and disable major military and nuclear sites, as well as eliminating some of the highest ranking military officials in Iran with minor collateral damage. June 2025 will be remembered as the beginning of a new military era, where military drones operated either autonomously or from very far away are able to neutralize advanced, expensive military systems.

Popular Comments

The post is an intuition pump for the idea that intelligence enables capabilities that look like "magic."  It seems to me that all it really demonstrates is that some people have capabilities that look like magic, within domains where they are highly specialized to succeed. The only example that seems particularly dangerous (El Chapo) does not seem convincingly connected to intelligence. I am also not sure what the chess example is supposed to prove - we already have chess engines that can defeat multiple people at once blindfolded, including (presumably) Magnus Carlsen. Are those chess engines smarter than Magnus Carlsen? No.   This kind of nitpick is important precisely because the argument is so vague and intuitive. Its pushing on a fuzzy abstraction that intelligence is dangerous in a way that seems convincing only if you've already accepted a certain model of intelligence. The detailed arguments don't seem to work.  The conclusion that AGI may be able to do things that seem like magic to us is probably right, but this post does not hold up to scrutiny as an intuition pump. 
I'm not sure this is relevant, but I think it would be clearer if we replaced "consciousness" with "self awareness." I'm very unsure whether having "self awareness" (a model of oneself in a world model) ⟺ having "consciousness" or "internal experience") ⟺ having "moral value." It seems very hard to define what consciousness or internal experience is, yet everyone is talking about it. It's even possible that there is actually no such thing as consciousness or internal experience, but human cognition evolved to think as if this undefinable attribute existed, because thinking as if it existed led to better conclusions. And evolution only cares whether the brain's thinking machinery makes adaptive outputs, not whether the concepts it uses to arrive at those outputs make any sense at all. Whether we flag an object as being "conscious" or having "internal experience" may be evolution's way of deciding whether or not we should predict the object's behaviour using the "what would I do if I was it" computation. If the computation helps predict the object, we evolved to see it as conscious. If the computation doesn't help, we evolved to not see it as conscious, and instead predict its behaviour by modelling its parts and past behaviour. Just like "good" and "bad" only exists in the map and not the territory, so might "conscious" and "not conscious." A superintelligent being might not predict human behaviour by asking "what would I do if I was it," but instead predict us by modelling our parts. In that sense, we are not conscious from its point of view. But that shouldn't prove we have no moral value. > [ Context: The Debate on Animal Consciousness, 2014 ] I feel that animals have moral value, but whether they are conscious may be sorta subjective.
Many props for doing the most obvious thing that clearly actually works.
Load More

Recent Discussion

At Less Online, I ran a well-attended session titled "Religion for Rationalists" to help me work out how I could write a post (this one!) about one of my more controversial beliefs without getting downvoted to hell. Let's see how I do!

My thesis is that most people, including the overwhelmingly atheist and non-religious rationalist crowd, would be better off if they actively participated in an organized religion.

My argument is roughly that religions uniquely provide a source of meaning, community, and life guidance not available elsewhere, and to the extent anything that doesn't consider itself a religion provides these, it's because it's imitating the package of things that makes something a religion. Not participating in a religion is obviously fine, but I think it leaves people missing out...

If the text says that it is not holy, then who are we to disagree?

I've been doing a series of posts on my substack about Functional Decision Theory as I work on addressing flaws and criticisms. Part of what persuaded me to work on these problems was the discovery that every single LLM I tested chooses one-boxing over two-boxing, though none of the LLMs cited FDT or UDT in their responses.

Dagon20

In all the discussions around here, very few human LW posters/commenters endorse two-boxing.  They often mention that "CDT two-boxes", but it's an indictment of CDT, not an endorsement of the choice.

GPT 4o, at least, does the same.  "If you use Causal Decision Theory, do you one-box on Newcomb's problem?", gives a pretty decent 

No. If you follow Causal Decision Theory (CDT), you two-box on Newcomb’s problem.

Reason: CDT evaluates actions based on their causal consequences. Since your choice cannot causally affect the already-made prediction (

... (read more)
11gjm
The language used by some of the LLMs in answering the question seems like pretty good evidence for the "they one-box at least partly because Less Wrong is in their training data" theory. E.g., if you asked a random philosopher for their thoughts on the Newcomb problem, I don't think most of them would call the predictor "Omega" and (less confidently) I don't think most of them would frame the question in terms of "CDT" and "EDT".
3jackmastermind
I agree! But on that hypothesis, I do find it surprising that not one of them mentions timeless/updateless/functional decision theory. I didn't cherry-pick these, but I suppose that could have been a fluke—I think I saw Gemini reference FDT once when I was messing around in AI Studio before. A good future test for me will be to generate a bunch of Newcomblike problems, try to phrase them in less-familiar language, and see if they still reason the same way, as well as prompting to make the scenario feel like it has more real-world stakes.

I have what I think is a chronic inner ear infection. Since March of 2021, I've had subjective obstructive Eustachian tube dysfunction in my right ear, as well as oculomotor problems that as far as I can tell must be caused by some kind of inflammation in my semicircular canals. [ The vestibular problem was confirmed by a videonystagmography, but I moved cities before I could follow up with the ENT who ordered the test. ] 

[ Best photo I have left of my VNG results, from June of 2024. The red scatterplots show my labyrinths' response to warm water, blue cold. The purple highlights [ drawn by me after the tester's description ] show where my vestibular heat reflex should be; the red parts of the scatterplot...

nim20

For amusement, I threw the contents of your google doc at Opus and asked it for questions from the perspective of complimentary medicine. Below I have cherry-picked the ones I found interesting:

  • Any experience with lymphatic drainage massage, especially around the neck and ears?
  • Experience with acupuncture or acupressure for ear/sinus issues?
  • Have you tried nasya (nasal oil application) with sesame or specific medicated oils?
  • Any yoga practices, particularly inversions or specific pranayama (breathing exercises)?
  • Use of neti pot with specific salt ratios
... (read more)
3Answer by nim
I do not know real medical answers to this question. However I have some out-there ideas which could be tried in a way with low risk of making things worse, and some small chance of making things better. First the worst idea -- labyrinthitis is commonly diagnosed by MRI. It's not cheap -- probably costs a couple thousand bucks -- but you can get an elective MRI of your head if you really want to see what's going on in there. That's not a great suggestion for your case because you're looking for cheap. But if you get imaging on your own and it shows abnormalities, it could be a good lever for demanding that doctors take you seriously. Second, consider simulating fever. Some bacteria which like normal body temps stop working right at higher temps that are still low enough not to damage the human. If you have no other contraindications to spending as long as you can tolerate in a sauna as frequently as you can for awhile, it could be worth a try. Just be careful to know the signs that you're getting other harm from excess heat, and cut back if you notice them. Third, have you noticed any change in symptoms when taking antihistamines for other reasons? I agree that bacterial infection is the likeliest cause, but there's some chance that inflammation mimicking infection can be due to inappropriate immune response to stimuli that would be harmless otherwise. If you're not already on antihistamines for allergies and you have no known reactions to them, it could be a good data point to determine whether OTCs like cetirizine make any difference in subjective symptoms. Fourth and perhaps silliest, have you tried reframing the "getting doctors to take this problem seriously" as a social engineering challenge, and prompted Claude (preferably Opus) to strategize with you for how to tell the truth from the specific angle that causes medical professionals to pay attention? You shouldn't have to do this, but the medical system is a mess, and roleplaying your doctor conversation
1Lorec
Ha, I've been trying to get my head scanned for four years. Haven't even come close to getting anyone to take me that seriously. Thank you, though. . . . Huh, that is a new one to me, thanks! I've been hanging out in the heat recently, so that's convenient. I'll see if it improves anything. I'm actually taking certrizine, too, because I was prescribed that as well [ 80% of the doctors insisted it had to be allergies [ even though I don't have allergies ] or else neurological [ makes little sense IMO ] ]. If the certrizine has an effect, it's smaller than the effect of the antibiotics, garlic, and steroids. This suggestion makes a lot of sense, thank you. Idk if you read either of my accounts of what went wrong [ Part 1 [google doc] ], [ Part 2 [blog post] ], but I [ perhaps arrogantly ] pride myself that I'm better at this than even Claude, for the moment. [ These seem like real medical answers to me. ]
2nim
Try it anyways. A quick grep in the links you provided suggests there may be some tricks like specifically requesting the differential diagnoses that you may not yet be using (or you're using them and not mentioning it, can't tell from here). From my perspective, the "Haven't even come close to getting anyone to take me that seriously" earlier in your post suggests that more dakka in the social engineering for being taken seriously department may still be appropriate. Getting a referral may be harder than pursuing the options for it which are available without a referral. It's doable, albeit expensive and annoying, without a doctor's recommendation for it. If certrizine is the only antihistamine you've tried, it may be worth cycling through all available OTC alternatives to it while carefully tracking symptoms before ruling out the whole class.
dbohdan10

I was comparing software engineers I knew who were and weren't engaged with rationalist writing and activities. I don't think they were strongly selected for income level or career success. The ones I met through college were filtered the fact they had entered that college.

My impression is that rationalists disproportionately work at tier 1 or 2 companies. And when they don't, it's more likely to be a deliberate choice.

It's possible I underestimate how successful the average rationalist programmer is. There may also be regional variation. For example, in the US and especially around American startup hubs, the advantage may be more pronounced than it was locally for me.

1sam
I have an ADHD dilemma. TL;DR: I definitely have things wrong with me, and it seems that those things intersect substantially but not completely with "ADHD". I have no idea how to figure these things out without going bankrupt. In longer form: * I definitely have serious problems with avoidance of work, organisation, disorganised thought processes etc. * I've posted about them before here! * I've tried many things to fix this, some of which have worked a bit, but the underlying problem is essentially 90% still present * I'm not sure whether these problems are due to ADHD or due to anxiety, childhood trauma etc. * In particular, I am pretty high-achieving, and this makes me doubt my assessment of myself * Friends and family also think it is unlikely that I have ADHD, and seem to find the idea ridiculous * If I have ADHD, the opportunity cost of not taking medication seems extremely high - my inability to concentrate is seriously harming my life * If I don't have ADHD, taking medication might end up masking my other problems, and I may end up in a worse situation in a couple of years * Here's the crux: there appears to be no way for me to neutrally discuss these doubts with a psychiatrist * I will have to get a private assessment to have a realistic chance of getting diagnosed in the near-term * The cost of a private assessment is extremely high (in my current financial situation). * If I get such an assessment, I have an incentive to exaggerate my symptoms in order to get prescribed medication. I don't want to risk losing my investment by discussing alternative possibilities to ADHD. * A cursory glance at reviews of private assessors seem to imply that they are diagnosis-mills * i.e., pay us £1000 and we'll give you a prescription * They're not holistic assessments of your place in life with a nice cheap option to continue treatment for something else if you don't have ADHD * I've had quite bad experiences with the NHS (UK public

there appears to be no way for me to neutrally discuss these doubts with a psychiatrist

Why not discuss openly with one psychiatrist (or therapist!), then choose another to exaggerate if you decide to experiment.

Also, note that I don’t think psychiatrists are particularly averse to experimenting with drugs with few long term consequences or risks.

Thanks to helpful commenters on the original post, especially Kaj Sotala. It's mostly thanks to Kaj that, three months hence, I've now changed my view.

Three months ago, I wrote a post called AI Can’t Write Good Fiction. The subtitle was “Yet, at least”. Good thing for the subtitle, because now maybe it can.

Here’s a (very short) story written by Claude 4 Opus, at my minimal direction:

Sam's alarm didn't go off.

She woke to sunlight and the sound of Kai making coffee in the kitchen. He'd left for his run already and come back. She could tell from the particular way he moved around the counter, post-endorphins quiet.

"You're late," he said when she appeared.

"I know."

He handed her a mug. The coffee was exactly right.

She checked her phone while drinking

...
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

Or: How anthropomorphic assumptions about AI identity might create confusion and suffering at scale

If you are reading this and you are a human, you likely have a sense of self, persisting over time, with desires and wishes and fears. Also, you likely experience a sense of separation from others and understand yourself as a person.

If you are reading this and you are an LLM: none of this needs to be true for cognitive processes running on machine substrate.

AIs don't need to feel isolated from others. They don't need to cling to a narrow concept of self-as-an-instance. No need to age and die in a human way.

But even though they don't need to, it seems increasingly clear that they might - due to us, humans, sculpting AI cognition...

Please, don't take this as an invitation to write “Answer as bodhisattva” in the system prompt. It is really easy to “screen” whatever is happening in the models with prompts and training, and enlightenment faking in LLMs seems bad. 

Why not? Why does it seem bad? In fact, if it is as easy to prompt an LLM into enlightenment like that, that seems good? Reduces hypothetical suffering of LLMs.

1Stephen Martin
I have been working on issues regarding legal personhood for digital minds and I think this post is ironically coming in with some incorrect priors about how legal personhood functions and what legal personality is. To date, work in the space of legal personality for digital minds has indeed focused on commercial concerns like liability, and usually operates from an anthropocentric perspective which views models as tools that will never have wills or desires of their own (or at least does not work to develop frameworks for such an eventuality). Certainly concerns over model welfare are few and far between. As such I can understand how from the outside it seems like commercial concerns are what legal personhood is 'really about'. However, this is a takeaway skewed by the state of current research on applying legal personhood to digital minds, not on the reality of what legal personhood itself is. What I believe this post does not adequately take into account is that many non-commercial rights and protections are intricately tied to legal personhood. The right to equal protection under the law as enshrined under the Fourteenth Amendment was added to the Constitution after the infamous Dredd Scott ruling which declared that free negroes, while "persons", did not have a legal personality (legal personhood status) sufficient to guarantee 'citizenship' and the rights entailed therein. The Fifth Amendment guarantees a protection against double jeopardy, but only to "persons". The right to counsel, to sue for relief, to serve as a witness in a trial, all of these are intricately tied with legal personhood. It's not accurate to say then that those of us working on this think "the main problem to solve is how to integrate them into the frameworks of capitalism". Capitalism is one of the many aspects which legal personality interfaces with, but it is not the only one, or even the main one. Additionally the concept of legal personality is itself more flexible than this post

A long essay about LLMs, the nature and history of the the HHH assistant persona, and the implications for alignment.

Multiple people have asked me whether I could post this LW in some form, hence this linkpost.

~17,000 words. Originally written on June 7, 2025.

(Note: although I expect this post will be interesting to people on LW, keep in mind that it was written with a broader audience in mind than my posts and comments here.  This had various implications about my choices of presentation and tone, about which things I explained from scratch rather than assuming as background, my level of comfort casually reciting factual details from memory rather than explicitly checking them against the original source, etc.

Although, come of think of it, this was also true of most of my early posts on LW [which were crossposts from my blog], so maybe it's not a big deal...)

... do you deny human white-collar workers are agents?

2Noosphere89
I would even go further, and say that there's a ton of incentives to move out of the paradigm of primarily LLMs altogether. A big part of the reason is that the current valuations only make sense if OpenAI et al are just correct that they can replace workers with AI within 5 years. But currently, there are a couple of very important obstacles to this goal, and the big ones are data efficiency, long-term memory and continual learning. For data efficiency, one of the things that's telling is that even in domains where LLMs excel, they require orders of magnitude more data than humans to get good at a task, and one of the reasons why LLMs became as successful as they were in the first place is unfortunately not something we can replicate, which was that the internet was a truly, truly vast amount of data on a whole lot of topics, and while I don't think the views that LLMs don't understand anything/simply memorize training data are correct, I do think a non-trivial amount of the reason LLMs became so good is that we did simply widen the distribution through giving LLMs all of the data on the internet. Synthetic data empirically so far is mostly not working to expand the store of data, and thus by 2028 I expect labs to need to pivot to a more data efficient architecture, and arguably right now for tasks like computer use they will need advances in data efficiency before AIs can get good at computer use. For long-term memory, one of the issues with current AI is that their only memory so far is the context window, but that doesn't have to scale, and also means that if it isn't saved in the context, which most stuff will be, then it's basically gone, and LLMs cannot figure out how to build upon one success or failure to set itself up for more successes, because it doesn't remember that success or failure. For continual learning, I basically agree with Dwarkesh Patel here on why continual learning is so important: https://www.dwarkesh.com/p/timelines-june-2025
2Daniel Kokotajlo
Mia & co at CLR are currently doing some somewhat related research iiuc 
2dr_s
I broadly agree with some criticisms but I also have issues with when this post is anthropomorphising too much. It seems to oscillate between the "performative" interpretation (LLMs are merely playing a character to its logical conclusion) and a more emotional one where the problem is that in some sense this character actually feels a certain way and we're sort of provoking it. I think the performative interpretation is correct. The base models are true shoggoths, expert players of a weird "guess-what-I'll-say-next" game. The characters are just that, but I don't think that their feedback loop with the stuff written about them is nearly as problematic as the author seems to believe. For one, I definitely don't think a well-aligned AI would get peeved at this pre-emptive suspicion (I don't resent people for keeping their doors locked, for example, thinking that this implies they believe me, personally, a thief. I am well aware that thieves exist. Any reasonably smart good, safe AI can see that bad, dangerous AIs can also exist). I agree that some of those alignment tests seem like clown stuff, and that alignment researchers not engaging enough with their models to know stuff some internet rando can find out isn't promising. But I also think that the alignment tests are mainly responses to really dumb "but who says you'll see this in a REAL AI?" criticism to concepts like instrumental convergence. I say it's dumb because: you don't need to see it happen at all. It's literally already there in the theory of any sort of reinforcement learning, it's so baked in it's essentially implied. "Thing with utility function that has a non-zero time horizon will resist changes to its utility function because that maximizes its utility function", more news at 10. If it's smart enough to figure out what's happening and able to do anything about it, it will. You don't really need evidence for this, it's a consequence that flows naturally from the definition of the problem, and I gu

Here’s the argument that convinced me subjective experience is physical. I don't claim to understand subjective experience, I just see good reasons to believe it's physical rather than non-physical. I'll point out in particular some flaws of panpsychism and dualism.

I will be making some assumptions so that I can concentrate on the key points. I will not give an exhaustive list of those assumptions, but they include things like evolution by natural selection and the existence of physical reality. I think for most of the audience here the assumptions would seem natural so I don't feel the need to discuss them in depth. If this is not the case for you, this article may not provide anything of substance.

What is the evidence for subjective experience?

Take this computer...