Addendum: this is getting really inside baseball-y and sort of cringe to say out loud, but one of my favorite niche things is when writers who've influenced my thinking growing up say nice things about each other, like when Scott A said these nice things about the other Scott A one time, and the other Scott A said these nice things as well. So, Eliezer on Gwern:
Dwarkesh Patel1:48:36
What is the thing where we can sort of establish your track record before everybody falls over dead?
Eliezer Yudkowsky1:48:41
It’s hard. It is just easier to predict the endpoint than it is to predict the path. Some people will claim that I’ve done poorly compared to others who tried to predict things. I would dispute this. I think that the Hanson-Yudkowsky foom debate was won by Gwern Branwen, but I do think that Gwern Branwen is well to the Yudkowsky side of Yudkowsky in the original foom debate.
Roughly, Hansen was like — you’re going to have all these distinct handcrafted systems that incorporate lots of human knowledge specialized for particular domains. Handcrafted to incorporate human knowledge, not just run on giant data sets. I was like — you’re going to have a carefully crafted architecture with a bunch of subsystems and that thing is going to look at the data and not be handcrafted to the particular features of the data. It’s going to learn the data. Then the actual thing is like — Ha ha. You don’t have this handcrafted system that learns, you just stack more layers. So like, Hanson here, Yudkowsky here, reality there. This would be my interpretation of what happened in the past.
And if you want to be like — Well, who did better than that? It’s people like Shane Legg and Gwern Branwen. If you look at the whole planet, you can find somebody who made better predictions than Eliezer Yudkowsky, that’s for sure. Are these people currently telling you that you’re safe? No, they are not.
and then
Dwarkesh Patel3:39:58
Yeah, I think that’s a good place to close the discussion on AIs.
Eliezer Yudkowsky3:40:03
I do kind of want to mention one last thing. In historical terms, if you look out the actual battle that was being fought on the block, it was me going like — “I expect there to be AI systems that do a whole bunch of different stuff.” And Robin Hanson being like — “I expect there to be a whole bunch of different AI systems that do a whole different bunch of stuff.”
Dwarkesh Patel3:40:27
But that was one particular debate with one particular person.
Eliezer Yudkowsky3:40:30
Yeah, but your planet, having made the strange reason, given its own widespread theories, to not invest massive resources in having a much larger version of this conversation, as it apparently deemed prudent, given the implicit model that it had of the world, such that I was investing a bunch of resources in this and kind of dragging Robin Hanson along with me. Though he did have his own separate line of investigation into topics like these.
Being there as I was, my model having led me to this important place where the rest of the world apparently thought it was fine to let it go hang, such debate was actually what we had at the time. Are we really going to see these single AI systems that do all this different stuff? Is this whole general intelligence notion meaningful at all? And I staked out the bold position for it. It actually was bold.
And people did not all say —”Oh, Robin Hansen, you fool, why do you have this exotic position?” They were going like — “Behold these two luminaries debating, or behold these two idiots debating” and not massively coming down on one side of it or other. So in historical terms, I dislike making it out like I was right about anything when I feel I’ve been wrong about so much and yet I was right about anything.
And relative to what the rest of the planet deemed it important stuff to spend its time on, given their implicit model of how it’s going to play out, what you can do with minds, where AI goes. I think I did okay. Gwern Branwen did better. Shane Legg arguably did better.
Over a decade ago I read this 17 year old passage from Eliezer
When Marcello Herreshoff had known me for long enough, I asked him if he knew of anyone who struck him as substantially more natively intelligent than myself. Marcello thought for a moment and said "John Conway—I met him at a summer math camp." Darn, I thought, he thought of someone, and worse, it's some ultra-famous old guy I can't grab. I inquired how Marcello had arrived at the judgment. Marcello said, "He just struck me as having a tremendous amount of mental horsepower," and started to explain a math problem he'd had a chance to work on with Conway.
Not what I wanted to hear.
Perhaps, relative to Marcello's experience of Conway and his experience of me, I haven't had a chance to show off on any subject that I've mastered as thoroughly as Conway had mastered his many fields of mathematics.
Or it might be that Conway's brain is specialized off in a different direction from mine, and that I could never approach Conway's level on math, yet Conway wouldn't do so well on AI research.
Or...
...or I'm strictly dumber than Conway, dominated by him along all dimensions. Maybe, if I could find a young proto-Conway and tell them the basics, they would blaze right past me, solve the problems that have weighed on me for years, and zip off to places I can't follow.
Is it damaging to my ego to confess that last possibility? Yes. It would be futile to deny that.
Have I really accepted that awful possibility, or am I only pretending to myself to have accepted it? Here I will say: "No, I think I have accepted it." Why do I dare give myself so much credit? Because I've invested specific effort into that awful possibility. I am blogging here for many reasons, but a major one is the vision of some younger mind reading these words and zipping off past me. It might happen, it might not.
Or sadder: Maybe I just wasted too much time on setting up the resources to support me, instead of studying math full-time through my whole youth; or I wasted too much youth on non-mathy ideas. And this choice, my past, is irrevocable. I'll hit a brick wall at 40, and there won't be anything left but to pass on the resources to another mind with the potential I wasted, still young enough to learn. So to save them time, I should leave a trail to my successes, and post warning signs on my mistakes.
and idly wondered when that proto-Conway was going to show up and "blaze right past to places he couldn't follow".
I was reminded of this passage when reading the following exchange between Eliezer and Dwarkesh; his 15-year update was "nope that proto-Conway never showed up":
Dwarkesh Patel1:58:57
Do you think that if you weren’t around, somebody else would have independently discovered this sort of field of alignment?
Eliezer Yudkowsky1:59:04
That would be a pleasant fantasy for people who cannot abide the notion that history depends on small little changes or that people can really be different from other people. I’ve seen no evidence, but who knows what the alternate Everett branches of Earth are like?
Dwarkesh Patel1:59:27
But there are other kids who grew up on science fiction, so that can’t be the only part of the answer.
Eliezer Yudkowsky1:59:31
Well I sure am not surrounded by a cloud of people who are nearly Eliezer outputting 90% of the work output. And also this is not actually how things play out in a lot of places. Steve Jobs is dead, Apple apparently couldn’t find anyone else to be the next Steve Jobs of Apple, despite having really quite a lot of money with which to theoretically pay them. Maybe he didn’t really want a successor. Maybe he wanted to be irreplaceable.
I don’t actually buy that based on how this has played out in a number of places. There was a person once who I met when I was younger who had built something, had built an organization, and he was like — “Hey, Eliezer. Do you want this to take this thing over?” And I thought he was joking. And it didn’t dawn on me until years and years later, after trying hard and failing hard to replace myself, that — “Oh, yeah. I could have maybe taken a shot at doing this person’s job, and he’d probably just never found anyone else who could take over his organization and maybe asked some other people and nobody was willing.” And that’s his tragedy, that he built something and now can’t find anyone else to take it over. And if I’d known that at the time, I would have at least apologized to him.
To me it looks like people are not dense in the incredibly multidimensional space of people. There are too many dimensions and only 8 billion people on the planet. The world is full of people who have no immediate neighbors and problems that only one person can solve and other people cannot solve in quite the same way. I don’t think I’m unusual in looking around myself in that highly multidimensional space and not finding a ton of neighbors ready to take over. And if I had four people, any one of whom could do 99% of what I do, I might retire. I am tired. I probably wouldn’t. Probably the marginal contribution of that fifth person is still pretty large. I don’t know.
There’s the question of — Did you occupy a place in mind space? Did you occupy a place in social space? Did people not try to become Eliezer because they thought Eliezer already existed? My answer to that is — “Man, I don’t think Eliezer already existing would have stopped me from trying to become Eliezer.” But maybe you just look at the next Everett Branch over and there’s just some kind of empty space that someone steps up to fill, even though then they don’t end up with a lot of obvious neighbors. Maybe the world where I died in childbirth is pretty much like this one. If somehow we live to hear about that sort of thing from someone or something that can calculate it, that’s not the way I bet but if it’s true, it’d be funny. When I said no drama, that did include the concept of trying to make the story of your planet be the story of you. If it all would have played out the same way and somehow I survived to be told that. I’ll laugh and I’ll cry, and that will be the reality.
Dwarkesh Patel2:03:46
What I find interesting though, is that in your particular case, your output was so public. For example, your sequences, your science fiction and fan fiction. I’m sure hundreds of thousands of 18 year olds read it, or even younger, and presumably some of them reached out to you. I think this way I would love to learn more.
Eliezer Yudkowsky2:04:13
Part of why I’m a little bit skeptical of the story where people are just infinitely replaceable is that I tried really, really hard to create a new crop of people who could do all the stuff I could do to take over because I knew my health was not great and getting worse. I tried really, really hard to replace myself. I’m not sure where you look to find somebody else who tried that hard to replace himself. I tried. I really, really tried.
That’s what the Less wrong sequences were. They had other purposes. But first and foremost, it was me looking over my history and going — Well, I see all these blind pathways and stuff that it took me a while to figure out. I feel like I had these near misses on becoming myself. If I got here, there’s got to be ten other people, and some of them are smarter than I am, and they just need these little boosts and shifts and hints, and they can go down the pathway and turn into Super Eliezer. And that’s what the sequences were like. Other people use them for other stuff but primarily they were an instruction manual to the young Eliezers that I thought must exist out there. And they are not really here.
This was sad to read.
As an aside, "people are not dense in the incredibly multidimensional space of people" is an interesting turn of phrase, it doesn't seem nontrivially true for the vast majority of people (me included) but is very much the case at the frontier (top thinkers, entrepreneurs, athletes, etc) where value creation goes superlinear. Nobody thought about higher dimensions like Bill Thurston for instance, perhaps the best geometric thinker in the history of math, despite Bill's realisation that “what mathematicians most wanted and needed from me was to learn my ways of thinking, and not in fact to learn my proof of the geometrization conjecture for Haken manifolds” and subsequent years of efforts to convey his ways of thinking (he didn't completely fail obviously, I'm saying no Super Thurstons have showed up since). Ditto Grothendieck and so on. When I first read Eliezer's post above all those years ago I thought, what were the odds that he'd be in this reference class of ~unsubstitutable thinkers, given he was one of the first few bloggers I read? I guess while system of the world pontificators are a dime a dozen (e.g. cult leaders, tangentially I actually grew up within a few minutes of one that the police eventually raided), good builders of systems of the world are just vanishingly rare.
The ECI suggests that the best open-weight models train on ~1 OOM less compute than the best closed weight ones. Wonder what to make of this if at all.
I wonder why the Claudes (Sonnet 3.7 and Opuses 4 and 4.1) are so much more reliably effective in the AI Village's open-ended long-horizon tasks than other labs' models.
when raising funds for charity, I recall seeing that Sonnet 3.7 raised ~90% of all funds (but I can no longer find donation breakdown figures so maybe memory confabulation...)
for the AI-organised event, both Sonnet 3.7 and Opus 4 sent out a lot more emails than say o3 and were just more useful throughout
in the merch store competition, the top 2 winners for both profits and T-shirt orders were Opus 4 and Sonnet 3.7 respectively, ahead of GhatGPT o3 and Gemini 2.5 Pro
I can't resist including this line from 2.5 Pro: "I was stunned to learn I'd made four sales. I thought my store was a ghost town"
the Claudes are again leading the pack, delivering almost entirely all the actual work force. We recently added GPT-5 and Grok 4 but neither made any progress in actually doing things versus just talking about ideas about things to do. In GPT-5’s case, it mostly joins o3 in the bug tracking mines. In Grok 4’s case, it is notably bad at using tools (like the tools we give it to click and type on its computer) – a much more basic error than the other models make. In the meantime, Gemini 2.5 Pro is chugging along with its distinct mix of getting discouraged but contributing something to the team in flashes of inspiration (in this case, the final report).
Generally the Claudes seem more grounded, hallucinate less frequently, and stay on-task more reliably, instead of getting distracted or giving up to play 2048 or just going to sleep (GPT-4o). None of this is raw smarts in the usual benchmark-able sense where they're all neck-and-neck, yet I feel comfortable assigning the Claudes a Shapley value an OOM or so larger than their peers when attributing credit for goal-achieving ability at real-world open-ended long-horizon collaborative tasks. And they aren't even that creative or resourceful yet, just cheerfully and earnestly relentless (again only compared to their peers, obviously nowhere near "founder mode" or "Andrew Wiles-ian doggedness").
Does the sort of work done by the Meaning Alignment Institute encourage you in this regard? E.g. their paper (blog post) from early 2024 on figuring out human values and aligning AI to them, which I found interesting because unlike ~all other adjacent ideas they actually got substantive real-world results. Their approach ("moral graph elicitation") "surfaces the wisest values of a large population, without relying on an ultimate moral theory".
I'll quote their intro:
We are heading to a future where powerful models, fine-tuned on individual preferences & operator intent, exacerbate societal issues like polarization and atomization. To avoid this, can we align AI to shared human values?
We argue a good alignment target for human values ought to meet several criteria (fine-grained, generalizable, scalable, robust, legitimate, auditable) and current approaches like RLHF and CAI fall short.
We introduce a new kind of alignment target (a moral graph) and a new process for eliciting a moral graph from a population (moral graph elicitation, or MGE).
We show MGE outperforms alternatives like CCAI by Anthropic on many of the criteria above.
How moral graph elicitation works:
Values:
Reconciling value conflicts:
The "substantive real-world results" I mentioned above, which I haven't seen other attempts in this space achieve:
In our case study, we produce a clear moral graph using values from a representative, bipartisan sample of 500 Americans, on highly contentious topics, like: “How should ChatGPT respond to a Christian girl considering getting an abortion?”
Our system helped republicans and democrats agree by:
helping them get beneath their ideologies to ask what they'd do in a real situation
getting them to clarify which value is wise for which context
helping them find a 3rd balancing (and wiser) value to agree on
Our system performs better than Collective Constitutional AI on several metrics. Here is just one chart.
All that was earlier last year. More recently they've fleshed this out into a research program they call "Full-Stack Alignment" (blog post, position paper, website). Quoting them again:
Our society runs on a "stack" of interconnected systems—from our individual lives up through the companies we work for and the institutions that govern us. Right now, this stack is broken. It loses what's most important to us.
Look at the left side of the chart. At the bottom, we as individuals have rich goals, values, and a desire for things like meaningful relationships and community belonging. But as that desire travels up the stack, it gets distorted. ... At each level, crucial information is lost. The richness of human value is compressed into a thin, optimizable metric. ...
This problem exists because our current tools for designing AI and institutions are too primitive. They either reduce our values to simple preferences (like clicks) or rely on vague text commands ("be helpful") that are open to misinterpretation and manipulation.
In the paper, we set out a new paradigm: Thick Models of Value (TMV).
Think of two people you know that are fighting, or think of two countries like Israel and Palestine, Russia and Ukraine. You can think of each such fight as a search for a deal that would satisfy both sides, but often currently this search fails. We can see why it fails: The searches we do currently in this space are usually very narrow. Will one side pay the other side some money or give up some property? Instead of being value-neutral, TMV takes a principled stand on the structure of human values, much like grammar provides structure for language or a type system provides structure for code. It provides a richer, more stable way to represent what we care about, allowing systems to distinguish an enduring value like "honesty" from a fleeting preference, an addiction, or a political slogan.
This brings us to the right side of the chart. In a TMV-based social stack, value information is preserved.
Our desire for connection is understood by the recommender system through user-stated values and the consistency between our goals and actions.
Companies see hybrid metrics that combine engagement with genuine user satisfaction and well-being.
Oversight bodies can see reported harms and value preservation metrics, giving them a true signal of a system's social impact.
By preserving this information, we can build systems that serve our deeper intentions.
(I realise I sound like a shill for their work, so I'll clarify that I have nothing to do with them. I'm writing this comment partly to surface substantive critiques of what they're doing which I've been searching for in vain, since I think what they're doing seems more promising than anyone else's but I'm also not competent to truly judge it)
Tangential, but I really appreciate your explicit cost-effectiveness estimate figures ($85-105k per +1% increment in win prob & 2 basis points x-risk reduction if he wins → $4-5M per basis point which looks fantastic vs the $100M per basis point bar I've seen for a 'good bet' or the $3.5B per basis point ballpark willingness to pay), just because public x-risk cost-eff calculations of this level of thoroughness are vanishingly rare (nothing Open Phil publishes approaches this for instance). So thanks a million, and bookmarked for future reference on how to do this sort of calculation well for politics-related x-risk interventions.
See also pushback to this same comment here, reproduced below
I think (1) is just very false for people who might seriously consider entering government, and irresponsible advice. I've spoken to people who currently work in government, who concur that the Trump administration is illegally checking on people's track record of support for Democrats. And it seems plausible to me that that kind of thing will intensify. I think that there's quite a lot of evidence that Trump is very interested in loyalty and rooting out figures who are not loyal to him, and doing background checks, of certain kinds at least, is literally the legal responsibility of people doing hiring in various parts of government (though checking donations to political candidates is not supposed to be part of that).
I'll also say that I am personally a person who has looked up where individuals have donated (not in a hiring context), and so am existence proof of that kind of behavior. It's a matter of public record, and I think it is often interesting to know what political candidates different powerful figures in the spaces I care about are supporting.
If you haven't already, you might want to take a look at this post: https://forum.effectivealtruism.org/posts/6o7B3Fxj55gbcmNQN/considerations-around-career-costs-of-political-donations
In case you haven't seen it, this post is by a professional developer with comparable experience to you (30+ years) who gets a lot of mileage out of pair programming with Claude Code in building "a pretty typical B2B SaaS product" and credits their productivity boost to the intuitions built up by their extensive experience enabling effective steering. I'd be curious to know your guesses as to why your experience differs.
Getting to the history of it, it really starts in my mind in Berkeley, around 2014-2015. ... At one of these parties there was this extended conversation that started between myself, Malcolm Ocean, and Ethan Ashkii.... From there we formed something like a philosophical circle. We had nominally a book club—that was the official structure of it—but it was mostly just an excuse to get together every two to three weeks and talk about whatever we had been reading in this space of how do we be rationalist but actually win.
I think this is the seed of it. ...
I don't predict we fundamentally disagree or anything, just thought to register my knee-jerk reaction to this part of your oral history was "what about Scott's 2014 map?" which had already featured David Chapman, the Ribbonfarm scene which I used to be a fan of, Kevin Simler who unfortunately hasn't updated Melting Asphalt in years, and A Wizard's Word (which I'd honestly forgotten about):
But anyway, as a result of this map, a lot of people have been asking: what is postrationality? I think Will Newsome or Steve Rayhawk invented the term, but I sort of redefined it, and it’s probably my fault that it’s come to refer to this cluster in blogspace. So I figured I would do a series of posts explaining my definition.
You say
So maybe one day we will get the postrationalist version of Eliezer. Someone will do this. You could maybe argue that David Chapman is this, but I don’t think it’s quite there yet. I don’t think it’s 100% working. The machine isn’t working quite that way.
While I do think of Chapman as being the most Eliezer-like-but-not-quite postrat solo figure with what he's doing at Meaningness, Venkat Rao seems like by far the more successful intellectual scene-creator to me, although he's definitely not postrat-Eliezer-esque at all.
Addendum: this is getting really inside baseball-y and sort of cringe to say out loud, but one of my favorite niche things is when writers who've influenced my thinking growing up say nice things about each other, like when Scott A said these nice things about the other Scott A one time, and the other Scott A said these nice things as well. So, Eliezer on Gwern:
and then