All of Nisan's Comments + Replies

What's more, even selfish agents with de dicto identical utility functions can trade: If I have two right shoes and you have two left shoes, we'd trade one shoe for another because of decreasing marginal utility.

Recent interviews with Eliezer:

That's the one, thank you!

The nanobots, from the bloodstream, in the parlor, Professor Plum.

You could have written Colonel Mustard!

Answer by NisanApr 11, 202386
  • Figure out why it's important to you that your romantic partner agree with you on this. Does your relationship require agreement on all factual questions? Are you contemplating any big life changes because of x-risk that she won't be on board with?

  • Would you be happy if your partner fully understood your worries but didn't share them? If so, maybe focus on sharing your thoughts, feelings, and uncertainties around x-risk in addition to your reasoning.

I have to click twice on the Reply link, which is unintuitive. (Safari on iOS.)

I tried a couple other debates with GPT-4, and they both ended up at "A, nevertheless B" vs. "B, nevertheless A".

I expressed some disagreement in my comment, but I didn't disagree-vote.

It seemed pretty clear from the text of your comment that you didn't think mine deserved a strong disagree-vote. 

I like your upper bound. The way I'd put it is: If you buy $1 of Microsoft stock, the most impact that can have is if Microsoft sells it to you, in which case Microsoft gets one more dollar to invest in AI today.

And Microsoft won't spend the whole dollar on AI. Although they'd plausibly spend most of a marginal dollar on AI, even if they don't spend most of the average dollar on AI.

I'm not sure what to make of the fact that Microsoft is buying back stock. I'd guess it doesn't make a difference either way? Perhaps if they were going to buy back $X worth of ... (read more)

That could be, but also maybe there won't be a period of increased strategic clarity. Especially if the emergence of new capabilities with scale remains unpredictable, or if progress depends on finding new insights.

I can't think of many games that don't have an endgame. These examples don't seem that fun:

  • A single round of musical chairs.
  • A tabletop game that follows an unpredictable, structureless storyline.
4Zach Stein-Perlman3mo
Agree. I merely assert that we should be aware of and plan for the possibility of increased strategic clarity, risk awareness, etc. (and planning includes unpacking "etc."). Probably taking the analogy too far, but: most games-that-can-have-endgames also have instances that don't have endgames; e.g. games of chess often end in the midgame.

I don't think this is a good argument. A low probability of impact does not imply the expected impact is negligible. If you have an argument that the expected impact is negligible, I'd be happy to see it.

I'm not claiming "low probability implies expectation is negligible", and I apologize if what I wrote gave the impression that I was. The thing that seems intuitively clear to me is pretty much "expectation is negligible". What is the actual effect on Microsoft of a small decrease in their share price? Before trying to answer that in a principled way, let's put an upper bound on it. The effect on MS of a transaction of size X cannot be bigger than X, because if e.g. every purchase of $1000 of MS stock made them $1000 richer then it would be to their advantage to buy stock; as they bought more the size of this effect would go down, and they would keep buying until the benefit from buying $1000 of MS stock became <= $1000. (MS does on net buy stock every year, or at least has for the last several years, but my argument isn't "this can't happen because if it did then MS would buy their own stock" but "if this happened MS would buy enough of their own stock to make the effect go away".) My intuition says that this sort of effect should be substantially less than the actual dollar value of the transaction, but if there's some way to prove this it isn't known to me. This intuition is the reason why I expect little effect on MS from you buying or selling MS shares. But let's set that intuition aside and see if we can make some concrete estimates (they will definitely require a lot of guesswork and handwaving). How does a higher share price actually help them? * If they choose to raise money by selling shares, they can get a bit more money by doing so. * If they choose to use shares for money-like purposes (e.g., buying another company with MS shares rather than cash, incentivizing employees by giving them shares), any fixed fraction of the company they are willing to part with is worth a bit more. * If they choose to borrow money, they can probably get slightly better terms because their company is seen as more valuable, hence better able to raise

After way more effort than I thought it could possibly require, there is now a full transcript here.

If there isn't, I recommend to the podcast creator to consult with e.g. the Clearer Thinking podcast [] team on how they do cost-effective partly-automated transcripts nowadays. Here's an article on their thinking from early 2022 [], which was before e.g. OpenAI Whisper was released. I think this LW post would be significantly more useful with a full transcript, even if automated, for instance because it's easier to discuss quotes in the comments. (On the other hand, there's a risk of getting misquoted or directing excessive scrutiny to language that's less polished than it would be in essay form, or that may suffer from outright transcription errors.)
No, sorry. Since a few people have asked: transcripts are pretty money- and time-consuming to produce, and I wanted to have a podcast where I make the trade-off of having more episodes but with less polish.

We had the model for ChatGPT in the API for I don't know 10 months or something before we made ChatGPT. And I sort of thought someone was going to just build it or whatever and that enough people had played around with it.


I assume he's talking about text-davinci-002, a GPT 3.5 model supervised-finetuned on InstructGPT data. And he was expecting someone to finetune it on dialog data with OpenAI's API. I wonder how that would have compared to ChatGPT, which was finetuned with RL and can't be replicated through the API.

You can't finetune GPT-3.5 through the API, just GPT-3

I agree that institutional inertia is a problem, and more generally there's the problem of getting principals to do the thing. But it's more dignified to make alignment/cooperation technology available than not to make it.

I'm a bit more optimistic about loopholes because I feel like if agents are determined to build trust, they can find a way.

I agree those nice-to-haves would be nice to have. One could probably think of more.

I have basically no idea how to make these happen, so I'm not opinionated on what we should do to achieve these goals. We need some combination of basic research, building tools people find useful, and stuff in-between.

You poster talks about "catastrophic outcomes" from "more-powerful-than-human" AI. Does that not count as alarmism and x-risk? This isn't meant to be a gotcha, I just want to know what counts as too alarmist for you.

6Marius Hobbhahn8mo
Maybe I should have stated this differently in the post. Many conversations end up talking about X-risks at some point but usually only after it went through the other stuff. I think my main learning was just that starting with X-risk as the motivation did not seem very convincing. Also, there is a big difference in how you talk about X-risk. You could say stuff like "there are plausible arguments why X-risk could lead to extinction but even experts are highly uncertain about this" or "We're all gonna die" and the more moderate version seems clearly more persuasive. 

Setting aside tgb's comment, shouldn't it be ? The formula in the post would have positive growth even if , which doesn't seem right.

It only took 7 years to make substantial progress on this problem: Logical Induction by Garrabrant et al..

Taking on a 60-hour/week job to see if you burn out seems unwise to me. Some better plans:

  • Try lots of jobs on lots of teams, to see if there is a job you can work 60 hours/week at.
  • Pay attention to what features of your job are energizing vs. costly. Notice any bad habits that might cause burnout.
  • Become more productive per hour.

Hi Bob, I noticed you have some boxes of stuff stacked up in the laundry room. I can't open the washing machine door all the way because the boxes are in the way. Could you please move them somewhere else?

Dear Alice,

Some of the boxes in that stack belong to my partner Carol, and I'd have to ask her if she's okay with them being moved.

In theory I could ask Carol if she's all right with the idea of moving the boxes. If Carol were to agree to the idea, I would need to find a new place for the boxes, then develop a plan for how to actually move the

... (read more)

Thanks for sharing your reasoning. For what it's worth, I worked on OpenAI's alignment team for two years and think they do good work :) I can't speak objectively, but I'd be happy to see talented people continue to join their team.

I think they're reducing AI x-risk in expectation because of the alignment research they publish (1 2 3 4). If anyone thinks that research or that kind of research is bad for the world, I'm happy to discuss.

Thanks for your constructive attitude to my words.

Why do you think the alignment team at OpenAI is contributing on net to AI danger?

Maybe I don't know enough about OpenAI's alignment team to criticize it in public? I wanted to name one alignment outfit because I like to be as specific as possible in my writing. OpenAI popped into my head because of the reasons I describe below. I would be interested in your opinion. Maybe you'll change my mind.

I had severe doubts about the alignment project (the plan of creating an aligned superintelligence before any group manages an unaligned one) even before Eliezer went public with his grave doubts in the fall of last year. It's not that I consider... (read more)

Also, chess usually ends in a draw, which is lame. Go rarely if ever ends in a draw.

Agreed. It'd be nice if the chess folk took some low-hanging-fruit rule changes seriously. Treating stalemate as a loss is the most obvious. I'd be interested to know how much this would change things at the highest level. Ah - I see DM tried this (gwern's link), with disappointingly little impact. A more 'drastic' (but IMO interesting) endgame change would be to change the goal of chess from "capture the king" to "get the king to the opponent's throne" (i.e. white wins by getting the king to e8, black wins by getting the king to e1; checkmate/stalemate wins immediately). You get some moderately interesting endgames with this rule - e.g. king+bishop can win against king from most positions, as can king+knight. This means that many liquidate-material-to-drawn-endgame tactics no longer work. For more general endgame positions, the e8 and e1 squares become an extra weakness. So positions where it was hard/impossible to convert an advantage (difficult with only one weakness to exploit), become winnable (two weaknesses often being enough). I don't know how it'd work out in practice. It'd be fun to see how [this + chess960] worked out at high level.

CFAR used to have an awesome class called "Be specific!" that was mostly about concreteness. Exercises included:

  • Rationalist taboo
  • A group version of rationalist taboo where an instructor holds an everyday object and asks the class to describe it in concrete terms.
  • The Monday-Tuesday game
  • A role-playing game where the instructor plays a management consultant whose advice is impressive-sounding but contentless bullshit, and where the class has to force the consultant to be specific and concrete enough to be either wrong or trivial.
  • People were encouraged t
... (read more)

Agents who model each other can be modeled as programs with access to reflective oracles. I used to think the agents have to use the same oracle. But actually the agents can use different oracles, as long as each oracle can predict all the other oracles. This feels more realistic somehow.

I'm not sure there's a functional difference between "same" and "different" oracles at this level of modeling.

Ok, I think in the OP you were using the word "secrecy" to refer to a narrower concept than I realized. If I understand correctly, if Alice tells Bob "please don't tell Bob", and then five years later when Alice is dead or definitely no longer interested or it's otherwise clear that there won't be negative consequences, Carol tells Bob, and Alice finds out and doesn't feel betrayed — then you wouldn't call that a "secret". I guess for it to be a "secret" Carol would have to promise to carry it to her grave, even if circumstances changed, or something.

In that case I don't have strong opinions about the OP.

Become unpersuadable by bad arguments. Seek the best arguments both for and against a proposition. And accept that you'll never be epistemically self-sufficient in all domains.

Suppose Alice has a crush on Bob and wants to sort out her feelings with Carol's help. Is it bad for Alice to inform Carol about the crush on condition of confidentiality?

In the most common branch of this conversation, Alice is predictably going to tell Bob about it soon, and is speaking to Carol first in order to sort out details and gain courage. If Carol went and preemptively informed Bob, before Alice talked to Bob herself, this would be analogous to sharing an unfinished draft. This would be bad, but the badness really isn't about secrecy. The contents of an unfinished draft headed for publication aren't secret, except in a narrow and time-limited sense. The problem is that the sharing undermines the impact of the later publication, causes people to associate the author with a lower quality product, and potentially misleads people about the author's beliefs. Similarly, if Carol goes and preemptively tells Bob about Alice's crush, then this is likely to give Bob a misleading negative impression of Alice. It's reasonable for Alice to ask Carol not to do that, and it's okay for them to not have a detailed model of all of the above. But if Alice never tells Bob, and five years later Bob and Carol are looking back on the preceding years and asking if they could have gone differently? In that case, I think discarding the information seems like a pure harm.

Your Boycott-itarianism could work just through market signals. As long as your diet makes you purchase less high-cruelty food and more low-cruelty food, you'll increase the average welfare of farm animals, right? Choosing a simple threshold and telling everyone about it is additionally useful for coordination and maybe sending farmers non-market signals, if you believe those work.

If you really want the diet to be robustly good with respect to the question of whether farm animals' lives are net-positive, you'd want to tune the threshold so as not to change... (read more)

Yep, I skimmed it by looking at the colorful plots that look like Ising models and reading the captions. Those are always fun.

No, I just took a look. The spin glass stuff looks interesting!

Are we talking about the same thing? []

I think you're saying , right? In that case, since embeds into , we'd have embedding into . So not really a step up.

If you want to play ordinal games, you could drop the requirement that agents are computable / Scott-continuous. Then you get the whole ordinal hierarchy. But then we aren't guaranteed equilibria in games between agents of the same order.

I suppose you could have a hybrid approach: Order is allowed to be discontinuous in its order- beliefs, but higher orders have to be continuous? Maybe that would get you to .... (read more)

And as a matter of scope, your reaction here is incorrect. [...] Reacting to it as a synecdoche of the agricultural system does not seem useful.

On my reading, the OP is legit saddened by that individual turkey. One could argue that scope demands she be a billion times sadder all the time about poultry farming in general, but that's infeasible. And I don't think that's a reductio against feeling sad about an individual turkey.

Sometimes, sadness and crying are about integrating one's beliefs. There's an intuitive part of your mind that doesn't understand ... (read more)


I apologize, I shouldn't have leapt to that conclusion.

Apology accepted.


it legitimately takes the whole 4 years after that to develop real AGI that ends the world. FINE. SO WHAT. EVERYONE STILL DIES.

By Gricean implicature, "everyone still dies" is relevant to the post's thesis. Which implies that the post's thesis is that humanity will not go extinct. But the post is about the rate of AI progress, not human extinction.

This seems like a bucket error, where "will takeoff be fast or slow?" and "will AI cause human extinction?" are put in the same bucket.


The central hypothesis of "takeoff speeds" is that at the time of serious AGI being developed, it is perfectly anti-Thielian in that it is devoid of secrets

No, the slow takeoff model just precludes there being one big secret that unlocks both 30%/year growth and dyson spheres. It's totally compatible with a bunch of medium-sized $1B secrets that different actors discover, adding up to hyperbolic economic growth in the years leading up to "rising out of the atmosphere".

Rounding off the slow takeoff hypothesis to "lots and lots of little innovations addin... (read more)


"Takeoff Speeds" has become kinda "required reading" in discussions on takeoff speeds. It seems like Eliezer hadn't read it until September of this year? He may have other "required reading" from the past four years to catch up on.

... (read more)
[This comment is no longer endorsed by its author]Reply

I read "Takeoff Speeds" at the time.  I did not liveblog my reaction to it at the time.  I've read the first two other items.

I flag your weirdly uncharitable inference.

I don't think "viciousness" is the word you want to use here.

You are right, but for a slightly different reason. I had thought I meant a meaning analogous quoted in Epistemic Viciousness []... Except when I actually reread the essay, this is not what I wanted to imply: (For some reason, I had remembered "epistemic viciousness" as "epistemic fierceness.")  I've currently edited to "fierceness." I'll keep an eye out for a yet better word. 

Ah, great! To fill in some of the details:

  • Given agents and numbers such that , there is an aggregate agent called which means "agents and acting together as a group, in which the relative power of versus is the ratio of to ". The group does not make decisions by combining their utility functions, but instead by negotiating or fighting or something.

  • Aggregation should be associative, so .

  • If you spell out all the associativity relations, you'll find that

... (read more)

I like that you glossed the phrase "have your cake and eat it too":

It's like a toddler thinking that they can eat their slice of cake, and still have that very same slice of cake available to eat again the next morning.

I also like that you explained the snowclone "lies, damned lies, and statistics". I'm familiar with both of these cliches, but they're generally overused to the point of meaninglessness. It's clear you used them with purpose.

Interestingly, if my research is not mistaken, "eat your cake and have it too" was the original form of the phrase and is much clearer imo; I was always confused by "have your cake and eat it too" because that seemed to be just ... describing the normal order of operations?

The psychotic break you describe sounds very scary and unpleasant, and I'm sorry you experienced that.

Typo: "common, share, agreed-on" should be "...shared...".

"Shut up (−1/3)i and calculate." is a typo that isn't present in the original post.

People are fond of using the neologism "cruxy", but there's already a word for that: "crucial". Apparently this sense of "crucial" can be traced back to Francis Bacon.

The point of using a word like this is to point to different habits of thoughts. If you use an existing word that's unlikely to happen in listerners. If you don't do that you get a lot of motte-and-bailey issues. 
A cruxy point doesn't have to be important, the whole question being considered doesn't have to be important. This is an unfortunate connotation of "crucial", because when I'm pointing out that the sky is blue, I'm usually not saying that it's important that it's blue, or that it's important for this object level argument to be resolved. It's only important to figure out what caused a simple mistake that's usually reliably avoided, and to keep channeling curiosity to fill out the map, so that it's not just the apparently useful parts that are not wild conjecture.

This story originally had a few more italicized words, and they make a big difference:

"Don't," Jeffreyssai said. There was real pain in it.


"I do not know," said Jeffreyssai, from which the overall state of the evidence was obvious enough.

Some of the formatting must have been lost when it was imported to LessWrong 2.0. You can see the original formatting at and in Rationality: AI to Zombies.

There are also minor rewordings throughout. The LessWrong version differs from the other two, which (from a limited sample) agree with each other. I would guess that the latter is the authorised version.

It seems to me that if I make some reasonable-ish assumptions, then 2 micromorts is equivalent to needing to drive for an hour at a random time in my life. I expect the value of my time to change over my life, but I'm not sure in which direction. So equating 2 micromorts with driving for an hour tonight is probably not a great estimate.

How do you deal with this? Have you thought about it and concluded that the value of your time today is a good estimate of the average value over your life? Or are you assuming that the value of your time won't change by more than, say, a factor of 2 over your life?

3Optimization Process2y
That's a great point! My rough model is that I'll probably live 60 more years, and the last ~20 years will be ~50% degraded, so by 60 remaining life-years are only 50 QALYs. But... as you point out, on the other hand, my time might be worth more in 10 years, because I'll have more metis, or something. Hmm. (Another factor: if your model is that awesome life-extension tech / friendly AI will come before the end of your natural life, then dying young is a tragedy, since it means you'll miss the Rapture; in which case, 1 micromort should perhaps be feared many times more than this simple model suggests. I... haven't figured out how to feel about this small-probability-of-astronomical-payoff sort of argument.)
Load More