All of Zach Stein-Perlman's Comments + Replies

Uncontroversially good legislation

I agree in part; I confess I was more thinking of the laxer "could be uncontroversial in a few years" standard (and more thinking of problems than policies). But at the least, I think narrow reforms in all but 5 and 10 could be uncontroversial. We don't need to abolish the FDA or "Buy American" to make improvements.

Aside from biosecurity and possibly medical license reciprocity, I think these are all pretty controversial.

Without checking the numbers, I'm pretty sure at least some of 7 and 9 are also quite popular (and maybe part of 6, depending on scop... (read more)

1korin437dI suspect part of why these seem uncontroversial is that they're not specific enough. Everyone agrees that we should make hospitals/housing/criminal justice better, but if you actually propose a specific policy to do that it, will give money to corporate interests/be socialist/make things better for 'bad' people/make some subgroup worse off/etc.
Uncontroversially good legislation

Thinking at the federal level, and mostly thinking about big stuff, but not claiming either choice is optimal. Explanation omitted because I don't have time to explain everything. Not systematic; not optimized for tractability; unordered.

  1. FDA reform.
  2. Biosecurity: regulation of hazardous research + pandemic preparedness + promoting biosecurity research.
  3. Medical licensing reciprocity (h/t Scott Alexander).
  4. Hospitals: the federal government could better-coordinate hospitals and make them more transparent (h/t Scott Alexander).
  5. Fund the IRS: auditing the rich
... (read more)
2korin437dAside from biosecurity and possibly medical license reciprocity, I think these are all pretty controversial. Outside of LessWrong, everyone loves the FDA and how they "protect" us. The IRS funding plan was discussed in NR (a major Republican media company): https://www.nationalreview.com/2021/11/when-it-comes-to-the-irs-bigger-is-not-better/ [https://www.nationalreview.com/2021/11/when-it-comes-to-the-irs-bigger-is-not-better/] Housing reform would allow undesirable people to buy houses, and would probably make house prices drop. Corporate welfare "creates jobs". etc. Not saying any of these are bad ideas, but they're not uncontroversial.
Prizes for ELK proposals

Ask dumb questions! ... we encourage people to ask clarifying questions in the comments of this post (no matter how “dumb” they are)

ok... disclaimer: I know little about ML and I didn't read all of the report.

All of our counterexamples are based on an ontology mismatch between two different Bayes nets, one used by an ML prediction model (“the predictor”) and one used by a human.

I am confused. Perhaps the above sentence is true in some tautological sense I'm missing. But in the sections of the report listing training strategies and corresponding coun... (read more)

2Ajeya Cotra15dIn the report, the first volley of examples and counterexamples are not focused solely on ontology mismatch, but everything after the relevant section [https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit#heading=h.u45ltyqgdnkk] is. ARC is always considering the case where the model does "know" the right answer to whether the diamond is in the room in the sense that it is discussed in the self-contained problem statement appendix here [https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit#heading=h.jk61tc933p1] . The ontology mismatch problem is not referring to the case where the AI "just doesn't have" some concept -- we're always assuming there's some "actually correct / true" translation between the way the AI thinks about the world and the way the human thinks about the world which is sufficient to answer straightforward questions about the physical world like "whether the diamond is in the room," and is pretty easy for the AI to find. For example, if the AI discovered some new physics and thinks in terms of hyper-strings in a four-dimensional manifold, there is some "true" translation between that and normal objects like "tables / chairs / apples" because the four-dimensional hyper-strings are describing a universe that contains tables / chairs / apples; furthermore, an AI smart enough to derive that complicated physics could pretty easily do that translation -- if given the right incentive -- just as human quantum physicists can translate between the quantum view of the world and the Newtonian view of the world or the folk physics view of the world. The worry explored in this report is not that the AI won't know how to do the translation; it's instead a question of what our loss functions incentivize. Even if it wouldn't be "that hard" to translate in some absolute sense, with the most obvious loss functions we can come up with it might be simpler / more natural / lower-loss to simply do infe
Newcomb's Problem as an Iterated Prisoner's Dilemma

I don't understand the second paragraph.

I buy (what I understand of) this if Omega makes its prediction by simulating you (and not if it makes its prediction by, say, scanning your DNA).

1Daniel Amdurer15dThe second paragraph is a bit handwavey. It's basically the bit that turns Newcomb into an iterated game. As there's this causal loop, it can be unlooped by converting into an iterated game, and using your action in the previous round as a proxy for your action in that round. So Omega plays based on your previous action, which is the same as your next one.
Open Thread - Jan 2022 [Vote Experiment!]
Zach Stein-Perlman18d40
3Truth
8Aim
14Clarity
7Seeking
🤨 6
❤️ 4
😮 1

Please tell us what you think! Love it/hate it/think it should be different? Let us know.

I think it's a fine experiment but... right now I'm closest to "hate it," at least if it was used for all posts (I'd be much happier if it was only for question-posts, or only if the author requested it or a moderator thought it would be particularly useful, or something).

  • It makes voting take longer (with not much value added).
  • It makes reading comments take longer (with not much value added). You learn very little from these votes beyond what you learn from readi
... (read more)
3MikkW10dI agree that it should be an option to turn this off for oneself, but I currently feel that this will be net-positive for most users
__nobody10d11
2Truth
3Aim
-1Clarity
🎉 1

I largely agree with this. Multi-axis voting is probably more annoying than useful for the regulars who have a good model of what is considered "good style" in this place. However, I think it'd be great for newbies. It's rare that your comment is so bad (or good) that someone bothers to reply, so mostly you get no votes at all or occasional down votes, plus the rare comment that gets lots of upvotes. Learning from so little feedback is hard, and this system has the potential to get you much more information.

So I'd suggest yet another mode of use for this: ... (read more)

2Gunnar_Zarncke18dThe ability to disable the voting by the user is valuable. An alternative would be to make it optional to enable for authors. Or require a minimum karma.
Omicron Post #11

We now know exactly what was up with the CDC nowcast. It was obvious nonsense.

Exactly what was up with it?

Omicron: My Current Model

Good point; I was imprecise. Thanks.

I meant 100K+ deaths in America, which is 4% of the global population, so millions of deaths globally, and I was implicitly thinking of "disease" as contagious disease that exists in rich countries too. I haven't looked for numbers but I suspect that COVID is a quite large fraction of American deaths from contagious disease, such that even in April it will not merely be one such disease among many.

2Razied23dBut this number of deaths from covid won't last, immunity from vaccines and past infections will get in equilibrium with some immune escape from perpetual new variants and declining immunity over time, just like it happens for flu strains. The finite number of very vulnerable old people will all die out, and over time the only people who will die of covid are the people who age into vulnerability, just like it happens with the flu.
Omicron: My Current Model

Covid-19 will be one more disease among many, and life will be marginally worse, but by about April you shouldn’t act substantially differently than if it no longer existed.

This seems quite bold given our history of variants emerging. And if Omicron infects billions, then prima facie there's great opportunity for mutation. I'd be interested to hear your credence in the following proposition:

From 1 May 2021 to 1 Jan 2030, Zvi won't act substantially differently due to risk of SARS-CoV-2 infection.

Additionally, "one more disease among many" suggests (... (read more)

3jaspax23d"One more disease among many" -- this Wikipedia graphic [https://en.wikipedia.org/wiki/File:Leading_cause_of_death_world.png] suggests that respiratory diseases already kill about 3.6M people/year, and pre-COVID I barely spent even a thought on them, nor did I expend extraordinary efforts avoiding them. We could add COVID to the mix and bump the number up to 3.7M people/year, and neither of the above would change.

Perhaps I should have explicitly put 'barring another major variant that disrupts this' there, but if Omicron infects most people on top of the vaccines, the damage a new variant does next time should be pretty low, and someone like me should be able to shrug it off and not care. 

Zvi’s Thoughts on the Survival and Flourishing Fund (SFF)

This makes some sense. On the other hand, not naming such organizations means you can't share your skepticism about specific organizations with the rest of us, who might benefit from hearing it.

Zvi’s Thoughts on the Survival and Flourishing Fund (SFF)

OK, thanks; this sounds reasonable.

That said, I fear that people in my position—viz., students who don't really know non-student EAs*—don't have the information to "figure out what's going on and what that means." So I want to note here that it would be valuable for people like me if you or someone else someday wrote a post explaining more what's going on in organized EA (and I'll finish reading this post carefully, since it seems relevant).

*I run my college's EA group; even relative to other student groups I/we are relatively detached from organized EA.


... (read more)
Zvi’s Thoughts on the Survival and Flourishing Fund (SFF)

I am not an Effective Altruist.

...

I know many EAs and consider many of them friends, but I do not centrally view the world in EA terms, or share the EA moral or ethical frameworks. I don’t use what seem to for all practical purposes be their decision theories. I have very large, very deep, very central disagreements with EA and its core components and central organizations and modes of operation. I have deep worries that important things are deeply, deeply wrong, especially epistemically, and results in an increasingly Goodharted and inherently poli

... (read more)

I intentionally dodged giving more details in these spots, because I want people to reason from the information and figure out what's going on and what that means, and I don't think updating 'against' (or for) things is the way one should be going about updating. 

Also because Long Post Is Long and getting into those other things would be difficult to write well, make things much longer, and be a huge distraction from actually processing the information. 

I think there's a much better chance of people actually figuring things out this way. 

Tha... (read more)

-5ChristianKl1mo
Zvi’s Thoughts on the Survival and Flourishing Fund (SFF)

There will inevitably be four different copies of this post, if not more – My Substack and WordPress copies, the LessWrong copy, and then EA Forum. I apologize in advance for the inevitable lack of engagement in some or all of those places, depending on how it goes.

Do you prefer comments on a particular copy? I'm not sure what the Schelling point is.

3Zvi1moThey're very different places, so I don't think there's a 'right' place for this. The EA Forum copy I expect to be relatively disconnected from for overdetermined reasons, but they were going to end up with a copy regardless so figured I'd provide it myself (all my writing is collective commons provided authorship and links back are provided, and I'm sure they will Have Thoughts). My guess is that if you encountered it here first you should be discussing it here.
The 2020 Review [Updated Review Dashboard]

Fair, I was implicitly assuming that no donation figures would be public.

8Raemon2moI'm unsure exactly how much we'll be making public, but I do expect at least aggregate donations to be public. The entire point here is to not merely have a few people committing to give money, but to build an entire system that helps authors have a justified expectation that important posts generally get money. And among the more credible ways to signal this is going to happen in the future is to start doing it now.
The 2020 Review [Updated Review Dashboard]

Something I’d like LessWrong to do better is to allow authors to transition from hobbyists, to professionals that get paid to research and write full time.

Earlier this year, I was thinking about whether LessWrong should become more like substack, where there’s an easy affordance to start supporting financially supporting authors you like. I liked the idea but wasn’t sure it’d be healthy for LessWrong – the sorts of posts that make people excited to donate are often more tribal/political. But this seemed less worrisome during The Review. It’s a time whe

... (read more)
5Richard_Ngo2moWhy is that the case? Is it just that people can't see how much you've donated via donation buttons? I assume that some aggregate donation figures will be made public later on, though, so making those figures higher seems pretty similar to you announcing donations personally.
Omicron Variant Post #2

Thanks, Zvi, for these updates. They have quite high counterfactual impact in educating me, and presumably the same is true for many others.

Other recent blogposts for those who haven't seen them yet: Noah Smith and Scott Alexander.

Zach Stein-Perlman's Shortform

I agree that near-optimal is unlikely. But I would be quite surprised by 1%-99% futures because (in short) I think we do better off we optimize for good and do worse if we don’t. If our final use of our cosmic endowment isn’t near-optimal, I think we failed to optimize for good and would be surprised if it’s >1%.

Christiano, Cotra, and Yudkowsky on AI progress

since you disagree with them eventually, e.g. >2/3 doom by 2030

This apparently refers to Yudkowsky's credences, and I notice I am surprised — has Yudkowsky said this somewhere? (Edit: the answer is no, thanks for responses.)

8Rob Bensinger2moI think Ajeya is inferring this from Eliezer's 2017 bet with Bryan Caplan [https://www.econlib.org/archives/2017/01/my_end-of-the-w.html]. The bet was jokey and therefore (IMO) doesn't deserve much weight, though Eliezer comments [https://www.econlib.org/archives/2017/01/my_end-of-the-w.html#comment-166919] that it's maybe not totally unrelated to timelines he'd reflectively endorse: In general, my (maybe-partly-mistaken) Eliezer-model... * thinks he knows very little about timelines (per the qualitative reasoning in There's No Fire Alarm For AGI [https://intelligence.org/2017/10/13/fire-alarm/] and in Nate's recent post [https://www.lesswrong.com/posts/cCMihiwtZx7kdcKgt/comments-on-carlsmith-s-is-power-seeking-ai-an-existential#Timelines] -- though not necessarily endorsing Nate's quantitative probabilities); * and is wary of trying to turn 'I don't know' into a solid, stable number for this kind of question (cf. When (Not) To Use Probabilities [https://www.lesswrong.com/posts/AJ9dX59QXokZb35fk/when-not-to-use-probabilities] ); * but recognizes that his behavior at any given time, insofar as it is coherent, must reflect some implicit probabilities. Quoting Eliezer back in 2016 [https://www.econlib.org/archives/2016/03/so_far_my_respo.html/#comment-158703] :
Discussion with Eliezer Yudkowsky on AGI interventions

we don't have enough time

Setting aside this proposal's, ah, logistical difficulties, I certainly don't think we should ignore interventions that target only the (say) 10% of the probability space in which superintelligence takes longest to appear.

Split and Commit

I'm curious what examples you or others who found the opening examples distracting would prefer. Something like those examples is standard for describing moral progress, at least in my experience, so I'm curious if you would frame moral progress differently or just use other examples.

Split and Commit

Or! This idea sounds superficially reasonable and even (per the appendix) gets praise from a few people, but is actually useless or harmful. Currently working out a hypothesis for how that could be the case...

1countingtoten2moAhem [https://www.lesswrong.com/posts/t2LGSDwT7zSnAGybG/split-and-commit?commentId=HvqGGns7HYE8kCrRT] . As with F=ma (I think) it's not so much wrong, or useless, as asking the wrong question on a different level.
6LVSN2moahaha
Study Guide

Thank you for writing this! I once thought about asking LW for something like this but never got around to it.

I'm an undergraduate; I expect to take several more late-undergraduate- to early-graduate-level math courses. Presumably some will turn out to be much more valuable to me than others, and presumably this is possible to predict better-than-randomly in advance. Do you [or anyone else] have thoughts on how to choose between math courses other than those you mention, either specific courses (and why they might be valuable) or general principles (any wh... (read more)

4johnswentworth2moI don't have recommendations for courses or principles to select them beyond what's in the post. (Otherwise I would have put them in the post.) I don't think you're going to find anybody with existing good answers. The embedded agency sequence [https://www.lesswrong.com/s/Rm6oQRJJmhGCcLvxh] is the best articulation of the problems which I currently know of. (Even there I disagree with the degree of emphasis placed on various subproblems/frames, but it is nonetheless very good.) If you want a useful tarting point to think about these things yourself: ask how to calculate the world-model and preferences of an e-coli directly from a low-level specification of the cell (i.e. all the reaction dynamics and concentrations and forces and whatnot).
5particularuniversalbeing2moYes, but not in a uniform way. The mathematical frontier is so large, and semesters so short, that Professor A's version of, for instance, a grad level "Dynamical Systems" course can have literally no overlap with Professor B's version. Useful advice here is going to have to come from Professors A and B (though not necessarily directly). Underdeveloped. There's some interesting work coming out of the programming language theory / applied category theory region these days (Neil Ghani and David Spivak come to mind), but "the math of agency" is not even an identifiable field yet, let alone one mature enough to show up in curricula.

Consequentialism might harm survival

In general, the correctness of [a principle] is one matter; the correctness of accepting it, quite another. I think you conflate the claims consequentialism is true and naive consequentialist decision procedures are optimal. Even if we have decisive epistemic reason to accept consequentialism (of some sort), we may have decisive moral or prudential reason to use non-consequentialist decision procedures. So I would at least narrow your claims to consequentialist decision procedures.

evolution as a force typically acts

... (read more)
1acylhalide3mo"Accept" implies choice. I'm making observations about chances of survival without assuming a notion of choice or free will. The outside perspective, where we are purely governed by physical laws. I have addressed both the claims though. From the outside perspective, I state: "consequentialism is true in an objective metaphysical sense" is almost certainly false, moral non-realism "naive consequentalist decision procedures have highest odds of survival" may be untrue, and some evidence is in the post -- The evolutionary point I'll give an example. Consider a trait (X) that makes individuals willing to mass-murder others if their own survival is threatened. These could be humans or plants or whatever. Consider any society - be it all X, all not-X or some X some not-X. If you (with god-like ability) wanted to insert one individual into this society with higher odds of survival, you would insert an individual with X and not not-X. X increases individual chances of survival and individual lifepsan, cause you're willing to kill others to save yourself. Now consider the three societies themselves. All not-X society has highest chances of surviving over large number of generations. X reduces collective chances of survival. Cause you never lose 10 people to one murderer, and other coordination problems can be solved etc etc. In the real world, over sufficiently many generations, its collective survival that is optimised for by nature. So you'll find more species full of not-X. Edited again :p
Rationalism for New EAs

Interesting.

My hot take is that you might want to be careful about how much of Less Wrong you throw at people right away.

I hadn't thought about it this way before and don't have a great model of how new people might respond to LW. Would the same apply to SSC, or is Scott Alexander less "weird in a way that repulses a decent number of people"? (I'll strongly consider putting more emphasis on Scout Mindset-ish stuff regardless, and would appreciate suggestions for more readings "that teach[] a generalizable core rationality skill.")

5G Gordon Worley III3moI think SSC is less off putting for a wider audience because it's less weird. It still repulses its fair share of folks, but in my estimation there's less false negatives there: the people repulsed are the people least likely to click with rationalist ideas, whereas I know folks who are fans of SSC who bounce off LW for cultural reasons that have nothing to do with the core ideas.
Your Time Might Be More Valuable Than You Think

I mostly agree. Two thoughts:

  • Rather than thinking in terms of wages, w(t), I think we should just think in terms of time-value or marginal utility, u(t). Clearly everything you say applies to all value-we-can-get-from-time, not just wages.
  • Some of your conclusions (e.g., "if you would be willing to trade an hour for $1000 in the future [i.e., and gain the hour], you should also be willing to do so now") only apply when the following is true: for the rest of the person's life, their value-from-time is a (nondecreasing) function of their time spent working
... (read more)
SIA > SSA, part 1: Learning from the fact that you exist

I agree with all of this (and I admire its clarity). In addition, I believe that the SIA-formulated questions are generally the important ones, for roughly the reason that the consequences of our choices are generally more like value is proportional to correct actions than value is proportional to fraction of actions correct (across all observers subjectively indistinguishable from me). (Our choices seem to be local in some sense; their effects are independent of the choices of our faraway subjectively-indistinguishable counterparts, and their effects seem... (read more)

The Simulation Hypothesis Undercuts the SIA/Great Filter Doomsday Argument

Ah, I agree. I misread that bit as about filters for us given that we are non-simulated, but really it's about filters for non-simulated civilizations, which under the simulation argument our existence doesn't tell us much about. Thanks.

The Simulation Hypothesis Undercuts the SIA/Great Filter Doomsday Argument

Ha, I wrote a comment like yours but slightly worse, then refreshed and your comment appeared. So now I'll just add one small note:

To the extent that (1) normatively, we care much more about the rest of the universe than our personal lives/futures, and (2) empirically, we believe that our choices are much more consequential if we are non-simulated than if we are simulated, we should in practice act as if there are greater odds that we are non-simulated than we have reason to believe for purely epistemic purposes. So in practice, I'm particularly interested... (read more)

3Lanrian4moRe your edit: That bit seems roughly correct to me. If we are in a simulation, SIA doesn't have strong views on late filters for unsimulated reality. (This is my question (B) above.) And since SIA thinks we're almost certainly in a simulation, it's not crazy to say that SIA doesn't have strong view on late filters for unsimulated reality. SIA is very ok with small late filters, as long as we live in a simulation, which SIA says we probably do. But yeah, it is a little bit confusing, in that we care more about late-filters-in-unsimulated reality if we live in unsimulated reality. And in the (unlikely) case that we do, then we should ask my question (C) above, in which case SIA do have strong views on late filters.
The LessWrong Team is now Lightcone Infrastructure, come work with us!

You want "to build a thriving in-person rationality and longtermism community in the Bay Area." That sounds great. How do you plan to do it, at any level of generality? 'Thriving community' can mean a lot of different things.

SIA > SSA, part 1: Learning from the fact that you exist

I strongly support SIA over SSA. I haven't read this sequence yet. But it looks like the sequence is about why the consequences of SIA are superior to those of SSA. This is a fine project. But a set of reasons for SIA over SSA just as strong as its more acceptable consequences, I think, is its great theoretical coherence.

SIA says: given your prior, multiply every possible universe by the number/volume of observers indistinguishable from you in that universe, then normalize. This is intuitive, it has a nice meaning,* and it doesn't have a discontinuity at z... (read more)

Great Power Conflict

Oh, interesting.

Speaking about states wanting things obscures a lot.

So I assume you would frame states as less agenty and frame the source of conflict as decentralized — arising from the complex interactions of many humans, which are less predictable than "what states want" but still predictably affected by factors like bilateral tension/hostility, general chaos, and various technologies in various ways?

Great Power Conflict

Thanks for your comment.

It's unclear how strongly the control about the individual actors are controlled by their respective governments.

Good point. If I understand right, this is an additional risk factor: there's a risk of violence that neither state wants due to imperfect internal coordination, and this risk generally increases with international tension, number of humans in a position to choose to act hostile or attack, general confusion, and perhaps the speed at which conflict occurs. Please let me know if you were thinking something else.

The co

... (read more)
2ChristianKl4moSpeaking about states wanting things obscures a lot. I expect that there's a good chance that Microsoft, Amazon, Facebook, Google, IBM, Cisco, Palantir and maybe a few other private entities are likely to have strong offensive capabilities. Then there are a bunch of different three letter agencies who are likely having offensive capabilities. The US government of course hacks Russian targets but sophisticated private actors won't simply attack Russia and demand ransom to be payed to them. There are plenty of people who currently do mainly do penetration testing for companies and who are very capable at actually attacking who might consider it worthwhile to attack Russian targets for money if that would be possible without legal repercussions. US government sponsored attacks aren't about causing damage in the way attacks targed at getting ransom are. It would get more serious private players involved in attacking who are outside of government control. Take someone like https://www.fortalicesolutions.com/services [https://www.fortalicesolutions.com/services] . Are those people currently going to attack Russian targets outside of retaliation? Likely not.
Great Power Conflict

Ha, I took intro IR last semester so I should have caught this. Fixed, thanks.

Zach Stein-Perlman's Shortform

Maybe AI Will Happen Outside US/China

I'm interested in the claim important AI development (in the next few decades) will largely occur outside any of the states that currently look likely to lead AI development. I don't think this is likely, but I haven't seen discussion of this claim.[1] This would matter because it would greatly affect the environment in which AI is developed and affect which agents are empowered by powerful AI.

Epistemic status: brainstorm. May be developed into a full post if I learn or think more.

 

I. Causes

The big tech companie... (read more)

Zach Stein-Perlman's Shortform

Related idea, off the cuff, rough. Not really important or interesting, but might lead to interesting insights. Mostly intended for my future selves, but comments are welcome.

Binaries Are Analytically Valuable

Suppose our probability distribution for alignment success is nearly binary. In particular, suppose that we have high credence that, by the time we can create an AI capable of triggering an intelligence explosion, we will have

  • really solved alignment (i.e., we can create an aligned AI capable of triggering an intelligence explosion at reasonable ex
... (read more)
What 2026 looks like
  • Consequences (in expectation) if widely accepted: very good.
  • Compressibility: poor (at least, good compressions are not obvious).
  • Probability of (a compressed version) becoming widely accepted or Respectable Opinion: moderately low due to weirdness. Less weird explanations of why AI might not do what we want would be more Respectable and acceptable.
  • Leverage (i.e., increase in that probability from increased marginal effort to promote that meme): uncertain.
2Daniel Kokotajlo5moI disagree about compressibility; Elon said "AI is summoning the demon" and that's a five-word phrase that seems to have been somewhat memorable and memetically fit. I think if we had a good longer piece of content that expressed the idea that lots of people could read/watch/play then that would probably be enough.
What 2026 looks like

I tentatively agree but "people realizing it's more dangerous than nukes" has potential negative consequences too — an arms race is the default outcome of such national security threats/opportunities. I've recently been trying to think about different memes about AI and their possible effects... it's possible that memes like "powerful AI is fragile" could get the same regulation and safety work with less arms racing.

2Daniel Kokotajlo5moHow about "AI is like summoning a series of increasingly powerful demons/aliens and trying to make them do what we want by giving them various punishments and rewards?"
What 2026 looks like

Thanks for writing this. Stories like this help me understand possibilities for the future (and understand how others think).

The US and many other Western governments are gears-locked, because the politicians are products of this memetic environment. People say it's a miracle that the US isn’t in a civil war already.

So far in your vignette, AI is sufficiently important and has sufficient public attention that any functional government would be (1) regulating it, or at least exerting pressure on the shape of AI through the possibility of regulation, and... (read more)

5Daniel Kokotajlo5moExcellent questions & pushback, thanks! Hmm, let me think... I think that if we had anything close to an adequate government, AI research would be heavily regulated already. So I'm not sure what you mean by "functional government." I guess you are saying that probably by 2026 if things go according to my story, the US government would be provoked by all the AI advancements to do more regulation of AI, a lot more, that would change the picture in some way and thus be worthy of mention? I guess my expectation is: (a) The government will have "woken up" to AI by 2026 more than it has by 2021. It will be attempting to pass more regulations as a result. (b) However, from the perspective of the government and mainstream USA, things in 2026 aren't that different from how they were in 2021. There's more effective censorship/propaganda and now there's all these nifty chatbot things, and there's more hype about the impending automation of loads of jobs but whatever hype cycles come and go and large numbers of jobs aren't actually being automated away yet. (c) Everything will be hyperpartisan and polarized, so that in order to pass any regulation about AI the government will need to have a big political fight between Right and Left and whoever gets more votes wins. (d) What regulations the government does pass will probably be ineffective or directed at the wrong goals. For example, when one party finally gets enough votes to pass stuff, they'll focus on whatever issues were most memetically fit in the latest news cycle rather than on the issues that actually matter long-term. On the issues that actually matter, meanwhile, they'll be listening to the wrong experts. (those with political clout, fame, and the right credentials and demographics, rather than e.g. those with lots of alignment forum karma) Thus, I expect that no regulations of note relating to AI safety or alignment will have been passed. For censorship and stuff, I expect that the left will be trying to undo
The Governance Problem and the "Pretty Good" X-Risk

Yes, I definitely consider (successful, philosophically sound) CEV to be a great use of superintelligence. An earlier draft mentioned CEV explicitly, but I decided to just mention the broader category "indirect normativity," which should include any sound method for specifying values indirectly.

Zach Stein-Perlman's Shortform

Value Is Binary

Epistemic status: rough ethical and empirical heuristic.

Assuming that value is roughly linear in resources available after we reach technological maturity,[1] my probability distribution of value is so bimodal that it is nearly binary. In particular, I assign substantial probability to near-optimal futures (at least 99% of the value of the optimal future), substantial probability to near-zero-value futures (between -1% and 1% of the value of the optimal future), and little probability to anything else.[2] To the extent that almost all of th... (read more)

1WilliamKiely2moAfter reading the first paragraph of your above comment only, I want to note that: I assign much lower probability to near-optimal futures than near-zero-value futures. This is mainly because I imagine a lot of the "extremely good" possible worlds I imagine when reading Bostrom's Letter from Utopia [https://www.nickbostrom.com/utopia.html] are <1% of what is optimal. I also think the amount of probability I assign to 1%-99% futures is (~10x?) larger than the amount I assign to >99% futures. (I'd like to read the rest of your comment later (but not right now due to time constraints) to see if it changes my view.)
1Zach Stein-Perlman5moRelated idea, off the cuff, rough. Not really important or interesting, but might lead to interesting insights. Mostly intended for my future selves, but comments are welcome. BINARIES ARE ANALYTICALLY VALUABLE Suppose our probability distribution for alignment success is nearly binary. In particular, suppose that we have high credence that, by the time we can create an AI capable of triggering an intelligence explosion, we will have * really solved alignment (i.e., we can create an aligned AI capable of triggering an intelligence explosion at reasonable extra cost and delay) or * really not solved alignment (i.e., we cannot create a similarly powerful aligned AI, or doing so would require very unreasonable extra cost and delay) (Whether this is actually true is irrelevant to my point.) Why would this matter? Stating the risk from an unaligned intelligence explosion is kind of awkward: it's that the alignment tax [https://www.lesswrong.com/posts/HEZgGBZTpT4Bov7nH/mapping-the-conceptual-territory-in-ai-existential-safety#Alignment_tax_and_alignable_algorithms] is greater than what the leading AI project is able/willing to pay. Equivalently, our goal is for the alignment tax to be less than what the leading AI project is able/willing to pay. This gives rise to two nice, clean desiderata: * Decrease the alignment tax * Increase what the leading AI project is able/willing to pay for alignment But unfortunately, we can't similarly split the goal (or risk) into two goals (or risks). For example, a breakdown into the following two goals does not capture the risk from an unaligned intelligence explosion: * Make the alignment tax less than 6 months and a trillion dollars * Make the leading AI project able/willing to spend 6 months and a trillion dollars on aligning an AI It would suffice to achieve both of these goals, but doing so is not necessary. If we fail to reduce the alignment tax this far, we can compensate by doing better on the willingness-