Yeah, I do feel confused about the extent to which the solution to this problem is just "selectively become dumber" (e.g. as discussed by Habryka here). However, I have faith that there are a bunch of Pareto improvements to be made—for example, I think that less neuroticism helps you get less pwned without making you dumber in general. (Though as a counterpoint, maybe neuroticism was useful for helping people identify AI risk?) I'd like to figure out theories of virtue and emotional health good enough to allow us to robustly identify other such Pareto improvements.
A related thought that I had recently: fertility decline seem like a rough proxy for "how pwned are you getting by memes", and fertility is strongly anticorrelated with population-level intelligence. So you have east asians getting hit hardest by the fertility crisis, then white populations, then south asians, while african fertility is still very high. Obviously this is confounded by metrics like development and urbanization, though, so it's hard to say if intelligence mediates the decline directly or primarily via creating wealth—but it does seem like e.g. east asians are getting hit disproportionately hard. (Plausibly there's some way to figure this out more robustly by looking at subpopulations.)
Yepp, this is true. However, I believe that there are other strategies for avoiding such memes other than "being smart". Two of these strategies broadly correspond to what we call "being virtuous" and "being emotionally healthy". See my exchange with Wei Dai here, and this sequence, for more.
But we run into problems when there are moral patients inside of, or under the care and responsibility of moral agents (that might or might not be moral patients in of themselves), because attending to the wellbeing of the inner moral patient entails violating the boundary of the larger moral agent or otherwise distorting just treatment of that agent.
Yepp, this is a great way of putting it.
I'm totally unwilling to write those people off because they happen to have been born into an unlucky situation. But it does seem like there's some philosophy to figure out here about how to help those people without creating bad incentives for the moral agents that they're contained within.
Yeah, agree that we shouldn't write them off, and that there's some way to balance these two things. (One way I think about politics is that one faction has refused to consider "without creating bad incentives" and in response the other faction is now polarizing towards refusing to consider "help those people". And we've now reached the point where these refusals commonly serve as vice signals on each side.)
Relatedly, my phrasing "the point of ethics" in my earlier message was too strong. I should have instead said something like "Although ethics has facets related to dealing with moral patients and other facets related to dealing with moral agents, the latter should generally have primacy, because (mis)aligning other moral agents is a big force multiplier (positively or negatively)."
“eg Most of a human's personal welfare depends on the country of their birth which is not due to their own behavior.”
But it was dependent on their ancestors’ behavior. And so insofar as you view tribes/ethnic groups/countries as playing games with/against each other, then the same logic applies at that higher level.
Now you might reject that viewpoint, and take a purely individualist stance. But my claim above is (loosely, I haven’t made it precise yet) that the point of ethics is to move us beyond thinking of ourselves as individual units, so that we can make decisions as a larger-scale moral agent.
And from the perspective of the larger moral agent you’re instantiated within, there’s a big difference between people who were born in your home country and people born on other continents—because the former are part of that same moral agent to a much greater degree than the latter. (And yes, they may be worse at “being part of the moral superagent” than a foreigner would be. But this is all negotiated via Schelling points in games between millions of agents, so you can’t just pick and choose your coalition. You need an initiation ritual like a naturalization process or a judicial trial to change that coalition.) Analogously, some of your time-slices are less good at “being Eli” (according to your dominant identity narrative) than time-slices of some other people. But it’s still just for them to benefit or lose out based on the actions of your other time-slices.
(Written quickly on phone, please forgive infelicities of phrasing.)
mostly it does not match my practical experience so far
I mostly wouldn't expect it to at this point, FWIW. The people engaged right now are by and large people sincerely grappling with the idea, and particularly people who are already bought into takeover risk. Whereas one of the main mechanisms by which I expect misuse of the idea is that people who are uncomfortable with the concept of "AI takeover" can still classify themselves as part of the AI safety coalition when it suits them.
As an illustration of this happening to Paul's worldview, see this Vox article titled "AI disaster won't look like the Terminator. It'll be creepier." My sense is that both Paul and Vox wanted to distance themselves from Eliezer's scenarios, and so Paul phrased his scenario in a way which downplayed stuff like "robot armies" and then Vox misinterpreted Paul to further downplay that stuff. (More on this from Carl here.) Another example: Sam Altman has previously justified racing to AGI by appealing to the idea that a slow takeoff is better than a fast takeoff.
Now, some of these dynamics are unavoidable—we shouldn't stop debating takeoffs just because people might misuse the concepts. But it's worth keeping an eye out for ideas that are particularly prone to this, and gradual disempowerment seems like one.
in practical terms, gradual disempowerment does not seem particularly convenient set of ideas for justifying that working in an AGI company on something very prosaic which helps the company is the best thing to do.
Well, it's much more convenient than "AI takeover", and so the question is how much people are motivated to use it to displace the AI takeover meme in their internal narratives.
when trying to support thinking about the problems, we use understanding-directed labels/pointers (Post-AGI Civilizational Equilibria), even though in many ways it could have been easier to use GD as a brand.
Kudos for doing so. I don't mean to imply that you guys are unaware of this issue or negligent; IMO it's a pretty hard problem to avoid. I agree that stuff like "understanding power" is nowhere near adequate as a replacement. However, I do think that there's some concept like "empowering humans" which is a way to address both takeover risk and gradual disempowerment risk, if we fleshed it out into a proper research field. (Analogously, ambitious mechinterp is a way to address both fast take-off and slow take-off risks.) And so I expect that a cluster forming around something like human empowerment would be more productive and less prone to capture.
avoiding using the term could be done mostly by either inventing another term for the the dynamic, or not thinking about the dynamic, or similar moves, which seem epistemically unhealthy
Yeah, "avoid using it altogether" would be too strong. Maybe something more like "I'll avoid using it as a headline/pointer to a cluster of people/ideas, and only use it to describe the specific threat model".
Many of Paul Christiano's writings were valuable corrections to the dominant Yudkowskian paradigm of AI safety. However, I think that many of them (especially papers like concrete problems in AI safety and posts like these two) also ended up providing a lot of intellectual cover for people to do "AI safety" work (especially within AGI companies) that isn't even trying to be scalable to much more powerful systems.
I want to register a prediction that "gradual disempowerment" will end up being (mis)used in a similar way. I don't really know what to do about this, but I intend to avoid using the term myself. My own research on related topics I cluster under headings like "understanding intelligence", "understanding political philosophy", and "understanding power". To me this kind of understanding-oriented approach seems more productive than trying to create a movement based around a class of threat models.
The real world is not just though
Fair point; I've just weakened my phrasing in the section you quoted.
However, I do think the world is much closer to just in some important ways than most cultural elites think. E.g. for questions like "whose fault is it that poor countries are poor?" or "whose fault is it that poor people in rich countries are poor?", the answer "it's mostly their own fault" is somewhat taboo in elite circles.
To be clear, considerations of justice and blame on a collective level rather than an individual level are pretty complicated. But I think we do have to grapple with them in order to reason about ethics in any sensible way.
Literally just as I was finishing writing up this post, I heard a commotion outside my house (in San Francisco). A homeless-looking man was yelling and throwing an electric guitar down the road. Apparently this had been going on for 5-10 minutes already. I sat in my window and watched for a few minutes; during that time, he stopped a car by standing in front of it and yelling. He also threw his guitar in the vicinity of several passers-by, including some old people and a mother cycling past with her kid.
There was a small gathering (of 5-10 people) at my house at this time. They were mostly ignoring it. I felt like this was wrong, and was slowly gathering up willpower to intervene. In hindsight I moved slowly because I was worried that a) he'd hit me with his guitar if I did, or b) he'd see which house I came out from and try to smash my windows or similar. But I wasn't very worried, because I knew I could bring a few friends out with me.
Before I ended up doing anything, though, a man stopped his car and started yelling at the homeless guy quite aggressively, things like "Get the fuck out of here!" I immediately went outside to offer support in case the homeless guy got aggressive, but he didn't need it; the homeless guy was already grabbing his stuff. He was somewhat apologetic but still kinda defensive (saying things like "it's not my fault, man, it's society"). At one point he turned to my friend and asked "were you bothered?" and my friend said "it was a bit loud".
As he left, he picked up his guitar again. The man who'd stopped turned around and yelled "Leave that guitar!" The homeless guy threw it again, the man ran over to pick it up, and then the homeless guy left. A few minutes later, two police cars pulled up—apparently someone else had called them.
Overall it was an excellent illustration of why virtue ethics is important. We should have confronted him as soon as we'd noticed him causing a ruckus, both so that (much more defenseless) passers-by didn't need to worry, and to preemptively prevent any escalation from him. But small niggles about him escalating meant that our fear ended up winning out, and made San Francisco a slightly less safe place. Even on the small things—like responding "it was a bit loud" instead of "you were being an asshole, quit scaring people"—it's very easy to instinctively flinch away from taking appropriate action. To avoid that, cultivating courage and honesty seems crucial.
A toy model of ethics which I've found helpful lately:
Consider society as a group of reinforcement learners, each getting rewards from interacting with the environment and each other.* We can then define two moral motivations:
Importantly, if you have one faction who's primarily optimizing for altruism, and another that's primarily optimizing for justice, by default they'll undermine each other's goals:
One way of thinking about the last few decades (and possibly centuries) is that ethical thinking has become dominated by altruism, to the point where being ethical and being altruistic are near-synonymous to many people (especially utilitarians). At an extreme, it leads to reasoning like in the comic below:
Of course, positively reinforcing misbehavior will tend to produce more misbehavior (both by teaching those who misbehave to do it again, and by making well-behaved people feel like chumps). And so more thoughtful utilitarians will defend justice as an instrumental moral good, albeit not as a terminal moral good. Unfortunately, it seems very hard to actually hold this position without in practice deprioritizing justice (e.g. it's rare to see effective altruists reasoning themselves into trying to make society more just).
I think this difficulty is related to why consequentialism is wrong. This is a tricky topic to write about, but one core intuition is that before figuring out how to act, you need to figure out who is acting. For example, before trying to plan for the future, you need to have a sense of personal identity whereby your future self will feel a sense of continuity with and loyalty to your plans.
We can analogously view justice (and other moral intuitions which I'm ignoring in this simplified analysis) are mechanisms for holding society together as a moral agent which is able to act coherently at all. And so people who think that individuals should choose actions on the basis of their consequences are putting the locus of agency in the wrong place—it's like saying that each ten-second timeslice of you should choose actions based on their consequences. Instead, something closer to virtue ethics is a far better approach.
Is this still consistent with some version of consequentialism? In some sense yes, in another sense no. Mostly I expect that the viewpoint I've outlined above will, when explored carefully enough, dissolve the standard debate between different branches of ethics. This is conceptually tricky to work through, though, and I'll save further discussion for another post.
* The main reason I call this a toy model is that viewing people as reward-maximizers is itself assuming a kind-of-consequentialist viewpoint. I think we actually want a much richer conception of what it means to help and hurt people, but "increase or decrease reward" is so much easier to describe that I decided to use it here.
** Justice isn't quite the right term here, because it implies being reward/punished for a specific action rather than being rewarded/punished for being generally good/bad; the same with "accountability". "Fairness" might be better except that it's been coopted by egalitarian notions of fairness. Other suggestions welcome—maybe something related to karma?
(I expect that Scott, Abram or some others have already pointed this out, but somehow this clicked for me only recently. Pointers to existing discussions appreciated.)
A Bayesian update can be seen as a special case of a prediction market resolution.
Specifically, a Bayesian update is the case where each "hypothesis" has bet all their wealth across some combination of outcomes, and then the pot is winner-takes-all (or split proportionally when there are multiple winners).
The problem with Bayesianism is then obvious: what happens when there are no winners? Your epistemology is "bankrupt", the money vanishes into the ether, and bets on future propositions are undefined.
So why would a hypothesis go all-in like that? Well, that's actually the correct "cooperative" strategy in a setting where you're certain that at least one of them is exactly correct.
To generalize Bayesianism, we want to instead talk about what the right "cooperative" strategy is when a) you don't think any of them are exactly correct, and b) when each hypothesis has goals too, not just beliefs.