Comments

You've probably thought more about this scenario than I have, so I'd be interested in hearing more about how you think it will play out. (Do you have links to where you've discussed it previously?) I was speaking mostly in relative terms, as slowing down rival AGI efforts in the ways I described seems more promising/realistic/safer than any other "pivotal acts" I had previously heard or thought of.

Coincidentally, I've been looking into why developed countries (especially East Asian ones) have such low and declining fertility. My conclusion so far is that people really really value positional goods, i.e., things that signal social status, such as prestigious degrees and jobs, homes in the best locations in the biggest cities, luxury goods like cars and handbags, children and mates that one can "show off", and to such an extent that it often overrides their desires for family, companionship, comfort, leisure, even respect for tradition and filial piety. Many choose to have one high status (i.e., "successful") child instead of two or more lower status children, or choose to remain single instead of marrying someone perceived to lower their social status, or choose to have no children instead of harming their career prospects, or force their kids into after-school lessons instead of giving them happy childhoods.

Aside from making it hard to raise fertility rates (South Korea's just declined from 0.78 to 0.72 in one year, despite government policies aimed at increasing fertility), the existence of positional goods is also a counterweight to non-rival goods, implying that higher population can make me (and everyone else) worse off as it increases competition for positional goods.

ETA: Found a Twitter thread giving a bunch of other (perhaps more tractable) reasons for low fertility.

If humans are mostly a kludge of impulses, including the humans you are training, then... what exactly are you hoping to empower using "rationality training"? I mean, what wants-or-whatever will they act on after your training? What about your "rationality training" will lead them to take actions as though they want things? What will the results be?

To give a straight answer to this, if I was doing rationality training (if I was agenty enough to do something like that), I'd have the goal that the trainees finish the training with the realization that they don't know what they want or don't currently want anything, but they may eventually figure out what they want or want something, and therefore in the interim they should accumulate resources/optionality, avoid doing harm (things that eventually might be considered irreversibly harmful), and push towards eventually figuring out what they want. And I'd probably also teach a bunch of things to mitigate the risk that the trainees too easily convince themselves that they've figured out what they want.

There's also a good scenario where the US develops an AGI that is capable of slowing down rival AGI development, but not so capable and misaligned that it causes serious problems, and that gives people enough time to solve alignment enough to bootstrap to AI solving alignment.

I'm feeling somewhat optimistic about this, because the workload involved in slowing down a rival AGI development doesn't seem so high that it couldn't be monitored/understood fully or mostly by humans, and the capabilities required also doesn't seem so high that any AI that could do it would be inherently very dangerous or hard to control.

I'm pretty sure (though not 100%) that "science doesn't know for sure" that "benevolent government" is literally mathematically impossible. So I want to work on that! <3

Public choice theory probably comes closest to showing this. Please look into that if you haven't already. And I'm interested to know what approach you want to work on.

Personally, I think the entire concept of government should be rederived from first principles from scratch and rebooted, as a sort of "backup fallback government" for the entire planet, with AI and blockshit

I think unfortunately this is very unlikely in the foreseeable future (absent superintelligent AI). Humans and their relationships are just too messy to fully model with our current theoretical tools, whereas existing institutions have often evolved to take more of human nature into account (e.g., academia leveraging people's status striving to produce knowledge for the world, militaries leveraging solidarity with fellow soldiers to overcome selfishness/cowardice).

As an investor I'm keenly aware that we're not even close to deriving the governance of a publicly held corporation from first principles. Once somebody solves that problem, I'd become much more excited about doing the same thing for government.

The US could try to slow down the Chinese AGI effort, for example:

  1. brick a bunch of their GPUs (hack their data centers and update firmwares to put GPUs into unusable/unfixable states)
  2. introduce backdoors or subtle errors into various deep learning frameworks
  3. hack their AGI development effort directly (in hard to detect ways like introducing very subtle errors into the training process)
  4. spread wrong ideas about how to develop AGI

If you had an AGI that you could trust to do tasks like these, maybe you could delay a rival AGI effort indefinitely?

What are the best "this is not a big deal" takes? (I want to see them before making up my mind on how big of a deal this is.)

Can you say something about your motivation to work on this? Why not leave it to future AI and/or humanity to figure out? Or what are the most important questions in this area to answer now?

It was not at all clear to me that you intended "they are likely to encounter" to have some sort of time horizon attached to it (as opposed to some other kind of restriction, or that you meant something pretty different from the literal meaning, or that your argument/idea itself was wrong), and it's still not clear to me what sort of time horizon you have in mind.

So you don’t need to “target the inner search,” you just need to get the system to act the way you want in all the relevant scenarios.

Your original phrase was "all scenarios they are likely to encounter", but now you've switched to "relevant scenarios". Do you not acknowledge that these two phrases are semantically very different (or likely to be interpreted very differently by many readers), since the modern world is arguably a scenario that "they are likely to encounter" (given that they actually did encounter it) but you say "the modern world is not a relevant scenario for evolution"?

Going forward, do you prefer to talk about "all scenarios they are likely to encounter", or "relevant scenarios", or both? If the latter, please clarify what you mean by "relevant"? (And please answer with respect to both evolution and AI alignment, in case the answer is different in the two cases. I'll probably have more substantive things to say once we've cleared up the linguistic issues.)

Load More