Wiki Contributions


Thanks for the confirmation!

In addition to what you say, I would also guess that  is a reasonable guess for P(no events in time t) when t > T, if it's reasonable to assume that events are Poisson-distributed. (but again, open to pushback here :)

Great post, thanks for sharing! 

I don't have good intuitions about the Gamma distribution, and I'd like to have good intuitions for computing your Rule's outcomes in my head. Here's a way of thinking about it -- do you think it makes sense?

Let  denote either  or  (whichever your rule says is appropriate).

I notice that for , your probability of zero events , where  is what I'd call the estimated event rate 

So one nice intuitive interpretation of your rule is that, if we assume event times are exponentially distributed, we should model the rate as  . Does that sound right? It's been a while since I've done a ton of math, so I wouldn't be surprised if I'm missing something here. 

In general, this post has prompted me to think more about the transition period between AI that's weaker than humans and stronger than all of human civilization, and that's been interesting! A lot of people assume that that takeoff will happen very quickly, but if it lasts for multiple years (or even decades) then the dynamics of that transition period could matter a lot, and trade is one aspect of that.

some stray thoughts on what that transition period could look like:

  • Some doomy-feeling states don't immediately kill us. We might get an AI that's able to defeat humanity before it's able to cheaply replicate lots of human labor, because it gets a decisive strategic advantage via specialized skill in some random domain and can't easily skill itself up in other domains.
  • When would an AI prefer to trade rather than coerce or steal?
    • maybe if the transition period is slow, and it knows it's in the earlier part of the period, so reputation matters
    • maybe if it's being cleverly watched or trained by the org building it, since they want to avoid bad press 
    • maybe there's some core of values you can imprint that leads to this? but maybe actually being able to solve this issue is basically equivalent to solving alignment, in which case you might as well do that.
  • In a transition period, powerful human orgs would find various ways to interface with AI and vice versa, since they would be super useful tools / partners for each other. Even if the transition period is short, it might be long enough to change things, e.g. by getting the world's most powerful actors interested in building + using AI and not leaving it in the hands of a few AGI labs, by favoring labs that build especially good interfaces & especially valuable services, etc. (While in a world with a short take off rather than a long transition period, maybe big tech & governments don't recognize what's happening before ASI / doom.) 

I love the genre of "Katja takes an AI risk analogy way more seriously than other people and makes long lists of ways the analogous thing could work." (the previous post in the genre being the classic "Beyond fire alarms: freeing the groupstuck.")

Digging into the implications of this post: 

In sum, for AI systems to be to humans as we are to ants, would be for us to be able to do many tasks better than AI, and for the AI systems to be willing to pay us grandly for them, but for them to be unable to tell us this, or even to warn us to get out of the way. Is this what AI will be like? No. AI will be able to communicate with us, though at some point we will be less useful to AI systems than ants could be to us if they could communicate.

I'm curious how much you think the arguments in this post should affect our expectations of AI-human relations overall? At its core, my concern is:

  • sure, the AI will definitely trade with useful human organizations / institutions when it's weak (~human-level), 
  • and it might trade with them a decent amount when it's strong but not world-defeating (~human-organization-level)
  • eventually AI will be human-civilization-level, and probably soon after that it's building Dyson spheres and stuff. Why trade with humanity then? Do we have a comparative advantage, or are we just a waste of atoms?

I can think of a few reasons that human-AI trade might matter for the end-state:

  1. We can bargain for the future while AIs are relatively weak. i.e., when humans have stuff AI wants, they can trade the stuff for an assurance that when the AI is strong, it'll give us .000001% of the universe. 
    1. This requires both leverage (to increase the share we get) and verification / trust (so the AI keeps its promise). If we have a lot of verification ability, though, we could also just try to build safe AIs? 
    2. (related: this Nate post saying we can't assume unaligned AIs will cooperate / trade with us, unless we can model them well enough to distinguish a true commitment from a lie. See "Objection: But what if we have something to bargain with?") 
  2. It seems possible that an AI built by humans and trained on human-ish content ends up with some sentimental desire for "authentic" human goods & services. 
    1. In order for this to end up good for the humans, we'd want the AI to value this pretty highly (so we get more stuff), and have a concept of "authenticity" that means that it doesn't torture / lobotomize us to get what it wants.  
    2. This is mostly by analogy to humans buying "authentic" products of other, poorer humans, but there's a spectrum between goods/services, and something more like "the pleasure of seeing something exist" a la zoo animals. 
      1. (goofy attempt at illustrating the spectrum: a woven shawl vs a performed monologue vs reality tv vs a zoo.)
      2. So a simpler, perhaps more likely, version of the 'desirable authentic labor' possibility is a 'human zoo', where the AI just likes having "authentic" humans around. which is not very tradelike. But maybe the best bad AI case we could hope for is something like this -- Earth left as a 'human zoo' while the AI takes over the rest of the lightcone.

Maybe one useful thought experiment is whether we could train a dog-level intelligence to do most of these tasks if it had the actuators of an ant colony, given our good understanding of dog training (~= "communication") and the fact that dogs still lack a bunch of key cognitive abilities humans have (so dog-human relations are somewhat analogous to human-AI relations). 

(Also, ant colonies in aggregate do pretty complex things, so maybe they're not that far off from dogs? But I'm mostly just thinking of Douglas Hofstadter's "Aunt Hillary" here :)

My guess is that for a lot of Katja's proposed trades, you'd only need the ants to have a moderate level of understanding, something like "dog level" or "pretty dumb AI system level". (e.g. "do thing X in situations where you get inputs Y that were associated with thing-we-actually-care-about Z during the training session we gave you".) 

The 'failure to communicate' is therefore in fact a failure to be able to think and act at the required level of flexibility and abstraction, and that seems more likely to carry over to our relations with some theoretical, super advanced AI or civilisation.

Definitely true that you're a more valuable trade partner if you're smarter. But there are some particularly useful intelligence/comms thresholds that we meet and ants don't -- e.g. the "dog level", plus some self-awareness stuff, plus not-awful world models in some domains.

Meta: the dog analogy ignores the distinction between training and trading. I'm eliding this here bc it's hard to know what an ant colony's "considered opinion" / "reflective endorsement" would mean, let alone an ant's. but ofc this matters a lot for AGI-human interactions. Consider an AGi that keeps humans around on a "human preserve" out of sentiment, but only cares about certain features of humanity and genetically modifies others out of existence (analogous to training out certain behaviors or engaging in selective breeding), or tortures / brainwashes humans to get them to act the way it wants. (These failure modes of "having things an AI wants, and being able to give it those things, but not defend yourself" are also alluded to in other comments here, e.g. gwern and Elisabeth's comments about "the noble wolf" and torture, respectively.)    

Yeah. It's conceivable you have an AI with some sentimental attachment to humans that leaves part of the universe as a "nature preserve" for humans. (Less analogous to our relationship with ants and more to charismatic flora and megafauna.)

In light of the FTX thing, maybe a particularly important heuristic is to notice cases where the worst-case is not lower-bounded at zero. Examples:

  • Shorting stock vs buying put options
  • Running an ambitious startup that fails is usually just zero, but what if it's committed funding & tied its reputation to lots of important things that will now struggle? 
  • More twistily -- what if you're committing to a course of action s.t. you'll likely feel immense pressure to take negative-EV actions later on, like committing fraud in order to save your company or pushing for more AI progress so you can stay in the lead?

Not that you should definitely not do things that potentially have large-negative downsides, but you can be a lot more willing to experiment when the downside is capped at zero.

Thanks for your posts, Scott! This has been super interesting to follow.

Figuring out where to set the AM-GM boundary strikes me as maybe the key consideration wrt whether I should use GM -- otherwise I don't know how to use it in practical situations, plus it just makes GM feel inelegant. 

From your VNM-rationality post, it seems like one way to think about the boundary is commensurability. You use AM within clusters whose members are willing to sacrifice for each other (are willing to make Kaldor-Hicks improvements, and have some common currency s.t. "K-H improvement" is well-defined; or, in another framing, have a meaningfully shared utility function) . Maybe that's roughly the right notion to start with? But then it feels strange to me to not consider things commensurate across epistemic viewpoints, especially if those views are contained in a single person (though GM-ing across internal drives does seem plausible to me). 

I'd love to see you (or someone else) explore this idea more, and share hot takes about how to pin down the questions you allude to in the AM-GM boundary section of this post: where to set this boundary, examples of where you personally would set it in different cases, and what desiderata we should have for boundary-setting eventually. (It feels plausible to me that having maximally large clusters is in some important sense the right thing to aim for).

Load More