Daniel Kokotajlo

Philosophy PhD student, worked at AI Impacts, now works at Center on Long-Term Risk. Research interests include acausal trade, timelines, takeoff speeds & scenarios, decision theory, history, and a bunch of other stuff. I subscribe to Crocker's Rules and am especially interested to hear unsolicited constructive criticism. http://sl4.org/crocker.html

Sequences

Timelines Grab Bag
Takeoff and Takeover in the Past and Future

Comments

Prediction can be Outer Aligned at Optimum

Well, at this point I feel foolish for arguing about semantics. I appreciate your post, and don't have a problem with saying that the malignity problem is an inner alignment problem. (That is zero evidence that it isn't also an outer alignment problem though!)

Evan's footnote-definition doesn't rule out malign priors unless we assume that the real world isn't a simulation. We may have good pragmatic reasons to act as if it isn't, but I still think you are changing the definition of outer alignment if you think it assumes we aren't in a simulation. But *shrug* if that's what people want to do, then that's fine I guess, and I'll change my usage to conform with the majority.

Eight claims about multi-agent AGI safety

Right, so... we need to make sure selection in AIs also has that property? Or is the thought that even if AIs evolve to be honest, it'll only be with other AIs and not with humans?

As an aside, I'm interested to see more explanations for altruism lined up side by side and compared. I just finished reading a book that gave a memetic/cultural explanation rather than a genetic one.

Strategic implications of AIs' ability to coordinate at low cost, for example by merging

This post is excellent, in that it has a very high importance-to-word-count ratio. It'll take up only a page or so, but convey a very useful and relevant idea, and moreover ask an important question that will hopefully stimulate further thought.

Prediction can be Outer Aligned at Optimum

Thanks, this is helpful.

--You might be right that an AI which assumes it isn't in a simulation is OK--but I think it's too early to conclude that yet. We should think more about acausal trade before concluding it's something we can safely ignore, even temporarily. There's a good general heuristic of "Don't make your AI assume things which you think might not be true" and I don't think we have enough reason to violate it yet.

--You say

For every AI-specification built with the abstraction "Given some finite training data D, the AI predicts the next data point X according to how common it is that X follows D across the multiverse", I think that AI is going to be misaligned (unless it's trained with data that we can't get our hands on, e.g. infinite in-distribution data), because of the standard universal-prior-is-misaligned-reasons. I think this holds true even if we're trying to predict humans like in IDA. Thus, this definition of "optimal performance" doesn't seem useful at all.

Isn't that exactly the point of the universal prior is misaligned argument? The whole point of the argument is that this abstraction/specification (and related ones) is dangerous. So... I guess your title made it sound like you were teaching us something new about prediction (as in, prediction can be outer aligned at optimum) when really you are just arguing that we should change the definition of outer-aligned-at-optimum, and your argument is that the current definition makes outer alignment too hard to achieve? If this is a fair summary of what you are doing, then I retract my objections I guess, and reflect more.

An Exploratory Toy AI Takeoff Model
By working and investing a part of the money to buy more hardware (e.g. by a cloud provider). This should grow roughly exponentially, at a similar speed to the Gross World Product (although the model does not consider wall-clock time)

Why would it be that slow? Companies, hedge funds, individual's savings accounts, etc. often scale up much faster than 3%/year or so.

Prediction can be Outer Aligned at Optimum
Due to the ordinary arguments about the universal prior being malign, this wouldn’t be outer aligned at optimum. Since this definition would mean that almost nothing is outer aligned, it seems like a bad definition. ...
As far as practical consequences go, I think this should be treated the same as the more general problem of the universal prior being malign. Thus, I’d like to categorise it as a problem with inner alignment; and I’d like to assume that an AI that’s outer aligned at optimum would act like it’s not in a simulation, if it is in fact not in a simulation.
This happens by default if our chosen definition of optimal performance treats being-in-a-simulation as a fixed fact about its environment – that the AI is expected to know – and not as a source of uncertainty. I think my preferred solutions above capture this by default[3]. For any solution based on how humans generalise, though, it would be important that the humans condition on not being in a simulation.

This is unsatisfying to me. First you say that we can't define optimum in the obvious way because then very few things would be outer aligned, then you say we should define optimum in such a way that the only way to be outer aligned is to assume you aren't in a simulation. (How else would we get an AI that act's like it's not in a simulation, if it is in fact not in a simulation? You can't tell whether you are in a simulation or not, by definition, so the only way for such an AI to exist is for it to always act like it's not in a simulation, i.e. to assume.) An AI that assumes it isn't in a simulation seems like a defective AI to me, so it's weird to build that in to the definition of outer alignment.

It's possible I'm misunderstanding you though!

A vastly faster vaccine rollout

What do you think about this: https://marginalrevolution.com/marginalrevolution/2021/01/fact-of-the-day-get-to-those-rooftops.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+marginalrevolution%2Ffeed+%28Marginal+Revolution%29

It claims that we could have had billions of doses several months ago, if we had been willing to pay the vaccine producers a few billion dollars more early on.

The "Commitment Races" problem

Thanks! Reading this comment makes me very happy, because it seems like you are now in a similar headspace to me back in the day. Writing this post was my response to being in this headspace.

But... I dunno man. I figured the first rule of Acausal Trade was "build a galaxy brain and think really goddamn carefully about acausal trade and philosophical competence" before you actually try simulating anything, and I'm skeptical a galaxy brain can't figure out the right precommitments.

This sounds like a plausibly good rule to me. But that doesn't mean that every AI we build will automatically follow it. Moreover, thinking about acausal trade is in some sense engaging in acausal trade. As I put it:

Since real agents can't be logically omniscient, one needs to decide how much time to spend thinking about things like game theory and what the outputs of various programs are before making commitments. When we add acausal bargaining into the mix, things get even more intense. Scott Garrabrant, Wei Dai, and Abram Demski have described this problem already, so I won't say more about that here. Basically, in this context, there are many other people observing your thoughts and making decisions on that basis. So bluffing is impossible and there is constant pressure to make commitments quickly before thinking longer. (That's my take on it anyway)

As for your handwavy proposals, I do agree that they are pretty good. They are somewhat similar to the proposals I favor, in fact. But these are just specific proposals in a big space of possible strategies, and (a) we have reason to think there might be flaws in these proposals that we haven't discovered yet, and (b) even if these proposals work perfectly there's still the problem of making sure that our AI follows them:

Objection: "Surely they wouldn't be so stupid as to make those commitments--even I could see that bad outcome coming. A better commitment would be..."
Reply: The problem is that consequentialist agents are motivated to make commitments as soon as possible, since that way they can influence the behavior of other consequentialist agents who may be learning about them. Of course, they will balance these motivations against the countervailing motive to learn more and think more before doing drastic things. The problem is that the first motivation will push them to make commitments much sooner than would otherwise be optimal. So they might not be as smart as us when they make their commitments, at least not in all the relevant ways. Even if our baby AGIs are wiser than us, they might still make mistakes that we haven't anticipated yet. The situation is like the centipede game: Collectively, consequentialist agents benefit from learning more about the world and each other before committing to things. But because they are all bullies and cowards, they individually benefit from committing earlier, when they don't know so much.

If you want to think and talk more about this, I'd be very interested to hear your thoughts. Unfortunately, while my estimate of the commitment races problem's importance has only increased over the past year, I haven't done much to actually make intellectual progress on it.

A vastly faster vaccine rollout

Thanks! (Strong-upvoting for going against the conventional wisdom here, being polite, and willing to back it all up with discussion. I really hope you are right.)

OK, here goes: This diagram makes it seem that large-scale production only began after Phase II completed: https://www.ema.europa.eu/en/human-regulatory/overview/public-health-threats/coronavirus-disease-covid-19/treatments-vaccines/covid-19-vaccines-development-evaluation-approval-monitoring

Is that true? If not, when did large-scale production begin? I guess I want to know what sub-process took 10 months. Was it that they needed 10 months to build the new factories to build the vaccine, because old factories couldn't be repurposed? Was it that old factories could be repurposed but need new equipment which couldn't be 3D printed but had to be made in a traditional assembly line which needed to be purpose-built?

Eight claims about multi-agent AGI safety

I think I was thinking that in multi-agent training environments there might actually be group selection pressure for honesty. (Or at least, there might be whatever selection pressures produced honesty in humans, even if that turns out to be something other than group selection.)

Load More