Daniel Kokotajlo

Philosophy PhD student, worked at AI Impacts, now works at Center on Long-Term Risk. Research interests include acausal trade, timelines, takeoff speeds & scenarios, decision theory, history, and a bunch of other stuff. I subscribe to Crocker's Rules and am especially interested to hear unsolicited constructive criticism. http://sl4.org/crocker.html

Sequences

AI Timelines
Takeoff and Takeover in the Past and Future

Comments

Daniel Kokotajlo's Shortform

Years after I first thought of it, I continue to think that this chain reaction is the core of what it means for something to be an agent, AND why agency is such a big deal, the sort of thing we should expect to arise and outcompete non-agents. Here's a diagram:

Roughly, plans are necessary for generalizing to new situations, for being competitive in contests for which there hasn't been time for natural selection to do lots of optimization of policies. But plans are only as good as the knowledge they are based on. And knowledge doesn't come a priori; it needs to be learned from data. And, crucially, data is of varying quality, because it's almost always irrelevant/unimportant. High-quality data, the kind that gives you useful knowledge, is hard to come by. Indeed, you may need to make a plan for how to get it. (Or more generally, being better at making plans makes you better at getting higher-quality data, which makes you more knowledgeable, which makes your plans better.)

 

Intermittent Distillations #2

Thanks for doing this! I'm excited to see this sequence grow, it's the sort of thing that could serve the function of a journal or textbook.

Homogeneity vs. heterogeneity in AI takeoff scenarios

OK, thanks. YMMV but some people I've read / talked to seem to think that before we have successful world-takeover attempts, we'll have unsuccessful ones--"sordid stumbles." If this is true, it's good news, because it makes it a LOT easier to prevent successful attempts. Alas it is not true.

A much weaker version of something like this may be true, e.g. the warning shot story you proposed a while back about customer service bots being willingly scammed. It's plausible to me that we'll get stuff like that before it's too late.

If you think there's something we are not on the same page about here--perhaps what you were hinting at with your final sentence--I'd be interested to hear it.

Review of "Fun with +12 OOMs of Compute"

I'm probably being just mathematically confused myself; at any rate, I'll proceed with the p[Tk & e+] : p[Tk & e-] version since that comes more naturally to me. (I think of it like: Your credence in Tk is split between two buckets, the Tk&e+ and Tk&e- bucket, and then when you update you rule out the e- bucket. So what matters is the ratio between the buckets; if it's relatively high (compared to the ratio for other Tx's) your credence in Tk goes up, if it's relatively low it goes down.

Anyhow, I totally agree that this ratio matters and that it varies with k. In particular here's how I think it should vary for most readers of my post:

for k>12, the ratio should be low, like 0.1.

for low k, the ratio should be higher.

for middling k, say 6<k<13, the ratio should be in between.

Thus, the update should actually shift probability mass disproportionately to the lower k hypotheses.

I realize we are sort of arguing in circles now. I feel like we are making progress though. Also, separately, want to hop on a call with me sometime to sort this out? I've got some more arguments to show you...

A Brief Review of Current and Near-Future Methods of Genetic Engineering

Thanks for this post!

If some of the more pessimistic projections about the timelines to TAI are realized, my efforts in this field will have no effect. It is going to take at least 30 years for dramatically more capable humans to be able to meaningfully contribute to work in this field. Using Ajeya Cotra's estimate of the timeline to TAI, which estimates a 50% chance of TAI by 2052, I estimate that there is at most a 50% probability that these efforts will have an impact, and a ~25% chance that they will have a large impact.
Those odds are good enough for me.

How low would the odds have to be before you would switch to doing something else? Would you continue with your current plan if the odds were 20-10 instead of 50-25?

Review of "Fun with +12 OOMs of Compute"
So what ends up mattering is the ratio p[Tk | e+] : p[Tk | e-]
I'm claiming that this ratio is likely to vary with k.

Wait, shouldn't it be the ratio p[Tk & e+] : p[Tk & e-]? Maybe both ratios work fine for our purposes, but I certainly find it more natural to think in terms of &.

Another (outer) alignment failure story

Thanks for this, this is awesome! I'm hopeful in the next few years for there to be a collection of stories like this.

This is a story where the alignment problem is somewhat harder than I expect, society handles AI more competently than I expect, and the outcome is worse than I expect. It also involves inner alignment turning out to be a surprisingly small problem. Maybe the story is 10-20th percentile on each of those axes.

I'm a bit surprised that the outcome is worse than you expect, considering that this scenario is "easy mode" for societal competence and inner alignment, which seem to me to be very important parts of the overall problem. Am I right to infer that you think outer alignment is the bulk of the alignment problem, more difficult than inner alignment and societal competence?

Some other threads to pull on:

--In this story, there aren't any major actual wars, just simulated wars / war games. Right? Why is that? I look at the historical base rate of wars, and my intuitive model adds to that by saying that during times of rapid technological change it's more likely that various factions will get various advantages (or even just think they have advantages) that make them want to try something risky. OTOH we haven't had major war for seventy years, and maybe that's because of nukes + other factors, and maybe nukes + other factors will still persist through the period of takeoff? IDK, I worry that the reasons why we haven't had war for seventy years may be largely luck / observer selection effects, and also separately even if that's wrong, I worry that the reasons won't persist through takeoff (e.g. some factions may develop ways to shoot down ICBMs, or prevent their launch in the first place, or may not care so much if there is nuclear winter)

--Relatedly, in this story the AIs seem to be mostly on the same team? What do you think is going on "under the hood" so to speak: Have they all coordinated (perhaps without even causally communicating) to cut the humans out of control of the future? Why aren't they fighting each other as well as the humans? Or maybe they do fight each other but you didn't focus on that aspect of the story because it's less relevant to us?

--Yeah, society will very likely not be that competent IMO. I think that's the biggest implausibility of this story so far.

--(Perhaps relatedly) I feel like when takeoff is that distributed, there will be at least some people/factions who create agenty AI systems that aren't even as superficially aligned as the unaligned benchmark. They won't even be trying to make things look good according to human judgment, much less augmented human judgment! For example, some AI scientists today seem to think that all we need to do is make our AI curious and then everything will work out fine. Others seem to think that it's right and proper for humans to be killed and replaced by machines. Others will try strategies even more naive than the unaligned benchmark, such as putting their AI through some "ethics training" dataset, or warning their AI "If you try anything I'll unplug you." (I'm optimistic that these particular failure modes will have been mostly prevented via awareness-raising before takeoff, but I do a pessimistic meta-induction and infer there will be other failure modes that are not prevented in time.)

--Can you say more about how "the failure modes in this story are an important input into treachery?"

Discontinuous progress in history: an update

On the contrary, the graph of launch costs you link seems to depict Falcon 9 as a 15-ish-year discontinuity in cost to orbit; I think you are misled by the projection, which is based on hypothetical future systems rather than on extrapolating from actual existing systems.

Daniel Kokotajlo's Shortform

I'm betting that a little buzz on my phone which I can dismiss with a tap won't kill my focus. We'll see.

Daniel Kokotajlo's Shortform

Productivity app idea:

You set a schedule of times you want to be productive, and a frequency, and then it rings you at random (but with that frequency) to bug you with questions like:

--Are you "in the zone" right now? [Y] [N]

--(if no) What are you doing? [text box] [common answer] [ common answer] [...]

The point is to cheaply collect data about when you are most productive and what your main time-wasters are, while also giving you gentle nudges to stop procrastinating/browsing/daydream/doomscrolling/working-sluggishly, take a deep breath, reconsider your priorities for the day, and start afresh.

Probably wouldn't work for most people but it feels like it might for me.

Load More