Nominated Posts for the 2019 Review

Posts need at least 2 nominations to continue into the Review Phase.
Nominate posts that you have personally found useful and important.
Sort by: fewest nominations
31Calibrating With Cards[anonymous]3y
1 0

2019 Review Discussion

[epistemic status: that's just my opinion, man. I have highly suggestive evidence, not deductive proof, for a belief I sincerely hold]

"If you see fraud and do not say fraud, you are a fraud." --- Nasim Taleb

I was talking with a colleague the other day about an AI organization that claims:

  1. AGI is probably coming in the next 20 years.
  2. Many of the reasons we have for believing this are secret.
  3. They're secret because if we told people about those reasons, they'd learn things that would let them make an AGI even sooner than they would otherwise.

His response was (paraphrasing): "Wow, that's a really good lie! A lie that can't be disproven."

I found this response refreshing, because he immediately jumped to the most likely conclusion.

Near predictions generate more funding

Generally, entrepreneurs who

This statement is not supported by the link used as a reference.  Was it a lie?  The reference speaks to failed intelligence and political manipulation using the perceived gap. The phrasing above suggests conspiracy.

National Intelligence Estimate (NIE) 11-10-57, issued in December 1957, predicted that the Soviets would "probably have a first operational capability with up to 10 prototype ICBMs" at "some time during the period from mid-1958 to mid-1959." The numbers started to inflate. A similar report gathered only a few months later, NIE 11-5-58, released in August 1958, concluded that the USSR had "the technical and industrial capability... to have an operational capability with 100 ICBMs" some time in 1960 and perhaps 500 ICBMs "some time in 1961, or at the latest

... (read more)

If the thesis in Unlocking the Emotional Brain (UtEB) is even half-right, it may be one of the most important books that I have read. Written by the psychotherapists Bruce Ecker, Robin Ticic and Laurel Hulley, it claims to offer a neuroscience-grounded, comprehensive model of how effective therapy works. In so doing, it also happens to formulate its theory in terms of belief updating, helping explain how the brain models the world and what kinds of techniques allow us to actually change our minds. Furthermore, if UtEB is correct, it also explains why rationalist techniques such as Internal Double Crux [1 2 3] work.

UtEB’s premise is that much if not most of our behavior is driven by emotional learning. Intense emotions generate unconscious predictive models of how...

3Tim Freeman6d
I read the book before reading this review. I have recently had success with the Conference Therapy technique they describe, so I highly recommend the book. I actually started reading the book, rage-quit in the middle, then came back to it years later and found it useful. I rage-quit because the section on EMDR was about a patient with panic attacks, EMDR was done, and afterward the patient still had panic attacks but they claimed the treatment was a success anyway. Any sensible interpretation would call this failure. So at least one of the authors does motivated cognition. If several therapists are writing a book together and the outcome is motivated cognition, they are all making a mistake that is within their area of competence to fix, and they failed to fix it. But, nevertheless, the Coherence Therapy parts of the book actually seem to work for me. I have to assume the coauthors didn't check each other's work, one of them cannot find and fix their own wrong emotional learnings, and the one(s) who wrote the Coherence Therapy parts didn't have that problem. Or maybe the Coherence Therapy parts are useful by luck. Another issue is that it is apparently not unusual for a problem to need to be solved with Coherence Therapy several times before the symptom goes away. This is not well explained by their theory, but it seems to be true. The typical number is around 3, based on the examples in the book and my experience using the technique on myself. I tend to be stubborn, so if CT failed for a problem that is important to me, I hope I would try to use CT on it at least ten times before giving up. Another problem is that they claim to be agnostic about which learnings are true and which are false. Nevertheless they start the process by identifying a symptom. The word "symptom" presupposes that beliefs that justify it are false. Even though they aren't as agnostic as they say, their technique appears to work. You have to ignore the pretend agnosticism to succeed with

I have recently had success with the Conference Therapy technique they describe


I actually started reading the book, rage-quit in the middle, then came back to it years later and found it useful. I rage-quit because the section on EMDR was about a patient with panic attacks, EMDR was done, and afterward the patient still had panic attacks but they claimed the treatment was a success anyway.

Huh, I didn't remember this from my read. Searching for "panic attacks" in my copy now, there's the story of Susan who got EMDR for panic attacks, but my copy seems ... (read more)

(cross posted from my personal blog)

Since middle school I've generally thought that I'm pretty good at dealing with my emotions, and a handful of close friends and family have made similar comments. Now I can see that though I was particularly good at never flipping out, I was decidedly not good "healthy emotional processing". I'll explain later what I think "healthy emotional processing" is, right now I'm using quotes to indicate "the thing that's good to do with emotions". Here it goes...

Relevant context

When I was a kid I adopted a strong, "Fix it or stop complaining about it" mentality. This applied to stress and worry as well. "Either address the problem you're worried about or quit worrying about it!" Also being a kid, I had a limited...

I'm trying to decide to what extent this applies to my lived experience, but finding it difficult to distinguish between maintaining a healthy tranquility and cultivating habitual impassivity. My intuition is that I've had both experiences, but the internal feedback for either is very similar. Both seem to involve putting a functional amount of distance between yourself and your emotional response, and - in my experience - the healthy habit does reinforce itself, just like the negative version. But then, sometimes, I find myself noticing the lack of an emo... (read more)

The justification for modelling real-world systems as “agents” - i.e. choosing actions to maximize some utility function - usually rests on various coherence theorems. They say things like “either the system’s behavior maximizes some utility function, or it is throwing away resources” or “either the system’s behavior maximizes some utility function, or it can be exploited” or things like that. Different theorems use slightly different assumptions and prove slightly different things, e.g. deterministic vs probabilistic utility function, unique vs non-unique utility function, whether the agent can ignore a possible action, etc.

One theme in these theorems is how they handle “incomplete preferences”: situations where an agent does not prefer one world-state over another. For instance, imagine an agent which prefers pepperoni over mushroom pizza when it has pepperoni,...

I mean the former: like, whatever "utility" is is not a simple thing to define in terms of things we have a handle on ("pleasurable mental states" does not count as a simple definition), and even if you allow yourself access to standard language about mental states I don't think it's so easy (e.g. there are a bunch of different sorts of mental states that might fall under the broad umbrella of "pleasure").

I do agree that "not for the sake of happiness alone" argues against utilitarianism.

This is a linkpost for

In 2008, Steve Omohundro's foundational paper The Basic AI Drives conjectured that superintelligent goal-directed AIs might be incentivized to gain significant amounts of power in order to better achieve their goals. Omohundro's conjecture bears out in toy models, and the supporting philosophical arguments are intuitive. In 2019, the conjecture was even debated by well-known AI researchers.

Power-seeking behavior has been heuristically understood as an anticipated risk, but not as a formal phenomenon with a well-understood cause. The goal of this post (and the accompanying paper, Optimal Policies Tend to Seek Power) is to change that.


It’s 2008, the ancient wild west of AI alignment. A few people have started thinking about questions like “if we gave an AI a utility function over world states, and it actually maximized that...

I've been thinking about whether these results could be interpeted pretty differently under different branding. The current framing, if I understand it correctly, is something like, 'Powerseeking is not desirable. We can prove that keeping your options open tends to be optimal and tends to meet a plausible definition of powerseeking. Therefore we should expect RL agents to seek power, which is bad.' An alternative framing would be, 'Making irreversible changes is not desirable. We can prove that keeping your options open tends to be optimal. Therefore we should not expect RL agents to make irreversible changes, which is good.' I don't think that the second framing is better than the first, but I do think that if you had run with it instead then lots of people would be nodding their heads and feeling reassured about corrigibility, instead of feeling like their views about instrumental convergence had been confirmed. That makes me feel like we shouldn't update our views too much based on formal results that leave so much room for interpretation. If I showed a bunch of theorems about MDPs, with no exposition, to two people with different opinions about alignment, I expect they might come to pretty different conclusions about what they meant.  What do you think? (To be clear I think this is a great post and paper, I just worry that there are pitfalls when it comes to interpretation.)  
Consider an agent navigating a tree MDP, with utility on the leaf nodes. At any internal node in the tree, ~most utility functions will have the agent retain options by going towards the branch with the most leaves. But all policies use up all available options -- they navigate to a leaf with no more power. I agree that we shouldn't update too hard for other reasons. EG this post's focus on optimal policies seems bad because reward is not the optimization target [].

Fair enough, but in that example making irreversible decisions is unavoidable. What if we consider a modified tree such that one and only one branch is traversible in both directions, and utility can be anywhere? 

I expect we get that the reversible brach is the most popular across the distribution of utility functions (but not necessarily that most utility functions prefer it). That sounds like cause for optimism—'optimal policies tend to avoid irreversible changes'.

The generalized efficient markets (GEM) principle says, roughly, that things which would give you a big windfall of money and/or status, will not be easy. If such an opportunity were available, someone else would have already taken it. You will never find a $100 bill on the floor of Grand Central Station at rush hour, because someone would have picked it up already.

One way to circumvent GEM is to be the best in the world at some relevant skill. A superhuman with hawk-like eyesight and the speed of the Flash might very well be able to snag $100 bills off the floor of Grand Central. More realistically, even though financial markets are the ur-example of efficiency, a handful of firms do make impressive amounts of money by...

In other words, how hard is it to reach the Pareto frontier?


Lately I've come to think of human civilization as largely built on the backs of intelligence and virtue signaling. In other words, civilization depends very much on the positive side effects of (not necessarily conscious) intelligence and virtue signaling, as channeled by various institutions. As evolutionary psychologist Geoffrey Miller says, "it’s all signaling all the way down."

A question I'm trying to figure out now is, what determines the relative proportions of intelligence vs virtue signaling? (Miller argued that intelligence signaling can be considered a kind of virtue signaling, but that seems debatable to me, and in any case, for ease of discussion I'll use "virtue signaling" to mean "other kinds of virtue signaling besides intelligence signaling".) It seems that if you get too much of one type


Hey folks, just to say I gave this talk on Intelligence Signalling :) inspired by Hanson

Credit for the question to Eli Tyre.

Preface + Epistemic Status

I spent roughly two days attempting to learn the answer this question plus several more writing it up. What is presented is more accurately described as a partial answer or contribution towards answering the question - this report isn’t actually a confident, solid answer. The question is too large for that.

I need to provide a couple of epistemic caveats:

  • I am not experienced at this kind of research, I don’t know what kind of rookie mistakes I might be making.
  • I have not attempted to assess the reliability of the historians who I quote, though my prior is to be less than completely confident. It might to be too easy to start with a conclusion, a nice narrative, and
0Spencer S3mo
Chinese, and Northern East Asian's in general, tend to have fewer geniuses. Europeans have a lower average IQ than NE Asian's, but the tail end on the right side of the bell curve extends further out-- Europeans have more geniuses. Explosive technological progress is entirely dependent on having a critical mass of geniuses. It's really that simple. Below average people, average people, even very bright people, don't event technologies that create a paradigm shift(to use the cliche). Those with an outlier IQ-- high abstract reasoning, but also high creativity, a sort of autistic, maverick attitude, are the ones who make earth shattering technologies and discoveries. The spirit of independentness may also be a factor in the gap between East and West. NE Asian's might lack psychological traits of independeness and high creativity. Ot at least they may lack them to the degree that Europeans have them. The European gene pool seems to have reached the peak of this genius sometime in the later 19th century and has been slowly declining since then. This explains the deceleration of progress.

holy citation needed batman! among other serious issues with this perspective, variance in capability has many input factors; while genetics is involved, memetics and material circumstances also make a significant difference - guns, germs, and steel all can serve to amplify or reduce effective intelligence. society has been on a 12k-year energy availability growth process since the start of farming, and there have been many variations on the way - I don't deny that there may be genetic differences. but I don't think we can even conclude there's been a dece... (read more)

1. A group wants to try an activity that really requires a lot of group buy in. The activity will not work as well if there is doubt that everyone really wants to do it. They establish common knowledge of the need for buy in. They then have a group conversation in which several people make comments about how great the activity is and how much they want to do it. Everyone wants to do the activity, but is aware that if they did not want to do the activity, it would be awkward to admit. They do the activity. It goes poorly.

2. Alice strongly wants to believe A. She searches for evidence of A. She implements a biased search, ignoring evidence against A. She finds justifications...

it was strange to read it. it was interesting - explaining point i already know in succinct and effective way. and it's connect nicely with the extensive discussion on consent and boundaries. Boundaries: Your Yes Means Nothing if You Can’t Say No

and then, when i was reading the comments and still internalizing the post i got it - i actually re-invented this concept myself! it could have been so nice not to have to do it... i wrote my own post about it - in Hebrew. it's name translates to Admit that sometimes the answer is "yes", and it start with a ... (read more)

is it? i find it very Christian way of thinking, and this though pattern seem obviously wrong to me. it's incorporated into the Western Culture, but i live in non-Christian place. you can believe in Heaven to all! some new-age people believe in that. you can believe in Heaven to all expect the especially blameworthy - this is how i understand Mormonism. thanks for the insight! now i can recognize one pretty toxic thought-pattern as Christian influence, and understand it better! 

This essay is an adaptation of a talk I gave at the Human-Aligned AI Summer School 2019 about our work on mesa-optimisation. My goal here is to write an informal, accessible and intuitive introduction to the worry that we describe in our full-length report.

I will skip most of the detailed analysis from our report, and encourage the curious reader to follow up this essay with our sequence or report.

The essay has six parts:

Two distinctions draws the foundational distinctions between
“optimised” and “optimising”, and between utility and reward.

What objectives? discusses the behavioral and internal approaches to understanding objectives of ML systems.

Why worry? outlines the risk posed by the utility ≠ reward gap.

Mesa-optimisers introduces our language for analysing this worry.

An alignment agenda sketches different alignment problems presented by these ideas,...

A part of me is worried that the terminology invites viewing mesa-optimisers as a description of a very specific failure mode, instead of as a language for the general worry described above.

I have been very confused about the term for a very long time, and have always thought mesa-optimisers is a very specific failure mode.

This post helped me clear things up.


Kaj Sotala has an outstanding review of Unlocking The Emotional Brain; I read the book, and Kaj’s review is better.

He begins:

UtEB’s premise is that much if not most of our behavior is driven by emotional learning. Intense emotions generate unconscious predictive models of how the world functions and what caused those emotions to occur. The brain then uses those models to guide our future behavior. Emotional issues and seemingly irrational behaviors are generated from implicit world-models (schemas) which have been formed in response to various external challenges. Each schema contains memories relating to times when the challenge has been encountered and mental structures describing both the problem and a solution to it.

So in one of the book’s example cases, a man named Richard sought help for...

What ever the case I am often exhausted, when dealing with such issues.

Good post though.

For instance certain high pitch sounds are terrible for my ears. Makes me lose focus, and makes my eyes close.

Its so bad, that I literally feel as though there is pain in my mind.

Schema? Or auditory thing? 

It never happens with other sounds, just with this pitch. 

Same problem with focus. 

I can clearly be aware how the little tribes in my mind come together to defeat the invaders, but once the battle is over they part ways, and go back, or if they have to ... (read more)

This post originally appeared here; I've updated it slightly and posted it here as a follow-up to this post.

David Friedman has a fascinating book on alternative legal systems. One chapter focuses on prison law - not the nominal rules, but the rules enforced by prisoners themselves.

The unofficial legal system of California prisoners is particularly interesting because it underwent a phase change sometime after the 1960’s.

Prior to the 1960’s, prisoners ran on a decentralized code of conduct - various unwritten rules roughly amounting to “mind your own business and don’t cheat anyone”. Prisoners who kept to the code were afforded some respect by their fellow inmates. Prisoners who violated the code were ostracized, making them fair game for the more predatory inmates. There was no formal enforcement; the...

I'm very curious about what the actual rules of the various gangs look like. If they exist in written form in an environment where it's easy to confiscate documents I would expect them to be publically accessible. 

Very interesting article. Some of the social engineering implications are really interesting. (I don't mean social engineering in a bad way). For example: if you believe the gang system in prisons is leading to increased re-offending rates as people who leave the prison system are recruited into gang structures outside of it then you can think about having more small prisons with prisoners moving between locations very infrequently.

This post begins the Immoral Mazes sequence. See introduction for an overview of the plan. Before we get to the mazes, we need some background first.

Meditations on Moloch

Consider Scott Alexander’s Meditations on Moloch. I will summarize here. 

Therein lie fourteen scenarios where participants can be caught in bad equilibria.

  1. In an iterated prisoner’s dilemma, two players keep playing defect.
  2. In a dollar auction, participants massively overpay.
  3. A group of fisherman fail to coordinate on using filters that efficiently benefit the group, because they can’t punish those who don’t profi by not using the filters.
  4. Rats are caught in a permanent Malthusian trap where only those who do nothing but compete and consume survive. All others are outcompeted.
  5. Capitalists serve a perfectly competitive market, and cannot pay a living wage.
  6. The tying of all good

damn, i'm not sure; maybe it was my Twitter cover picture: (this is content i modified from someone else)

Previously: Online discussion is better than pre-publication peer review, Disincentives for participating on LW/AF

Recently I've noticed a cognitive dissonance in myself, where I can see that my best ideas have come from participating on various mailing lists and forums (such as cypherpunks, extropians, SL4, everything-list, LessWrong and AI Alignment Forum), and I've received a certain amount of recognition as a result, but when someone asks me what I actually do as an "independent researcher", I'm embarrassed to say that I mostly comment on other people's posts, participate in online discussions, and occasionally a new idea pops into my head and I write it down as a blog/forum post of my own. I guess that's because I imagine it doesn't fit most people's image of what a researcher's


FP often leads to long, winding discussions that may end with two researchers agreeing, but the resulting transcript is not great for future readers.

Highly agree. Often valuable content in discussions - either what you wrote or what the other person wrote - just gets lost.

Rereading your old discussions and then distilling the useful stuff into posts is a full time job.

This is a major reason I find very lengthy comments not often worth writing. I wonder if there is a way ro solve this problem. (Maybe more advanced search functionality on your comments?)

An actual debate about instrumental convergence, in a public space! Major respect to all involved, especially Yoshua Bengio for great facilitation.

For posterity (i.e. having a good historical archive) and further discussion, I've reproduced the conversation here. I'm happy to make edits at the request of anyone in the discussion who is quoted below. I've improved formatting for clarity and fixed some typos. For people who are not researchers in this area who wish to comment, see the public version of this post here. For people who do work on the relevant areas, please sign up in the top right. It will take a day or so to confirm membership.

Original Post

Yann LeCun: "don't fear the Terminator", a short opinion piece by Tony Zador and me that was just...

It would be great to have a summary or distillation of this conversation.

2Evan R. Murphy7mo
(Responding to the above comment years later...) It seems like "amateur" AI safety researchers have been the main ones willing to seriously think about AGI and on-the-horizon advanced AI systems from a safety angle though. However, I do think you're pointing to a key potential blindspot in the AI safety community. Fortunately AI safety folks are studying ML more, and I think ML researchers are starting to be more receptive to discussions about AGI and safety. So this may become a moot point.
Has LeCun changed his mind on any of these points since this debate?

I've been thinking more about partial agency. I want to expand on some issues brought up in the comments to my previous post, and on other complications which I've been thinking about. But for now, a more informal parable. (Mainly because this is easier to write than my more technical thoughts.)

This relates to oracle AI and to inner optimizers, but my focus is a little different.


Suppose you are designing a new invention, a predict-o-matic. It is a wonderous machine which will predict everything for us: weather, politics, the newest advances in quantum physics, you name it. The machine isn't infallible, but it will integrate data across a wide range of domains, automatically keeping itself up-to-date with all areas of science and current events. You fully expect that...

If someone had a strategy that took two years, they would have to over-bid in the first year, taking a loss. But then they have to under-bid on the second year if they're going to make a profit, and--"

"And they get undercut, because someone figures them out."

I think one could imagine scenarios where the first trader can use their influence in the first year to make sure they are not undercut in the second year, analogous to the prediction market example. For instance, the trader could install some kind of encryption in the software that this company use... (read more)

I've been discoursing more privately about the corruption of discourse lately, for reasons that I hope are obvious at least in the abstract, but there's one thing I did think was shareable. The context is another friend's then-forthcoming blog post about the politicization of category boundaries.

In private communication, quoted with permission, Jessica Taylor wrote:

In a world where half the employees with bad jobs get good titles, aren't their titles predictively important in that they predict how likely they are to be hired by outside companies? Their likelihood of getting hired is, under these assumptions, going to be the same as that of as people with good jobs and good titles, and higher than that of people with bad jobs and bad titles. So, in terms of things...

When I was halfway through this and read about the 4 stages, they immediately seemed to me to correspond to four types of news reporting:

  1. Accurate reporting
  2. Misleading reporting (i.e. distorting real events, and fooling many people)
  3. Fake news (i.e. completely made up, but still pretending to be news, and fooling some people)
  4. Obviously false or 'pure fiction' (i.e. not even pretending to be news, and fooling no-one). You do get this kind of thing in the crappiest tabloids like the UK's Sunday Sport or maybe the US's National Enquirer. A well-known example in th
... (read more)

Previously in Immoral Mazes sequence: Moloch Hasn’t Won

Perfect Competition

In Meditations on Moloch, Scott points out that perfect competition destroys all value beyond the axis of competition.

Which, for any compactly defined axis of competition we know about, destroys all value.

This is mathematically true.

Yet value remains.

Thus competition is imperfect.

(Unjustified-at-least-for-now note: Competition is good and necessary. Robust imperfect competition, including evolution, creates all value. Citation/proof not provided due to scope concerns but can stop to expand upon this more if it seems important to do so. Hopefully this will become clear naturally over course of sequence and/or proof seems obvious once considered.)

Perfect competition hasn’t destroyed all value. Fully perfect competition is a useful toy model. But it isn’t an actual thing.

Some systems and markets get close. For now they remain the...

Agreed. Zvi's proposition also simply doesn't align with first-world people's motivations, as far as I can tell. In short, first-worlders have a lot of other interesting ways that they can use their time.

[Epistemic Status: Scroll to the bottom for my follow-up thoughts on this from months/years later.]

Early this year, Conor White-Sullivan introduced me to the Zettelkasten method of note-taking. I would say that this significantly increased my research productivity. I’ve been saying “at least 2x”. Naturally, this sort of thing is difficult to quantify. The truth is, I think it may be more like 3x, especially along the dimension of “producing ideas” and also “early-stage development of ideas”. (What I mean by this will become clearer as I describe how I think about research productivity more generally.) However, it is also very possible that the method produces serious biases in the types of ideas produced/developed, which should be considered. (This would be difficult to quantify at the best of...

Re the "Depth-first vs Breadth-first" distinction for idea development: IDA* is ok as far as a loose analogy to personally searching the idea tree goes, but I think this is another instance where there's a (first-order) trade-off between individual epistemic rationality and social epistemology. What matters is that someone discovers good ideas on AI alignment, not whether any given person does. As such, we can coordinate with other researchers in order to search different branches of the idea tree, and this is more like multithreaded/parallel/distributed tree search. We want to search branches that are neglected, in our comparative advantage, and we shouldn't be trying to maximise the chance that we personally discover the best idea. Instead, we should collectively act according to the rule that maximises the chance that someone in the community discovers the best idea. Individually, we are parallel threads of the same search algorithm.
I agree with the general sentiment that paying attention to group optimality, not just individual optimality, can be very important. However, I am a bit skeptical of giving this too much importance when thinking about your research. If we're all doing what's collectively best, we must personally be doing what gives us the highest expectation of contributing (not of getting credit, but of contributing). If this were not the case, then it follows that there is at least one single person who could change their strategy to have a better chance of contributing. So """in an appropriate sense""" we should still do what's best for our personal research.  It does not follow that if everyone is doing what seems personally best for their research, the group is following a collectively optimal path. However, I think it's somewhat hard to produce a counterexample which doesn't involve strategizing about who gets the credit. Here's a simplistic argument that you're wrong: the "only way" to help create good ideas is by having good ideas. This isn't really true (for example, I might trigger your bright idea by some innocuous action). However, it seems mostly true in the realm of purposeful research.  Anyway, with respect to IDA*, I'm curious exactly what you meant by "first order".  I don't yet see why IDA* is unsuitable in the multi-researcher context. You can set up the search problem to be a sub-problem that's been farmed out to you, if you're part of a larger research program which is organized by farming out sub-questions. You can integrate information from other people in your search-tree, EG updating nodes with higher heuristic values if other people think they're promising. You can use an IDA* tree to coordinate a parallelized search, instead of a purely sequential search like IDA*. (Perhaps this is the change you're trying to point at?) Some related questions: * How much should you focus on reading what other people do, vs doing your own things? * How much sho

I'm curious exactly what you meant by "first order". 

Just that the trade-off is only present if you think of "individual rationality" as "let's forget that I'm part of a community for a moment".  All things considered, there's just rationality, and you should do what's optimal.

First-order: Everyone thinks that maximizing insight production means doing IDA* over idea tree. Second-order: Everyone notices that everyone will think that, so it's no longer optimal for maximizing insights produces overall. Everyone wants to coordinate with everyone else... (read more)

Since the CAIS technical report is a gargantuan 210 page document, I figured I'd write a post to summarize it. I have focused on the earlier chapters, because I found those to be more important for understanding the core model. Later chapters speculate about more concrete details of how AI might develop, as well as the implications of the CAIS model on strategy. ETA: This comment provides updates based on more discussion with Eric.

The Model

The core idea is to look at the pathway by which we will develop general intelligence, rather than assuming that at some point we will get a superintelligent AGI agent. To predict how AI will progress in the future, we can look at how AI progresses currently -- through research and development (R&D)...

I agree that in the long term, agent AI could probably improve faster than CAIS, but I think CAIS could still be a solution.

Regardless of how it is aligned, aligned AI will tend to improve slower than unaligned AI, because it is trying to achieve a more complicated goal, human oversight takes time, etc. To prevent unaligned AI, aligned AI will need a head start, so it can stop any unaligned AI while it's still much weaker. I don't think CAIS is fundamentally different in that respect.

If the reasoning in the post that CAIS will develop before AGI holds up, then CAIS would actually have an advantage, because it would be easier to get a head start.

Load More