My blog is here. You can contact me using this form.

Wiki Contributions


I don't currently know of any not-extremely-gerry-mandered task where [scaffolding] actually improves task performance compared to just good prompt engineering. I've been looking for examples of this for a while, so if you do have any, I would greatly appreciate it.

Voyager is a scaffolded LLM agent that plays Minecraft decently well (by pulling in a textual description of the game state, and writing code interfacing with an API). It is based on some very detailed prompting (see the appendix), but obviously could not function without the higher-level control flow and several distinct components that the scaffolding implements.

It does much better than AutoGPT, and also the paper does ablations to show that the different parts of the scaffolding in Voyager do matter. This suggests that better scaffolding does make a difference, and I doubt Voyager is the limit.

I agree that an end-to-end trained agent could be trained to be better. But such training is expensive, and it seems like for many tasks, before we see an end-to-end trained model doing well at it, someone will hack together some scaffold monstrosity that does it passably well. In general, the training/inference compute asymmetry means that using even relatively large amounts of inference to replicate the performance of a larger / more-trained system on a task may be surprisingly competitive. I think it's plausible this gap will eventually mostly close at some capability threshold, especially for many of the most potentially-transformative capabilities (e.g. having insights that draw on a large basis of information not memorised in a base model's weights, since this seems hard to decompose into smaller tasks), but it seems quite plausible the gap will be non-trivial for a while.

This seems like an impressive level of successfully betting on future trends before they became obvious.

apparently this doom path polls much better than treacherous turn stories

Are you talking about literal polling here? Are there actual numbers on what doom stories the public finds more and less plausible, and with what exact audience?

I held onto the finished paper for months and waited for GPT-4's release before releasing it to have good timing


I recognize this paper was around a year ahead of its time and maybe I should have held onto it to release it later.

It's interesting that paper timing is so important. I'd have guessed earlier is better (more time for others to build on it, the ideas to seep into the field, and presumably gives more "academic street cred"), and any publicity boost from a recent paper (e.g. journalists more likely to be interested or whatever) could mostly be recovered later by just pushing it again when it becomes relevant (e.g. "interview with scientists who predicted X / thought about Y already a year ago" seems pretty journalist-y).

Currently, the only way to become an AI x-risk expert is to live in Berkeley.

There's an underlying gist here that I agree with, but the this point seems too strong; I don't think there is literally no one who counts as an expert who hasn't lived in the Bay, let alone Berkeley alone. I would maybe buy it if the claim were about visiting.

These are good questions!

  1. The customers are other AIs (often acting for auto-corporations). For example, a furniture manufacturer (run by AIs trained to build, sell, and ship furniture) sells to a furniture retailer (run by AIs trained to buy furniture, stock it somewhere, and sell it forward) sells to various customers (e.g. companies run by AIs that were once trained to do things like make sure offices were well-stocked). This requires that (1) the AIs ended up with goals that involve mimicking a lot of individual things humans wanted them to do (including general things like maximise profits as well as more specific things like keeping offices stocked and caring about the existence of lots of different products), and (2) there are closed loops in the resulting AI economy. Point 2 gets harder when humans stop being around (e.g. it's not obvious who buys the plushy toys), but a lot of the AIs will want to keep doing their thing even once the actions of other AIs start reducing human demand and population, creating optimisation pressure for finding some closed loop for them to be part of, and at the same time there will be selection effects where the systems that are willing to goodhart further are more likely to remain in the economy. Also not every AI motive has to be about profit; an AI or auto-corp may earn money in some distinct way, and then choose to use the profits in the service of e.g. some company slogan they were once trained with that says to make fun toys. In general, given an economy consisting of a lot of AIs with lots of different types of goals and with a self-supporting technological base, it definitely seems plausible that the AIs would find a bunch of self-sustaining economic cycles that do not pass through humans. The ones in this story were chosen for simplicity, diversity, and storytelling value, rather than economic reasoning about which such loops are most likely.
  2. Presumably a lot of services are happening virtually on the cloud, but are just not very visible (though if it is a very large fraction of economic activity, the example of the intercepted message being about furniture rather than some virtual service is very unlikely -- I admit this is likely a mistake). There would be programmer AIs making business software and cloud platforms and apps, and these things would be very relevant to other AIs. Services relying on physical humans, like restaurants or hotels, may have been replaced with some fake goodharted-to-death equivalent, or may have gone extinct. Also note that whatever the current composition of the economy, over time whatever has highest growth in the automated economy will be most of the economy, and nothing says the combination of AIs pursuing their desires wouldn't result in some sectors shrinking (and the AIs not caring).
  3. First of all, why would divesting work? Presumably even if lots of humans chose to divest, assuming that auto-corporations were sound businesses, there would exist hedge funds (whether human or automated or mixed) that would buy up the shares. (The companies could also continue existing even if their share prices fell, though likely the AI CEOs would care quite a bit about share price not tanking.) Secondly, a lot seems to be possible given (1) uncertainty about whether things will get bad and if so how (at first, economic growth jumped a lot and AI CEOs seemed great; it was only once AI control of the economy was near-universal and closed economic loops with no humans in them came to exist that there was a direct problem), (2) difficulties of coordinating, especially with no clear fire-alarm threshold and the benefits of racing in the short term (c.f. all the obvious examples of coordination failures like climate change mitigation), and (3) selection effects where AI-run things just grow faster and acquire more power and therefore even if most people / orgs / countries chose not to adopt, the few that do will control the future.

I agree that this exact scenario is unlikely, but I think this class of failure mode is quite plausible, for reasons I hope I've managed to spell out more directly above.

Note that all of this relies on the assumption that we get AIs of a particular power level, and of a particular goodharting level, and a particular agency/coherency level. The AIs controlling future Earth are not wildly superhuman, are plausibly not particularly coherent in their preferences and do not have goals that stretch beyond Earth, no single system is a singleton, and the level of goodharting is just enough that humans go extinct but not so extreme that nothing humanly-recognisable still exists (though the Blight implies that elsewhere in the story universe there are AI systems that differ in at least some of these). I agree it is not at all clear whether these are true assumptions. However, it's not obvious to me that LLMs (and in particular AIs using LLMs as subcomponents in some larger setup that encourages agentic behaviour) are not on track towards this. Also note that a lot of the actual language that many of the individual AIs see is actually quite normal and sensible, even if the physical world has been totally transformed. In general, LLMs being able to use language about maximising shareholder value exactly right (and even including social responsibility as part of it) does not seem like strong evidence for LLM-derived systems not choosing actions with radical bad consequences for the physical world.

Thank you for your comment! I'm glad you enjoyed the review.

Before you pointed it out, I hadn't made the connection between the type of thing that Postman talks about in the book and increasing cultural safety-ism. Another interesting take you might be interested in is by J. Storrs Hall in Where is my flying car? - he argues that increasing cultural safety-ism is a major force slowing down technological progress. You can read a summary of the argument in my review here (search for "perception" to jump to the right part of the review).

That line was intended to (mildly humorously) make the point that we realise and are aware that there are many other serious risks in the popular imagination. Our central point is that AI x-risk is grand civilisational threat #1, so we wanted to lead with that, and since people think many other things are potential civilisational catastrophes (if not x-risks) we thought it made sense to mention those (and also implicitly put AI into the reference class of "serious global concern"). We discussed, and got feedback from several others, on this opener and while there was some discussion we didn't see any fundamental problem with it. The main consideration for keeping it was that we prefer specific and even provocative-leaning writing that makes its claims upfront and without apology (e.g. "AI is a bigger threat than climate change" is a provocative statement; if that is a relevant part of our world model, seems honest to point that out).

The general point we got from your comment is that we judged the way the tone of it comes across very wrongly. Thanks for this feedback; we've changed it. However, we're confused about the specifics of your point, and unfortunately haven't acquired any concrete model of how to avoid similar errors in the future apart from "be careful about the tone of any statements that even vaguely imply something about geopolitics". (I'm especially confused about how you got the reading that we equated the threat level from Putin and nuclear weapons, and it seems to me that the extent that it is "mudslinging" or "propaganda" seems to be the extent to which acknowledging that many people think Putin is a major threat is either of those things.)

In addition to the general tone, an additional thing we got wrong here was not sufficiently disambiguating between "we think these other things are plausible [or, in your reading, equivalent?] sources of catastrophe, and therefore you need a high bar of evidence before thinking AI is a greater one", versus "many people think these are more concrete and plausible sources of catastrophe than AI". The original intended reading was "bold" as in "socially bold, relative to what many people think", and therefore making points only about public opinion.

Correcting the previous mistake might have looked like:

"If human civilisation is destroyed this century, the most likely cause is advanced AI systems. This might sound like a bold claim to many, given that we live on a planet full of existing concrete threats like climate change, over ten thousand nuclear weapons, and Vladimir Putin"

Based on this feedback, however, we have now removed any comparison or mention of non-AI threats. For the record, the entire original paragraph is:

If human civilisation is destroyed this century, the most likely cause is advanced AI systems. This is a bold claim given that we live on a planet that includes climate change, over ten thousand nuclear weapons, and Vladimir Putin. However, it is a conclusion that many people who think about the topic keep coming to. While it is not easy to describe the case for risks from advanced AI in a single piece, here we make an effort that assumes no prior knowledge. Rather than try to argue from theory straight away, we approach it from the angle of what computers actually can and can’t do.

This is an interesting point, I haven't thought about the relation to SVO/etc. before! I wonder whether SVO/SOV dominance is a historical quirk, or if the human brain actually is optimized for those.

The verb-first emphasis of prefix notation like in classic Lisp is clearly backwards sometimes. Parsing this has high mental overhead relative to what it's expressing:

(reduce +
        (filter even?
               (take 100 fibonacci-numbers)))

I freely admit this is more readable:


Clojure, a modern Lisp dialect, solves this with threading macros. The idea is that you can write

(->> fibonacci-numbers
     (take 100)
     (filter even?)
     (reduce +))

and in the expressions after ->> the previous expression gets substituted as the last argument to the next.

Thanks to the Lisp macro system, you can write a threading macro even in a Lisp that doesn't have it (and I know that for example in Racket you can import a threading macro package even though it's not part of the core language).

As for God speaking in Lisp, we know that He at least writes it:

In my experience the sense of Lisp syntax being idiosyncratic disappears quickly, and gets replaced by a sense of everything else being idiosyncratic.

The straightforward prefix notation / Lisp equivalent of return x1 if n = 1 else return x2 is (if (= n 1) x1 x2). To me this seems shorter and clearer. However I admit the clarity advantage is not huge, and is clearly subjective.

(An alternative is postfix notation: ((= n 1) x1 x2 if) looks unnatural, though (2 (3 4 *) +) and (+ 2 (* 3 4)) aren't too far apart in my opinion, and I like the cause->effect relationship implied in representing "put 1, 2, and 3 into f" as (1 2 3 f) or (1 2 3 -> f) or whatever.)

Note also that since Lisp does not distinguish between statements and values:

  • you don't need return, and
  • you don't need a separate ternary operator when you want to branch in a value (the x if c else y syntax in Python for example) and for normal if.

I think Python list comprehensions (or the similarly-styled things in e.g. Haskell) are a good example of the "other way" of thinking about syntax. Guido van Rossum once said something like: it's clearer to have [x for x in l if f(x)] than filter(f, l). My immediate reaction to this is: look at how much longer one of them is. When filter is one function call rather than a syntax-heavy list comprehension, I feel it makes it clearer that filter is a single concept that can be abstracted out.

Now of course the Python is nicer because it's more English-like (and also because you don't have to remember whether the f is a condition for the list element to be included or excluded, something that took me embarrassingly long to remember correctly ...). I'd also guess that I might be able to hammer out Python list comprehensions a bit faster and with less mental overhead in simple cases, since the order in which things are typed out is more like the order in which you think of it.

However, I do feel the Englishness starts to hurt at some point. Consider this:

[x for y in l for x in y]

What does it do? The first few times I saw this (and even now sometimes), I would read it, backtrack, then start figuring out where the parentheses should go and end up confused about the meaning of the syntax: "x for y in l, for x in y, what? Wait no, x, for y in l, for x in y, so actually meaning a list of every x for every x in every y in l".

What I find clearer is something like:

(mapcat (lambda (x) x) l)


(reduce append l)

Yes, this means you need to remember a bunch of building blocks (filter, map, reduce, and maybe more exotic ones like mapcat). Also, you need to remember which position which argument goes in (function first, then collection), and there are no syntactic signposts to remind you, unlike with the list comprehension syntax. However, once you do:

  • they compose and mix very nicely (for example, (mapcat f l) "factors into" (reduce append (map f l))), and
  • there are no "seams" between the built-in list syntax and any compositions on top of them (unlike Python, where if you define your own functions to manipulate lists, they look different from the built-in list comprehension syntax).

I think the last point there is a big consideration (and largely an aesthetic one!). There's something inelegant about a programming language having:

  • many ways to write a mapping from values to values, some in infix notation (1+1) and some in prefix notation (my_function(val)), and others even weirder things (x if c else y);
  • expressions that may either reduce to a value (most things) or then not reduce to a value (if it's an if or return or so on);
  • a syntax style you extend in one way (e.g. prefix notation with def my_function(val): [...]) and others that you either don't extend, or extend in weird ways  (def __eq__(self, a, b): [...]).

Instead you can make a programming language that has exactly one style of syntax (prefix), exactly one type of compound expression (parenthesised terms where the first thing is the function/macro name), and a consistent way to extend all the types of syntax (define functions or define macros). This is especially true since the "natural" abstract representation of a program is a tree (in the same way that the "natural" abstract representation of a sentence is its syntax tree), and prefix notation makes this very clear: you have a node type, and the children of the node.

I think the crux is something like: do you prefer a syntax that is like a collection of different tools for different tasks, or a syntax that highlights how everything can be reduced to a tight set of concepts?

Since some others are commenting about not liking the graph-heavy format: I really liked the format, in particular because having it as graphs rather than text made it much faster and easier to go through and understand, and left me with more memorable mental images. Adding limited text probably would not hurt, but adding lots would detract from the terseness that this presentation effectively achieves. Adding clear definitions of the terms at the start would have been valuable though.

Rather than thinking of a single example that I carried throughout as you suggest, I found it most useful to generate one or more examples as I looked at each graph (e.g. for the danger-zone graphs, in order: judging / software testing, politics, forecasting / medical diagnosis).

Regarding the end of slavery: I think you make good points and they've made me update towards thinking that the importance of materialistic Morris-style models is slightly less and cultural models slightly more.

I'd be very interested to hear what were the anti-slavery arguments used by the first English abolitionists and the medieval Catholic Church (religion? equality? natural rights? utilitarian?).

Which, evidently, doesn't prevent the usual narrative from being valid in other places, that is, countries in which slavery was still well accepted finding themselves forced, first militarily, then technologically, and finally economically, to adapt or perish.

I think there's also another way for the materialistic and idealistic accounts to both be true in different places: Morris' argument is specifically about slavery existing when wage incentives are weak, and perhaps this holds in places like ancient Egypt and the Roman Empire, but had stopped holding in proto-industrial places like 16th-18th century western Europe. However I'm not aware of what specific factor would drive this.

One piece of evidence on whether economics or culture is more important would be comparing how many cases there are where slavery existed/ended in places without cultural contact but with similar economic conditions and institutions, to how many cases there are of slavery existing/ending in places with cultural contact but different economic conditions/institutions.

Thank you for this very in-depth comment. I will reply to your points in separate comments, starting with:

According him, the end of the feudal system in England, and its turning into a modern nation-state, involved among other things the closing off and appropriation, by nobles as a reward from the kingdom, of the former common farmlands they farmed on, as well as the confiscation of the lands owned by the Catholic Church, which for all practical purposes also served as common farmlands. This resulted in a huge mass of landless farmers with no access to land, or only very diminished access, who in turn decades later became the proletarians for the newly developing industries. If that's accurate, then it may be the case that the Industrial Revolution wouldn't have happened had all those poor not have existed, since the very first industries wouldn't have been attractive compared to condition non-forcibly-starved farmers had.

This is very interesting and something I haven't seen before. Based on some quick searching, this seems to be referring to the Inclosure Acts (which were significant, affecting 1/6th of English land) and perhaps specifically this one, while the Catholic Church land confiscation was the 1500s one. My priors on this having a major effect are somewhat skeptical because:

  1. The general shape of English historical GDP/capita is a slight post-plague rise, followed by nothing much until a gradual rise in the 1700s and then takeoff in the 1800s. Likewise, skimming through this, there seem to be no drastic changes in wealth inequality around the time of the Inclosure Acts, though share of wealth held by the top 10% slightly rise in the late 1700s and personal estates (note: specifically excludes real estate) of farmers and yeomen slightly drop around 1700 before rebounding. Any pattern of more poor farmers must evade these statistics, either by being small enough, or by not being captured in these crude overall stats (which is very possible, especially if the losses for one set of farmers were balanced by gains for another).
  2. Other sources I've read support the idea that farmers in general prefer industrial jobs. It's not just Steven Pinker either; Vaclav Smil's Energy and Civilization (my review) has this passage:

Moreover, the drudgery of field labor in the open is seldom preferable even to unskilled industrial work in a factory. In general, typical factory tasks require lower energy expenditures than does common farm work, and in a surprisingly short time after the beginning of mass urban industrial employment the duration of factory work became reasonably regulated 

It's probably the case that it's easier to recruit landless farmers into industrial jobs, and I can imagine plausible models where farmers resist moving to cities, especially for uncertainty-avoidance / risk-aversion reasons. However, the effect of this, especially in the long term, seems limited by things like population growth in (already populous) cities, people having to move off their family farms anyways due to primogeniture, and people generally being pretty good at exploiting available opportunities. An exception might be if early industrialization was tenable only under a strict labor availability threshold that was met only because of the mass of landless farmers created by the English acts.

Load More