Sammy Martin. Philosophy and Physics BSc, AI MSc at Edinburgh, starting a PhD at King's College London. Interested in ethics, general philosophy and AI Safety.


Mathematical Models of Progress?
Answer by SDMFeb 16, 202118

I made an attempt to model intelligence explosion dynamics in this post, by attempting to make the very oversimplified exponential-returns-to-exponentially-increasing-intelligence model used by Bostrom and Yudkowsky slightly less oversimplified.

This post tries to build on a simplified mathematical model of takeoff which was first put forward by Eliezer Yudkowsky and then refined by Bostrom in Superintelligence, modifying it to account for the different assumptions behind continuous, fast progress as opposed to discontinuous progress. As far as I can tell, few people have touched these sorts of simple models since the early 2010’s, and no-one has tried to formalize how newer notions of continuous takeoff fit into them. I find that it is surprisingly easy to accommodate continuous progress and that the results are intuitive and fit with what has already been said qualitatively about continuous progress.

The page includes python code for the model.

This post doesn't capture all the views of takeoff - in particular it doesn't capture the non-hyperbolic faster growth mode scenario, where marginal intelligence improvements are exponentially increasingly difficult and therefore we get a (continuous or discontinuous switch to a) new exponential growth mode rather than runaway hyperbolic growth.

But I think that by modifying the f(I) function that determines how RSI capability varies with intelligence we can incorporate such views.

(In the context of the exponential model given in the post that would correspond to an f(I) function where 

which would result in a continuous (determined by size of d) switch to a single faster exponential growth mode)

But I think the model still roughly captures the intuition behind scenarios that involve either a continuous or a discontinuous step to an intelligence explosion.

The Meaning That Immortality Gives to Life

Modern literature about immortality is written primarily by authors who expect to die, and their grapes are accordingly sour. 

This is still just as true as when this essay was written, I think - even the Culture had its human citizens mostly choosing to die after a time... to the extent that I eventually decided: if you want something done properly, do it yourself.

But there are exceptions - the best example of published popular fiction that has immortality as a basic fact of life is the Commonwealth Saga by Peter F Hamilton and the later Void Trilogy (the first couple of books were out in 2007).

The Commonwealth has effective immortality, a few downsides of it are even noticable (their culture and politics is a bit more stagnant than we might like), but there's never any doubt at all that it's worth it, and it's barely commented on in the story,

In truth, I suspect that if people were immortal, they would not think overmuch about the meaning that immortality gives to life. 

(Incidentally, the latter-day Void Trilogy Commonwealth is probably the closest a work of published fiction has come to depicting a true eudaimonic utopia that lacks the problems of the culture)

I wonder if there's been any harder to detect shift in how immortality is portrayed in fiction since 2007? Is it still as rare now as then to depict it as a bad thing?

Covid 2/11: As Expected

The UK vaccine rollout is considered a success, and by the standards of other results, it is indeed a success. This interview explains how they did it, which was essentially ‘make deals with companies and pay them money in exchange for doses of vaccines.’

A piece of this story you may find interesting (as an example of a government minister making a decision based on object level physical considerations): multiple reports say Matt Hancock, the UK's health Secretary, made the decision to insist on over-ordering vaccines because he saw the movie Contagion and was shocked into viscerally realising how important a speedy rollout was.

It might just be a nice piece of PR, but even if that's the case it's still a good metaphor for how object level physical considerations can intrude into government decision making

Review of Soft Takeoff Can Still Lead to DSA

I agree with your argument about likelihood of DSA being higher compared to previous accelerations, due to society not being able to speed up as fast as the technology. This is sorta what I had in mind with my original argument for DSA; I was thinking that leaks/spying/etc. would not speed up nearly as fast as the relevant AI tech speeds up.

Your post on 'against GDP as a metric' argues more forcefully for the same thing that I was arguing for, that 

'the economic doubling time' stops being so meaningful - technological progress speeds up abruptly but other kinds of progress that adapt to tech progress have more of a lag before the increased technological progress also affects them? 

So we're on the same page there that it's not likely that 'the economic doubling time' captures everything that's going on all that well, which leads to another problem - how do we predict what level of capability is necessary for a transformative AI to obtain a DSA (or reach the PONR for a DSA)?

I notice that in your post you don't propose an alternative metric to GDP, which is fair enough since most of your arguments seem to lead to the conclusion that it's almost impossibly difficult to predict in advance what level of advantage over the rest of the world in which areas are actually needed to conquer the world, since we seem to be able to analogize persuasion tools to or conquistador-analogues who had relatively small tech advantages, to the AGI situation.

I think that there is still a useful role for raw economic power measurements, in that they provide a sort of upper bound on how much capability difference is needed to conquer the world. If an AGI acquires resources equivalent to controlling >50% of the world's entire GDP, it can probably take over the world if it goes for the maximally brute force approach of just using direct military force. Presumably the PONR for that situation would be awhile before then, but at least we know that an advantage of a certain size would be big enough given no assumptions about the effectiveness of unproven technologies of persuasion or manipulation or specific vulnerabilities in human civilization.

So we can use our estimate of how doubling time may increase, anchor on that gap and estimate down based on how soon we think the PONR is, or how many 'cheat' pathways that don't involve economic growth there are.

The whole idea of using brute economic advantage as an upper limit 'anchor' I got from Ajeya's Post about using biological anchors to forecast what's required for TAI - if we could find a reasonable lower bound for the amount of advantage needed to attain DSA we could do the same kind of estimated distribution between them. We would just need a lower limit - maybe there's a way of estimating it based on the upper limit of human ability since we know no actually existing human has used persuasion to take over the world but as you point out they've come relatively close.

I realize that's not a great method, but is there any better alternative given that this is a situation we've never encountered before, for trying to predict what level of capability is necessary for DSA? Or perhaps you just think that anchoring your prior estimate based on economic power advantage as an upper bound is so misleading it's worse than having a completely ignorant prior. In that case, we might have to say that there are just so many unprecedented ways that a transformative AI could obtain a DSA that we can just have no idea in advance what capability is needed, which doesn't feel quite right to me.

Ten Causes of Mazedom

Finally got round to reading your sequence and it looks like we disagree a lot less than I thought, since your first three causes are exactly what I was arguing for in my reply,

This is probably the crux. I don't think we tend to go to higher simulacra levels now, compared to decades ago. I think it's always been quite prevalent, and has been roughly constant through history. While signalling explanations definitely tell us a lot about particular failings, they can't explain the reason things are worse now in certain ways, compared to before. The difference isn't because of the perennial problem of pervasive signalling. It has more to do with economic stagnation and not enough state capacity. These flaws mean useful action gets replaced by useless action, and allow more room for wasteful signalling.

As one point in favour of this model, I think it's worth noting that the historical comparisons aren't ever to us actually succeeding at dealing with pandemics in the past, but to things like "WWII-style" efforts - i.e. thinking that if we could just do x as well as we once did y then things would have been a lot better.

This implies that if you made an institution analogous to e.g. the weapons researchers of WW2 and the governments that funded them, or NASA in the 1960s, without copy-pasting 1940s/1960s society wholesale, the outcome would have been better. To me that suggests it's institution design that's the culprit, not this more ethereal value drift or increase in overall simulacra levels.

I think you'd agree with most of that, except that you see a much more significant causal role for the cultural factors like increased fragility and social atomisation. There is pretty solid evidence for both being real problems, Jon Haidt presents the best case to take these seriously, although it's not as definitive as you make out (E.g. Suicide rates are basically a random walk), and your explanation for how they lead to institutional problems is reasonable, but I wonder if they are even needed as explanations when your first three causes are so strong and obvious,

Essentially I see your big list like this:

Main Drivers:

Cause 1: More Real Need For Large Organizations (includes decreasing low hanging fruit) Cause 2: Laws and Regulations Favor Large Organizations Cause 3: Less Disruption of Existing Organizations Cause 5: Rent Seeking is More Widespread and Seen as Legitimate

Real but more minor:

Cause 4: Increased Demand for Illusion of Safety and Security Cause 8: Atomization and the Delegitimization of Human Social Needs Cause 7: Ignorance Cause 9: Educational System Cause 10: Vicious Cycle

No idea but should look into:

Cause 6: Big Data, Machine Learning and Internet Economics

Essentially my view is that if you directly addressed the main drivers with large legal or institutional changes the other causes of mazedom wouldn't fight back.

I believe that the 'obvious legible institutional risks first' view is in line with what others who've written on this problem like Tyler Cowen or Sam Bowman think, but it's a fairly minor disagreement since most of your proposed fixes are on the institutional side of things anyway.

Also, the preface is very important - these are some of the only trends that seem to be going the wrong way consistently in developed countries for a while now, and they're exactly the forces you'd expect to be hardest to resist.

The world is better for people than it was back then. There are many things that have improved. This is not one of them.

Review of Soft Takeoff Can Still Lead to DSA

Currently the most plausible doom scenario in my mind is maybe a version of Paul’s Type II failure. (If this is surprising to you, reread it while asking yourself what terms like “correlated automation failure” are euphemisms for.) 

This is interesting, and I'd like to see you expand on this. Incidentally I agree with the statement, but I can imagine both more and less explosive, catastrophic versions of 'correlated automation failure'. On the one hand it makes me think of things like transportation and electricity going haywire, on the other it could fit a scenario where a collection of powerful AI systems simultaneously intentionally wipe out humanity.

Clock-time leads shrink automatically as the pace of innovation speeds up, because if everyone is innovating 10x faster, then you need 10x as many hoarded ideas to have an N-year lead. 

What if, as a general fact, some kinds of progress (the technological kinds more closely correlated with AI) are just much more susceptible to speed-up? I.e, what if 'the economic doubling time' stops being so meaningful - technological progress speeds up abruptly but other kinds of progress that adapt to tech progress have more of a lag before the increased technological progress also affects them? In that case, if the parts of overall progress that affect the likelihood of leaks, theft and spying aren't sped up by as much as the rate of actual technology progress, the likelihood of DSA could rise to be quite high compared to previous accelerations where the order of magnitude where the speed-up occurred was fast enough to allow society to 'speed up' the same way.

In other words - it becomes easier to hoard more and more ideas if the ability to hoard ideas is roughly constant but the pace of progress increases. Since a lot of these 'technologies' for facilitating leaks and spying are more in the social realm, this seems plausible.

But if you need to generate more ideas, this might just mean that if you have a very large initial lead, you can turn it into a DSA, which you still seem to agree with:

  • Even if takeoff takes several years it could be unevenly distributed such that (for example) 30% of the strategically relevant research progress happens in a single corporation. I think 30% of the strategically relevant research happening in a single corporation at beginning of a multi-year takeoff would probably be enough for DSA.
Fourth Wave Covid Toy Modeling

I meant, 'based on what you've said about Zvi's model' I.e. Nostalgebraist says zvi says Rt never goes below 1 - if you look at the plot he produced Rt is always above 1 given Zvi's assumptions, which the London data falsified.

Fourth Wave Covid Toy Modeling
  • It seems better to first propose a model we know can match past data, and then add a tuning term/effect for "pandemic fatigue" for future prediction.

To get a sense of scale, here is one of the plots from my notebook:

The colored points show historical data on R vs. the 6-period average, with color indicating the date.

Thanks for actually plotting historical Rt vs infection rates!

Whereas, it seems more natural to take (3) as evidence that (1) was wrong.

In my own comment, I also identified the control system model of any kind of proportionality of Rt to infections as a problem. Based on my own observations of behaviour and government response, the MNM hypothesis seems more likely (governments hitting the panic button as imminent death approaches, i.e. hospitals begin to be overwhelmed) than a response that ramps up proportionate to recent infections. I think that explains the tight oscillations.

I'd say the dominant contributor to control systems is something like a step function at a particular level near where hospitals are overwhelmed, and individual responses proportionate to exact levels of infection are a lesser part of it.

You could maybe operationalize this by looking at past hospitalization rates, fitting a logistic curve to them at the 'overwhelmed' threshold and seeing if that predicts Rt. I think it would do pretty well.

This tight control was a surprise and is hard to reproduce in a model, but if our model doesn't reproduce it, we will go on being surprised by the same thing that surprised us before.

My own predictions are essentially based on continuing to expect the 'tight control' to continue somehow, i.e. flattening out cases or declining a bit at a very high level after a large swing upwards.

It looks like (subsequent couple of days data seem to confirm this), Rt is currently just below 1 in London - which would outright falsify any model that claims Rt never goes below 1 for any amount of infection with the new variant, given our control system response, which according to your graph, the infections exponential model does predict.

If you ran this model on the past, what would it predict? Based on what you've said, Rt never goes below one, so there would be a huge first wave with a rapid rise up to partial herd immunity over weeks, based on your diagram. That's the exact same predictive error that was made last year.

I note - outside view - that this is very similar to the predictive mistake made last Febuary/March with old Covid-19 - many around here were practically certain we were bound for an immediate (in a month or two) enormous herd immunity overshoot.

Eight claims about multi-agent AGI safety

Humans have skills and motivations (such as deception, manipulation and power-hungriness) which would be dangerous in AGIs. It seems plausible that the development of many of these traits was driven by competition with other humans, and that AGIs trained to answer questions or do other limited-scope tasks would be safer and less goal-directed. I briefly make this argument here.

Note that he claims that this may be true even if single/single alignment is solved, and all AGIs involved are aligned to their respective users.

It strikes me as interesting that much of the existing work that's been done on multiagent training, such as it is, focusses on just examining the behaviour of artificial agents in social dilemmas. The thinking seems to be - and this was also suggested in ARCHES - that it's useful just for exploratory purposes to try to characterise how and whether RL agents cooperate in social dilemmas, what mechanism designs and what agent designs promote what types of cooperation, and if there are any general trends in terms of what kinds of multiagent failures RL tends to fall into.

For example, it's generally known that regular RL tends to fail to cooperate in social dilemmas, 'Unfortunately, selfish MARL agents typically fail when faced with social dilemmas'. From ARCHES:

One approach to this research area is to continually ex-amine social dilemmas through the lens of whatever is the leading AI devel-opment paradigm in a given year or decade, and attempt to classify interest-ing behaviors as they emerge. This approach might be viewed as analogous to developing “transparency for multi-agent systems”: first develop inter-esting multi-agent systems, and then try to understand them.

There seems to be an implicit assumption here that something very important and unique to multiagent situations would be uncovered - by analogy to things like the flash crash. It's not clear to me that we've examined the intersection of RL and social dilemmas enough to notice if this were true, if it were true, and I think that's the major justification for working on this area.

Fourth Wave Covid Toy Modeling

One thing that you didn't account for - the method of directly scaling the Rt by the multiple on the R0 (which seems to be around 1.55), is only a rough estimate of how much the Rt will increase by when the effective Rt is lowered in a particular situation. It could be almost arbitrarily wrong - intuitively, if the hairdressers are closed, that prevents 100% of transmission in hairdressers no matter how much higher the R0 of the virus is.

For this reason, the actual epidemiological models (there aren't any for the US for the new variant, only some for the UK), have some more complicated way of predicting the effect of control measures. This from Imperial College:

We quantified the transmission advantage of the VOC relative to non-VOC lineages in twoways: as an additive increase in R that ranged between 0.4 and 0.7, and alternatively as amultiplicative increase in R that ranged between a 50% and 75% advantage. We were not ableto distinguish between these two approaches in goodness-of-fit, and either is plausiblemechanistically. A multiplicative transmission advantage would be expected if transmissibilityhad increased in all settings and individuals, while an additive advantage might reflect increasesin transmissibility in specific subpopulations or contexts.

The multiplicative 'increased transmissibility' estimate will therefore tend to underestimate the effect of control measures. The actual paper did some complicated Bayesian regression to try and figure out which model of Rt change worked best, and couldn't figure it out.

Measures like ventilation, physical distancing when you do decide to meet up, and mask use will be more multiplicative in how the new variant diminishes their effect. The parts of the behaviour response that involve people just not deciding to meet up or do things in the first place, and anything involving mandatory closures of schools, bars etc. will be less multiplicative.


I believe this is borne out in the early data. Lockdown 1 in the UK took Rt down to 0.6. The naive 'multiplicative' estimate would say that's sufficient for the new variant, Rt=0.93. The second lockdown took Rt down to 0.8, which would be totally insufficient. You'd need Rt for the old variant of covid down to 0.64 on the naive multiplicative estimate - almost what was achieved in March. I have a hard time believing it was anywhere near that low in the Tier 4 regions around Christmas.

But the data that's come in so far seems to indicate that Tier 4 + Schools closed has either levelled off or caused slow declines in infections in those regions where they were applied.

First, the random infection survey - London and South East are in decline and East of England has levelled off (page 3). The UKs symptom study, which uses a totally different methodology, confirms some levelling off and declines in those regions - page 6. It's early days, but clearly Rt is very near 1, and likely below 1 in London. The Financial Times cottoned on to this a few days late but no-one else seems to have noticed.

I think this indicates a bunch of things - mainly that infections caused by the new variant can and will be stabilized or even reduced by lockdown measures which people are willing to obey. It's not impossible if it's already happening.


To start, let’s also ignore phase shifts like overloading hospitals, and ignore fatigue on the hopes that vaccines coming soon will cancel it out, although there’s an argument that in practice some people do the opposite.

I agree with ignoring fatigue, but ignoring phase shifts? If it were me I'd model the entire control system response as a phase shift with the level for the switch in reactions set near the hospital overwhelm level - at least on the policy side, there seems to be an abrupt reaction specifically to the hospital overloading question. The British government pushed the panic button a few days ago in response to that and called a full national lockdown. I'd say the dominant contributor to control systems is something like a step function at a particular level near where hospitals are overwhelmed, and individual responses proportionate to exact levels of infection are a lesser part of it.

I think the model of the control system as a continuous response is wrong, and a phased all-or-nothing response for the government side of things, plus taking into account non-multiplicative effects on the Rt, would produce overall very different results - namely that a colossal overshoot of herd immunity in a mere few weeks is probably not happening. I note - outside view - that this is very similar to the predictive mistake made last Febuary/March with old Covid-19 - many around here were practically certain we were bound for an immediate (in a month or two) enormous herd immunity overshoot.

Load More