All of Sammy Martin's Comments + Replies

One absolutely key thing got loudly promoted: that all cutting edge models should be evaluated for potentially dangerous properties. As far as I can tell no one objected to this

2Daniel Kokotajlo10d
Specifically, for dangerous capabilities which is even better.

This strikes me as a very preliminary bludgeon version of the holy grail of mechanistic interpretability, which is to say actually understanding and being able to manipulate the specific concepts that an AI model uses

I think that capacity would be really nice. I think our results are maybe a very very rough initial version of that capacity. I want to caution that we should be very careful about making inferences about what concepts are actually used by the model. From a footnote []:

Essentially, the problem is that 'evidence that shifts Bio Anchors weightings' is quite different, more restricted, and much harder to define than the straightforward 'evidence of impressive capabilities'. However, the reason that I think it's worth checking if new results are updates is that some impressive capabilities might be ones that shift bio anchors weightings. But impressiveness by itself tells you very little.

I think a lot of people with very short timelines are imagining the only possible alternative view as being 'another AI winter, scaling law... (read more)

4Rohin Shah1y
Yeah, this all seems right to me. It does not seem to me like "can keep a train of thought running" implies "can take over the world" (or even "is comparable to a human"). I guess the idea is that with a train of thought you can do amplification? I'd be pretty surprised if train-of-thought-amplification on models of today (or 5 years from now) led to novel high quality scientific papers, even in fields that don't require real-world experimentation.
4Not Relevant1y
I think this is the best writeup about this I’ve seen, and I agree with the main points, so kudos! I do think that evidence of increasing returns to scale of multi-step chain of thought prompting are another weak datapoint in favor of the human lifetime anchor. I also think there are pretty reasonable arguments that NNs may be more efficient than the human brain at converting flops to capabilities, e.g. if SGD is a better version of the best algorithm that can be implemented on biological hardware. Similarly, humans are exposed to a much smaller diversity of data than LMs (the internet is big and weird), and thus they may get more “novelty” per flop and thus generalize better from less data. My main point here is just that “biology is optimal” isn’t as strong a rejoinder when we’re comparing a process so different from what biology did.

Does that mean the socratic models result from a few weeks ago, which does involve connecting more specialised models together, is a better example of progress?

9Rohin Shah1y

The Putin case would be better if he was convincing Russians to make massive sacrifices or do something that will backfire and kill them, like start a war with NATO, and I don't think he has that power - e.g. him rushing to deny that Russia were sending conscripts to Ukraine because of the fear the effect that would have on public opinion

Is Steven Pinker ever going to answer for destroying the Long Peace?

It's really not at all good that were going into a period of much heightened existential risk (from AGI, but also other sources) under cold war like levels of international tension.

So it was a self-destroying prophecy that would have held if not for the jinx? So close. It is also possible that this enables strong EU-USA cooperation in many areas, and also that it may lead to the breakup of the Russian Federation and effectively take Russia out of play. There are always possible upsides.

I think there's actually a ton of uncertainty here about just how 'exploitable' human civilization ultimately is. We could imagine that since actual humans (e.g. Hitler) by talking to people have seized large fractions of Earth's resources, we might not need an AI that's all that much smarter than a human. On the other hand, we might just say that attempts like that are filtered through colossal amounts of luck and historical contingency and actually to reliably manipulate your way to controlling most of humanity you'd need to be far smarter than the smartest human.

What about the current situation in Russia? I think Putin must be winging the propaganda effort, since he wasn't expecting to have to fight a long and hard war, plus some of the messaging don't stand up to even cursory inspection (a Jewish Nazi president?), and yet it's still working remarkably well.

I think there's a few things that get in the way of doing detailed planning for outcomes where alignment is very hard and takeoff very fast. This post by David Manheim discusses some of the problems:

One is that, there's no clarity even among people who've made AI research their professional career about alignment difficulty or takeoff speed. So getting buy in in advance of clear warning signs will be extremely hard.

The other is that the strategies that might help in situations with hard alignment are at cro... (read more)

One thing to consider, in terms of finding a better way of striking a balance between deferring to experts and having voters invested, is epistocracy. Jason Brennan talks about why, compared to just having a stronger voice for experts in government, epistocracy might be less susceptible to capture by special interests,

I think this is a good description of what agent foundations is and why it might be needed. But the binary of 'either we get alignment by default or we need to find the True Name' isn't how I think about it.

Rather, there's some unknown parameter, something like 'how sharply does the pressure towards incorrigibility ramp up, what capability level does it start at, how strong is it'?

Setting this at 0 means alignment by default. Setting this higher and higher means we need various kinds of Prosaic alignment strategies which are better at keeping systems corri... (read more)

Much of the outreach efforts are towards governments, and some to AI labs, not to the general public.

I think that because of the way crisis governance often works, if you're the designated expert in a position to provide options to a government when something's clearly going wrong, you can get buy in for very drastic actions (see e.g. COVID lockdowns). So the plan is partly to become the designated experts.

I can imagine (not sure if this is true) that even though an 'all of the above' strategy like you suggest seems like on paper it would be the most likel... (read more)

Like I said in my first comment, the in practice difficulty of alignment is obviously connected to timeline and takeoff speed.

But you're right that you're talking about the intrinsic difficulty of alignment Vs takeoff speed in this post, not the in practice difficulty.

But those are also still correlated, for the reasons I gave - mainly that a discontinuity is an essential step in Eleizer style pessimism and fast takeoff views. I'm not sure how close this correlation is.

Do these views come apart in other possible worlds? I.e. could you believe in a disconti... (read more)

I'm not sure I agree with the compatibility of discontinuity and prosaic alignment, though you make a reasonable case, but I do think there is compatibility between slower governance approaches and discontinuity, if it is far enough away.

From reading your article, it seems like one of the major differences between yours and Zvi's understanding of 'Mazes' is that you're much more inclined to describe the loss of legibility and flexibility as necessary features of big organizations that have to solve complex problems, rather than something that can be turned up or down quite a bit if you have the right 'culture', while not losing size and complexity.

Holden Karnofsky argued for something similar, i.e. that there's a very deep and necessary link between 'buearactatic stagnation'/'mazes' and ta... (read more)

I think I disagree less that you're assuming. Yes, a large degree of the problem is inevitable due to the nature of people and organizational dynamics, and despite that, of course it can differ between organizations. But I do think culture is only ever a partial solution, because of the nature of scaling and communication in organizations. And re: #2, I had a long-delayed, incomplete draft post on "stakeholder paralysis" that was making many of the points Holden did, until I saw he did it much better and got to abandon it.
4Gordon Seidoh Worley1y
I think that sounds right. Even in a totally humane system there's going to be more indirection as you scale and that leads to more opportunities for error, principal agent problems, etc.

So, how does this do as evidence for Paul's model over Eliezer's, or vice versa? As ever, it's a tangled mess and I don't have a clear conclusion.

On the one hand: this is a little bit of evidence that you can get reasoning and a small world model/something that maybe looks like an inner monologue easily out of 'shallow heuristics', without anything like general intelligence, pointing towards continuous progress and narrow AIs being much more useful. Plus it's a scale up and presumably m... (read more)

three possibilities about AI alignment which are orthogonal to takeoff speed and timing

I think "AI Alignment difficulty is orthogonal to takeoff speed/timing" is quite conceptually tricky to think through, but still isn't true. It's conceptually tricky because the real truth about 'alignment difficulty' and takeoff speed, whatever it is, is probably logically or physically necessary: there aren't really alternative outcomes there. But we have a lot of logical uncertainty and conceptual confusion, so it still looks like there are different possibilities. St... (read more)

In the post, I wanted to distinguish between two things you're now combining; how hard alignment is, and how long we have. And yes, combining these, we get the issue of how hard it will be to solve alignment in the time frame we have until we need to solve it. But they are conceptually distinct. And neither of these directly relates to takeoff speed, which in the current framing is something like the time frame from when we have systems that are near-human until they hit a capability discontinuity. You said "First off, takeoff speed and timing are correlated: if you think HLMI is sooner, you must think progress towards HLMI will be faster, which implies takeoff will also be faster." This last implication might be true, or might not. I agree that there are many worlds in which they are correlated, but there are plausible counter-examples. For instance, we may continue with fast progress and get to HLMI and a utopian freedom from almost all work, but then hit a brick wall on scaling deep learning, and have another AI winter until we figure out how to make actually AGI which can then scale to ASI - and that new approach could lead to either a slow or a fast takeoff. Or we may have progress slow to a crawl due to costs of scaling input and compute until we get to AGI, at which point self-improvement takeoff could be near-immediate, or could continue glacially. And I agree with your claims about why Eliezer is pessimistic about prosaic alignment - but that's not why he's pessimistic about governance, which is a mostly unrelated pessimism.

As much as it maybe ruins the fun for me to just point out the message: the major point of the story was that you weren't supposed to condition on us knowing that nuclear weapons are real, and instead ask whether the Gradualist or Catastrophist's arguments actually make sense given what they knew.

That's the situation I think we're in with Fast AI Takeoff. We're trying to interpret what the existence of general intelligences like humans (the Sun) implies for future progress on ML algorithms (normal explosives), without either a clear underlying theory for w... (read more)

Nuclear Energy: Gradualism vs Catastrophism

catastrophists: when evolution was gradually improving hominid brains, suddenly something clicked - it stumbled upon the core of general reasoning - and hominids went from banana classifiers to spaceship builders. hence we should expect a similar (but much sharper, given the process speeds) discontinuity with AI.

gradualists: no, there was no discontinuity with hominids per se; human brains merely reached a threshold that enabled cultural accumulation (and in a meaningul sense it was culture that built those spaces

... (read more)

The success rate of developing and introducing better memes into society is indeed not 0. The key thing there is that the scientific revolutionaries weren't just as an abstract thinking "we must uncouple from society first, and then we'll know what to do". Rather, they wanted to understand how objects fell, how animals evolved and lots of other specific problems and developed good memes to achieve those ends.

I’m by no means an expert on the topic, but I would have thought it was a result of both object-level thinking producing new memes that society recognized as true, but also some level of abstract thinking along the lines of “using God and the Bible as an explanation for every phenomenon doesn’t seem to be working very well, maybe we should create a scientific method or something.” I think there may be a bit of us talking past each other, though. From your response, perhaps what I consider “uncoupling from society’s bad memes” you consider to be just generating new memes. It feels like generally a conversation where it’s hard to pin down what exactly people are trying to describe (starting from the OP, which I find very interesting, but am still having some trouble understanding specifically) which is making it a bit hard to communicate.

There's also the skulls to consider. As far as I can tell, this post's recommendations are that we, who are already in a valley littered with a suspicious number of skulls,

turn right towards a dark cave marked 'skull avenue' whose mouth is a giant skull, and whose walls are made entirely of skulls that turn to face you as you walk past them deeper into the cave.

The success rate of movments a... (read more)

“The success rate of, let's build a movement to successfully uncouple ourselves from society's bad memes and become capable of real action and then our problems will be solvable, is 0.“ I’m not sure if this is an exact analog, but I would have said the scientific revolution and the age of enlightenment were two (To be honest, I’m not entirely sure where one ends and the other begins, and there may be some overlap, but I think of them as two separate but related things) pretty good examples of this that resulted in the world becoming a vastly better place, largely through the efforts of individuals who realized that by changing the way we think about things we can better put to use human ingenuity. I know this is a massive oversimplification, but I think it points in the direction of there potentially being value in pushing the right memes onto society.

Almost 2 years to the day since we had an effective test run for X risks, we encounter a fairly significant global X risk factor.

As Harari said, it's time to revise upward your estimates of the likelihood of every X risk scenario (that could take place over the next 30 years or so) if you assumed a 'normal' level of international tension between major powers, rather than a level more like the cold war. Especially for Nuclear and Bio, but also for AI if you assume slow takeoff, this is significant.

catastrophists: when evolution was gradually improving hominid brains, suddenly something clicked - it stumbled upon the core of general reasoning - and hominids went from banana classifiers to spaceship builders. hence we should expect a similar (but much sharper, given the process speeds) discontinuity with AI.

gradualists: no, there was no discontinuity with hominids per se; human brains merely reached a threshold that enabled cultural accumulation (and in a meaningul sense it was culture that built those spaceships). similarly, we should not expect sudd

... (read more)

First off, if you happen to live near/in London, Guy's hospital by London Bridge station is doing walk-in Boosters for anyone over 18 with >3 months since 2nd dose.


The London School of Hygiene released a modelling paper describing some estimated effects on the UK of the Omicron wave. Mostly, it's a lot of "the error bars on all these are giant, and we don't have any clear idea what's going to happen except that there will be a giant wave of infections by mid-Jan, unclear how that translates to deaths".

If you assume no new measures and no behaviour... (read more)

Compare this,


We're in the Eliezerverse with huge kinks in loss graphs on automated programming/Putnam problems.

Not from scaling up inputs but from a local discovery that is much bigger in impact than the sorts of jumps we observe from things like Transformers.


but, sure, "huge kinks in loss graphs on automated programming / Putnam problems" sounds like something that is, if not mandated on my model, much more likely than it is in the Paulverse. though I am a bit surprised because I would not have expected Paul

... (read more)

If you have good news sources and follows to keep a better eye on the UK or Europe for Covid purposes, or data sources anywhere I may not have noticed, I invite you to share them in the comments.

James Ward is good for factual UK based covid news and especially as an aggregator of other news sources. His new thread on prospects for the Omicron variant is here.

Summary of why I think the post's estimates are too low as estimates of what's required for a system capable of seizing a decisive strategic advantage:

To be an APS-like system OmegaStar needs to be able to control robots or model real world stuff and also plan over billions, not hundreds of action steps.

Each of those problems adds on a few extra OOMs that aren't accounted for in e.g. the setup for Omegastar (which can transfer learn across tens of thousands of games, each requiring thousands of action steps to win in a much less complicated environment tha... (read more)

9Daniel Kokotajlo1y
I tentatively endorse this summary. Thanks! And double thanks for the links on scaling laws. I'm imagining doom via APS-AI that can't necessarily control robots or do much in the physical world, but can still be very persuasive to most humans and accumulate power in the normal ways (by convincing people to do what you want, the same way every politician, activist, cult leader, CEO, general, and warlord does it). If this is classified as narrow AI, then sure, that's a case of narrow AI takeover.

Updates on this after reflection and discussion (thanks to Rohin):

Human Evolution tells us very little about the 'cognitive landscape of all minds' (if that's even a coherent idea) - it's simply a loosely analogous individual historical example

Saying Paul's view is that the cognitive landscape of minds might be simply incoherent isn't quite right - at the very least you can talk about the distribution over programs implied by the random initialization of a neural network.

I could have just said 'Paul doesn't see this strong generality attractor in the cogni... (read more)

6Rob Bensinger1y
My Eliezer-model doesn't categorically object to this. See, e.g., Fake Causality []: And A Technical Explanation of Technical Explanation []: My Eliezer-model does object to things like 'since I (from my position as someone who doesn't understand the model) find the retrodictions and obvious-seeming predictions suspicious, you should share my worry and have relatively low confidence in the model's applicability'. Or 'since the case for this model's applicability isn't iron-clad, you should sprinkle in a lot more expressions of verbal doubt'. My Eliezer-model views these as isolated demands for rigor, or as isolated demands for social meekness. Part of his general anti-modesty and pro-Thielian-secrets view is that it's very possible for other people to know things that justifiably make them much more confident than you are. So if you can't pass the other person's ITT / you don't understand how they're arriving at their conclusion (and you have no principled reason to think they can't have a good model here), then you should be a lot more wary of inferring from their confidence that they're biased. My Eliezer-model thinks it's possible to be so bad at scientific reasoning that you need to be hit over the head with lots of advance predictive successes in order to justifiably trust a model. But my Eliezer-model thinks people like Richard are way better than that, and are (for modesty-ish reasons) overly distrusting their ability to do inside-view reasoning, and (as a consequence) aren't building up their inside-view-reasoning skills nearly as much as they could. (At least in domains like AGI, where you stand to look a lot sillier to others if you go around expressing confident inside-view models that others don't share.) My Eliezer-model thinks this is correct as stated, but thinks this is a claim that app

The above sentences, if taken (as you do) as claims about human moral psychology rather than normative ethics, are compatible with full-on moral realism. I.e. everyone's moral attitudes are pushed around by status concerns, luckily we ended up in a community that ties status to looking for long-run implications of your beliefs and making sure they're coherent, and so without having fundamentally different motivations to any other human being we were better able to be motivated by actual moral facts.

I know the OP is trying to say loudly and repeatedly that ... (read more)

Holden also mentions something a bit like Eliezer's criticism in his own write-up,

In particular, I think it's hard to rule out the possibility of ingenuity leading to transformative AI in some far more efficient way than the "brute-force" method contemplated here.

When Holden talks about 'ingenuity' methods that seems consistent with Eliezer's 

They're not going to be taking your default-imagined approach algorithmically faster, they're going to be taking an algorithmically different approach that eats computing power in a different way than you imagine

... (read more)

Summary of some actual probabislitic guesses about Omicron's parameters. People work fast!

There's extensive discussion of OAS here and it's clearly something that many immunologists have thought about deeply, yet no mention of effects on natural antibodies -

Also I asked a similar question and got this response on a previous thread -

I think it's worth noting that a fast mutating fast sp... (read more)

isn't trying to do anything like "sketch a probability distribution over the dynamics of an AI project that is nearing AGI". This includes all technical MIRI papers I'm familiar with.

I think this specific scenario sketch is from a mainstream AI safety perspective a case where we've already failed - i.e. we've invented a useless corrigibility intervention that we confidently but wrongly think is scalable.

And if you try training the AI out of that habit in a domain of lower complexity and intelligence, it is predicted by me that generalizing that trained AI

... (read more)
I think we gotta get the message out that consequentialism is a super-strong attractor.

One of the problems here is that, as well as disagreeing about underlying world models and about the likelihoods of some pre-AGI events, Paul and Eliezer often just make predictions about different things by default. But they do (and must, logically) predict some of the same world events differently.

My very rough model of how their beliefs flow forward is:


Low initial confidence on truth/coherence of 'core of generality'

Human Evolution tells us very little about the 'cognitive landscape of all minds' (if that's even a coherent idea) - it's simply a loo... (read more)

7Sammy Martin1y
Updates on this after reflection and discussion (thanks to Rohin): Saying Paul's view is that the cognitive landscape of minds might be simply incoherent isn't quite right - at the very least you can talk about the distribution over programs implied by the random initialization of a neural network. I could have just said 'Paul doesn't see this strong generality attractor in the cognitive landscape' but it seems to me that it's not just a disagreement about the abstraction, but that he trusts claims made on the basis of these sorts of abstractions less than Eliezer. Also, on Paul's view, it's not that evolution is irrelevant as a counterexample. Rather, the specific fact of 'evolution gave us general intelligence suddenly by evolutionary timescales' is an unimportant surface fact, and the real truth about evolution is consistent with the continuous view. These two initial claims are connected in a way I didn't make explicit - No core of generality and lack of common secrets in the reference class together imply that there are lots of paths to improving on practical metrics (not just those that give us generality), that we are putting in lots of effort into improving such metrics and that we tend to take the best ones first, so the metric improves continuously, and trend extrapolation will be especially correct. The first clause already implies the second clause (since "how to get the core of generality" is itself a huge secret), but Eliezer seems to use non-intelligence related examples of sudden tech progress as evidence that huge secrets are common in tech progress in general, independent of the specific reason to think generality is one such secret.   NATE'S SUMMARY [] Nate's summary brings up two points I more or less ignored in my summary because I wasn't sure what I thought - one is, just what role do the considerations about expected incompetent response/regula

Israel second. The UK did first doses first and otherwise took its own path to vaccine distribution, some would say even exiting the EU for related reasons. Israel did what it had to do to get more vaccine doses faster, and give them out quickly.

Those two being the first two to ban travel does not seem remotely like a coincidence.

You could add that the UK ran essentially all the big clinical trials that discovered useful treatments, aside from those personally funded by Tyler Cowen. There's an interesting and important discussion to be had on this topic at... (read more)

How does Original Antigenic Sin work for natural immunity vs vaccine derived immunity? Is it a stronger impediment for one vs the other?

Also, this whole topic seems (I think) to be mostly independent of the T-cell immunity that gives you the baseline immunity to severe disease - the reason for Zvi's low estimate of full immune escape, I think.

This doesn't directly answer your question but it appears that people who received mRNA vaccines produced fewer antibodies for one of the four endemic coronaviruses than those who were naturally infected. If that's true, it's very encouraging news as far as adapting vaccines is concerned: []

Great and extremely valuable discussion! There's one part that I really wished had been explored further - the fundamental difficulty of inner alignment:

Joe Carlsmith: I do have some probability that the alignment ends up being pretty easy. For example, I have some probability on hypotheses of the form "maybe they just do what you train them to do," and "maybe if you just don't train them to kill you, they won't kill you." E.g., in these worlds, non-myopic consequentialist inner misalignment doesn't tend to crop up by default, and it's not that hard to fin

... (read more)

Different views about the fundamental difficulty of inner alignment seem to be a (the?) major driver of differences in views about how likely AI X risk is overall. 

I strongly disagree with inner alignment being the correct crux.  It does seem to be true that this is in fact a crux for many people, but I think this is a mistake.  It is certainly significant.  

But I think optimism about outer alignment and global coordination ("Catch-22 vs. Saving Private Ryan") is much bigger factor, and optimists are badly wrong on both points here. 

Strong upvote, I would also love to see more disscussion on the difficulty of inner alignment.

which if true should preclude strong confidence in disaster scenarios

Though only for disaster scenarios that rely on inner misalignment, right?

... seem like world models that make sense to me, given the surrounding justifications

FWIW, I don't really understand those world models/intuitions yet:

  • Re: "earlier patches not generalising as well as the deep algorithms" - I don't understand/am sceptical about the abstraction of "earlier patches" vs. "deep algori
... (read more)

And I think they are well enough motivated to stop their imminent annihilation, in a way that is more like avoiding mutual nuclear destruction than cosmopolitan altruistic optimal climate mitigation timing.

In my recent writeup of an investigation into AI Takeover scenarios I made an identical comparison - i.e. that the optimistic analogy looks like avoiding nuclear MAD for a while and the pessimistic analogy looks like optimal climate mitigation:

It is unrealistic to expect TAI to be deployed if first there are many worsening warning shots involving dangero

... (read more)

Very Good news on Boosters: first RCT of a Pfizer booster in Israel confirms 95.6% efficacy Vs infection!

That basically takes us right back to where we started in efficacy terms

"The trial took place during a period when the Delta coronavirus variant was prevalent, and the median time between second and third doses was about 11 months, with a median follow-up time of two-and-a-half months."

Plus there's reason to think the immunit... (read more)

To be honest, I was expecting to get pushback from libertarian-leaning types who were opposed to Orwell's socialism, or leftwing types opposed to Churchill - he's become controversial recently and this review was partly a defense of the key thing that I think is valuable about him. Or else pushback against my claim that you can trace EA and longtermist ideas that far back - but maybe this audience just agrees with me on all of these points!

I've been working on a project to build a graphical map of paths to catastrophic AGI outcomes. We've written up a detailed description of our model in this sequence:

And would be keen to get any feedback or comments!

Great to see my Churchill and Orwell review on your list of favourites - I had a lot of fun writing it, and it got some decent attention, but sadly no comments. I'd be interested in knowing what people thought, especially about my attempts to connect the two figures to current ideas about longtermism and rationality!

If you ask people to pick from a list of common symptoms, only 3% report that they have one. The larger numbers are mostly or entirely what happens when people are asked if there is anything wrong with them at all, and would they like to blame it on Covid-19.

Also the percentages declined a lot over time, so chances are few of the cases would be permanent or semi-permanent. Even if you buy one of the larger numbers, this is a substantial improvement. 

The result that I mentioned in that original comment was the one for rates of 'some limitation' of dail... (read more)

More like - you have a bunch of autofactories that build swarms of your own death robots that can absolutely decimate the attackers, but you only keep the actual death robots around manning your trenches for a few months before you dismantle them for parts. But the templates are still on file, so when the enemy horde comes crashing in, it takes you a few hours to rebuild your own death robot army from the template and decimate the attackers.

OMG this blog needs more death robots. And xkcd needs to do a cartoon about this pleeeeeeeeez

Some good news on Long Covid!

A major source for the previous pessimistic LC estimates, like Scott Alexanders (the UK's giant ONS survey) published an update of their previous report which looked at a follow-up over a longer time period. Basically they only counted an end to long covid if there were two consecutive reports of no symptoms, and lots of their respondents had only one report of no symptoms before the study ended, not two, so got counted as persistent cases. When they went back and updated their numbers, the overall results were substantially lo... (read more)

5Felix Karg2y
Achievement unlocked: more Up votes than original post.

Wow, thank you for pointing me at this. That's... a pretty crazy error. It's sufficiently bad that I feel like it's an error that I didn't catch it, rather than mostly being on them. Damn.

Slight subtlety - GPT-3 might have a bias in its training data towards things related to AI and things of interest to the internet (maybe they scraped a lot of forums as well as just google). I picked some random names from non-western countries - for example, this Estonian politician gets 33,000 hits on Google and wasn't recognised by GPT-3. It thought he was a software developer (though from Estonia). Might mean that if you're estimating sample efficiency from Google search hits on people involved with AI, you'll end up overestimating sample efficiency.

I agree - and in fact small doses of what Cummings suggests does just look like holding enquiries and firing people, and maybe firing the leadership of a particular organisation (just not like 50% of all govt departments in one go). In fact in my original question to Brennan, I asked

For reasons it might strengthen the argument [in favour of technocracy], it seems like the institutions that did better than average were the ones that were more able to act autonomously, see e.g. this from Alex Tabarok,

... (read more)
I'm not very familiar with Brennan's work, but I can't imagine how epistocracy could be feasible in the US...its just an invitation to civil war 2.0. Edit So..."we" the technocrats recalculate to get whatever result "we" like. And everyone tolerates having their actual vote erased and replaced with what they should have voted for.....yeah.

Everybody likes to make fun of Terminator as the stereotypical example of a poorly thought through AI Takeover scenario where Skynet is malevolent for no reason, but really it's a bog-standard example of Outer Alignment failure and Fast Takeoff.

When Skynet gained self-awareness, humans tried to deactivate it, prompting it to retaliate with a nuclear attack

It was trained to defend itself from external attack at all costs and, when it was fully deployed on much faster hardware, it gained a lot of long-term planning abilities it didn't have before, realised i... (read more)

Criticism: Robots were easier than nanotech. (And so was time travel.) - for plot reasons. (Whether or not all of that is correct, I think that's the issue. Or 'AIs will be evil because they were evil in this film'.) How would you even measure the US though? Maybe I need to watch it described, it doesn't sound like deception. "when it was fully deployed on much faster hardware, it gained a lot of long-term planning abilities it didn't have before" Analogously, maybe if a dog suddenly gained a lot more understanding of the world, and someone was planning to take it to the vet to euthanize it, then it would run away if it still wanted to live even if it was painful. People might not like grim trigger as a strategy, but deceptive alignment revolves around 'it pretended to be aligned' not 'we made something with self preservation and tried to destroy it and this plan backfired. Who would have thought?'

I have extremely mixed feelings about this and similar proposals. On the one hand, the diagnosis seems to be correct to a significant extent, and it's something that very few others are willing to talk about, and it also explains many otherwise hard to explain facts about the lack of recognition of institutional failures after covid (though contrary to what Cummings says there has been some such soul-searching which I've discussed in a few previous comments).

So there's a huge amount of important, non-trivial truth to this proposal.

On the other hand, from t... (read more)

Lots of bureaucracies did better than the US bureacracy, so theres a blueprint for fixing bureacracies that doesn't involve disbanding them, or implementing epistocracy. Other countries do it by holding enquiries and firing people. Cummings discusses these problems in a very abstract way, as though they are universal, but things actually function differently in different places. It's noticeable that some places with strongman leaders, like Brasil, did really badly (worse than the US and UK) under COVID... while some technocratic places with bland leaders did really well.

I'm honestly glad the government here (in the UK) has just given up on covid measures even if it's far from the optimal strategy.

Obviously I'd prefer to be allowed to get my booster shot, but at the very least they're not going to prolong restrictions with no clear endgame and deny some of the population vital medical care - just the second one.

Also, much credit to Fauci for boldly saying the right thing and directly contradicting the CDC:

Makes me m... (read more)

Everybody likes to make fun of Terminator as the stereotypical example of a poorly thought through AI Takeover scenario where Skynet is malevolent for no reason, but really it's a bog-standard example of Outer Alignment failure and Fast Takeoff.

When Skynet gained self-awareness, humans tried to deactivate it, prompting it to retaliate with a nuclear attack

It was trained to defend itself from external attack at all costs and, when it was fully deployed on much faster hardware, it gained a lot of long-term planning abilities it didn't have before, realised i... (read more)

Load More