All of Roko's Comments + Replies

Yes, and also using radio with good encryption to communicate quickly.

It certainly seems that a mastery of tank warfare would have helped a lot. But the British experience with tanks shows that there was a huge amount of resistance within the military to new forms of warfare. Britain only had tanks because Winston Churchill made it his priority to support them.

New weapon systems are not impressive at first. The old ways are typically a local optimum. So the real question here is how to leave that local optimum!

That's a good point. I will clarify. I mean [a] - you win, the enemy surrenders.

Do outcomes "Russia/France surrenders, France/Russia stays neutral or whitepeaces" count?

I'm struggling to see why fun books would make any difference. Germany didn't lose because it ran out of light reading material.

As for troop morale and so on, I don't think that was a decisive element as by the time it started to matter, defeat was already overdetermined.

In other words, I think Germany would have lost WWI even with infinite morale.

If it pays out in advance it isn't insurance.

A contract that relies on a probability to calculate payments is also a serious theoretical headache. If you are a Bayesian, there's no objective probability to use since probabilities are subjective things that only exist relative to a state of partial ignorance about the world. If you are a frequentist there's no dataset to use.

There's another issue.

As the threat of extinction gets higher and also closer in time, it can easily be the case that there's no possible payment that people ought to rationally accept.... (read more)

Even in a traditional accounting sense, I'm not aware that there is any term that could capture the probable existential effects of a research, but I understand what @So8res is trying to pursue in this post which I agree with. But, I think apocalypse insurance is not the proper term here. 

I think IAS/IFRS 19, actuarial gains or losses / IFRS 26 Retirement benefits are more closer to the idea - though these theortical accounting approaches applies to employees of a company. But these can be tweaked to another form of accounting theory (on another form ... (read more)

Yes, utility of money is currently fairly well bounded. Liability insurance is a proxy for imposing risks on people, and like most proxies comes apart in extreme cases. However: would you accept a 16% risk of death within 10 years in exchange for an increased chance of living 1000+ years? Assume that your quality of life for those 1000+ years would be in the upper few percentiles of current healthy life. How much increased chance of achieving that would you need to accept that risk? That seems closer to a direct trade of the risks and possible rewards involved, though it still misses something. One problem is that it still treats the cost of risk to humanity as being simply the linear sum of the risks acceptable to each individual currently in it, and I don't think that's quite right.

I think you have a nerdy novel society and a loss of WWI for the same reasons it was lost in our timeline

I think it's definitely possible that it increases defection rates and/or decreases morale among the officers, or that it completely bounces off most of the troops or increases defection rates there. Especially because you can't test it on officers and measure effectiveness in the environment of long trench wars, where nihilism ran rampant, because that environment wouldn't exist until it was far too late to use it as a testing environment. But propaganda and war recruitment was generally pretty inferior to what exists today, e.g. the world's best psychologist was Sigmund Freud and behavioral economics was ~a century away. They were far worse than most people today at writing really good books that are easy to read and that anyone could enjoy, and the contemporary advances in propaganda that they did have resulted in massive and unprecedented scaling in nationalism and war capabilities, even though what they had at the time was vastly less effective than what we're used to today.
I don't really see how you lose; you have a cultural renaissance, an economic boom, and a coordination takeoff in your pocket, and you have substantial degrees of freedom to convert it into German Nationalism that's an order of magnitude memetically stronger than the original WW1.  The risk comes from Britain and France getting their own cultural renaissance, and that's actually a pretty easy fix; just insult the French and British every single time you write something, and that will probably be enough.

Sure you can bring decision theory knowledge. All I'm disallowing is something like bringing back exact plans for a nuke.

Well, it turned out that attacking on The Western Front in WWI was basically impossible. The front barely moved over 4 years, and that was with far more opposing soldiers over a much wider front.

So the best strategy for Germany would have been to dig in really deep and just wait for France to exhaust itself.

At least that's my take as something of an amateur.

This is based on assumption that defense is much easier than offense. This is not true, in fact in WWI attacker's and defender's losses were usually close (for example, ~140k vs ~160k KIA at Verdun).
I like the reasoning on the front, but I disagree. The reason I don't think it holds is because the Western Front as we understand it is what happened after the British Expeditionary Force managed to disrupt the German offensive into France, and the defenses that were deployed were based on the field conditions as they existed. What I am proposing is that initial invasion go directly into the teeth of the untested defenses which were built for the imagined future war (which was over a period of 40 years or so before actual war broke out). I reason these defenses contained all of the mistaken assumptions which the field armies made and learned from in the opening months of the war in our history, but built-in and having no time or flexibility to correct in the face of a general invasion. Even if Britain eventually enters the war, I strongly expect there would be no surprise attack by the expeditionary force during Germany's initial invasion, and so predict the Germans take Paris. That being said, my reasoning does work in reverse and so supports your proposed plan: if we are able to persuade Germany of the historically proven defenses and update them about the true logistical burden, they absolutely could greet the French with a Western Front-grade of defenses on their side of the border. This provides more than enough time to subjugate Russia before mobilization, or perhaps drive them to surrender outright with confirmation that their chief ally is useless. The less aggressive option with France makes the British and US entries into the war even less likely, I'd wager. Frankly, conquering France isn't even a real win condition, it was just what I expected because that's where the invasion went historically. This makes the whole affair look simpler, where Germany and Austria-Hungary are able to prosecute a war on just the Russian and Balkan fronts, it stops being a world war and reduces to a large European war, and they get to exploit the territorial gains going f

But the British could have entered the war anyway. After all, British war goals were to maintain a balance of power in Europe and they don't want France and Russia to fall and Germany to be too strong.

OK, but if I am roleplaying the German side, I might choose to still start WWI but just not attack through Belgium. I will hold the Western Front with France and attack Russia.

Indeed you might - in fact I suggested attacking through the French border directly in the other question where we aid Germany/Austria rather than try to prevent the war. The idea of defending against France is an interesting one - the invasion plans called for knocking out France first and Russia second based on the speed with which they expected each country to mobilize, and Russia is much slower to conquer just based on how far everyone has to walk. Do you estimate choosing to face an invasion from France would be worth whatever they gain from Russia, in the thinking of German command? I genuinely don't know anything about Germany's plans for Russia post invasion in the WW1 case, so I cannot tell.

True. I may in fact have been somewhat underconfident here.

Answer by RokoNov 24, 202321

I think violence helps unaligned AI more than it helps aligned AI.

If the research all goes underground it will slow it down but it will also make it basically guaranteed that there's a competitive, uncoordinated transition to superintelligence.

Well, Altman is back in charge now.... I don't think I'm being overconfident

"Estimate overconfidence" implies that estimate can be zero!

It seems that I was mostly right in the specifics, there was a lot of resistance to getting rid of Altman and he is back (for now)

I didn't make anything up. Altman is now back in charge BTW.

Well the board are in negotiations to have him back

"A source close to Altman says the board had agreed in principle to resign and to allow Altman and Brockman to return, but has since waffled — missing a key 5PM PT deadline by which many OpenAI staffers were set to resign. If Altman decides to leave and start a new company, those staffers would assuredly go with him."

2Ariel Kwiatkowski17d
Huh, whaddayaknow, turns out Altman was in the end pushed back, the new interim CEO is someone who is pretty safety-focused, and you were entirely wrong.   Normalize waiting for more details before dropping confident hot takes.

"A source close to Altman" means "Altman" and I'm pretty sure that he is not very trustworthy party at the moment.

I think there's a pretty big mistake here - the value of not getting flu is a lot more than $200.

At a $5M value of life, each day is worth about $200, so 7 days of almost complete incapacitation is -$1400.

I would certainly pay $1400 upfront to make a bad flu just instantly stop.

dath ilan is currently getting along pretty well without AGI

I hate to have to say it, but you are generalizing from fictional evidence

Dath ilan doesn't actually exist. It's a fantasy journey in Eliezer's head. Nobody has ever subjected it to the rigors of experimentation and attempts at falsification.

The world around us does exist. And things are not going well! We had a global pandemic that was probably caused by government labs that do research into pandemics, and then covered up by scientists who are supposed to tell us the truth about pandemics. THA... (read more)

7Max H1mo
I'm not updating about what's actually likely to happen on Earth based on dath ilan. It seems uncontroversially true that a world where the median IQ was 140 or whatever would look radically different (and better) than the world we currently live in. We do not in fact, live in such a world. But taking a hypothetical premise and then extrapolating what else would be different if the premise were true, is a generally useful tool for building understanding and pumping on intuitions in philosophy, mathematics, science, and forecasting. If you say "but the premise is false!!111!" you're missing the point.

Yes, and I believe that the invention and spread of firearms was key to this as they reduce the skill dependence of warfare, reducing the advantage that a dedicated warband has over a sedentary population.

That, but getting your army from mostly melee to mostly range and solving your operational problems helps a lot too.

What happened in the 1200s is that Mongols had a few exceptionally good leaders

It's consistent with the overhang model that a new phase needs ingredients A, B, C, ... X, Y, Z. When you only have A, ... X it doesn't work. Then Y and Z come, it all falls into place and there's a rapid and disruptive change. In this case maybe Y and Z were good leaders or something. I don't want to take too strong a position on this, as given my research it seems there is still debate among specialists about what exactly the key ingredients were.

Most people, ultimately, do not care about something that abstract and will be happy living in their own little Truman Show realities that are customized to their preferences.

Personally I find The World to be dull and constraining, full of things you can't do because someone might get offended or some lost-purposes system might zap you. Did you fill in your taxes yet!? Did you offend someone with that thoughtcrime?! Plus, there are the practical downsides like ill health and so on.

I'd be quite happy to never see 99.9999999% of humanity ever again, to simpl... (read more)

Could an alien observer have identified Genghis Khan's and the Mongol's future prospects

Well, probably not to that level of specificity, but I think the general idea of empires consuming vulnerable lands and smaller groups would have been obvious

Well, sometimes they can, because sometimes the impending consumption of the resource is sort of obvious. Imagine a room that's gradually filling with a thin layer of petrol on the floor, with a bunch of kids playing with matches in it.

3M. Y. Zuo1mo
Could an alien observer have identified Genghis Khan's and the Mongol's future prospects when he was a teenager? I'm not quite sure.

One possible way to kill humans

I suspect that drones + poison may be surprisingly effective. You only need one small-ish facility to make a powerful poison or bioweapon that drones can spread everywhere or just sneak into the water supply. Once 90% of humans are dead, the remainder can be mopped up.

Way harder to be able to keep things running once we're gone.

Yes, this is a great example of Exploratory engineering

This post is way too long. Forget clown attacks, we desperately need LLMs that can protect us from verbosity attacks.

I wrote two shorter versions, one on the AI governance/US China side of things, and an even shorter summary of the overall situation. I really regret making this post as long as EY's List of Lethalities. I thought that the Cyborgism post was also long and got tons of reads because it was worth reading, so it would be fine, but it didn't go like that. I still think the situation is an emergency.

You can just create personalized environments to your preferences. Assuming that you have power/money in the post-singularity world.

Assuming your preferences don't involve other people or the world

a technical problem, around figuring out how to build an AGI that does what the builder wants

How does a solution to the above solve the coordination/governance problem?

1Carl Feynman1mo
I think the theory is something like the following: We build the guaranteed trustworthy AI, and ask it to prevent the creation of unaligned AI, and it comes up with the necessary governance structures, and the persuasion and force needed to implement them.   I’m not sure this is a certain argument.  Some political actions are simply impossible to accomplish ethically, and therefore unavailable to a “good” actor even given superhuman abilities.

Ah, I see. Yeah, that's a reasonable worry. Any ideas on how someone in those orgs could incentivize such behavior whilst discouraging poorly thought out pivotal acts?

the fact that we are having this conversation simply underscores how dangerous this is and how unprepared we are.

This is the future of the universe we're talking about. It shouldn't be a footnote!

researchers at big labs will not be forced to program an ASI to do bad things against the researchers' own will

Well these systems aren't programmed. Researchers work on architecture and engineering, goal content is down to the RLHF that is applied and the wishes of the user(s), and the wishes of the user(s) are determined by market forces, user preferences, etc. And user preferences may themselves be influenced by other AI systems.

Closed source models can have RLHF and be delivered via an API, but open source models will not be far behind at any given p... (read more)

6Max H1mo
I don't dispute any of that, but I also don't think RLHF is a workable method for building or aligning a powerful AGI. Zooming out, my original point was that there are two problems humanity is facing, quite different in character but both very difficult: * a coordination / governance problem, around deciding when to build AGI and who gets to build it * a technical problem, around figuring out how to build an AGI that does what the builder wants at all. My view is that we are currently on track to solve neither of those problems. But if you actually consider what the world in which we sufficiently-completely solve even of them looks like, it seems like either is sufficient for a relatively high probability of a relatively good outcome, compared to where we are now. Both possible worlds are probably weird hypotheticals which shouldn't have an impact on what our actual strategy in the world we actually live in should be, which is of course to pursue solutions to both problems simultaneously with as much vigor as possible. But it still seems worth keeping in mind that if even one thing works out sufficiently well, we probably won't be totally doomed.

AI researchers would be the ones in control

No. You have simplistic and incorrect beliefs about control.

If there are a bunch of companies (Deepmind, Anthropic, Meta, OpenAI, ...) and a bunch of regulation efforts and politicians who all get inputs, then the AI researchers will have very little control authority, as little perhaps as the physicists had over the use of the H-bomb.

Where does the control really reside in this system?

Who made the decision to almost launch a nuclear torpedo in the Cuban Missile Crisis?

3Max H1mo
In the Manhattan project, there was no disagreement between the physicists, the politicians / generals, and the actual laborers who built the bomb, on what they wanted the bomb to do. They were all aligned around trying to build an object that would create the most powerful explosion possible. As for who had control over the launch button, of course the physicists didn't have that, and never expected to. But they also weren't forced to work on the bomb; they did so voluntarily and knowing they wouldn't be the ones who got any say in whether and how it would be used. Another difference between an atomic bomb and AI is that the bomb itself had no say in how it was used. Once a superintelligence is turned on, control of the system rests entirely with the superintelligence and not with any humans. I strongly expect that researchers at big labs will not be forced to program an ASI to do bad things against the researchers' own will, and I trust them not to do so voluntarily. (Again, all in the probably-counterfactual world where they know and understand all the consequences of their own actions.)

I would question the idea of "control" being pivotal.

Even if every AI is controllable, there's still the possibility of humans telling those AIs to bad things and thereby destabilizing the world and throwing it into an equilibrium where there are no more humans.

Global compliance is the sine qua non of regulatory approaches, and there is no evidence of the political will to make that happen being within our possible futures unless some catastrophic but survivable casus belli happens to wake the population up

Part of why I am posting this is in case that happens, so people are clear what side I am on.

Who is going to implement CEV or some other pivotal act?

Ah, I see. Yeah, that's a reasonable worry. Any ideas on how someone in those orgs could incentivize such behavior whilst discouraging poorly thought out pivotal acts? I would be OK with a future where e.g. OAI gets 90-99% of the cosmic endowment as long as the rest of us get a chunk, or get the chance to safely grow to the point where we have a shot at the vast scraps OAI leaves behind.

Well, the AI technical safety work that's appropriate for neural networks is about 5-6 years old, if we go back before 2017 I don't think any relevant work was done

Conversely, if we had a complete technical solution, I don't see why we necessarily need that much governance competence.

As I said in the article, technically controllable ASIs are the equivalent of an invasive species which will displace humans from Earth politically, economically and militarily.

5Max H1mo
And I'm saying that, assuming all the technical problems are solved, AI researchers would be the ones in control, and I (mostly) trust them to just not do things like build an AI that acts like an invasive species, or argues for its own rights, or build something that actually deserves such rights. Maybe some random sociologists on Twitter will call for giving AIs rights, but in the counterfactual world where AI researchers have fine control of their own creations, I expect no one in a position to make decisions on the matter to give such calls any weight. Even in the world we actually live in, I expect such calls to have little consequence. I do think some of the things you describe are reasonably likely to happen, but the people responsible for making them happen will do so unintentionally, with opinion columnists, government regulations, etc. playing little or no role in the causal process.
So you don't think a pivotal act exists? Or, more amitiously, you don't think a sovereign implementing CEV would result in a good enough world?

virtue signalling is generally used for insincerity

Virtue signalling can be sincere.


"If the world were unified around the priority of minimizing global catastrophic risk, I think that we could reduce risk significantly further by implementing a global, long-lasting, and effectively enforced pause on frontier AI development—including a moratorium on the development and production of some types of computing hardware"

This really needs to be shouted from the rooftops. In the public sphere, people will hear "responsible scaling policy" as "It's maximally safe to keep pushing ahead with AI" rather than "We are taking on huge risks because politicians can't be bothered to coordinate".


This really needs to be shouted from the rooftops.

I disagree. I think it's important that we shout from the rooftops that the existential risk from AI is real, but I disagree that we should shout from the rooftops that a sufficiently good pause would solve it (even though I agree with Paul that it is true). I talk about this in this comment.

Historically, I think that a lot of causes have been hurt by a sort of purity-testing where scientists are forced to endorse the most extreme policy, even if it's not the best policy, on the idea that it would solve ... (read more)

The problem with a naive implementation of RSPs is that we're trying to build a safety case for a disaster that we fundamentally don't understand and where we haven't even produced a single disaster example or simulation.

To be more specific, we don't know exactly which bundles of AI capabilities and deployments will eventually result in a negative outcome for humans. Worse, we're not even trying to answer that question - nobody has run an "end of the world simulator" and as far as I am aware there are no plans to do that.

Without such a model it's very diff... (read more)

You need a method of touching grass so that researchers have some idea of whether or not they're making progress on the real issues.

We already can't make MNIST digit recognizers secure against adversarial attacks. We don't know how to prevent prompt injection. Convnets are vulnerable to adversarial attacks. RL agents that play Go at superhuman levels are vulnerable to simple strategies that exploit gaps in their cognition.

No, there's plenty of evidence that we can't make ML systems robust.

What is lacking is "concrete" evidence that that will result in blood and dead bodies.

None of those things are examples of misalignment except arguably prompt injection, which seems like it's being solved by OpenAI with ordinary engineering.

This is where I'd like to insert a meme with some text like "did you even read the post?" You:

  • Make a bunch of claims that you fail to support, like at all
  • Generally go in for being inflammatory by saying "it's not a priority in any meaningful value system" i.e. "if you value this then your system of meaning in the world is in fact shit and not meaningful"
  • Pull the classic "what I'm saying is THE truth and whatever comes (the downvotes) will be a product of peoples denial of THE truth" which means anyone who responds you'll likely just point to and say someth
... (read more)
I somewhat agree with this but I think its an uncharitable framing of the point, since virtue signalling is generally used for insincerity. My impression is that the vegans I've spoken with are mostly acting sincerely based on their moral premises, but those are not ones I share. If you sincerely believe that a vast atrocity is taking place that society is ignoring then a strident emotional reaction is understandable. 

you only start handing out status points after someone has successfully demonstrated the security failure

Maybe you're right, we may need to deploy an AI system that demonstrates the potential to kill tens of millions of people before anyone really takes AI risk seriously. The AI equivalent of Trinity.

It's not just about "being taken seriously", although that's a nice bonus - it's also about getting shared understanding about what makes programs secure vs. insecure. You need a method of touching grass so that researchers have some idea of whether or not they're making progress on the real issues.

3O O2mo
To me the security mindset seems inapplicable because in computer science, programs are rigid systems with narrow targets. AI is not very rigid and the target, I.e. an aligned mind, is not necessarily narrow.

Do you just like not believe that AI systems will ever become superhumanly strong? That once you really crank up the power (via hardware and/or software progress), you'll end up with something that could kill you?

Read what I wrote above: current systems are safe because they're weak, not safe because they're inherently safe.

Security mindset isn't necessary for weak systems because weak systems are not dangerous.

This is exactly what I am arguing against. I do not believe that the security mindset doesn't work because AI is weak, I believe that the security mindset fails for deeper reasons than that, and an increase in capabilities doesn't mean that the security mindset looks better (indeed, it may actually look worse, see the attempted optimization daemon break of AI to see how making capabilities go up by increasing the dimensions of the AI, where it started going away, or all of SGD's corrections.) Edit: I also have issues with the way LW applies the security mindset, and I'll quote my comment from there on why a lot of LW implementations of security mindset fail:

I believe the security mindset is inappropriate for AI

I think that's because AI today feels like a software project akin to building a website. If it works, that's nice, but if it doesn't work it's no big deal.

Weak systems have safe failures because they are weak, not because they are safe. If you piss off a kitten, it will not kill you. If you piss off an adult tiger...

The optimistic assumptions laid out in this post don't have to fail in every possible case for us to be in mortal danger. They only have to fail in one set of circumstances that someone ... (read more)

I disagree, and I think there are deeper reasons for why most computer security analogies do not work for ML/AI alignment. I think the biggest reasons for this are the following: 1. The thing that LW people call security mindset is non-standard, and under the computer security definition, you only start handing out points for discovering potential failures when they can actually demonstrate it, and virtually no proposed failures that I am aware of have been demonstrated successfully, except for goal misgeneralization and specification gaming, and even here they are toy AIs. In contrast, the notion that inner misaligned models/optimization daemons would appear in modern AI systems has been tested twice before, and in 1 case, DaemonicSigil was able to get a gradient hacker/optimization daemon to appear, but it was extremely toy, then when it was shown in a more realistic case, the optimization daemon phenomenon went away, or was going away. See Iceman's comment for more details on why LW Security Mindset!=Computer Security Mindset: That leads to 2: ML people can do things that would not work well under a security mindset or rocket engineering, like randomly doubling model size or data, or swapping one model for another, which would be big no-nos under computer security and rocket engineering, because rockets would literally explode if you doubled their fuel randomly in-flight, and switching the order in securing a password would make it output nonsense at best or destroy the security at worst. There are enough results like this that I'm now skeptical of applying the security mindset frame to AI safety, beyond inner alignment being very likely by default to SGD's corrective properties.
Load More