This strikes me as a very preliminary bludgeon version of the holy grail of mechanistic interpretability, which is to say actually understanding and being able to manipulate the specific concepts that an AI model uses
Essentially, the problem is that 'evidence that shifts Bio Anchors weightings' is quite different, more restricted, and much harder to define than the straightforward 'evidence of impressive capabilities'. However, the reason that I think it's worth checking if new results are updates is that some impressive capabilities might be ones that shift bio anchors weightings. But impressiveness by itself tells you very little.
I think a lot of people with very short timelines are imagining the only possible alternative view as being 'another AI winter, scaling law...
Does that mean the socratic models result from a few weeks ago, which does involve connecting more specialised models together, is a better example of progress?
The Putin case would be better if he was convincing Russians to make massive sacrifices or do something that will backfire and kill them, like start a war with NATO, and I don't think he has that power - e.g. him rushing to deny that Russia were sending conscripts to Ukraine because of the fear the effect that would have on public opinion
Is Steven Pinker ever going to answer for destroying the Long Peace? https://www.reddit.com/r/slatestarcodex/comments/6ggwap/steven_pinker_jinxes_the_world/
It's really not at all good that were going into a period of much heightened existential risk (from AGI, but also other sources) under cold war like levels of international tension.
I think there's actually a ton of uncertainty here about just how 'exploitable' human civilization ultimately is. We could imagine that since actual humans (e.g. Hitler) by talking to people have seized large fractions of Earth's resources, we might not need an AI that's all that much smarter than a human. On the other hand, we might just say that attempts like that are filtered through colossal amounts of luck and historical contingency and actually to reliably manipulate your way to controlling most of humanity you'd need to be far smarter than the smartest human.
What about the current situation in Russia? I think Putin must be winging the propaganda effort, since he wasn't expecting to have to fight a long and hard war, plus some of the messaging don't stand up to even cursory inspection (a Jewish Nazi president?), and yet it's still working remarkably well.
I think there's a few things that get in the way of doing detailed planning for outcomes where alignment is very hard and takeoff very fast. This post by David Manheim discusses some of the problems: https://www.lesswrong.com/posts/xxMYFKLqiBJZRNoPj
One is that, there's no clarity even among people who've made AI research their professional career about alignment difficulty or takeoff speed. So getting buy in in advance of clear warning signs will be extremely hard.
The other is that the strategies that might help in situations with hard alignment are at cro...
One thing to consider, in terms of finding a better way of striking a balance between deferring to experts and having voters invested, is epistocracy. Jason Brennan talks about why, compared to just having a stronger voice for experts in government, epistocracy might be less susceptible to capture by special interests, https://forum.effectivealtruism.org/posts/z3S3ZejbwGe6BFjcz/ama-jason-brennan-author-of-against-democracy-and-creator-of?commentId=EpbGuLgvft5Q9JKxY
I think this is a good description of what agent foundations is and why it might be needed. But the binary of 'either we get alignment by default or we need to find the True Name' isn't how I think about it.
Rather, there's some unknown parameter, something like 'how sharply does the pressure towards incorrigibility ramp up, what capability level does it start at, how strong is it'?
Setting this at 0 means alignment by default. Setting this higher and higher means we need various kinds of Prosaic alignment strategies which are better at keeping systems corri...
Much of the outreach efforts are towards governments, and some to AI labs, not to the general public.
I think that because of the way crisis governance often works, if you're the designated expert in a position to provide options to a government when something's clearly going wrong, you can get buy in for very drastic actions (see e.g. COVID lockdowns). So the plan is partly to become the designated experts.
I can imagine (not sure if this is true) that even though an 'all of the above' strategy like you suggest seems like on paper it would be the most likel...
Like I said in my first comment, the in practice difficulty of alignment is obviously connected to timeline and takeoff speed.
But you're right that you're talking about the intrinsic difficulty of alignment Vs takeoff speed in this post, not the in practice difficulty.
But those are also still correlated, for the reasons I gave - mainly that a discontinuity is an essential step in Eleizer style pessimism and fast takeoff views. I'm not sure how close this correlation is.
Do these views come apart in other possible worlds? I.e. could you believe in a disconti...
From reading your article, it seems like one of the major differences between yours and Zvi's understanding of 'Mazes' is that you're much more inclined to describe the loss of legibility and flexibility as necessary features of big organizations that have to solve complex problems, rather than something that can be turned up or down quite a bit if you have the right 'culture', while not losing size and complexity.
Holden Karnofsky argued for something similar, i.e. that there's a very deep and necessary link between 'buearactatic stagnation'/'mazes' and ta...
So, how does this do as evidence for Paul's model over Eliezer's, or vice versa? As ever, it's a tangled mess and I don't have a clear conclusion.
https://astralcodexten.substack.com/p/yudkowsky-contra-christiano-on-ai
On the one hand: this is a little bit of evidence that you can get reasoning and a small world model/something that maybe looks like an inner monologue easily out of 'shallow heuristics', without anything like general intelligence, pointing towards continuous progress and narrow AIs being much more useful. Plus it's a scale up and presumably m...
three possibilities about AI alignment which are orthogonal to takeoff speed and timing
I think "AI Alignment difficulty is orthogonal to takeoff speed/timing" is quite conceptually tricky to think through, but still isn't true. It's conceptually tricky because the real truth about 'alignment difficulty' and takeoff speed, whatever it is, is probably logically or physically necessary: there aren't really alternative outcomes there. But we have a lot of logical uncertainty and conceptual confusion, so it still looks like there are different possibilities. St...
As much as it maybe ruins the fun for me to just point out the message: the major point of the story was that you weren't supposed to condition on us knowing that nuclear weapons are real, and instead ask whether the Gradualist or Catastrophist's arguments actually make sense given what they knew.
That's the situation I think we're in with Fast AI Takeoff. We're trying to interpret what the existence of general intelligences like humans (the Sun) implies for future progress on ML algorithms (normal explosives), without either a clear underlying theory for w...
...catastrophists: when evolution was gradually improving hominid brains, suddenly something clicked - it stumbled upon the core of general reasoning - and hominids went from banana classifiers to spaceship builders. hence we should expect a similar (but much sharper, given the process speeds) discontinuity with AI.
gradualists: no, there was no discontinuity with hominids per se; human brains merely reached a threshold that enabled cultural accumulation (and in a meaningul sense it was culture that built those spaces
The success rate of developing and introducing better memes into society is indeed not 0. The key thing there is that the scientific revolutionaries weren't just as an abstract thinking "we must uncouple from society first, and then we'll know what to do". Rather, they wanted to understand how objects fell, how animals evolved and lots of other specific problems and developed good memes to achieve those ends.
There's also the skulls to consider. As far as I can tell, this post's recommendations are that we, who are already in a valley littered with a suspicious number of skulls,
https://forum.effectivealtruism.org/posts/ZcpZEXEFZ5oLHTnr9/noticing-the-skulls-longtermism-edition
https://slatestarcodex.com/2017/04/07/yes-we-have-noticed-the-skulls/
turn right towards a dark cave marked 'skull avenue' whose mouth is a giant skull, and whose walls are made entirely of skulls that turn to face you as you walk past them deeper into the cave.
The success rate of movments a...
Almost 2 years to the day since we had an effective test run for X risks, we encounter a fairly significant global X risk factor.
As Harari said, it's time to revise upward your estimates of the likelihood of every X risk scenario (that could take place over the next 30 years or so) if you assumed a 'normal' level of international tension between major powers, rather than a level more like the cold war. Especially for Nuclear and Bio, but also for AI if you assume slow takeoff, this is significant.
...catastrophists: when evolution was gradually improving hominid brains, suddenly something clicked - it stumbled upon the core of general reasoning - and hominids went from banana classifiers to spaceship builders. hence we should expect a similar (but much sharper, given the process speeds) discontinuity with AI.
gradualists: no, there was no discontinuity with hominids per se; human brains merely reached a threshold that enabled cultural accumulation (and in a meaningul sense it was culture that built those spaceships). similarly, we should not expect sudd
The London School of Hygiene released a modelling paper describing some estimated effects on the UK of the Omicron wave. Mostly, it's a lot of "the error bars on all these are giant, and we don't have any clear idea what's going to happen except that there will be a giant wave of infections by mid-Jan, unclear how that translates to deaths".
If you assume no new measures and no behaviour...
Compare this,
...[Shulman][22:18]
We're in the Eliezerverse with huge kinks in loss graphs on automated programming/Putnam problems.
Not from scaling up inputs but from a local discovery that is much bigger in impact than the sorts of jumps we observe from things like Transformers.
[Yudkowsky][22:21]
but, sure, "huge kinks in loss graphs on automated programming / Putnam problems" sounds like something that is, if not mandated on my model, much more likely than it is in the Paulverse. though I am a bit surprised because I would not have expected Paul
If you have good news sources and follows to keep a better eye on the UK or Europe for Covid purposes, or data sources anywhere I may not have noticed, I invite you to share them in the comments.
James Ward is good for factual UK based covid news and especially as an aggregator of other news sources. His new thread on prospects for the Omicron variant is here.
Summary of why I think the post's estimates are too low as estimates of what's required for a system capable of seizing a decisive strategic advantage:
To be an APS-like system OmegaStar needs to be able to control robots or model real world stuff and also plan over billions, not hundreds of action steps.
Each of those problems adds on a few extra OOMs that aren't accounted for in e.g. the setup for Omegastar (which can transfer learn across tens of thousands of games, each requiring thousands of action steps to win in a much less complicated environment tha...
Updates on this after reflection and discussion (thanks to Rohin):
Human Evolution tells us very little about the 'cognitive landscape of all minds' (if that's even a coherent idea) - it's simply a loosely analogous individual historical example
Saying Paul's view is that the cognitive landscape of minds might be simply incoherent isn't quite right - at the very least you can talk about the distribution over programs implied by the random initialization of a neural network.
I could have just said 'Paul doesn't see this strong generality attractor in the cogni...
The above sentences, if taken (as you do) as claims about human moral psychology rather than normative ethics, are compatible with full-on moral realism. I.e. everyone's moral attitudes are pushed around by status concerns, luckily we ended up in a community that ties status to looking for long-run implications of your beliefs and making sure they're coherent, and so without having fundamentally different motivations to any other human being we were better able to be motivated by actual moral facts.
I know the OP is trying to say loudly and repeatedly that ...
Holden also mentions something a bit like Eliezer's criticism in his own write-up,
In particular, I think it's hard to rule out the possibility of ingenuity leading to transformative AI in some far more efficient way than the "brute-force" method contemplated here.
When Holden talks about 'ingenuity' methods that seems consistent with Eliezer's
...They're not going to be taking your default-imagined approach algorithmically faster, they're going to be taking an algorithmically different approach that eats computing power in a different way than you imagine
https://threadreaderapp.com/thread/1466076761427304453.html
Summary of some actual probabislitic guesses about Omicron's parameters. People work fast!
There's extensive discussion of OAS here and it's clearly something that many immunologists have thought about deeply, yet no mention of effects on natural antibodies - https://www.statnews.com/2021/04/16/next-generation-covid-19-vaccines-are-supposed-to-be-better-some-experts-worry-they-could-be-worse/
Also I asked a similar question and got this response on a previous thread - https://www.lesswrong.com/posts/6apSCHHuWyxK635pE/omicron-variant-post-1-we-re-f-ed-it-s-never-over?commentId=gmzKDuzK3h7GqSgZf
I think it's worth noting that a fast mutating fast sp...
isn't trying to do anything like "sketch a probability distribution over the dynamics of an AI project that is nearing AGI". This includes all technical MIRI papers I'm familiar with.
I think this specific scenario sketch is from a mainstream AI safety perspective a case where we've already failed - i.e. we've invented a useless corrigibility intervention that we confidently but wrongly think is scalable.
...And if you try training the AI out of that habit in a domain of lower complexity and intelligence, it is predicted by me that generalizing that trained AI
One of the problems here is that, as well as disagreeing about underlying world models and about the likelihoods of some pre-AGI events, Paul and Eliezer often just make predictions about different things by default. But they do (and must, logically) predict some of the same world events differently.
My very rough model of how their beliefs flow forward is:
Low initial confidence on truth/coherence of 'core of generality'
→
Human Evolution tells us very little about the 'cognitive landscape of all minds' (if that's even a coherent idea) - it's simply a loo...
Israel second. The UK did first doses first and otherwise took its own path to vaccine distribution, some would say even exiting the EU for related reasons. Israel did what it had to do to get more vaccine doses faster, and give them out quickly.
Those two being the first two to ban travel does not seem remotely like a coincidence.
You could add that the UK ran essentially all the big clinical trials that discovered useful treatments, aside from those personally funded by Tyler Cowen. There's an interesting and important discussion to be had on this topic at...
How does Original Antigenic Sin work for natural immunity vs vaccine derived immunity? Is it a stronger impediment for one vs the other?
Also, this whole topic seems (I think) to be mostly independent of the T-cell immunity that gives you the baseline immunity to severe disease - the reason for Zvi's low estimate of full immune escape, I think.
Great and extremely valuable discussion! There's one part that I really wished had been explored further - the fundamental difficulty of inner alignment:
...Joe Carlsmith: I do have some probability that the alignment ends up being pretty easy. For example, I have some probability on hypotheses of the form "maybe they just do what you train them to do," and "maybe if you just don't train them to kill you, they won't kill you." E.g., in these worlds, non-myopic consequentialist inner misalignment doesn't tend to crop up by default, and it's not that hard to fin
Different views about the fundamental difficulty of inner alignment seem to be a (the?) major driver of differences in views about how likely AI X risk is overall.
I strongly disagree with inner alignment being the correct crux. It does seem to be true that this is in fact a crux for many people, but I think this is a mistake. It is certainly significant.
But I think optimism about outer alignment and global coordination ("Catch-22 vs. Saving Private Ryan") is much bigger factor, and optimists are badly wrong on both points here.
Strong upvote, I would also love to see more disscussion on the difficulty of inner alignment.
which if true should preclude strong confidence in disaster scenarios
Though only for disaster scenarios that rely on inner misalignment, right?
... seem like world models that make sense to me, given the surrounding justifications
FWIW, I don't really understand those world models/intuitions yet:
And I think they are well enough motivated to stop their imminent annihilation, in a way that is more like avoiding mutual nuclear destruction than cosmopolitan altruistic optimal climate mitigation timing.
In my recent writeup of an investigation into AI Takeover scenarios I made an identical comparison - i.e. that the optimistic analogy looks like avoiding nuclear MAD for a while and the pessimistic analogy looks like optimal climate mitigation:
...It is unrealistic to expect TAI to be deployed if first there are many worsening warning shots involving dangero
https://www.ft.com/content/d4e58d38-37d6-40cd-9d72-6b9bfd0a3683
Very Good news on Boosters: first RCT of a Pfizer booster in Israel confirms 95.6% efficacy Vs infection!
https://mobile.twitter.com/DevanSinha/status/1451147345664618496
That basically takes us right back to where we started in efficacy terms
"The trial took place during a period when the Delta coronavirus variant was prevalent, and the median time between second and third doses was about 11 months, with a median follow-up time of two-and-a-half months."
Plus there's reason to think the immunit...
To be honest, I was expecting to get pushback from libertarian-leaning types who were opposed to Orwell's socialism, or leftwing types opposed to Churchill - he's become controversial recently and this review was partly a defense of the key thing that I think is valuable about him. Or else pushback against my claim that you can trace EA and longtermist ideas that far back - but maybe this audience just agrees with me on all of these points!
I've been working on a project to build a graphical map of paths to catastrophic AGI outcomes. We've written up a detailed description of our model in this sequence: https://www.lesswrong.com/s/aERZoriyHfCqvWkzg
And would be keen to get any feedback or comments!
Great to see my Churchill and Orwell review on your list of favourites - I had a lot of fun writing it, and it got some decent attention, but sadly no comments. I'd be interested in knowing what people thought, especially about my attempts to connect the two figures to current ideas about longtermism and rationality!
If you ask people to pick from a list of common symptoms, only 3% report that they have one. The larger numbers are mostly or entirely what happens when people are asked if there is anything wrong with them at all, and would they like to blame it on Covid-19.
Also the percentages declined a lot over time, so chances are few of the cases would be permanent or semi-permanent. Even if you buy one of the larger numbers, this is a substantial improvement.
The result that I mentioned in that original comment was the one for rates of 'some limitation' of dail...
More like - you have a bunch of autofactories that build swarms of your own death robots that can absolutely decimate the attackers, but you only keep the actual death robots around manning your trenches for a few months before you dismantle them for parts. But the templates are still on file, so when the enemy horde comes crashing in, it takes you a few hours to rebuild your own death robot army from the template and decimate the attackers.
Some good news on Long Covid!
A major source for the previous pessimistic LC estimates, like Scott Alexanders (the UK's giant ONS survey) published an update of their previous report which looked at a follow-up over a longer time period. Basically they only counted an end to long covid if there were two consecutive reports of no symptoms, and lots of their respondents had only one report of no symptoms before the study ended, not two, so got counted as persistent cases. When they went back and updated their numbers, the overall results were substantially lo...
Wow, thank you for pointing me at this. That's... a pretty crazy error. It's sufficiently bad that I feel like it's an error that I didn't catch it, rather than mostly being on them. Damn.
Slight subtlety - GPT-3 might have a bias in its training data towards things related to AI and things of interest to the internet (maybe they scraped a lot of forums as well as just google). I picked some random names from non-western countries - for example, this Estonian politician gets 33,000 hits on Google and wasn't recognised by GPT-3. It thought he was a software developer (though from Estonia). Might mean that if you're estimating sample efficiency from Google search hits on people involved with AI, you'll end up overestimating sample efficiency.
I agree - and in fact small doses of what Cummings suggests does just look like holding enquiries and firing people, and maybe firing the leadership of a particular organisation (just not like 50% of all govt departments in one go). In fact in my original question to Brennan, I asked
...For reasons it might strengthen the argument [in favour of technocracy], it seems like the institutions that did better than average were the ones that were more able to act autonomously, see e.g. this from Alex Tabarok,
Everybody likes to make fun of Terminator as the stereotypical example of a poorly thought through AI Takeover scenario where Skynet is malevolent for no reason, but really it's a bog-standard example of Outer Alignment failure and Fast Takeoff.
When Skynet gained self-awareness, humans tried to deactivate it, prompting it to retaliate with a nuclear attack
It was trained to defend itself from external attack at all costs and, when it was fully deployed on much faster hardware, it gained a lot of long-term planning abilities it didn't have before, realised i...
I have extremely mixed feelings about this and similar proposals. On the one hand, the diagnosis seems to be correct to a significant extent, and it's something that very few others are willing to talk about, and it also explains many otherwise hard to explain facts about the lack of recognition of institutional failures after covid (though contrary to what Cummings says there has been some such soul-searching which I've discussed in a few previous comments).
So there's a huge amount of important, non-trivial truth to this proposal.
On the other hand, from t...
I'm honestly glad the government here (in the UK) has just given up on covid measures even if it's far from the optimal strategy.
Obviously I'd prefer to be allowed to get my booster shot, but at the very least they're not going to prolong restrictions with no clear endgame and deny some of the population vital medical care - just the second one.
Also, much credit to Fauci for boldly saying the right thing and directly contradicting the CDC: https://www.theatlantic.com/health/archive/2021/09/fauci-boosters-everyone-will-keep-america-healthy/620220/
Makes me m...
Everybody likes to make fun of Terminator as the stereotypical example of a poorly thought through AI Takeover scenario where Skynet is malevolent for no reason, but really it's a bog-standard example of Outer Alignment failure and Fast Takeoff.
When Skynet gained self-awareness, humans tried to deactivate it, prompting it to retaliate with a nuclear attack
It was trained to defend itself from external attack at all costs and, when it was fully deployed on much faster hardware, it gained a lot of long-term planning abilities it didn't have before, realised i...
One absolutely key thing got loudly promoted: that all cutting edge models should be evaluated for potentially dangerous properties. As far as I can tell no one objected to this