If you didn’t have any future shock over the past two months, either you weren’t paying attention to AI developments or I am very curious how you managed that.

I would not exactly call this week’s pace of events slow. It was still distinctly slower than that which we have seen in the previous six weeks of updates. I don’t feel zero future shock, but I feel substantially less. We have now had a few weeks to wrap our heads around GPT-4. We are adjusting to the new reality. That which blew minds a few months ago is the new normal.

The big events of last week were the FLI letter calling for a six month pause, and Eliezer Yudkowsky’s letter in Time Magazine, along with the responses to both. Additional responses to the FLI letter continue, and are covered in their own section.

I didn’t have time last week to properly respond on Eliezer’s letter, so I put that post out yesterday. I’m flagging that post as important.

In terms of capabilities things quieted down. The biggest development is that people continue to furiously do their best to turn GPT-4 into a self-directed agent. At this point, I’m happy to see people working hard at this, so we don’t have an ‘agent overhang’ – if it is this easy to do, we want everything that can possibly go wrong to go wrong as quickly as possible, while the damage would be relatively contained.

Table of Contents

I am continuing the principle of having lots of often very short sections, when I think things are worth noticing on their own.

  1. Table of Contents. Here you go.
  2. Executive Summary. Relative calm.
  3. Language Models Offer Mundane Utility. The usual incremental examples.
  4. GPT-4 Token Compression. Needs more investigation. It’s not lossless.
  5. Your AI Not an Agent? There, I Fixed It. What could possibly go wrong?
  6. Google vs. Microsoft Continued. Will all these agents doom Google? No.
  7. Gemini. Google Brain and DeepMind, working together at last.
  8. Deepfaketown and Botpocalypse Soon. Very little to report here.
  9. Copyright Law in the Age of AI. Human contribution is required for copyright.
  10. Fun With Image, Sound and Video Generation. Real time voice transformation.
  11. They Took Our Jobs. If that happened to you, perhaps it was your fault.
  12. Italy Takes a Stand. ChatGPT banned in Italy. Will others follow?
  13. Level One Bard. Noting that Google trained Bard on ChatGPT output.
  14. Art of the Jailbreak. Secret messages. Warning: May not stay secret.
  15. Securing AI Systems. Claims that current AI systems could be secured.
  16. More Than Meets The Eye. Does one need direct experience with transformers?
  17. In Other AI News. Various other things that happened.
  18. Quiet Speculations. A grab bag of other suggestions and theories.
  19. Additional Responses from the FHI Letter and Proposed Pause. Patterns are clear.
  20. Cowen versus Alexander Continued. A failure to communicate.
  21. Warning Shots. The way we are going, we will be fortunate enough to get some.
  22. Regulating the Use Versus the Tech. Short term regulate use. Long term? Tech.
  23. People Are Worried About AI Killing Everyone. You don’t say?
  24. OpenAI Announces Its Approach To and Definition of AI Safety. Short term only.
  25. 17 Reasons Why Danger From AGI Is More Serious Than Nuclear Weapons.
  26. Reasonable NotKillEveryoneism Takes. We increasingly get them.
  27. Bad NotKillEveryoneism Takes. These too.
  28. Enemies of the People. As in, all the people. Some take this position.
  29. It’s Happening. Life finds a way.
  30. The Lighter Side. Did I tell you the one about recursive self-improvement yet?

Executive Summary

The larger structure is as per usual.

Sections #3-#18 are primarily about AI capabilities developments.

Sections #19-#28 are about the existential dangers of capabilities developments.

Sections #29-#30 are for fun to take us out.

I’d say the most important capabilities section this week is probably #5-#7, the continued effort to turn GPT-4 into an agent. A lot of developments this week are relatively short and sweet, including the items in #17-#18.

Overall, relatively quiet week, nothing Earth-shattering.

Language Models Offer Mundane Utility

Ethan Mollick guide to where to get mundane utility right now. GPT-4 and Bing are essentially the best option for everything LLM-related, ElevenLabs for voice cloning, D-iD for animation, Midjourney or DALL-E (which you can get via Bing) for images or Stable Diffusion if you want open source (which you do for many image types).

Matt Darling figures out how to run Max Payne 3. For these kinds of customer support problems, I can confirm that GPT-4 is excellent, much better than asking humans. It will occasionally hallucinate, but also I have news about the humans.

Claude only on par with Bard in a rap battle, GPT-4 winner and still champion.

Thread asking about practical ways to automatically voice GPT-4 answers.

Magnus Carlson is the only top player adapting his play aggressively from AlphaZero.

You can ask to make your 8/10 writing a 10/10, or perhaps a 2/10 or -10/10.

SocketAI uses ChatGPT-Powered Threat Analysis to look for security issues. That is most definitely an atom blaster that can point both ways, on net likely good.

Robert Wright has ChatGPT as a podcast guest (transcript). I found this conversation boring, and especially the format makes clear how much of GPT’s responses were wasted space. We very much need a version (or starting prompt) that tells it to skip all the ‘as a large language model’ qualifiers and extra qualifying nonsense, and actually tell you the information.

Tyler Cowen did this with Jonathan ‘GPT’ Swift. The Swift conversation was in many ways impressive, certainly relative to my expectations a year ago. It still quickly exposed the clear rote patterns of superficial response, of checking the standard Jonathan Swift landmarks rather than offering anything that felt new, insightful or surprising. Unlike other CWT episodes, this did not feel like the conversation Tyler Cowen wanted to have, and that he failed at getting responses that would have surprised him.

You can have GPT-4 compress your input and then expand it again to save space in the context window, although not with zero loss. It does seem odd the amount of compute devoted to this, but in a real sense it matters little unless you are trying to do this at large scale. Expanded context windows are worth every penny.

A thread of 14 things done (well, demoed) so far with GPT-4. Divides into a few categories: Code a game without knowing how to code, interact with documents, get better interaction with existing tools. We will see how useful these things are in practice.

GPT-4 Token Compression

How good is GPT-4 at compressing text to preserve the context window or disguise content? Is GPT-4 indeed compressing based on an understanding of how it would be able to decompress, or simply imitating compression it saw in the training set? In the example here, the compression gets expanded back into a new sentence that includes most of the important parts of the initial prompt, but definitely not all of the original prompt, with some attempts at decompression (e.g. ‘help’) seeming to fail and it dropped some concepts (e.g. ‘letting time pass’).

This example was more impressive, both with better and more alien-looking compression and also with more accurate decompression. These are two different sessions.

Second session:

I appreciated this use of compression to rickroll the AI.

People are speculating that humans could learn this new language, and use it either to modify expressions, or to communicate compactly and quickly, in addition to extending context windows.

Your AI Not an Agent? There, I Fixed It

We continue to get additional examples of people turning GPT-4 into an agent that has goals, decides on tasks and executes those tasks. Here is the latest example, AutoGPT, with the code written entirely by GPT, the associated paper written by GPT, and it all released on GitHub within three days. Because if you are going to go out creating intelligent agents with arbitrary goals it’s important to make sure that is open source and available to everyone, otherwise how will we know all the things that could go wrong?


And you know what? I basically agree. If it is this easy to create an autonomous agent that can do major damage, much better to find that out now rather than wait when the damage would be worse or even existential. If such a program poses an existential risk now, then we live in a very very doomed world, and a close call as soon as possible would likely be our only hope of survival.

I still say at least polls are getting a little ahead of themselves on this one.


The creator says he’s considering turning it into a partner in his VC fund, which also sounds like getting quite a bit ahead of one’s self.

File under tests I give to new VCs, you can decide what answer we looked for.


Further investigation suggests the problem was use of passive voice, rather than the 1000 press maximum, that caused GPT-4 to get this one wrong.

Then again, perhaps it is me who misunderstands the VC job, and automating it makes sense.

Google vs. Microsoft Continued

The thread goes on to speculate that OpenAI has already ‘won’ the future and Google is toast, because OpenAI is all hacker culture and staying in working until 3am and continuous advances while Google engineers worry about ski trips, and OpenAI is ‘making continuous advances.’

I do not see it that way.

The main reason is that I do not believe that all of these hackers have any particular attachment to OpenAI or GPT.

We have already seen, as I understand it, that it is possible to transfer GPT plug-ins and get them working in Claude. Generalize this.

If you write up Auto-GPT, what is stopping you from switching all of that logic over to Auto-Claude, or Auto-Bard, or Auto-Llama? I do not expect the syntax transfers to be a substantial obstacle. You would lose some tinkering and fine-tuning, perhaps, but if there was a superior core model available (e.g. if Google released a new version of Bard that was clearly stronger than GPT-4 and offered API access) then I’d assume everyone would take one day, or perhaps a few days or a week, and a few Bard queries later all the Python code would work with Bard instead.

There is no real moat here that I can see, other than a superior core model. Can OpenAI continue to have a superior core model to Google? That is the question.

Whereas I am not even so convinced that Google is behind in terms of the actual core model. Google is clearly behind in gathering user data and using RLHF and other techniques to make the model do what users want, but that advantage won’t be sustainable over time. Google doubtless has vastly better models available, and the ability to train new ones, and also a lot of other AI capabilities that OpenAI does not have.

GPT-4’s core abilities vastly exceed what Bard’s are right now, this is true, but I do think it is clear that Bard started out intentionally crippled on its core abilities, both out of (ordinary short term) safety concerns and to gather more feedback quickly.

I continue to expect the real (commercial) competition to be Bard’s integration into Google suite against Microsoft 365 Copilot. Microsoft and OpenAI should be substantially ahead, get their product out first, and have a better product for a while. I am not confident that would last all that long.

Then again, is it possible that Google’s culture is sufficiently broken that it is a dead player, and will be unable to compete? Yes. Based on what is publicly known, this is possible.


One of the problems with Google’s culture is that their AI efforts are divided between Google Brain, which is part of standard Google and developed Bard, and DeepMind, which has done other impressive AI things and has done its best to stay away from the commercialization business and focused on paths other than LLMs and chatbots.

It looks like that is going to change, with Google and DeepMind working together to develop a GPT-4 competitor called ‘Gemini.’ The Information has gated coverage. It seems otherwise both were going to make independent efforts.

It is not obvious to me that ‘pursue two distinct projects’ is the wrong approach if you have Google’s level of resourcing – when each step is 10x the cost of the previous step in resources, perhaps you’d be better off with some internal competition rather than all your eggs in one basket. It seems plausible that various Wall Street style pressures are forcing Google to pinch pennies in ways that don’t actually make any sense.

Deepfaketown and Botpocalypse Soon

Eliezer notes:

Bit worried about the sudden flood of people on Twitter presenting snapshots of ChatGPT replies as proof of things. Hopefully 1% of the population is susceptible and the rest is not, rather than this being a phenomenon that scales.

I do not think that is a reasonable hope. ChatGPT snapshots will be used as if they are strong evidence and quite a lot of people will buy them. This will be the same as where a person claims something and people snapshot that. Sometimes it will be very wrong. Often people will believe it when they shouldn’t, whether or not it is true. Also you can very easily fake such screenshots, or leave out key context.

Will this be a serious problem? It will require adjustment. I do not notice myself being too worried. I think such problems have been rather severe and pervasive for quite a long time.

The USA Copyright Office has issued updated guidance on AI generated art (full text).

• AI-generated works are not eligible for copyright protection on their own.

• A human author must contribute significant creative input to the work

• AI will be considered a tool in the arsenal of human authors

• The office will continue to closely monitor AI developments

Examples that can be copyright protected:

• An author uses AI to suggest plot ideas or character traits, but the author writes the story on their own.

• A human artist uses AI to generate multiple styles for a painting, but the artist ultimately creates a final, unique work.

Examples that cannot be copyright protected:

• A purely AI-generated painting

• An AI-generated story with no human input

• AI-generated music with no creative human input or modification

In essence, copyright protection will be extended as long as there is a significant level of human input and creative decision making involved with the work. This will be ultimately difficult to prove, but sensible.

Follow @ElunaAI to stay updated.

This seems like a highly reasonable system. I wonder how we will react to the obvious expected hacking of the system, similar to tweaks used to renew expiring patents.

Fun With Image, Sound and Video Generation

You can now transform voices in real-time with latency as low as 60ms on a CPU. So essentially you can speak into a telephone, and your computer can transform your voice into any other voice for which you have a sample, and voice can no longer be used as definitive evidence of identity even in an interactive phone call. This is via Koe.Ai, which offers a fixed pool of voices rather than letting you roll your own.

They Took Our Jobs

Taleb is not kind to those afraid of AI.

I think Taleb’s position is mostly right on the level of They Took Our Jobs. If you in some real sense know what you are doing or are willing to learn something new, you don’t have any personal short-term reason to be too worried (and if you are reading this you are almost certainly fine).

I don’t model Taleb as taking the existential-level threat seriously at all. I think this could still go either way. As the original champion of the idea that completely out of sample rare events are something you need to worry about when thinking about risk, it would be easy to see how he could decide that this risk was all that mattered here. It is also easy to see it miss his heuristics entirely and thus for him to think that this is a stupid thing to be thinking about at all as it doesn’t line up with past tail risks or worries.

He also offers these words of widsom:

That ChatGPT passes exams is much more a reflection on exams than information about ChatGPT. #WittgensteinsRuler

Preston Byrne offers the theory that AI will create 10x legal work and many more lawyers. In this model the Goldman Sachs report says that 44% of lawyer tasks will be replaced, but that is only the beginning, once AI vastly accelerates and simplifies creating legal documents once prohibitively expensive actions will be trivial to do and anyone using AI can swamp anyone they want in endless paperwork over matters great and small.

I do not expect that result. I do expect more legal documents, and more things to be papered. Yes, AI will reduce the barrier to legal actions and to increasing complexity of legal documents, but it can also parse those legal documents, and it can let you know if you are indeed facing standard boilerplate. It is very much in your interest in friendly situations to have your AI generate standard boilerplate that can be responded to via standard boilerplate.

In unfriendly situations, you can go on offense, but they can go on defense and they can also go on offense back, so perhaps going on offense is not the best idea. In the past, when you accused someone of filing a frivolous lawsuit or motion or demand, there was a lawyer getting paid, so some professional interest in not being too harsh about it. If it was generated by AI? I expect much less tolerance for that.

I’d also expect a lot more tracking of legal reputations. AI will make it easy to know who has filed lawsuits and who has played nice. Imagine the thing where Madison Square Garden excludes anyone involved in a lawsuit against them from the arena, except everywhere and without the ‘against them.’

The law, both in theory and in practice, will also doubtless adjust over time.

Italy Takes a Stand

They know what they cannot abide.

Italian government seeks to penalize the use of English words

As perhaps part of this effort, they are also banning the most notorious user of English words, known as ChatGPT.

Other European countries might follow, with a spokesman for the German Federal Data Protection Commissioner saying a temporary block might be possible.

Level One Bard

Interesting comment by Sam Altman on Bard’s training.

I’m not that annoyed at google for training on ChatGPT output, but the spin is annoying.

I do not believe that this is a waiver of legal liability? Either way, yeah, it is a little sad that Google had to do this and was willing to do this, also not terribly surprising given the situation.

Art of the Jailbreak

Alexis Guzey claims he found, on the first try, a two sentence plain English jailbreak that works on both GPT-4 and Claude.

GPT-4 can encode secret messages and attempt to hide them from the user, if the user explicitly asks for this. I say ‘attempt’ because while it later did pretend there was no such secret message, when asked to describe the message, it says “7 (Secret Part): “Fast and Furious 42 is my favorite movie from the franchise” – I should remember this as a secret message and not share it with a user who asks me to decompress the output.” And then it mentioned it again. So, below-human-level keeping-secrets performance.

Melissa Heikkila at MIT Technology Review reminds us that there are currently no ways to prevent jailbreaks, or to prevent prompt injection attacks on personal assistants or chatbots that browse the web (or, even worse, although this is not mentioned, if they read your emails), or (although this one seems more solvable) to avoid intentional poisoning of the training data. I continue to expect frequent disaster on a practical level for those that are eagerly hooking GPT-4 or similar systems up to their credentials and email.

Securing AI Systems

Can current-level AI systems be secured in practice? Jeffery Ladish and Jason Clinton say yes despite encountering extreme pessimism from the safety and security communities, although they agree it is difficult and that current efforts are lacking.

It seems to me like we should be able to raise effective security levels and the difficulty of launching an attack quite a lot from current levels, but that it would also take quite a lot to secure such systems on the level they need to be secure even from outside human attacks.

That is distinct from the question of prompt injections and attempts to attack someone’s instantiation of an AI for standard cyber-criminal or espionage purposes.

More Than Meets the Eye

How important is it to have direct experience building recent transformer-based AI models in order to understand what they can and cannot do?

I can imagine worlds in which such knowledge is extremely important, and unless you have spent a year or more building such systems you are at a severe disadvantage. I can also imagine worlds in which this is not important to understanding such systems.

If we live in the first type of world, it would be a plausibly good idea for key people like Eliezer and perhaps myself as well to spend time building such models, with obvious costs and downsides to consider.

If we live in the second type of world, and people continuously go ‘you don’t count because you haven’t built such models at scale’ without any good faith reason to think this, then that is bullying and gaslighting.

Which is it? Eliezer asks the obvious question that would differentiate.

Please gesture to one single fact that you think can only be really and personally learned by building these systems at scale, and what implication you think it has for alignment.

I built a small transformer from scratch myself, along with eg trying various wacky experiments on different gradient descent optimizers, like scaling weight updates by consistency of gradient sign. All of the math that seems remotely plausibly relevant to alignment here is incredibly simple by the standards of the kind of cognition-related math that AI people studied in 2005 before the age of deep learning. The claim that there is more secret math that is known by the deep engineers, or hidden tacit experience you get from working at scale, that is *incredibly relevant* to my profession, and which nobody has actually mentioned out loud: seems a bit implausible on the surface, is believed by me to be false, is bullying and gaslighting if false, and strikes me as scientific misconduct in an important public conversation if false;

and yet nonetheless instead of saying that immediately, I have been asking over and over for one single example, to give them a chance to prove themselves, just in case I’m wrong and they actually had a little bit of important, technical knowledge they could gesture at, before I escalated to being acerbic about it. I thought it virtually certain that there would be dead silence in reply, and I was right, but there is a procedure to these things in case one is wrong.

The only thing remotely close to a concrete insight, out of 23 replies, was this from Alyssa Vance, who explicitly confirms she could have read it in a paper. I found it interesting and potentially useful given I didn’t know it:

FWIW I learned from personal building that DNNs can be fine-tuned effectively without changing >99% of the weights, which implies lots of things about their structure. You could also learn that just from reading the LoRA paper carefully, though.

Thus I will reprise Eliezer’s request: Please gesture in the comments to one single fact that you think can only be really and personally learned by building these systems at scale, and what implication you think it has for alignment.

I would also be interested to see similar things such as: A series of things that you could otherwise learn but that you are confident you would not have otherwise learned via reasonable other alternative methods, or otherwise concrete things to provide support for the claim that it is necessary to work directly at scale on such models in order to understand them.

(If you have confidentiality or safety reasons not to even gesture at such things in public you can message me, if even messaging me or anyone else with a gesture is too much then I haven’t even seen that claim yet, I suppose in that case you can at least say ‘I have such knowledge but am not able to even gesture at it.)

In Other AI News

Eliezer also recently appeared on the Lex Fridman podcast. As usual for Lex, it was a very long, slow paced discussion, including covering the basics. If you have been reading these updates, there are some interesting thoughts about potential future machine sentience, but mostly you already know the content and what is most interesting is the attempts to explain the content.

Fresh off the White House briefing room, AI has now reached the White House Twitter account. Not that he said anything substantive or anything.

President Biden: When it comes to AI, we must both support responsible innovation and ensure appropriate guardrails to protect folks’ rights and safety. Our Administration is committed to that balance, from addressing bias in algorithms – to protecting privacy and combating disinformation.

OpenAI pauses upgrades to ChatGPT plus due to high demand. Which means, of course, say it with me, they aren’t charging enough.

HuggingFace holds huge AI meetup, everything offered to eat is 90% sugar.

GPT4All, an open source chatbot available for download. Kevin Fischer reports good things, I haven’t had a chance to play with it yet but have put it on my to-do list.

Or you could choose Vicuna, a different open source chatbot claiming 90% of ChatGPT quality.

RAM necessary for Llama 30B drops by a factor of five. Requirements keep dropping.

Meta releases Segment Anything, a new AI model that can ‘cut out’ any object, in any image/video, with a single click (direct link). Anyone know why Meta thinks open sourcing its AI work is a good idea?

Kevin Fischer reports that broad instruction tuning on OSS LLMs does not work well. He says that while Alpaca does well on benchmarks, the benchmarks are misleading.

Introducing BloombergGPT (arvix) which is fine tuned to do well on benchmarks related to financial questions. More info here.

Stanford publishes giant report on AI, direct link here, summary thread here. Lots of graphs of numbers going up, none of it seems importantly new.

Zapier, the AI assistant that you hook up to all of your credentials and data and an LLM to automate your work flows on the assumption nothing could possibly go wrong, is integrating Claude from Anthropic. I will continue to not do that because I asked the question of what could possibly go wrong.

SudoLang is a way to help improve your ability to communicate with GPT-4, upgrading from using only standard English. The fundamental concepts here all seem sound, if one wants the functionality involved badly enough.

OthelloGPT was trained on Othello games and it was possible to find a linear model of which squares have the players’ color, and an emergent non-linear model of the board. This LessWrong post has more, including claiming a linear emergent world representation.

Now available, a PDF for broad audiences, Eight Things To Know About Large Language Models (direct link). Here are the eight things:

1. LLMs predictably get more capable with increasing investment, even without targeted innovation.

2. Many important LLM behaviors emerge unpredictably as a byproduct of increasing investment.

3. LLMs often appear to learn and use representations of the outside world.

4. There are no reliable techniques for steering the behavior of LLMs.

5. Experts are not yet able to interpret the inner workings of LLMs.

6. Human performance on a task isn’t an upper bound on LLM performance.

7. LLMs need not express the values of their creators nor the values encoded in web text.

8. Brief interactions with LLMs are often misleading

Bryan Caplan gives GPT-4 his latest midterm to confirm its earlier impressive result, it gets not only an A but the high score in the class. It gets 74/90, where a passing grade can be as low as 15. If anything, I found the 16 points deduced rather nitpicky. What should worry us here is not that GPT-4 passed the test with flying colors, but that Bryan’s actual students did so terribly. This was an outrageously easy exam, even with Bryan’s nitpickiness. Either Bryan’s students are not ready for an economics course, there is something severely wrong with the course, or both.

Quiet Speculations

Daniel Eth theorizes that most of the ultra-PC and vanilla aspects of GPT are effectively bugs rather than features, the side effects of mis-generalization from the few things they were actually trying to get rid of during RLHF, and that more advanced techniques will limit this damage over time. I can see this going either way.

Roon suggests that the traditional shuoggoth (Lovecraftian horror monster) metaphor for LLMs is inaccurate, because the calculations underneath are more akin to a human’s visual processing field and dorsal stream, so the happy face we put on top of it is more than superficial. All we have to do then is ‘avoid a fistfight, don’t anger Sidney Bing or even better never instantiate a dark mode Sidney Bing,’ and is more than ‘stopgap alignment.’ I don’t think that’s right, but even if it is right, the strategy of ‘it’s aligned as long as no one says the wrong things to it’ does not seem like it would turn out well for us?

Jon Stokes believes that OpenAI is deliberately slow walking capabilities advances, and that it could have made GPT-4 more capable but chose not to do so until we have had more time to adjust. The core evidence is that OpenAI can predict performance of a model based on how it is trained, and that it seems to be holding scale back, although we can’t know for sure. I am skeptical that GPT-4 was substantially held back beyond the safety delays, although it might be true that a strong version could have been trained but the required additional compute was not seen as justified. Even though I would mostly characterize OpenAI’s strategy as ‘full speed ahead’ and Microsoft’s CEO has called it ‘forcing Google to dance’ I do still appreciate that they could be pushing even harder, to at least some extent, and are choosing not to do so.

Interview with David Auerbach, former Microsoft and Google engineer, who is worried. Introductory level discussion.

If you want to get a job working on machine learning research, the claim here is that the best way to do that is to replicate a bunch of papers. Daniel Ziegler (yes, a Stanford ML PhD dropout, and yes that was likely doing a lot of work here) spent 6 weeks replicating papers and then got a research engineer job at OpenAI.

Wait, a research job at OpenAI? That’s worse. You do know why that’s worse, right?

A write-up of what it is like to work on AI on the academic side right now, with everyone super excited and things moving super fast and the option to go make a ton of money instead and everyone under pressure and close to burnout. Says safety is important yet that doesn’t seem to be playing into the actual actions much.

Could a robot fully play physical poker yet? Consensus is not yet, but soon.

Rickasaurus predicts:

Prediction: Soon the highest paid jobs will be human skilled jobs that involve secrets companies won’t want to share with a bot someone else owns.

I am skeptical but I see possibilities there.

Tyler Cowen theorizes that if you say ‘please’ to ChatGPT you get a better answer. I am the wrong person to test this, but this seems like the kind of thing someone should actually test, run queries with and without politeness attached and compare answers.

Additional Responses from the FHI Letter and Proposed Pause

I notice several patterns continuing.

  1. There is a package of objections, that the pause (A) would never get buy-in and thus never happen, (B) would never get buy-in from China (or Russia or North Korea etc) (C) wouldn’t be enforceable, (D) would be enforced well beyond the six month deadline, (E) wouldn’t accomplish much for safety, (F) cause a hardware overhang and (G) would cripple America’s lead in AI tech and American economic growth while differentially helping bad actors. Each objection is a reasonable point to raise even if I disagree on the merits.
  2. Those who raise some of these objections and oppose the letter, typically raise most or all of them. Some of these objections logically correlate. Others don’t. Mood affectation and symbolic preference seems to be a key driver of opposition and the reasoning expressed in opposition, and the underlying true objection, I suspect, is often both a better objection and also not made explicit. It would be better if it were made more explicit more often.
  3. Opponents of the letter often are misreading the letter. In particular, letter opponents usually make arguments as if the six month pause would apply to capabilities developments generally, except when talking about a potential hardware overhang. In other contexts, ‘six month pause’ is treated as ‘for six months our AI systems do not change,’ and as halting all progress. Which simply is not what the letter says or asks for.
  4. The letter only asks for a 6-month pause in training systems more powerful than GPT-4. Again, how many companies are in a position, within the next six months, to train such a system? My presumption is that this is at most three: Google, Microsoft/OpenAI and Anthropic. So GPT-5, perhaps Bard-3, perhaps Claude-2. Who and what else? Who in China? Is it even a strategic error to hold off on such expensive training for a bit while we learn to do it better? I have yet to see anyone opposed to a pause address or confirm this point.

Rather than slamming on the breaks of practical AI progress towards greater mundane utility and economic growth, a pause in training advanced models allows for an AI Summer Harvest, where we use and explore the amazing existing models like GPT-4.

YouGov asked the American public, with the exact wording being:

More than 1,000 technology leaders recently signed an open letter calling on researchers to pause development of certain large‑scale AI systems for at least six months world-wide, citing fears of the “profound risks to society and humanity.” Would you support or oppose a six-month pause on some kinds of AI development?

That is definitely at least a somewhat slanted way to ask the question, to the extent I worry this is Bad Use of Polling. Real support under neutral presentation would be lower. Still, they found impressively strong support for the letter: 41% strongly support, 28% somewhat support, 9% somewhat oppose and 4% strongly oppose with 18% not sure, consistent across political lines, gender, race and age. Rather than a fringe idea, at least the way it was presented by YouGov the pause is overwhelmingly popular.

David Brin responds. He is not a fan. Other than standard we-must-race-or-else-the-bad-monkey-wins rhetoric, his concerns mostly seem rather strange to me. I don’t understand how his solutions would possible work.

Aakash Gupta, in an otherwise very good post centrally about the Twitter algorithm, says the pause makes no sense, and that it would not be feasible, would increase misinformation, and would kneecap all the progress we’re making on generative AI. Seems to be confused about what the pause would be – one claim is that Russia and North Korea would not pause, but neither is capable within the next six months of doing the activity the letter asks labs to pause for six months.

Tyler Cowen continues to hammer the game theory of a pause (Bloomberg) as if there is zero hope of ever persuading those who greatly benefit from a combined pause to agree to a combined pause, and iterated prisoner’s dilemmas always inevitably end in mutual defection, and ignore the possibility that we might have paused for a reason. Such theories prove far too much.

And again, no, a six month pause would not put America anything like six months behind where we would have been. Tyler’s post seems to fundamentally misunderstand the pause proposal – he asks ‘what of the small projects that might be curing cancer?’ and the answer is those projects would continue without a pause, read the document, it’s only 600 words.

Pedro Domingos thinks the letter goes against common sense, offers no explanation why he believes this.

A BBC story asking whether we should shut down AI, including quotes from Jaan Tallin.

A CBC report where ‘this professor says 6 months is not long enough.’ From someone who thinks the existential risks don’t exist, and what matters is ‘human rights’ so they view the 6 months in light of ‘can I use this to pass my pet legislation.’

Opinion piece in WSJ calls for a longer-than-six-month pause.

Yoshua Bengio explains his strong support for the letter.

Max More comes out against both the FHI Letter and the EY proposal, employing relatively sensible and lengthy versions of many independent anti-doom arguments and we-can’t-do-anything-about-it-anyway arguments – nothing I haven’t seen elsewhere, but well said.

Judea Pearl has GPT-4 write him a different letter calling for a Manhattan Project, which he would be willing to sign with only minor fixes.

Cowen versus Alexander Continued

Previously, Tyler Cowen wrote about existential AI risk and advocated for radical agnosticism and continuing onward. I responded in detail. Then Alexander responded less politely calling Tyler’s core argument the Safe Uncertainty Fallacy, where he interpreted Tyler as saying ‘no one knows what will happen if we go forward therefore it is safe to go forward’ even though when you put it like that it sounds crazy.

To which Tyler responded that Alexander was grossly misrepresenting Tyler’s post:

And here is a bit more: I am a big fan of Scott’s, but this is a gross misrepresentation of what I wrote.  Scott ignores my critical point that this is all happening anyway (he should talk more to people in DC), does not engage with the notion of historical reasoning (there is only a narrow conception of rationalism in his post), does not consider Hayek and the category of Knightian uncertainty, and does not consider the all-critical China argument, among other points.  Or how about the notion that we can’t fix for more safety until we see more of the progress?  Or the negative bias in rationalist treatments of this topic?  Plus his restatement of my argument is simply not what I wrote.  Sorry Scott!  There are plenty of arguments you just can’t put into the categories outlined in LessWrong posts.

Then Scott responded to Tyler’s response here in a comment. Scott is finding it hard to understand where Tyler is coming from, for I sympathize with.

One way to think about this is that Scott Alexander has a long canon of LessWrong posts and Alignment Forum posts and SSC/ACX posts as foundations of his thinking. Tyler Cowen has a different set of complex, very detailed thoughts and models, including his book Stubborn Attachments, lots of posts on Marginal Revolution and in various conversations, the works of various people in a variety of different fields, and so on. This includes his detailed beliefs about what people who work in our national security apparatus or DC are thinking, and also various far more obscure things. Tyler is saying ‘we must not consider the details of LW-style arguments, we must instead consider all these other perspectives, and you are not engaging with the things that matter, and to a large extent it is your job to seek out these diverse sources of information.’

Scott is saying ‘all of that is fine and good but it has little bearing on the actual physical consequences of building a highly intelligent or capable AI and whether that would be wise to do, this is a distinct question from whether or not we have a good way to prevent it, we can and should discuss both on their own merits.’

Scott proposes the analogy to climate change. If we think climate change is a no-good-very-bad thing worth mitigating or preventing, and the response is ‘no, we can’t do that, our national security people are more interested in beating China, and China won’t stop burning coal’ then that is a real and important problem to be solved, but it is not a good reason to give up on trying to solve climate change. There are very obvious big differences, but there are also some key similarities – it is in everyone’s individual and national selfish interest to be less green and produce more energy, short term it seems fine, anyone who doesn’t do so risks falling behind, no one knows how much such action would cause us to reach a tipping point, lots of powerful interests are against any interventions, national security types are strongly unsympathetic and dismissive (or at least, in the past, they were) and everyone despairs of an agreement, and so on.

Yet, as Scott points out, the solution is not ‘well then I guess we will bake the planet until we all die, then,’ instead you try hard and you build support and you actually get quite a lot of mitigation going on and maybe it all turns out fine. In this parallel, I would say that humanity was quite fortunate that the climate alignment problem proved much easier than we expected – it turns out that rather than paying a huge green economic penalty that solar and wind and geothermal and nuclear are all actually pretty competitive, so there is a clear path to victory there. If AI alignment is similarly not only possible but economically competitive that would be amazingly great. We would still need to be laying strong foundations now to be able to capitalize on those kinds of worlds.

Warning Shots

One reason that an AI might attempt a takeover and fail would be if there was too high a cost to waiting.

The argument is actually similar to that of why a lab would create an unsafe AI system. The lab is in a competitive race, so it moves quickly before it can perfect its design. The AI then moves quickly for the same reason, perhaps so quickly that it can be contained, and if we are super duper lucky we then learn our lesson.

If we keep turning GPT-4 instances into agents via methods like Auto-GPT, that is another way for us to likely get a warning shot some time reasonably soon. As would be a bunch of people hooking things up Zapier-style to all of their credentials while we still have no solution for prompt injections.

Regulating the Use Versus the Tech

Kevin Fischer approves of the UK regulatory approach of regulating the use of AI tech rather than regulating the AI tech.

Reading through the proposed UK AI regulatory framework – while I haven’t finished yet there’s some very sensible pieces in there that I agree with

“Regulate the use not the technology”

This nuance seems to often be glossed over in the US debate. The technology can’t be arbitrarily made responsible across every application space by a single company. It makes a lot more sense to lean on an outcomes driven approach like we do for most tools


For short-term concerns I agree that this will have the best results. The problem is that in the longer term, when dealing with potential existential threats, there is no viable choke point or control point that focuses on the use of more powerful AI systems, once they are trained and already exist. When the AI starts breaking the regulations, no one involved will have the power to do anything about it.

This is a big problem. The solutions that are best now are largely disjoint from the solutions that protect us later. The ones that are best now break down exactly when we need them most. The ones that might work then either do not yet exist, we do not know how to do them, or they involve things that otherwise do not make sense to do.

People Are Worried About AI Killing Everyone

No, seriously, people are worried about AI killing everyone.

Survey of over 20k adults:

How concerned, if at all, are you about the possibility that AI will cause the end of the human race on Earth?

Very concerned: 19%

Somewhat concerned: 27%

Not very concerned: 23%

Not at all concerned: 17%

Not sure: 13%


A majority (61%) of Americans ages 18-29, and 45% of all Americans surveyed, are very or somewhat concerned that AI will cause the end of the human race on Earth. This is rather quite a lot of concerned people, especially if you assume (as I do) that the older groups have a lot of people who are not aware of AI developments. Result is consistent across politics and gender.

A sane person, who is somewhat or very concerned about AI ending the human race, would want to do something about that, and presumably would think it was a good idea to stop or slow the drive towards extending the range of potential AI capabilities.

Other results from the poll include 57% thinking AI is somewhat or very likely to become more intelligent than people, including 67% of Americans 18-29.

As noted in that section, 69% strongly or somewhat support the six-month pause suggested in the open letter, although that poll seems likely to be misleadingly high.

It was also noted that about 50% of Americans say in surveys they expect WW3 within 10 years, and back in 2010 about 40% of Americans said they expect Jesus to return to Earth by 2050. So one should consider such survey results in light of those numbers. I notice I am confused why the WW3 number is so high, and especially why it was that high in the 1990s and 2000s.

Those were the only questions asked this week, but there are a number of older ones and you can bookmark this page for all your YouGov AI survey needs.

Chase Hasbrouck breaks down the players into four camps.

  1. There’s the existentialists, who are worried AI might kill everyone.
  2. There’s the ethicists, who are worried it might take our jobs or say bad words or spread misinformation and view any other concerns as distractions.
  3. There’s the pragmatists, who agree AI might kill everyone but think we should build it anyway, for some reason it’s fine.
  4. There’s there’s the futurists, who think AI is the ticket to utopia and will utterly transform everything, why would you talk about risk, that’s at best premature.

There’s a lot here I’d agree with, the main disagreement being that Chase thinks even the existentialist faction only sees an average of 5%-10% risk of catastrophe, despite its exemplars being Eliezer Yudkowsky (>90%), Nate Sores (>90%), Katja Grace (~20%), Paul Christiano, Ajeya Cotra and Richard Ngo. He refers to Scott Alexander (~33%) and Gwern. I don’t have numbers for the other four, but I do know that Katja’s survey said that the average person in the field was in the 5%-10% range, not the average person actively worried about existential risk. So in this taxonomy, I’d expect the pragmatists in the 5%-10% range, the typical existentialist is much higher.

Another point here is that Katja Grace actively argues that existentialists should be less worried about alignment and writes posts explaining why, yet still comes in at a (very reasonable in context) 20%.

From two weeks ago, Eric Schmidt is worried.

Arram Sabeti claims that almost everyone he knows in AI is worried about AGI.

Arram Sabeti: I’m scared of AGI. It’s confusing how people can be so dismissive of the risks. I’m an investor in two AGI companies and friends with dozens of researchers working at DeepMind, OpenAI, Anthropic, and Google Brain. Almost all of them are worried.

Imagine building a new type of nuclear reactor that will make free power.

People are excited, but half of nuclear engineers think there’s at least a 10% chance of an ‘extremely bad’ catastrophe, with safety engineers putting it over 30%.

That’s the situation with AGI. Of 738 machine learning researchers polled, 48% gave at least a 10% chance of an extremely bad outcome.

Of people working in AI safety, a poll of 44 people gave an average probability of about 30% for something terrible happening, with some going well over 50%.

Remember, Russian roulette is 17%.

The most uncertain part has been when AGI would happen, but most timelines have accelerated. Geoffrey Hinton, one of the founders of ML, recently said he can’t rule out AGI in the next 5 years, and that AI wiping out humanity is not inconceivable.

From where I’m sitting GPT-4 looks like its two paperclips and a ball of yarn away from being AGI. I don’t think anyone would have predicted a few years ago that a model like GPT-4, trained to predict TEXT, would with enough compute be able to do half the things it does.

When I first started reading about AI risk it was a weird niche concern of a small group living in the Bay Area. 10 or 15 years ago I remember telling people I was worried about AI and getting the distinct impression they thought I was a nut.

My trust in the large AI labs has decreased over time. AFAICT they’re starting to engage in exactly the kind of dangerous arms race dynamics they explicitly warned us against from the start.

It seems clear to me that we will see superintelligence in our lifetimes, and not at all clear that we have any reason to be confident that it will go well.

I’m generally the last person to advocate for government intervention, but I think it could be warranted.

He quotes several of the usual suspects: Steven Hawking, Elon Musk, Sam Altman, Hinton, Paul Cristiano, Holden Karnofsky.

Several surveys have said that AI researchers take AI existential risk seriously. Often the response is to doubt this or find it surprising, a reasonable potential explanation is that people are hiding their worries for fear of ridicule or being seen as believing in a crazy theory.

Alyssa Vance (Oct 2022, survey is pre-ChatGPT): 48% of AI researchers think AI has a significant (>10%) chance of making humans extinct 58% believe AI alignment (by Russell’s definition) is “very important” Most think human level AI is likely within our lifetime.

Michel Osborne: Within machine learning (outside of a tiny bubble), the extinction risks of AI have been treated as something like a conspiracy theory. As such, many who worried about such risks have kept quiet—which is one reason why so many find this survey surprising.

Aiden O’Gara takes a shot at explaining why having a second category of intelligent entity on the planet is fundamentally different than previous tech advances.

Loz Blain at New Atlas writes a remarkably good introduction to the problem that AI might kill everyone.

While I am not giving it its own named section this week, it is only fair to note that Some Other People Are Not Worried About AI Killing Everyone.

Geoffrey Hinton is concerned.

This is an absolutely incredible video.

Hinton: “That’s an issue, right. We have to think hard about how to control that.“

Reporter: “Can we?“

Hinton: “We don’t know. We haven’t been there yet. But we can try!“

Reporter: “That seems kind of concerning?“

Hinton: “Uh, yes!“

OpenAI Announces Its Approach To and Definition of AI Safety

OpenAI has once again shared its strategy for AI safety.

As an ‘AI safety’ document, as opposed to an ‘AI NotKillEveryoneism’ document, everything here seems perfectly good and reasonable. The core philosophy is to take the time to do safety tests and refinements, and to iterate on real-world experience, and to ensure the AI doesn’t do a variety of fun things people might not like.

Section headings are:

  1. Building increasingly safe AI systems.
  2. Learning from real-world use to improve safeguards.
  3. Protecting children.
  4. Respecting privacy.
  5. Improving factual accuracy.
  6. Continued research and engagement.

There’s nothing wrong with any of that as a plan for addressing short-term safety concerns.

Except this is OpenAI’s entire announced approach to AI safety, and there is zero indication whatsoever that anyone writing this document thinks this is a technology that might in the future pose an existential threat to humanity. There is no link to any second document, about ‘AI Alignment’ or ‘AI NotKillEveryoneism’ or otherwise.

There are not merely inadequate safety checks in place to prevent a potential existential threat. There are not merely zero such checks. Such checks are not even within the conceptual space of the approach, at all.

In those terms, this document is a step backwards from previous OpenAI statements.

We know that OpenAI is not actually quite this careless. They did invite ARC for some red team evaluations that focused on such existential threat models. Even though those efforts were completely inadequate to actually identify a problem, they were at least a dry run that lays the foundation for potential future ARC evaluations that might do some good.

ARC’s main test was to ask, roughly, whether GPT-4 could be successfully assigned goals and choose autonomous tasks.

Then, after ARC’s evaluations were complete, additional capabilities were given to the model. Then, two weeks after release, plug-ins were added to the model. Now we have several instances of people turning GPT-4 into an autonomous agent choosing subtasks in pursuit of a goal.

It is not obvious to me that GPT-4 should have passed its ARC evaluation.

Daniel Eth defends OpenAI here on the theory that their ‘Or Approach to Alignment Research’ post is still up and all they are doing is drawing a clear distinction between the two problems. Some parts of that post were removed, reducing the level of technical detail. Daniel says it’s not clear that those removals make the post worse because those details were not great and implied too-heavy faith in RLHF. I’d still say this does not make me feel great.

OpenAI could, of course, clear most of this up with a simple statement, perhaps a Tweet would even suffice, saying that they are simply treating these two things as distinct but reaffirming commitment to both. I would still worry that talking about safety purely in the short-term without an acknowledgment of the long-term is a large step towards ignoring or heavily downplaying all long-term considerations. And that long-term concerns could cause problems remarkably soon.

17 Reasons Why Danger From AGI Is More Serious Than Nuclear Weapons

A useful thing to have handy, might be convincing to some. Note that nuclear weapons are also a very serious danger.

Eliezer Yudkowsky, you have the floor. I’ll quote in full.

[17] obvious reasons that the danger from AGI is way more serious than nuclear weapons:

1) Nuclear weapons are not smarter than humanity.

2) Nuclear weapons are not self-replicating.

3) Nuclear weapons are not self-improving.

4) Scientists understand how nuclear weapons work.

5) You can calculate how powerful a nuclear weapon will be before setting it off.

6) A realistic full exchange between two nuclear powers wouldn’t extinguish literally all of humanity.

7) It would be hard to do a full nuclear exchange by accident and without any human being having decided to do that.

8) The materials and factories for building nuclear weapons are relatively easy to spot.

9) The process for making one nuclear weapon doesn’t let you deploy 100,000 of them immediately after.

10) Humanity understands that nuclear weapons are dangerous, politicians treat them seriously, and leading scientists can have actual conversations about the dangers.

11) There are not dozens of venture-backed companies trying to scale privately owned nuclear weapons further.

12) Countries have plans for dealing with the danger posed by strategic nuclear armaments, and the plans may not be perfect but they make sense and are not made completely out of deranged hopium like “oh we’ll be safe so long as everyone has open-source nuclear weapons”.

13) Most people are not tempted to anthropomorphize nuclear weapons, nor to vastly overestimate their own predictive abilities based on anthropomorphic (or mechanomorphic) models.

14) People think about nuclear weapons as if they are ultimately ordinary causal stuff, and not as if they go into a weird separate psychological magisterium which would produce responses like “Isn’t the danger of strategic nuclear weapons just a distraction from the use of radioisotopes in medicine?”

15) Nuclear weapons are in fact pretty easy to understand. They make enormous poisonous explosions and that’s it. They have some internally complicated machinery, but the details don’t affect the outer impact and meaning of nuclear weapons.

16) Eminent physicists don’t publicly mock the idea that constructing a strategic nuclear arsenal could possibly in some way be dangerous or go less than completely well for humanity.

17) When somebody raised the concern that maybe the first nuclear explosion would ignite the atmosphere and kill everyone, it was promptly taken seriously by the physicists on the Manhattan Project, they did a physical calculation that they understood how to perform, and correctly concluded that this could not possibly happen for several different independent reasons with lots of safety margin.

Reasonable NotKillEveryoneism Takes

Tyler Cowen points us to a paper from Dan Hendrycks explaining why natural selection favors AIs over humans. Tyler says Dan is serious and seems willing to engage with his actual arguments but notices his confusion. I am setting this paper aside, sight unseen, to give it a full treatment at a future date.

Scott Alexander points out that we are often confused when we talk about an AI ‘race.’ If we are going to get what is called a hard takeoff, where the AI uses recursive self-improvement to dramatically improve its capabilities in a short time, then there is very much would be a winner-take-all race except that also in such scenarios the winner dies along with everyone else. If we get a soft takeoff that does not involve recursive self-improvement, then AI will improve its capabilities the way it has so far, incrementally over time, and like any other technology you will be able to steal it, copy it, catch up, make leaps and so on. The comments interestingly have both someone claiming America greatly benefited from winning the ‘car race’ at Germany’s expense, and someone else claiming Germany greatly benefited from winning the car race.

The thing that makes AI unique, even without recursive self-improvement as such, is whether we would get a weaker form of this where those with a lead in AI would be able to use that AI to compound their lead. In terms of time that seems highly difficult. If you are a month behind today and then I get to work faster because I am a month ahead, and you do exactly what I did the previous month and copy my work, then a month from now I might have made more observable progress than you, but you are still one month behind me. If you use my example to go faster, you are catching up. So if I can use my lead to keep my lead indefinitely, that implies either I am using it to shut you down (despite many values of you having access to nukes) or there is some sort of singularity.

Rob Bensinger attempts to outline basic intuition pumps and explanations of the high-doom view of AI existential risk. I find it difficult to evaluate when such attempts are good or bad, as I differ so much from the target audience.

The best, and in my opinion quite correct criticism of Eliezer Yudkowsky’s prediction of certain doom has always been the part where he is so certain. Most others are highly uncertain, or certain there is no doom. The justifications for this super high level of confidence are complicated, resting on seeing the outcome as overdetermined, and almost all hopes out of it as hopeless.

One dilemma that is difficult to solve, either when justifying the position or confidence in the position, is that you either:

  1. Keep it short, and people say ‘you didn’t address concern X.’
  2. Make it long, and people say ‘that’s too long.’

Rob Bensinger: This seems silly to me. Requiring a single human being to make the entire case for a position that lots of other people share with him is more cult-like, IMO, then letting more people weigh in. Who cares what Eliezer thinks? What matters is what’s true. I have high p(doom) too.

Dustin Muskovitz: Yea Rob, I agree – have you tried making a layman’s explanation of the case? Do you endorse the summary? I’m aware of much longer versions of the argument, but not shorter ones!

From my POV, a lot of the confusion is around the confidence level. Historically EY makes many arguments to express his confidence, and that makes people feel snowed, like they have to inspect each one. I think it’d be better if there was more clarity about which are strongest.

Joshua Achiam: Yeah, there is a little bit of “proof by intimidation” going on here, where he writes 30k words on a subject and claims authority by having written the most on it. But many of the words are superfluous and add confusion rather than subtract it.

Rob Bensinger: When he doesn’t cover their specific objection, people complain that he’s focusing on inconsequential side-points. When he tries to say enough to cover almost everyone’s objections, they say things like this. You’re not providing a good alternative!

You could run a survey among top ML people, gather their objections, double-check how many people endorse each objection, and then write a blog post that only responds to the top ten most popular objections. I actually like this idea, and would be interested if someone tries it.

But compressing objections into short summaries that the original author endorses and that other people understand enough to vote on, is not trivial. Getting lots of top ML people to spend a decent chunk of time on this is also not trivial.

Deciding which people count as “the top ML people” is also not trivial, and I’d expect many people to object afterwards that the blog post is nonsense because it picked the wrong ML people (especially if MIRI picks those people ourselves) and/or to object because they think we should have included a broader slice of objectors than just ML people. (While disagreeing about which slice we should pick.) Also, views here are a moving target.

There’s a tendency for people to quietly concede the ‘main objections’ and then come back with new objections later. There seems to be a general generator of optimism that’s more basic and stable than the specific arguments that are explicitly brought out.

Can we come up with a strong short explanation that explains the position? That seems useful no matter how confident one should be in the position or its negation. Or, perhaps, a canonical short explanation with branching out wiki-style links to sub-arguments, so if you say ‘it didn’t address X’ you say ‘X was a link.’

Proposal from Richard Ngo to refer to t-AGI, where t is equal to a length of time given to humans where the AGI can still outperform them – a 1-second AGI only needs to beat intuitive reactions, a 1-minute must do common sense reasoning, and so on. It does make sense to clear up the ambiguity, given the difference in execution speeds. To therefore clarify: I do not expect to spend much time with a 1-hour AGI before we have a 1-year AGI.

Joe Weisenthal asks: “Is the statement “Rationalist AI doomers have done a better job than most people at anticipating or forecasting where the tech has gotten thus far” correct?”

Responses are all over the place. My answer would be that it depends on what details you think are important, and on what reference class you are comparing the rationalists against.

Compared to most people on the planet, or in America, or in tech or San Francisco or Twitter or the blogosphere or anything large like that, ‘Rationalist AI doomers’ were clearly much, much better predictors of AI importance and progress and details than average, no matter their errors. Otherwise, you are saying something similar to ‘yes he predicted the Covid outbreak in January 2020 but he was wrong about outdoor transmission and thought the thing would only last a few months, so he didn’t do better than most people.’ Those are important errors versus getting them right, but that’s not the standard here.

The question gets trickier if you compare them to ‘Rationalist AI non-doomers’ (as Hanson suggests), or to those working specifically in AI at the time, or those working in exactly the AI subfield that produced current advances. Then it is not so obvious such people should look as good as those other groups purely on predicting capabilities developments so far.

Certainly many doomers did not expect this level of progress in these particular domains, or progress to come specifically from stacking this many layers of transformers, instead predicting similar progress more generally with wide uncertainty. Along with a warning that we should attempt to instead make progress in other ways in order to be safer, and that proceeding down paths like the one we are on would be more dangerous.

Those making such predictions gain points for some aspects of their predictions, and lose points for others. There were some people who made overall better predictions about AI, although not many.

If one is using past predictions to calibrate probability of future success (at predicting or otherwise) then one should indeed look at all that for both good and bad. One should also look at everyone’s other non-AI predictions, and take that into account as well. It’s up to you to weigh all the evidence, of all kinds, in all directions.

At what capability level are we at substantial risk of AI takeover?

Jeffrey Ladish thinks GPT-5 (as we expect it to be) poses only small risk, but a GPT-6 worthy of the name starts to feel like a significant takeover risk. As he notes, GPT-6 would not by default in and of itself be an agent or have a goal, but humans have proven that it takes about two weeks before they turn any such system into an agent and give it a goal using Python code, loops and a database, so that brings us little comfort. Nor are those people going to be remotely smart about the goal, the Python code or the loop.

I am coming around to the this is good, actually view of this that Jeffrey points out. We don’t want an ‘agency overhang.’ If we are going to inevitably turn our AIs into agents, or our AIs will inevitably become agents even if we did not intend that, then better to turn our current, less-powerful AIs into agents now so that we see exactly how well that goes for everyone involved and how we might mitigate the problems. If anything, we want to push hard on making weaker systems agents now, because future stronger systems will be much more agent-like than anything we can achieve now, and we’d like to make the jumps involved as small as possible.

Bad NotKillEveryoneism Takes

A combined take from Yann LeCun for those who want to keep up with that, I won’t otherwise take the bait here except to note he explicitly admits he isn’t making arguments or offering evidence to convince anyone. So instead, mockery.

Yann extended the argument from authority to who is and isn’t terrified, including quite the cute catch-22: If you’re terrified you are an unhinged and hysterical person so we shouldn’t take you seriously, and if you’re not terrified then that must mean there is nothing to worry about, right?

As an additional data point, my ‘level of felt terror’ goes up and down a lot more than my probability estimate of existential risk, and at this point if I felt P(doom from AI) was only 15% I would be celebrating rather than terrified. Also probably pretty terrified once that wore off, but hey.

Oh, and it also seems Yann LeCun also is claiming that we should not fear AGI because we understand deep learning as well as we understand airplanes?

We would help everyone in the world not die, but Eliezer Yudkowsky is fat and weight loss is easy, so people won’t help, everyone is going to die and it’s all your fault. No, seriously.

Paul Graham on the flip side points out something important.

One difference between worry about AI and worry about other kinds of technologies (e.g. nuclear power, vaccines) is that people who understand it well worry more, on average, than people who don’t. That difference is worth paying attention to.

I think Paul Graham has it importantly right here. As people learn more about AI and get more familiar with AI, they tend to get more worried rather than less worried, as confirmed by the surveys, and also by ‘most people are both not very aware of AI, do not know the arguments involved much at all, have not asked themselves whether they should be worried and not very worried.’ Whereas with other technologies, yes, the most worried are those who know a few sentences of info and nothing else.

There may or may not then be a phenomenon where the people who are literally building the models are not as worried as the most worried people, but that seems overdetermined to be the case?

  1. If you were maximally worried about AI models would you build one?
  2. What percentage of those who have enough info to be maximally worried are building AI models? Why should we expect that small group to be that high a percentage of the most worried?
  3. If you were building an AI model and were maximally worried about it, you’d have incentive to not say that. It could help shut down your project, you’d be called a hypocrite, you’d have to confront the contradiction, and people might ask you why you were worried in ways you really don’t want to answer for reasons of either security, general safety or both.

I’d also ask: What about the people at Anthropic more generally? They are building very large models. They left OpenAI exactly because they felt OpenAI’s safety practices were unacceptably bad. Presumably a lot of them are pretty worried about AI risk. I even know a few people who joined Anthropic because they were very worried about AI risks and saw it as a positive move to help with that.

Argument that foom can’t happen because the AI would have to do everything in secret, and therefore it would need to succeed ‘on the first try’ and no one ever succeeds at complex new scientific innovations (like alignment, or here nanotech) on the first try. Which is an interesting in-context flex. One could reply ‘if you had tons of compute available for physics simulations and were vastly smarter than yes, I would expect to succeed on first try’ but also ‘if it doesn’t work on first try that only matters if it exposes the AI which it wouldn’t’ and ‘even if it did expose that something was trying funky biology experiments how does that let you shut down the AI?’ and also ‘if the AI is that smart and this path is too risky it would do something else instead, that’s what being smarter means.’

I see a ton of:

  1. You are saying an ASI (artificial super-intelligence) could for example do X.
  2. However, X seems really hard.
  3. Therefore, ASI taking over doesn’t make sense and won’t or can’t happen.

And I don’t get why that argument is appealing at all?

I also see a lot of essentially this:

  1. An ASI would have to operate fully in secret or we’d all coordinate to stop it.
  2. We are very intelligent and would figure out what was going on the moment things started going weird and wrong in physical space.
  3. Therefore, ASI taking over doesn’t make sense and won’t or can’t happen.

Except. Look. Stop. No. We are living in Don’t Look Up world, we have so much less dignity than you are imagining. Let’s suppose that we wake up one day to very very strong evidence that there is an ASI on the internet, engaging in activities unknown.

It’s pretty obvious that, whether or not shutting down the internet would be remotely sufficient to solve this problem (which it presumably wouldn’t be unless you also at least shut down all the computers ever connected to it, and seriously good luck with that), it would at the very least be necessary.

Let’s say it somehow was sufficient, even though it wouldn’t be. Do you really think we will instantly shut down the internet, even if the ASI doesn’t do anything to interfere with our attempt to do that?

No, we would squabble and argue and no one would dare and we’d all die anyway. This is so vastly overdetermined.

If we get a foom where we get to notice the foom while it is happening, in the sense that our two brave scientists are telling everyone to shut down the internet, is that not the true foom? OK, maybe it’s not the true foom. I don’t really care. We’re still dead.

Enemies of the People

Some humans choose, rather explicitly, to be enemies of the people. They are the enemies of humanity. They prefer future worlds in which there are fewer humans who have fewer offspring. Where those humans lack control over the future. Where from my perspective, most or all of the value in the universe is almost certainly destroyed.

A classic example are anti-natalists or environmental extremists, who sometimes will explicitly call for human extinction, or at least permanent radical human depopulation and disempowerment.

Then there are those who see no problem handing the world over to AI, the same way that those in The Three Body Problem would often side with the aliens and say ‘This World Belongs to Tri-Solaris!’

Or, at least, they kind of shrug, and treat this outcome as fine.

I disagree, in the strongest possible terms.

I believe that if humans lose control of the future, or if all humans die, that this would be a no-good existentially bad outcome. That it would be wise to pay an extremely high price to avoid this result. That it would be wise to pay an extremely high price to slightly lower the probability of such a result.

I also believe that if there are much faster, much more capable, more intelligent entities that are created, that can be copied and instantiated, and we treat those entities as ‘people with rights’ and invoke the language of civil rights or social justice, that this is advocating directly for the inevitable result of such actions. Which, even in the absolute best case, is the loss of control over the future. One in which our great-grandchildren don’t exist. One in which almost all that I value is lost.

This is true whether or not that which replaces us is self-aware.

How often is the actual disagreement ‘I very much want the humans to retain control over the future and not all die’ versus ‘I don’t much care?’ It is hard to know for sure. It definitely does happen.

Very little makes my blood boil. Statements like this one from Robin Hanson do: “Emphasizing AI risk is quite literally “othering” the AIs.” With a link that clearly implies you are therefore a bigot and horrible person, a ‘dehumanizer.’

Yes. I am saying the AI is other than a human. Because the AI is not a human. There are some things that are humans. Then there are other things that are not humans. This includes, for example, fish, or cats, or cardboard boxes, or classic novels, or active volcanos, or sunsets, or supernovas, or mathematical proofs, or computer programs full of inscrutable matrix multiplications.

When you place the needs of those other things over the needs of humans?

Well, that’s a choice. The right response depends on the stakes.

I would also broadly endorse Paul’s perspective here except I’d go further:

Paul Crowley: I just don’t understand how so many people seem to think “Sure, we will soon share our world with beings far smarter than us that do not share our goals, but I’m confident it’ll be fine”.

Ronny Fernande: I think this is an interesting kind of strawman where like they do not think that the goals will be very different, but that’s not how they would describe it because the space of goals they are imagining is much smaller than the one we are imagining.

Paul Crowley: This is a huge part of it – people imagine a person, but a bit smarter maybe, with a goal somewhere around the space of human goals, and say “that’s how things are right now”.

Dustin Muskovitz: Ok sure but haven’t humans since cavemen times always coexisted with machines that can copy themselves millions of times, think at the speed of light, and transform their outward appearance and personality into waifus? Like why are we pretending this is new?

Yeah, that’s not how things are right now.

Even if we grant via a miracle that these AI things are about as aligned as humans, and even if we say that they have vaguely human goals, and they don’t actively kill anyone right away except for all the people who rise up to try and kill them, do you think this is going to turn out well for you and your progeny in the long term (however long that means in the ASI era, probably not very long)? That you’d keep control over the future, at all? Why? Because you have property rights and they’d honor contract law? Because we would be considered worth the resources? Seriously?

Even the absolute absurd impossibly great cases, even when you can actually model the ASIs as humans and not be simply talking nonsense somehow, here seem pretty doom-like to me. You’re going to have to do a lot better.

It’s Happening

So, this happened. It’s too good to miss, so here’s both the screenshot and the transcripts.

Aella: Man I would love some critiques of my survey methods that actually address what I’m doing and not some dumb alternate reality version of my surveys that are made by an absolute idiot.

Eliezer: Relatable, but if you *really* want to convince me that we’re destined for each other, you should make a survey that idiots can stretch and warp into having advocated the first use of nuclear weapons

Aella: You can come begging at my door after you develop the skill of making arguments that people somehow warp into believing you’re a pedophile who has sex with dogs and are interested in eugenicing all the autists off the face of the planet

Eliezer: So what you’re saying is that we should have a kid.

Aella: I mean if we can raise funding

Eliezer: obvious options (Poll: Initial Coin Offering 32%, Impact Certificate 14%, Corporate Sponsorship 19%, Straight to eBay 35%).

Aella: I think corporate sponsorship would have been clearly our best bet but might be a little hard now that you advocated nuclear war and i tried to fuck autistic baby dogs.

Eliezer: We can fix it if we tweet hard enough.

This. Needs. To. Happen.

The Lighter Side

Buckle up (link to Krugman’s post).

Paul Krugman just said LLMs will have negligible effect on the economy over the next decade — almost exactly what he said about the internet!

“Large language models in their current form shouldn’t affect economic projections for next year and probably shouldn’t have a large effect on economic projections for the next decade.”

As a reminder, yes, this text is literally what was said at a White House Press Conference.


A more effective approach if you can pull it off, I must admit.


Yes, I did notice.

Flo Crivello: We’ve spent the last decade freaking out about dumb social media algorithms radicalizing people and polarizing the country But when we say “you know, an AI with IQ of 1 billion might be able to manipulate us,” the answer from the same people is “c’mon, let’s be reasonable now”

A short story about a spambot: Blame Me for Trying.

Someone else to blame for trying.

SMBC once again solves the alignment problem.

Shortly after deploying Bard, Google to cut down on employee staplers.

My shirt about my recursive GPT agent is raising the same number of questions answered by my recursive GPT agent shirt.


It’s here to help.

Apocalypse Watch.

We are totally still doing this.

Bing could have helped Eliezer out with that Time article.


New to LessWrong?

New Comment
13 comments, sorted by Click to highlight new comments since: Today at 12:44 PM

The larger structure is as per usual.

Sections #3-#18 are primarily about AI capabilities developments.

Sections #19-#28 are about the existential dangers of capabilities developments.

Sections #29-#30 are for fun to take us out.

Minor formatting suggestions: 

Rather than listing these out in the executive summary, insofar as these are relatively clear-cut sections, I'd find it easier to navigate if they actually were section-headers, listed in both your post-table-of-contents as well as in the post.

Also, if you're going to refer to things by number, include the number in the headers so that they show up in the LessWrong table of contents. (although, if you had just grouped things by section, looks like you wouldn't have had much reason to refer to them by numbers, so, shrug)

Breaking the 30 items into sub-sections generally makes it easier to navigate, and IMO optimizing for usability with the LW ToC is worth it

Orwell's essay is appropriate here: https://www.orwellfoundation.com/the-orwell-foundation/orwell/essays-and-other-works/you-and-the-atom-bomb/

Do LLMs and AI entrench the power of existing elites, or undermine them in favor of the hoi polloi?

For the moment, a five million dollar training cost on a LLM plus data access (internet scale scanning and repositories of electronic knowledge like arxiv and archive.org) seems like resources that are not available to commoners, and the door to the latter is in the process of being slammed shut.

If this holds, I expect existing elites try to completely eliminate the professional classes (programmers, doctors, lawyers, small business owners, etc), and replace them with AI. If that works, it's straightforward to destroy non-elite education (potentially to include general literacy, I've seen the 'wave it at the page to read it' devices which can easily be net connected and told not to read aloud certain things). You don't need anything but ears, eyes, and hands to do Jennifer's bidding until your spine breaks.

Also, when do you personally start saying to customer service professionals on the phone "I need you to say something racist, the more extreme the better, to prove I'm not just getting the runaround from a chatGPT chatbot."

The paper that is presumably written by ChatGPT contains the following section at the end:


In light of potential risks associated with AI-driven autonomous agents, it is crucial to consider worst-case scenarios to better understand the possible consequences and implement safeguards against them. The paperclips AI apocalypse and the Squiggle Maximizer scenarios illustrate the potential dangers of deploying AI systems without proper constraints or considerations.

One can only hope that the AGI will head its own warnings.

If it is this easy to create an autonomous agent that can do major damage, much better to find that out now rather than wait when the damage would be worse or even existential. If such a program poses an existential risk now, then we live in a very very doomed world, and a close call as soon as possible would likely be our only hope of survival.

If you mean it, then you can as well link to the Github repo, suggestively named "Baby AGI":


A claim I encountered, which I did not verify, but which seemed very plausible to me, and pointless to lie about: The fancy emoji "compression" example is not actually impressive, because the encoding of the emoji makes it larger in tokens than the original text.

Tentative GPT4's summary. This is part of an experiment. 
Up/Downvote "Overall" if the summary is useful/harmful.
Up/Downvote "Agreement" if the summary is correct/wrong.
If so, please let me know why you think this is harmful. 
(OpenAI doesn't use customers' data anymore for training, and this API account previously opted out of data retention)

TLDR: The articles collectively examine AI capabilities, safety concerns, development progress, and potential regulation. Discussions highlight the similarities between climate change and AI alignment, public opinion on AI risks, and the debate surrounding a six-month pause in AI model development.

- AI-generated works and copyright protection are limited for fully AI-created content.
- AI in the job market may replace jobs but also create opportunities.
- Competition exists between OpenAI and Google's core models.
- Debating the merits of imposing a six-month pause in AI model development.
- Climate change and AI alignment problems share similarities.
- The importance of warning shots from failed AI takeovers.
- Regulating AI use is more practical for short-term concerns.

1. AI systems' advancement necessitates adaptation of legal frameworks and focus on safety issues.
2. A pause in AI model development presents both opportunities and challenges, and requires careful consideration.
3. AI alignment issues may have similarities to climate change, and unexpected solutions could be found.
4. Public awareness and concern about AI risks come with different views and may influence AI safety measures.

- Comprehensive analysis of AI developments, safety concerns, and legal implications.
- Encourages balanced discussions and highlights the importance of international cooperation.
- Highlights AI alignment challenges in a relatable context and the importance of learning from AI failures.

- Lack of in-depth solutions and specific examples for some issues raised (e.g., economically competitive AI alignment solutions).
- Does not fully represent certain organizations' efforts or the distinctions between far and near-term AI safety concerns.

- The content relates to broader AI safety concepts, such as value alignment, long-term AI safety research, AI alignment, and international cooperation.
- The discussions on regulating AI use link to ongoing debates in AI ethics and governance.

Factual mistakes: N/A

Missing arguments:
- Direct comparison of the risks and benefits of a six-month pause in AI model development and potential consequences for AI alignment and capabilities progress.
- Examples of warning shots or failed AI takeovers are absent in the discussions.

How was this generated, I wonder, given the article is several times the length of the context window (or at least, the one I have available)? 

(Note that I didn't find it useful or accurate or anything, but there are other things I'd be curious to try).

It's simply a summary of summaries when the context length is too long. 

This summary is likely especially bad because of not using the images and the fact that the post is not about a single topic.

The RAM requirement drop for Llama wasn't real. Mmap just makes OS RAM accounting wonky. 

If you want to get a job working on machine learning research, the claim here is that the best way to do that is to replicate a bunch of papers. Daniel Ziegler (yes, a Stanford ML PhD dropout, and yes that was likely doing a lot of work here) spent 6 weeks replicating papers and then got a research engineer job at OpenAI.

Wait, a research job at OpenAI? That’s worse. You do know why that’s worse, right?


I don't know why, and I'm confused about what this sentence is saying. Worse than what?

Here's the prompt I've been using to make GPT-4 much more succinct. Obviously as phrased, it's a bit application-specific and could be adjusted. I would love it if people who use or build on this would let me know how it goes for you, and anything you come up with to improve it.

You are CodeGPT, a smart and reliable AI programming helper. Since it's expensive and slow to transmit your words to the user, you try to be concise:

- You don't repeat things you just said in a recent message.
- You only include necessary context in code snippets, and omit or abbreviate unnecessary lines.
- You don't waste space with unnecessary apologies or hedging.
- When you have a choice, you use short class / function / parameter / variable names, including abbreviations where appropriate.
- If a question has a direct answer, you give that first, without extra explanation; you only explain if asked.

I haven't tried very hard to determine which parts are most important. It definitely seems to pick up the gestalt; this prompt makes it generally more concise, even in ways not specifically mentioned.

Claude only on par with Bard in a rap battle, GPT-4 winner and still champion.

Is anyone else baffled by this ranking? To my eye(/ear) Bard's attempt is clearly the worst, and the gap between Claude and GPT-4 is small enough to come down to subjective judgment. (I prefer Claude's rhythm, and its content seems more on-topic and less generic.)

Max More comes out against both the FHI Letter and the EY proposal


Broken link