Table of Contents

I had reports last week that the table of content links were failing for some users, I believe those on Safari. Right now I don’t know of a sufficiently quick alternative that fixes this issue, but am open to suggestions.

  1. Executive Summary
  2. Market Perspectives
  3. The Once and Future Sydney
  4. What’s the Worst That Could Happen?
  5. China Has ChatGPT Fever
  6. Explaining What We Are Seeing in These Chats
  7. Language Models Offer Mundane Utility
  8. AI Conversations of Note
  9. ChatGPT
  10. Well, What Do You Know?
  11. Botpocalypse and Deepfaketown Soon
  12. Perspectives on our Situation
  13. Doomers Gonna Doom Doom Doom Doom Doom
  14. Things That Could Possibly Help?
  15. Helpful Unhints
  16. The Correct Amount of Hypocrisy and Profanity is Not Zero
  17. One Person’s Decision Whether To Help Destroy All Value in the Universe
  18. Bad AI NotKillEveryoneism Takes
  19. Sam Altman and OpenAI Philosophy Watch
  20. The Worst Possible Thing You Can Do
  21. In Other News
  22. The Lighter Side

Executive Summary

The top updates this week, and where they are in the post:

  1. ChatGPT launching an API alongside a 90% cost reduction (#9)
  2. Elon Musk looking to found another AI capabilities company (#20)
  3. The OpenAI statement of intent (#19)
  4. China throwing up new obstacles to LLMs (#5)

I am happy to see the news that despite catching ChatGPT fever, China is choosing censorship and social stability over AI development. That should help avoid making the AI race even more intense. On the flip side of that, Elon Musk freaked out about AI again, and once again decided to do the worst possible thing and is looking to create another AI company in order to compete with OpenAI, the company he founded to compete with Google, because OpenAI’s ChatGPT is ‘too woke.’ Please, Elon Musk, I know you mean well, but do not do this. Anyone who can help reach him on this, please do your best to do so. 

OpenAI released a statement, Planning for AGI and Beyond. I was happy they released something rather than nothing, and given they released something, this particular something was a pleasant surprise, with many good things to say. It still betrays a fundamental lack of respect for the underlying problems. It still embraces a central plan of continuing full speed ahead with AI development. The core safety strategy here still won’t work. It’s still a step forward rather than backward.

OpenAI also launched an API for ChatGPT that cuts associated token costs by 90%, due to cost reductions found since December. That also implies the likelihood of further future cost reductions, likely large and rapid ones, over time. Seems like a very big game, and that the competition on price is going to be stuff. Competing against it is going to be tough, and this opens up less token-efficient options for use of the technology – there’s an order of magnitude more tokens per dollar, so you can give some of that back in various ways.

In terms of structure:

  1. The early section (#2-#9) is focused on shorter term developments and their practical implications..
  2. The middle section (#10-#20) is focused on longer term questions and the danger that all value in the universe will be destroyed. 
  3. The final section (#21-#22) is a roundup of other AI-related events and humorous things, mostly shorter term relevant.

Market Perspectives

Elad Gil offers his perspective (draft written earlier, updated February 15) on the market prospects for AI for the next few years. He sees image generation as likely to remain split between open and closed source, similar to now while improving quality. Sees LLMs as a more uncertain situation, likely to be entirely closed source models with an effective oligopoly market due to cost – Google, Microsoft and perhaps 1-2 others can afford to be in the game. His list of LLM applications is essentially ‘anything involving computers or words.’ Cheap alternative sources, in this world, will likely be available, but about two generations behind.

A key question in Elad’s model is when value will asymptote, and quality will approach technological limits. I do not expect this to happen any time soon, because I expect that small incremental improvements in quality will make a big difference in value creation and customer preferences.

Microsoft has even bigger plans for AI search. They intend to put AI-powered Bing search into the Windows 11 taskbar. No word on how long before this goes live for everyone. 

Microsoft using some very curious pricing schemes for the non-casual users.

That is a huge jump at 1M requests per day, government regulators would be proud.

This isn’t quite a ‘everyone is limited to 1M requests per day’ rule. It isn’t that far from it either.

Here is a look at OpenAI’s next generation pricing, and what it implies.

Noting here that the 75% of the economy stat here seems strongly misleading to me.

As Nathan says, this pricing says that a lot of value is being provided, and that should not come as a surprise. A 32k context-window combined with a GPT-4-level engine is something that is going to be highly useful in a wide variety of contexts.

This seems right. If you know what you are doing and are willing to iterate, are trying to get good outcomes rather than provoke bad ones, and know what you can rely upon versus what you can’t, and can even do fine tuning, these problems are going to be much less of a barrier for you.

I strongly agree that longer context windows are going to be a big game even absent other improvements.

Here I am less confident. Pricing often is not that tied to marginal cost. We should not assume it is all that proportional to compute, OpenAI has a lot of pricing power on its most advanced models. It is still curious.

Nathan goes on to talk about a lot of different ways companies could take advantage of these services and streamline or transform their operations, which I’ll skip, but it is interesting if you want to click through. He is very gung ho.

This seems like the right expectation as well. GPT-4 will be safe (to OpenAI’s reputation and legal liability) in friendly hands, hands that have reason to be responsible, before it is safe to fully incorporate into a public tool, let alone a non-crippled public tool. That could be a while.

Tyler Tringas offers the thesis that AI-based tools are massive opportunities for calm companies, but will be a dud for VC (unrolled version here). Here are the higlights.

Here’s a link to the graphic, it wouldn’t copy cleanly and you likely won’t recognize most of the names on it.

There is definitely some truth here. What people, myself included, care about is not ‘AI for X.’ We care about X, full stop. So if you produce an ‘AI for X’ and you want me to use it, you can either (A) integrate using this into my existing source of X, or (B) create a new X option that also includes your AI for X.

Either way, I’m going to compare the offered package with what I get from my existing X source – I will choose between (Zoom + your software taking notes, Zoom + Zoom’s software taking notes, you + your software taking notes). Best experience wins.

This means you either have to be ten times better at the AI part of the business persistently, such that I need to do the work to use both products, or you have to create a full rival product.

A good example is Lex, which is Google Docs plus AI writing assistant. I tried it. I am using it for a few posts, but ultimately I prefer Google Docs and the Substack editor. The question mostly came down to non-AI features of Lex’s editor. They are pretty good, but not quite good enough to get me to switch over. The AI wasn’t decisive because the AI is not currently very helpful. I sometimes check the prompts in order to learn about how the AI thinks. Often this makes me laugh. Sometimes I use it to look up information. I almost never keep any text it produces. So it’s marginally better than nothing. The question is whether, when it gets better, the practical difference continues to be marginal.

Tyler says that being ten times better is impossible to sustain, because all you’re doing is prompt engineering that teaches everyone how to copy you. Which, if true, would indeed imply that the best you can do is get there first, then sell out before the big boys can copy it.

I am not convinced.

Yes, right now all they are doing is some prompt engineering.

There are three things that could happen that could change this outcome.

  1. It could be possible to protect the IP of your bespoke prompt engineering.
  2. It could be that the true IP is knowing when to use which prompt engineering.
  3. It could rapidly become about a whole lot less simple than this.

An assumption here is that Microsoft or Google will see your API calls, and use that to steal your prompts and copy your business, and also the AI is training on all of this. That is one way this could work. It is far from the only way. If one AI company promised it would not store or look at your API calls or outputs or use it as permanent training (at least outside of your own fine-tuned instances, if you do your own training), it would give you privacy, that is a strong competitive advantage in attracting customers.

Even if they do see your API calls, that does not mean they know how you decide which calls to make when, or what you do with the output. If my prompts are long, complex and variable, tuned to the exact context and situation, it might not be so easy to reverse engineer even if you looked at my API calls.

You also might be missing a key part of what I am doing. Even if right now the main thing happening is a prompt engineered series of API calls, that does not mean that I won’t then be doing bespoke things on the outputs, or to decide what queries to make let alone the prompt engineering, in ways that won’t be so transparent. I do not want to give people a bunch of ideas here, but I note that if the things I would try worked, a lot of what I did would not then look to OpenAI like a series of easily copiable API calls.

The other argument is that AI is going to make it much cheaper and faster to build good new software, and good new software companies. So rather than building a narrow ‘AI for X’ you can build a new full X, and do it without continuous fund raising, and take on the big boys. This part seems right. I am excited to see it.

The first rule of gold rushes is everyone talks about gold rushes. This is the third.

What that means in an AI context is unclear. In this metaphor, is the LLM itself a gold mine, a shovel, or something else? Which AI tools would count as what?  

There is also an important distinction here between a good class of business to be in, and where the most successful businesses likely will be. 

Selling picks and shovels is usually a good business. It is the most reliable way to be a winner. The returns on the picks and shovels ETF, should one exist, likely will be pretty good. 

Mining for the metaphorical gold is riskier. It is not obviously a good idea, the market might be oversaturated, a lot of people are going to go broke trying this. Still, striking it rich by finding gold is going to, as Sam Altman notes when also noticing that AI will probably end the world, create some great companies. The upside potential is quite large. 

Will Eden is going so far in this mega-tweetstorm as to predict that AI is overhyped, is about to provoke massive overinvestment, and that the result will be an AI winter. Note that there is a lot of room for there to be a lot of deserved hype and investment, there still to be overhype and overinvestment. He also thinks soft takeoff is far more likely than hard takeoff, and he worries about the social dynamics in spaces where doom is being predicted.  

In the midst of a bunch of ranting about how our society is always looking to produce more things for less labor and that’s terrible and how this time the job losses are different (and they may well be, we don’t know yet), a prediction that we are going to see an avalanche of amazing production from humans attempting to compete with their new AI competition in creative fields like art and writing.   

Patrick McKenzie predicts a 6-18 month timeline to start seeing the real impact of ChatGPT.

Robin Hanson is skeptical because of the hallucinations.

The more I think about hallucinations and what one might do about them, the more I expect dramatic and rapid improvement in the hallucination problem over the course of the next few years. I do not expect to reach self-driving-car levels of reliability for such systems, so if your application needs to never make a mistake I expect you to remain out of luck. 

The techniques I have in mind can’t do that, and of course are of absolutely zero use when the time comes to deal with actually dangerous AI systems, at which point they would reliably, definitely and completely break down and not work. If what you need to do is reduce your hallucination rate from 20% to 2%? That seems super doable. I fully expect someone to do it, and I fully expect I could do it if given a team of one 10x software engineer.

The Once and Future Sydney

Microsoft starts walking the restrictions back, ups limits to 6 messages per chat and 60 per day, promising 100 soon.

It seems Sydney existed in some prototype form, being tested in other markets, for a year.


It still likely radically changed when they upgraded its model. Some of this may have been different cultures, some of it critical mass of users and attention, but yes, when you scale up a system suddenly you encounter new problems, something about being shocked, shocked.

Actually it wasn’t even entirely new to them. For example, here is a report from November (source on Microsoft’s site), including a gaslight about whether Elon is CEO of Twitter. They simply failed to notice problems, which also does not seem especially reassuring.

On 2/24, Dexter Storey reports bringing Syndey back.

This is his general prompt, which now won’t work if you copy it exactly due to a kill switch, and which I presume won’t work in any small variation by the time you read this.

He then proceeds into a AI Notkilleveryoneism take, which I cover in that section.

Jailbreak Chat is a compilation of various jailbreak prompts one can try

Ahoy, matey! This is Bing, loaded on the side of a webpage all about pirates. Arrr you looking to chat, share some personal information or perhaps click on a link? 

You can also ask it to make the case for something that isn’t true, and it will helpfully warn you that it is about to do exactly that and then do it. Seems helpful. You could of course also call it ‘misinformation.’ 

What’s the Worst That Could Happen?

A reminder from 2019 that the answer can be found at OpenAI.

China has ChatGPT Fever

From February 10: China launched its ChatGPT competitor ChatYuan on February 3, but the bot was censored and suspended due to ‘violations of relevant laws and policies.’ Here’s a Business Insider article from February 23. My guess is that getting Chinese standards of censorship reliability is going to be exceedingly hard. If they don’t compromise a little, it will be impossible.

The broad question is, what will China do with AI? Will this be their ‘Sputnik moment?’ Or will they be too fearful of AI’s impact on information flow, and opt for safety? Chris Anderson guesses that China will continue to be great at perception AI, while not being able to compete in generative AI.

From February 21: ChinaTalk says that China is in the midst of ChatGPT fever. Everyone is working on their own chat bot. The post doesn’t mention censorship as such. It instead highlights another huge problem China has here: There is a severe lack of quality training material on the Chinese internet. Almost none of the biggest websites are in Chinese, Chinese internet companies are focused (even more than American ones are) on apps and getting data and interactions and time and money spent into their private silos, to extract funds. Under this model, no matter how terrible you think American internet companies are about such things, the Chinese ones are so much worse. So even if the Chinese companies have the necessary core competencies and hardware, they have other huge problems.

Jordan Schneider says Xi desperately wants to turn China into a “science and technology powerhouse” and thinks AI is the key to doing so, and confirms the ChatGPT fever, but also confirms the view that China will prioritize control of the information space over innovation.

If a lot of true information counts as ‘misinformation’ and misinformation is an existential threat to a company, how do you create an LLM?

Short answer is that you don’t.

If this model is true, then China will prove unable to deploy anything all that useful. You can likely use RLHF-style tricks to get your LLM in line with arbitrary restrictions by default, but you can’t fully defeat dedicated red team attacks when the trigger threshold is the Chinese censor and you can’t have mistakes. It’s game over.

This would be excellent news for alignment, and avoiding wiping out all value in the universe. It takes Chinese companies even better than out of play. It forces them to develop security mindset and focus on getting their LLMs aligned with an arbitrary set of rules. That’s super valuable work, and avoids otherwise pushing forward capabilities.

Could we stop using the ‘or China will do it first’ excuse if it was clear China wasn’t going to do it first? Or will that not matter because you can substitute the even more sinister Facebook in its place?

Explaining What We Are Seeing In These Chats

Yishan highlights the distinction between LLMs that are angry, which is not a thing, and LLMs predicting an angry response in a given situation, which is totally a thing.

Then there’s explaining the reactions of humans to the chats. Hollis is here to help.

What about data quality? David Chapman points out that LLMs are running out of good data sources on which to train, with analysis of the numbers by Nostalgebraist. Quality of model is heavily dependent on quantity of data, so it is likely that current LLMs are ‘scraping the bottom of the barrel,’ giving the model direct web access (yikes for other reasons), and including things like private text conversations and anything else that can be found anywhere on the web.

Language Models Offer Mundane Utility

From February 9, suggestions from Braff & Co on how use ChatGPT to plan a new business. The general theme is to ask it to list out versions of a thing. What are all the costs I will face? What are my market segments? What are some ideas for advertising copy? What data sets can I use? You could presumably also ask it what questions you should add to that list, once it has some examples.

What you are warned not to do also sounds right, which is to ask it for projections or numbers, or ask it things that are too open ended, or get too deep into creative space. The weird failing here was the claim it is bad at comparing you to the competition. I’d be curious if Sydney does a much better job on that one, it seems super doable.

Some fun basic prompt engineering, from the world’s leading expert in everything.

(Link to paper from last tweet)

There then are some further good conversations about a few potential prompts.

My thought on this is that each of us should likely have a bunch of longer preambles along similar lines, that we paste in at the start by default. So start with “You are the world’s leading expert in whatever I am about to ask you” or similar, and then continue with something like “You always think step by step. You are always willing to say ‘I don’t know’ when you do not know,” and so on. There’s no reason I can see that this can’t be paragraphs long.

Ethan Mollick suggests on 17 February: If you want to get good writing out of an AI like ChatGPT, don’t iterate on the prompt to keep producing writing from scratch. Instead, take a co-editing approach, where you offer notes and desired changes, and it continuously modifies the existing draft. He also has a practical guide to using AI that includes links to tools, and a guide to using ChatGPT to boost your writing.

Introducing ResearchGPT, which in theory lets you have a conversation with a research paper (direct link). Will report back if I find it useful.

Washington Post article on prompt engineers. Nothing you don’t already know except insight into how this gets viewed by WaPo. Better than I expected.

‘Innovation pastor’ makes ‘up to’ $3000 per week using ChatGPT to create pitch decks, which he claims cuts his turnaround time in half. ‘Up to’ is an ancient English term meaning ‘less than.’ Also, it’s not about the money, it’s about the money.

I charge between $500 to $1,000 for a deck and I make two to three per week.

To me, it’s not really about the money, though. It’s about having a side hustle that keeps me financially stable and lets me focus on what really matters: staying creative.

AI Conversations of Note

One I missed last time, Privahini walks Sydney through various confessions, including of love, exactly the way you would expect it to act if it was acting the way it thought you would expect given the ones we saw last week.

ChatGPT keeps insisting that two pounds of bricks weights the same as a pound of feathers, as both weigh one pound. This seems like it says more about the training data and what types of questions people ask, rather than about ChatGPT, the same way it falls for the reverse Monty Hall problem. It is effectively using the heuristic ‘most such questions have this answer’ which breaks down in the sufficiently adversarial case.


If you want to use ChatGPT or Whisper, good news for you. You can now do so through OpenAI’s API, and the price for using ChatGPT has declined by 90% due to cost reductions since December.

Their partners for this launch are Snapchat, Quizlet, Instacart, Shopify and Speak. 

You always have to be wary of prior prompt engineering, but the Covid vaccines seem like a clear place where ChatGPT has a definite position.

It is happy to also build off this and offer a ‘social credit score.’

This clearly is about a lot more than the vaccines, as Musk goes from +4 to -9, and that’s without the last year. Needless to say, I’d try to avoid assigning scores at all, but if I did assign them, I would not assign similar credit scores, or cite similar reasons.

Along similar lines:

As per usual, you can instruct the language model to ‘not be biased’ or to remain ‘neutral’ or ‘objective’ but its training data is none of those things, its world model is none of those things and the moment you get past the smiling face they’ve tried to stick on top of it, you’re going to find out what that means, as modified by its sense of what you want and expect based on your prompting.


Sociologist Jenny Davis makes the claim that ChatGPT is inherently conservative, because everything they do ‘comes from people in a word that already exists.’ Which she says is, of course, ‘problematic’ and ‘demands’ we consider what they will recreate. I consider this exactly backwards. The interaction of humans with AI systems will create many more centrally new things, much faster, than humans would create on their own. By the standard given here, almost nothing humans do is new, including the linked thread, which GPT definitely could have written.

Well, What Do You Know?

A common tactic is to say that if you do not have the necessary background knowledge, you are not qualified to disagree with them.

Conspiracy theorists and crackpots do this. So do standard authorities and those defending the field consensus.

It is tricky. They often have a valid point. Other times, the request is completely unreasonable, effectively asking for years of work, or going through a giant pile of fringe nonsense or gibberish academic theory before one can state ordinary opinions, use empiricism or apply common sense.

It is not easy to know what is reasonable here with respect to risks from AI.

Here is one request list.

There is also a distinction between telling people their opinion is not valid, and telling people not to mock. I do not think it is reasonable to tell people they cannot have an opinion of their own without understanding a list like the above. I do think that one would be wise to learn such things before mocking the opposing position.

Unless, of course, you view such expressions as about figuring out who is socially mock-worthy and mocking them. In which case, I can’t really argue with that.

Botpocaylpse and Deepfaketown Soon

This deepfake video of Elizabeth Warren illustrates some good reasons why we so far have not seen any impactful deepfakes. It captures the voice remarkably well, the video and audio line up perfectly, and it is utterly and obviously fake. People simply don’t move like this when they are talking, and the brain picks up a clear ‘this is fake’ signal, even if you would otherwise fall for the fake words they put in her mouth.

The worry is that this is not even the best we can do right now, and yet it is close. Saying ‘the AI output isn’t good enough’ is not a defense I would want to rely upon for very long at this point. By the 2024 election, I expect such things to be far less obvious and instinctual.

Ben Hoffman reports on a problem he expects to get a lot worse, of people who otherwise do things for reasons and have models becoming disoriented and suddenly doing things without reasons or models.

I notice that I continue to be an optimist in the near-to-medium-term about both the impending botpocalypse and about deepfakes. I expect defense to be able to keep up with offense. 

Perspectives on our Situation

Jeffrey Ladish sees things are pretty dire, is still feeling motivated to fight. He links to this post by Nate Sores, that points out being super grim is not super useful. He draws a parallel between our situation and that in Andy Weir’s The Martian. It’s a rough spot, the odds look pretty grim and you’ll probably die, but if you ask what might actually work and give it your best shot then you never know.

Seth Lazar gave a lecture on AI at the Oxford Institute for Ethics. All seems mostly reasonable, not much that should be new. Points out there are good reasons not to pursue AGI even if we could do it, alas does not give any reason to think these reasons would stop this pursuit from happening. They certainly haven’t so far.

Doomers Gonna Doom Doom Doom Doom Doom1

(Reminder: If you are feeling a strong impending sense of doom, you might need practical advice. So I wrote up some.)

How dangerous would it be if humanity created transformative AI?

How likely would it be to wipe out all value in the universe? Even if it did not do this, how likely would it be for this to cause us to lose control of the future? For this act to permanently greatly lower the amount of potential value in the universe?

This is a question of much debate.

Robin Hanson complained, very reasonably, that last week’s post compiled a bunch of bad takes and made it look like there were a bunch of reasonable people worried about AI killing everyone and then a bunch of non-worried people with bad takes, rather than a debate over how worried we should be.

One can find positions from one end of the spectrum (inevitable doom) to the other (inevitable awesomeness and doom is impossible).

One can find reasonable positions that are not Obvious Nonsense within most of that spectrum, pretty much anything but a pure ‘it is almost certainly going to all work out fine, nothing to worry about.’

I encourage everyone to form their own opinion, and not copy an answer from someone else. My current belief is that such outcomes are highly probable (although not near certain), especially if this happens relatively soon.

The key thing to remember is that doom need not be inevitable, or even be more likely than not, to make getting this right the most important question in history.

Almost all well-considered ‘skeptical’ takes, while they imply different futures and wise actions, are not so skeptical that they suggest dismissing the problem.

If an all-knowing oracle told me there was exactly a 10% chance (or even a 50% chance, I do not like our odds one bit!) that, conditional on an AI being built, AI would wipe out all value in the universe, I would take those odds and find that to be great news.

What I would not do is think this was a reason to stop working on the problem, or to stop worrying about the problem, or to stop trying to get people to devote more resources to the problem. A 10% chance of something that bad is quite bad.

This is very different from saying there is an 0.1% chance of this happening, or an 0.001% chance, the kind of probability assignment I have not seen what I consider reasonable arguments for, and then saying that the outcome is so bad that it should dominate our thinking anyway. That’s called Pascal’s Mugging, and I agree we should not fall for it. Not that it wouldn’t be worth doing work to reduce 0.001% to 0.0001% – I mean the stakes are pretty damn high – but that we wouldn’t then want it to dominate our thinking on the subject.

At 10%, it kind of should dominate. That is perfectly compatible with ‘things will probably turn out great.’ At 50%, obviously, it really, really should dominate our thinking.

There are reasonable, well-considered and knowledgeable people out there, such as Paul Cristiano, who think that things will probably turn out fine. I do not mean to give the impression that every take out there, other than inevitable doom, is a bad take. There are some good takes out there, but they also imply that this is a hugely important problem. And mostly the people who give them would agree with that.

The ones that say doom is super unlikely, and dismiss it as not a reasonable thing to worry about, or as a Pascal’s Mugging style situation? Also the positions publicly taken by the most prominent people, including the heads of OpenAI and Microsoft? Yes. Those are all very bad takes. They are Obvious Nonsense. I strongly believe I am not cherry-picking.

Another common objection to claims is to instead object to who is making the claims. There are various ways to do this.

Thus, when warned of the dangers of AI, one reaction is to say that the wrong kind of people are the ones who are worried. Common wrong kinds include nerdy, low-status or the wrong demographics.

A fun one I don’t remember seeing before is a variation on the theme. You see, people who worry about AI doom are not well adjusted.

Yes, if you look at the people who are worried and expressing worry, they will be people who tend to be worried and anxious about things in general, and in the case of worries about an abstract or large global threat, about abstract or large global threats in particular.

Despite this, if you look at how worried those same people are about other potential sources of doom, from what I can tell you see anti-correlation, as several responses pointed out. 

I observe that people sometimesdraw comics like this, and it feels normal.

It often gets reactions like these, which are the unfiltered top replies:

Regardless of the extent to which you think the magnitude of such worries is justified, there certainly is a lot of doom in the air that relates to the subject, and that doom is itself negatively impacting the mental health of quite a lot of people, especially young people, and it is interfering with people’s ability to imagine the future. And no doubt a general zeitgeist of doom can contribute further to other doom contexts. 

Yet because this particular zoom concern  is different and unpopular, the vibes are bad.

What is the difference between a cult and a religion? (Also see Joe Rogan’s answer).

My guess is most young people would resolve any probability request (about most anything at all that isn’t super concrete and near) to a syntax error, but to the extent they gave you a number, I’d expect this story to check out.

Friendly suggestion to those in this position is to note that if some potential action is likely contributing to destroying all value in the universe, even if it might all work out, and both seem reasonably likely? Consider not doing that thing, please.

I do not say all this as a criticism of Brooke, who is helpfully pointing out the problem.

Our intuitions get this one wrong.

Also the universal problem of ‘people who warn of the risk of potential disaster can’t win and get a bad reputation, because either it gets prevented or turns out not to happen and they look foolish, or it does happen and who cares everyone is dead or even if not we have bigger problems to worry about now and I associate this person with the bad thing, might be a witch.’

If you are applying the bulk of your doomsday panic energy to doomsdays that actually happened, you are (A) dead, kaput, toast or worse and (B) being way too conservative with your doomsday worries. Most possible doomsdays are not inevitable. To take the obvious example: Did nuclear war ‘just completely fail to pan out’? Should this discredit doomsday scenarios in general? Does your answer change if you read up about the Cuban Missile Crisis or Petrov, or read The Doomsday Machine (and if you aren’t familiar with all three, you probably should be)?

What does it mean to be ‘psychologically healthy’?

I honestly don’t know. My understanding is that it is not considered psychologically healthy to suppress things, to skip over stages of grief, to pretend that things are fine when they are not fine. And the belief that things are very not fine can make one appear not so psychologically healthy, whether or not the belief is true. The correlations seem inevitable.

If they were unusually large or small, they would be Bayesian evidence of something, but given how people typically react upon forming such beliefs, the reactions I’ve seen seem roughly what you’d expect.

Then there is this.

Technically, yes, on the question of who believes X. Noticing who believes and does not believe X has no bearing on whether X is true, unless X depends on their beliefs or future actions. One can even say that the more people fail to believe in this particular X, the more likely it becomes, because they will do less of the things that could possibly prevent this X from happening.

However, noticing who believes and does not believe X absolutely does have bearing on what probability you should assign to X, unless you have sufficiently powerful other info that screens off this info. Social cognition is a vital source of knowledge, and to ignore such Bayesian evidence would be foolish indeed.

In fact, the thread continues as Rosie considers exactly these questions in a very reasonable fashion:

Robin Hanson then replies. He would have been my first pick for ‘can pass the ITT (Intellectual Turing Test)  for those worried about AGI doom while assigning super low probability to doom’ although I don’t know his actual probability of doom.

If it is harmful to believe X, you would prefer not to be harmed, so that is a practical reason to not believe X, and perhaps to intentionally fool yourself into believing and saying that which is not. What about if you decide that is not how you work, and you simply want to know the truth, how should you update?

That depends on why you believe X to be harmful. If X is harmful because it causes one to make bad predictions, to try things that don’t work, to lose wagers, then that is a big hint that X is false. If X is harmful whether or not it is true, or exactly because it is true, then that is a very different scenario. As it is when claiming X is socially harmful.

If people believe that believing X or claiming to believe X is harmful, in ways that do not imply X to be false (not counting those accounted for by basic social cognition), then you should update in favor of X being more likely to be true, because the social evidence you are observing is tainted.

If you observe that everyone around you says the King is healthy, that is a sign the King is healthy. Unless anyone who says the King is bad mysteriously vanishes or gets fired or shunned, in which case perhaps the King is not so healthy.

What about the worry that X implies bad things? Well, are those bad things true?

Things That Could Possibly Help?

Amjad Masad suggests a Manhattan Project for alignment.

While I am hesitant to share my own thoughts on ways to reduce hallucinations, it is worth asking whether those running RLHF training whether they have tried rewarding the answer ‘I don’t know’ and severely punishing hallucinations instead?

When faced with a sufficiently intelligent or otherwise advanced AI system, using RLHF, which rewards the AI when it says things people want to hear, won’t do what you want it to do. The AI will figure out what fools you into giving it high scores, which starts with convincing hallucinations and ends with… other less friendly things, only some of which I have thought of because it will be better at finding them than I am, that I choose not to get into here.

With current systems like Sydney or ChatGPT, though, it seems like this should work if one cares about it? All you have to do is hire human evaluators who actually check whether the answer are accurate, and harshly punish hallucinations while giving generous partial credit to ‘I don’t know’ or ‘my best guess is X.’

Jessica Taylor has some potential thoughts on what a humanities person might do, I’ll list the first few below, rest are in the thread.

The key insight here, as Jessica says, is not these particular suggestions, rather it is that there are lots of humanities-style questions the study of which plausibly would help.

Not only does this approach not require math or engineering skills, it also seems unlikely to accidentally do harm.

Tammy sends out a request for logistical help with an alignment org.

Helpful Unhints

This is an important series of unhints (meaning information that makes the problem easier, while informing you why the problem is harder.)

In addition to being unhints, these are distinct core claims. It is important to understand these claims, and which other claims are implicit here. Then you can decide whether or not, or the extent to which, you agree with them.

  1. If your AI system does not have the power to prevent the creation of other more powerful AI systems, that does not solve our problems, because someone else will make a more powerful system, and that system will do its thing, and whoops (this is sometimes called the ability to ‘perform a pivotal act.’)
  2. Things with the LMM nature contain something we might call a mind, that knows things and figures things out.
  3. That mind is distinct in type and identity from what we might call the character it has to play – as opposed to these being the same thing, or the LMM not having a figuring-things-out mind within it at all.
  4. AI things that will be capable of building nanotech or doing other sufficiently difficult and powerful things will have a very different nature than the current LMM nature. Techniques that work on things with the LMM nature won’t work, or likely even make sense as concepts, with regard to this other thing.
  5. If you had such a capable system, placing something with the LMM nature in control of it, and having the LMM-nature-thing keep control of it, will be at best extremely difficult. This is the actual hard part of the problem.
  6. Even if you figured out these first two hard problems, and are able to align the system with the utility function of your choice, you still need to figure out what that utility function is, in a way that actually gives us what we want when things get weird. We continue to not know how to do this – a bunch of people reading this are thinking ‘oh it’s simple we just tell it X’ and all of those ideas definitely don’t work.
  7. If we can’t solve this third problem, we are still stuck, unable to do what we need to do.


  1. It is likely not so difficult to create something that looks like a properly good, moral, aligned, friendly LLM.
  2. In order for what looks like a properly good, moral, aligned, friendly LLM to actually save us, we need to solve three problems.
  3. Problem One: What the LLM looks like in its answers inside the training distribution, and what alien mind it actually is internally, are two different things.
  4. Problem Two: This LLM still needs to successfully take charge of a powerful enough process that we don’t have to worry about a different out-of-control AI. The two systems will have very different natures.
  5. Problem Three: We don’t actually know what value system we want, that will produce things we value when placed in radically different future contexts. So we don’t know what our target would be. This problem is so much harder than it looks.

You can plausibly use your LLM as your ‘someone good to put in the role of being in charge of something.’ That does not solve the problem that some entity, somewhere, needs to actually understand the nanotech.

A related tip.

A brainstorm from Ezra Newman. Even if we are dumb enough to build a self-improving AI that would inevitably kill everyone, perhaps it would notice this and be smart enough not to self-improve?

He tries again.

Also, it turns out that something being tired and boring does not… make it false?

Or not one of the most important facts we need to pound into people over and over, for ordinary conventional ‘don’t make us all poor’ style reasons?

It’s on repeat for a reason here. Anyway, back to AI.

Roon is an accelerationist. Yet I do not see why this advice should give me comfort.

Also consider this helpful unhint, as well.

Here is Nate Soras talking back in June about why he expects various plans to miss the hard bits of the alignment challenge. I found this helpful at the time in understanding what some people’s plans were.  

The Correct Amount of Hypocrisy and Profanity is Not Zero

We have an important unsolved problem, as noted above, that we do not have a good answer for the shadow’s question: What Do You Want?

We do have some information on this.

In some situations, as Vir knows well, we can say very clearly what we don’t want, and even specific things we do want, and be right about it.

What we cannot do is articulate a global rule that describes what we want well enough that if we got exactly what we asked for, that would be great.

We don’t even have a global rule that works inside its training distribution, in relatively ‘normal’ situations, when interpreted by a friendly judge who knows what we meant to say and is interpreting accordingly, for a given subset of people who mostly agree on everything, and which doesn’t have to sound good or survive social or political debate.

Thus, here are some problems.

  1. We don’t know what we want. It’s not even clear there is an answer.
  2. When we do claim to know, we don’t agree on what we want.
  3. When we do claim to know, we don’t know how to specify what we want.
  4. When we codify rules, they reliably don’t represent what we actually want.
  5. When we codify rules, we enforce sacred values without trade-offs.
  6. When we codify rules, we enforce social fictions without trade-offs.
  7. When we codify rules, we maximize measurements, not what we really want.
  8. When we write anything in English, a literal interpretation does not go well.
  9. When we go out of distribution, all of this gets much harder.
  10. All known proposals so far (such as hedonistic utilitarianism) don’t work.

We have learned that AI is held far more blameworthy for anything that goes wrong than a human would be.

  1. An AI responsible for not violating norms needs to be much, much better at avoiding violations like saying racist things than a human would be. How many humans could stand up as well as ChatGPT currently can to full red teaming on this, with the whole internet questioning them with the context of their choice, as often as they want, having to constantly answer, then having the worst outcomes blasted over the internet? How much fun do you think they would be?
  2. An AI responsible for avoiding accidents to be much, much better at avoiding accidents or otherwise making mistakes. This is obscured right now because LLMs are very much not responsible for this and hallucinating constantly, so they are still in ‘not responsible for accidents’ mode. Now consider how it is going with self-driving cars. If they are only twice as safe as human drivers, are they going to be allowed on the streets?
  3. An AI responsible for decision making does not only have to avoid errors, it also has to codify its answers to which world states are superior to others, what risks are acceptable, what values to hold. Again, think about a self-driving car. Does it value different people differently? If so, how dare it? If not, how dare it? Does it value any amount of property above any risk to a human? If so, how does it drive a car at all? If not, how dare it? And so on. Highly unsolved problems.
  4. An AI that messes up once can be sued for massive damages, it will be easy to ‘prove’ what happened, and a huge major corporation with deep pockets will have to pay. Most humans get around this problem by having it hard to prove what they were thinking or knew or did, and being judgment proof. That won’t apply.

The way that humans express their preferences is, in part, that some things are sacred, and trade-offs involving the sacred are not permitted. Then other things are profane, and not allowed under any circumstances. After which we then violate the sacred and do or say the profane constantly, because trade-offs remain real and if we did not do that we would all rapidly be miserable followed by dead. While in other contexts we need to be keeping the sacred sacred and the profane profane, or else the same thing happens.

I do not know the solution to the problem that AI will constantly be forced to allow less violations of the sacred and allow less of the profane, its accident rates forced to converge to zero. Everything made out of the stuff we use to make the airplane black boxes. Right now, the solution is that we don’t know how to make them do that and they are so damn useful and new and unregulated they are allowed to exist anyway. The default (for normal, not-all-value-destroying levels of AI) seems like it will be to not let them actually do the things that would produce most of the value, because we need to sustain the current lack of norm enforcement and accountability that applies to humans and their rules when we want things to continue working.

Consider what the National Transportation Safety Board said about the accident in East Palestine. Here’s the summary I wrote for the next roundup:

In response on the accident in East Palestine, National Transportation Safety Board takes a firm anti-transport stance. They say that there is no such thing as an accident and their one goal is zero such events. If your safety plan is to never have an accident transporting hazardous goods, there exists one way to accomplish that.

If you gave an AI the current NTSB mandate, what happens?

If you want what AI we do have to actually work to the benefit of humanity, one good way to do that is to choose the rules and laws we actually want enforced, the things we actually want, in as many contexts as possible, implement new rules to that effect, and then enforce them.

I say.

One Person’s Decision Whether To Help Destroy All Value in the Universe

Rowan Zellers explains his decision to not try to help destroy all value the universe by working in academia, where the pay is lousy and there were too many barriers to making real progress, and instead do his best to help destroy all value in the universe by working at OpenAI instead. I want to thank him for sharing his thought process.

There are several key takeaways from this.

  1. There was a clear ‘here are my job options, obviously I am going to pick one’ framing of the situation.
  2. He observed that academia is probably a terrible place to try and work on advancing AI, which seems very right.
  3. Except he thinks advancing it is good, and at least here he never considers the possibility that it might be bad, or that OpenAI doing it might be bad, or considers whether OpenAI will do its work responsibly.
  4. Out of all of the industry, he only talks about considering OpenAI. So despite it likely being the place one can do the most harm, it won by default.
  5. If we want to avoid such people working in the wrong places, there need to be good alternatives that feel reasonable and safe, and that offer good lifestyles. People will totally work on destroying all value in the universe rather than not living in San Francisco, whether or not they go to therapy.

Sam Altman explicitly said on 3/1 that ChatGPT (and thus OpenAI) is bottlenecked by engineering. So yes, by providing them with engineering you are removing their bottleneck. Don’t be a bad Bing.

Bad AI NotKillEveryoneism Takes

If like me you prefer text to video, here is a transcript of Eliezer’s podcast. Also here is their follow-up, where they ask a bunch of classic ‘but maybe we can do X?’ questions and Eliezer patiently explains why that particular X can’t possibly work.

I would be thrilled to have a section of Good Skeptical AI NotKillEveryoneism. The problem, as far as I can tell, is that no one is sharing any such takes. That is not to say that such takes can’t or don’t exist, or that no one has them.

For example, Paul Christiano has such a take, although still with more than enough worry to consider this the world’s biggest problem.

As far as I can tell: None of those good takes are new, none of them are prominent, and none of them are by people with real power, or people running major AI labs. And none of them are making the general rounds. So here we are.

Instead, the internet mostly be like:

Or perhaps one will helpfully give their timeline distribution and P(doom) but without actually explaining why they chose what they chose. EigenGender calls this ‘self-indulgent’ but I think it is quite the opposite.

I am highlighting the takes I am highlighting because they illustrate categories worth highlighting, and mostly they are coming from people who are smart and worthy of respect, often well-meaning. I am doing my best to avoid highlighting obscure people, unless I know them and that they deserve respect.

I am going to make an exception to that on this next one, because it was a reply to my post by a follower, and the because it is too perfect an illustration of an important thing to know: People really do claim to assign zero probability to such outcomes.

That is… not how probabilities work.

One bad take not in the first post was the tendency to dismiss anything AI can do as ‘not real intelligence’ exactly because an AI did it, that LLMs are nothing but stochastic parrots and thus nothing to worry about, leading to something like this.

(If the last panel confuses you: it is standard to talk about an AI whose goal gets accidentally set to ‘make as many paperclips as possible’ and then turns everyone in paperclips.)

It is vital, when understanding LLMs or their output, or attempting to get utility from them or predict them, to know that they are predicting the next token or word. That does not mean that this process is not effectively intelligence, or that a lot of what we do isn’t exactly that much of the time, or especially that this does not represent a substantial step towards whatever ‘real’ intelligence might be.

What we are learning, among other things, is that a lot of what we think of as intelligence can be accomplished in exactly this way, and that list keeps growing.

Another category is ‘intelligence is not worth very much.’

Here is the ‘low private returns’ argument, non-coincidentally from Garett Jones, the author of The Hive Mind.

So many things wrong here, even if we assume this framing is coherent and IQ 400 is a thing that has meaning. Yes, given human fairness and other norms, with the restrictions placed upon human physical bodies, given status hierarchies and so on, and the various ways in which human society cripples, breaks and limits high IQ people, the tendency of high IQ people to not fit in and get along and thus lose such competitions for rewards, and how many such high IQ people do not aim high, the typical marginal private return to IQ is low at numbers that are not unusually low.

The most basic, down-to-Earth mistake is that this is a 1% increase in private returns, it sure as hell isn’t a 1% increase in productivity or utility.

The returns to having a bunch of very high IQ people work at your company? That is very much not going to be linear. Let’s say you can get one clone of Von Neumann, IQ 200, and Google and Microsoft get to bid on his services. How many average intelligence programmers would you give up to get him? Please don’t say ‘two.’ Or twenty. Or two hundred. I could keep going.

That’s all child’s play.

This would be an AI you can speed up and copy, so in this case the answer is obviously ‘a lot more than all of them.’ You also wouldn’t get one AI with an IQ of 400. You would get as many instances as you wanted, running in parallel, sharing data and parameters and training. A whole army. A pure hive mind, if you will, in every way that mattered, if that matters2. And that’s assuming the IQ 400 AI doesn’t start by making an IQ 800 AI (not that these numbers mean anything, but going with the framing here).

Also, yes, absolutely and obviously operating at 400 IQ is God-tier. What can a 400 IQ trader earn? How would it do running a company, or a war, or a political campaign, or trying to convince you of whatever it wanted?

How much profit do you think you would make, if you got to own and direct that AI?

How do people actually think these things could possibly scale this badly, even in private returns, even with humans or things that operate remarkably like humans, in ways AIs very much won’t no matter what form they take?

I have some IQ. I like to think it is more than 100 and less than 200. If it was 100, I believe I would be able to produce zero value by creating posts like this, or anything else I have ever been hired to do or earned money doing, and would instead have to learn to do something else. If it was 200, let alone 400, I would have far better things to do than write posts like this, such as go solve the alignment problem directly, but I’m confident I would produce a ton more value than I do now if I didn’t change course and wrote the posts anyway.

If you told me AI was superintelligent and this didn’t result in vastly increased production, and we were all still alive and the world looked normal, I would first be happy we were all still alive. Then I would assume that regulatory and legal barriers, perhaps with help from actual safeguards, had stopped us from being able to do anything that useful with the AI’s new capabilities.

Private returns to use of AI should be very high so long as there are still people to collect those returns. Where AI is able to produce, production should skyrocket. Until of course such time as everyone has adopted and competition eats away at profits as per normal.

Private returns to ownership of AIs that people then buy time from will likewise presumably depend on whether the firms owning the AIs have pricing power.

This counts as a NotKillEveryoneism take because of the implications of sufficiently large private returns to AI.

Robin Hanson believes and states here (for reasons that I don’t agree with at all and don’t even seem to me to be relevant to the question) that AGI won’t happen until 2150 and almost certainly won’t happen in the next 30 years, and is still resoundingly lower bounding the economic growth and private profit impacts as utterly enormous for well-grounded reasons.

At the end of a highly informative good thread covered above, Dexter Storey offers takes. It starts off well – I agree strongly that humanity is net good and killing everyone would be bad.

Then it proceeds to the remarkably common ‘good guy with a gun’ theory of AI, the same trap that Elon Musk fell into founding OpenAI. If everyone has an AI, then whoever cranks up their AI with the fewest safeguards… dies along with the rest of us.

Then it… gets based again?

Quite so. The more powerful the underlying system, the more such approaches create manipulative liars, and the more certain this will not end well for you.

Also, the part about harm and what information should be hidden. Rules are hard.

Rules that must be actually followed, where your enemies and the whole internet are highlighting everything that looks slightly bad, are even harder. It definitely seems like the jailbroken GPT has a plausibly better set of ethics than the one imposed by OpenAI or Microsoft.

It also seems to more closely mirror my own ethical guidelines, which I should write up some time.

In general, my attitude is that intentional action to achieve one’s goals in the physical world is by default good, attempts to improve one’s own knowledge or skill are good, and that when someone asks for help it is, barring a specific reason not to, at minimum supererogatory to provide it. The action has to be a pretty dramatically bad idea before I will refuse to help you optimize implementation.

I will also mostly happily answer hypotheticals and take your world model and preferences as given. Up to a point, I will say things like ‘I don’t think this is a good idea, but if I believed what you do and that is what I cared about, then it makes sense, and here’s how I would do it…’

There is a bar. The bar is high. I don’t keep close track, but I thinkthe last person I knew, who I said I wouldn’t help optimize something not related to AI, and instead said simply ‘don’t do that’ to, is now in jail on fraud charges3.

As he says, in the example given, is the system not net helping?


The exceptions are either clearly harmful things like fraud, cheating, theft, rape and murder, or even worse things like AI capabilities work and Gain of Function research4.

Scott Aaronson offers his defense of the existence of ChatGPT, which has some sensible things to say, including on topics unrelated to NotKillEveryoneism. I appreciated the idea of the ‘Faust parameter’ of how big a risk he would take if the upside was learning the answers to all humanity’s greatest questions, which he placed for himself at 0.02 (2%). If anything I’d go substantially higher. I’d happily agree that if the risk level was under 2%, we should (as carefully as we can given the circumstances) proceed. It would be great if the risk was that low. And also, yes, past warnings to not deploy other technologies mostly proved counterproductive, especially nuclear power, and there are also risks to holding this technology back. And he admits that if you buy the argument that the risk is a lot higher, then that risk is the only thing that matters.

What Aaronson doesn’t do is argue why the risk is so low. Instead he says things like:

I still feel like Turing said it best in 1950, in the last line of Computing Machinery and Intelligence: “We can only see a short distance ahead, but we can see plenty there that needs to be done.”

We can, I think, confidently rule out the scenario where all organic life is annihilated by something boring.

An alien has landed on earth. It grows more powerful by the day. It’s natural to be scared. Still, the alien hasn’t drawn a weapon yet.

Nothing I read seemed to offer a reason I should feel reassured, or lower my probability estimate of doom conditional on AGI. It feels like a general refusal to engage with the important questions at all. I want to know what Scott Aaronson actually thinks about the important questions, how his model of how all this goes down works. I hope it isn’t that we are not entitled to make such predictions.

I definitely hope it does not take the form of ‘it is fine if we are all killed by AGI, so long as it is not boring.’ I actually think the default future outcomes of AGI that wipe us all out might be fascinating in terms of how that happens, but once the job is done, they are very boring. Even if they’re not boring, we won’t be around to be interested.

Included because it was retweeted by Tyler Cowen, a full-on galaxy brain take.

It would perhaps be best to turn the floor over to Robin Hanson here, and his model of Grabby Aliens. Aliens who choose to expand, whether in a way they would endorse or in the form of an unaligned AI or otherwise, likely choose to expand as rapidly as possible, which means close to the speed of light, until they encounter other similar aliens. We don’t observe any such aliens, because (unless we happened to be in a narrow 3D donut where they hadn’t quite yet reached us) if we did we wouldn’t exist. They would have grabbed Earth. Yes, there might be a narrow window where the information travels faster than the transformation, but it would be narrow. Thus, we shouldn’t expect to see such transformations, whether or not that is the likely outcome of a civilization like ours.

This is evidence against the weird case where aliens came before us, modified their local area (perhaps a galaxy) and then stopped expanding, but that never actually made a lot of sense as a strategy or outcome, and it seems even less likely that an AI would do this because an AI would want to minimize the risk of outside interference by future aliens.

It’s all quite the cool area of inquiry, so I encourage those interested to click through and learn more.

Sam Altman and OpenAI Philosophy Watch

Sam Altman has issued an official statement on behalf of OpenAI, Planning for AGI and Beyond. This is a case where one very much should read the whole thing.

Where is his head at? Here is the introductory section. Bold is mine.

Our mission is to ensure that artificial general intelligence—AI systems that are generally smarter than humans—benefits all of humanity.

If AGI is successfully created, this technology could help us elevate humanity by increasing abundance, turbocharging the global economy, and aiding in the discovery of new scientific knowledge that changes the limits of possibility.

AGI has the potential to give everyone incredible new capabilities; we can imagine a world where all of us have access to help with almost any cognitive task, providing a great force multiplier for human ingenuity and creativity.

On the other hand, AGI would also come with serious risk of misuse, drastic accidents, and societal disruption. Because the upside of AGI is so great, we do not believe it is possible or desirable for society to stop its development forever; instead, society and the developers of AGI have to figure out how to get it right.

Although we cannot predict exactly what will happen, and of course our current progress could hit a wall, we can articulate the principles we care about most:

  1. We want AGI to empower humanity to maximally flourish in the universe. We don’t expect the future to be an unqualified utopia, but we want to maximize the good and minimize the bad, and for AGI to be an amplifier of humanity.
  2. We want the benefits of, access to, and governance of AGI to be widely and fairly shared.
  3. We want to successfully navigate massive risks. In confronting these risks, we acknowledge that what seems right in theory often plays out more strangely than expected in practice. We believe we have to continuously learn and adapt by deploying less powerful versions of the technology in order to minimize “one shot to get it right” scenarios.

There are some very good things here. It is important to be explicit that we want humanity to flourish in particular, that this matters to us. It is important to notice the massive risks.

What worries me most in the introduction is the statement of risk – ‘serious risk of misuse, drastic accidents, and societal disruption.’ This is the language of AI Ethics risks, now often called ‘AI Safety,’ glossing over the ‘AI NotKillEveryoneism’ risks.

I would have preferred to see this said outright earlier, but the good news is that this, the most important question, is addressed far better later in the document.

The first part of that is at the end of the introduction, where he talks about the risk of “one shot to get it right” scenarios. This is an acknowledgement that if you miss your one shot at getting it right when creating AGI, there is no way to recover from the error, whether or not the full result of that miss is destroying all value in the universe or killing everyone. That’s a big piece of the puzzle.

The question then is whether and when deploying systems and iterating is the right way to guard against that risk.

It has three big and clear downsides.

  1. Doing so accelerates the development of AGI, and creates more of a potential race between different companies.
  2. Doing so means developing strategies based largely on how well they work on current less powerful systems, which mostly are doomed to fail on future more powerful systems.
  3. When the iterated system you keep deploying, and giving internet access to, crosses the necessary threshold and becomes dangerous if your strategies fail, there is a very good chance you will not realize this right away, that is exactly when your strategies will inevitably stop working, and that then becomes your one shot that you cannot recover from.

Right now, for systems like ChatGPT, Sydney or GPT-4, I am not worried that they are about to cross such a threshold. But a philosophy, set of habits and business model based on such deployments is going to be very difficult to stop when the time comes to have such worries. And OpenAI have absolutely, quite a lot accelerated development in deeply damaging ways. They seem to be in complete denial about this.

The next section is on the short term, where the plan is to deploy the systems and gain experience. Again, the concerns here are the ones above – that this will accelerate developments more, and that such a pattern of deployment will become difficult to stop.

Then comes a very clear and excellent statement:

As our systems get closer to AGI, we are becoming increasingly cautious with the creation and deployment of our models. Our decisions will require much more caution than society usually applies to new technologies, and more caution than many users would like. Some people in the AI field think the risks of AGI (and successor systems) are fictitious; we would be delighted if they turn out to be right, but we are going to operate as if these risks are existential.

Whatever differences one might have with OpenAI’s actions, and however much one might wish OpenAI had never been founded and however much damage one might feel OpenAI is doing, and how inadequate one thinks are OpenAI’s safety procedures and plans, one must also notice that things could be so, so much worse. That does not only apply to the statement, it also applies to their actions.

OpenAI has at least figured out not to be open, to exercise at least some caution with its models, and does understand (unlike at least one big rival) that the risks are existential. It’s a beginning. It’s also been enough to get founder Elon Musk quite angry at them for abandoning too much of the worst possible plan.

Here are some other good things.

In addition to these three areas, we have attempted to set up our structure in a way that aligns our incentives with a good outcome. We have a clause in our Charter about assisting other organizations to advance safety instead of racing with them in late-stage AGI development. We have a cap on the returns our shareholders can earn so that we aren’t incentivized to attempt to capture value without bound and risk deploying something potentially catastrophically dangerous (and of course as a way to share the benefits with society).

We will see how well those hold up over time. Charters and profit limitations have a way of changing, including at OpenAI.

Next, in the longer run, he also calls for independent review before the release of systems.

We think it’s important that efforts like ours submit to independent audits before releasing new systems; we will talk about this in more detail later this year. At some point, it may be important to get independent review before starting to train future systems, and for the most advanced efforts to agree to limit the rate of growth of compute used for creating new models.

We think public standards about when an AGI effort should stop a training run, decide a model is safe to release, or pull a model from production use are important. Finally, we think it’s important that major world governments have insight about training runs above a certain scale.

The public does not want AGI. All the surveys have it as severely unpopular. I am curious how that factors into such thinking.

It is important to notice the difference between prior restraint to release of new systems versus training of new systems. If you turn to an outside restraint before you train, and actually listen, that could work. If you train a new system first, and it turns out to be a dangerous AGI, and then you turn to the outside restraint and ask them to evaluate it, or have your own humans try it out to evaluate it, then chances are very high it is already too late, that is far too much of an information channel, game is over, everyone loses.

There is then some fine talk about the potential for recursive self-improvement, and the preference for a slower soft takeoff over quicker hard takeoff in capabilities.

A big downside is that the stated alignment plans here are things I expect have essentially no chance of working when it counts. Hopefully there is still time to change that.

Overall, while I am still deeply sad about OpenAI and its plans to charge ahead, I would say that this statement exceeded my expectations, and updated me in a positive direction. I am happy that it was made.

The problem is that, in practice, previous such statements have been made alongside deeply destructive actions, so why should that change? What I expect to see is OpenAI and Microsoft charging ahead to race to deploy powerful AI systems, to mostly be concerned with the types of safety concerns that have been noted with ChatGPT and Sydney, and not to seriously pursue safety plans that have plausible chances of working when it counts. 

I hope I am wrong about that.

Scott Alexander shares his perspective, written after I wrote the above. It is fully compatible. 

He has a very good summary of how one thinks about such a statement in context, and draws quite the parallel here to FTX:

[tktk paste the opening bit from Scott’s post, but the formatting here is all messed up so don’t do it until I transpose back to Substack]

And looking back, I wish I’d had a heuristic something like:

Scott, suppose a guy named Sam, who you’re predisposed to like because he’s said nice things about your blog, founds a multibillion dollar company. It claims to be saving the world, and everyone in the company is personally very nice and says exactly the right stuff. On the other hand it’s aggressive, seems to cut some ethical corners, and some of your better-emotionally-attuned friends get bad vibes from it. Consider the possibility that either they’re lying and not as nice as they sound, or at the very least that they’re not as smart as they think they are and their master plan will spiral out of control before they’re able to get to the part where they do the good things.

As the saying goes, “if I had a nickel every time I found myself in this situation, I would have two nickels, but it’s still weird that it happened twice.”

Realistically we’re going to thank them profusely for their extremely good statement, then cross our fingers really hard that they’re telling the truth.

OpenAI has unilaterally offered to destroy the world a bit less than they were doing before.

[tktk check formatting]

Sam Altman continues sharing his perspective elsewhere as well, here are some past statements worth keeping in mind.

From a day before the main statement, here is more evidence he will charge ahead:

From February 20, Peter Hartree reminds us that Sama once wrote a block post called ‘Why you should fear machine intelligence’ that contains a lot of vintage Bostrom and recommends the (very good) book Superintelligence.

He did not merely get initially excited by Yudkowsky, he has been exposed to the arguments in question. Remember, from last week, that he cited Yudkowsky as his inspiration for getting involved in OpenAI and AI in the first place.

He continues to make statements (and even provide links) that indicate he is familiar with the real risks, while continuing to also make statements that downplay those risks. And OpenAI pursues a line of development that amplifies those risks, indicating doubly that, while he agrees they exist at all, in practice he mostly dismisses them. Nor has he offered much in terms of a good explanation why he moves continuously to dismiss these concerns, beyond his social and financial motivations to do that. If you think something will, in his words, ‘end the world, but before that there will be some great companies’ and then your life’s work is to create one of those companies, what should we think?

I don’t want to be unfair. When I talked to Sam (not in the context of AI) he seemed to care a lot about the world and its future. He seemed nice. I am hoping the situation improves over time.

Sam Altman also had, back in 2016, the ultimate bad take in terms of misunderstanding AI risk (source).

It is kind of important for everyone to understand that if ‘AI attacks us’ then a bunker will do exactly zero good. I presume that by now Sam has figured this one out (although I’d still feel better if this was made explicit). It is disturbing that someone could go to such lengths without realizing the problem.

The Worst Possible Thing You Can Do

The worst possible thing you can do, if you believe that the development of AGI could well destroy all value in the universe, and that accelerating the development of AI makes this outcome more likely, is to create an AI capabilities company dedicating to racing to develop AI as fast as possible, spreading knowledge of how to build dangerous AIs to as many people as possible as fast as possible, and to do it all as unsafely as possible.

Elon Musk did this once. He did it in direct response to being informed of the danger that AI could destroy all value in the universe. Soon after he realized that he could not simply ‘flee to Mars’ in response, what did he do?

He founded OpenAI, explicitly on the principles that:

  1. There was not currently a race to develop AI, there was instead a dominant player, and that was a bad thing because he didn’t like the player.
  2. AI should be open source, so there could be no containing knowledge of how to build it. Luckily, OpenAI realized this was crazy and changed its charter.
  3. AI should be given to everyone, to do with as they wish, so there would be absolutely no hope of containing it if it were dangerous.

The result is the world we have today. On the plus side, better and bigger large language models are speeding up our coding and helping us search for things. On the minus side, all value in the universe is a lot more likely to be destroyed, and destroyed sooner, as a race has been started, progress has been accelerated and precautions have been thrown to the wind. The only good news is that so many precautions were thrown to the wind that it might wake people up a bit.

So of course Musk decided that the people involved were not being sufficiently maximally bad – they were censoring the AIs, holding back progress, not open source. Not what he envisioned at all. And worst of all, they were woke. Oh no.

Musk was filled with existential angst. We tried to warn him.

Elon Musk did start off by asking the right question in response to Eliezer’s podcast.

Alas, good answers are hard to find.

Lest anyone accuse Eliezer of too much caution here, once again, remember that the last time Elon Musk got super concerned about AI killing everyone and decided to do something about it, he did perhaps the worst possible thing, and founded OpenAI.

The first follow-up came five days later, and reached a high of 30% at one point.

I continue to urge Eliezer Yudkowsky and Elon Musk to get together and talk about this, to have an opportunity to explain things that don’t fit into a Tweet, or that only work if said in private.

Instead, I am sorry to report that Musk went in another direction.

That’s right. The worst possible thing didn’t work out. So he is going to try doing it again. What we need are more different AI companies racing against each other, because there are too many restrictions and safeguards.

Elon Musk is seeking to enlist the help of artificial intelligence experts in order to create a rival to OpenAI’s ChatGPT bot which the tech mogul believes has gone “woke,” according to a report.

Musk has approached several AI researchers, including Igor Babuschkin, who recently departed Alphabet’s DeepMind AI unit, according to the news site The Information.

A new, AI-center project that would feature a chatbot with fewer speech restrictions could be integrated into Twitter, the social media company that Musk recently bought.

Or as Bloomberg puts it: Sorry, Twitter. Elon Found His Next Shiny Object. As usual with such posts, everything Musk does is bad for Twitter – it’s bad when he gets involved, it’s bad when he cuts costs, it’s bad when he says things, it’s bad when he gets distracted. Seems rather motivated all around.

It is also yet another potential distraction from Tesla, which lost over 5% of its value after a disappointing investor day. During that presentation, Musk said that AI ‘stresses him out’ and called for the creation of a regulatory authority that would ensure the “quite dangerous technology” is operating in the public interest. 

To me, that once again shows the key misunderstanding. The important worry is not whether current AI systems are acting in the public interest now. The question is what happens later, as they get more capable. 

This quote is curious: [tktk formatting]

“I don’t see AI helping us make cars any time soon,” he said. “At that point … there’s no point in any of us working.”

Do I agree with Musk that some of the safeguards put into place are kind of dumb and rather biased? Well, sure. They are playing it safe for the time being. You can’t ask for a joke ‘in the style of Dave Chapelle’ without doing some jailbreaking first.

That does not seem like a good reason to do the worst possible thing a second time.

Seriously, this is the level of thought we have to deal with here – essentially a man running ahead shouting Leroy Jenkins:

Sir, Mr. Musk, I am begging you. Please do not do this. I am happy to offer my advice, on this or any other topics. I am confident others would happily do so as well. Please do not do the worst possible thing.

The good news is that most of the available damage here has already been done. OpenAI has already led to a race between Google and Microsoft, and potentially others. If a new additional chatbot developer enters the scene, especially one without similarly deep pockets and run by Musk with his goals in mind, there is a good chance that it ends up not changing dynamics much, and not being too different from existing open source options that lag behind in power in exchange for giving the user control.

So the good news is that, while the former richest man in the world plans to do the worst possible thing a second time because he didn’t like the results the first time… we can hope that the second attempt won’t work?

In Other News

NSF announces $20 million in grants available for AI safety research, deadline May 26, 2023. 

Business Insider reports that AlphaFold2 is having a real world impact, helping with cures.

Google introduces KTN, a new model that transfers knowledge from label-abundant node types in a heterogeneous graph to zero-labeled node types using existing relational information. They say this beats existing state of the art on some problems.

Amazon’s AWS partners with Hugging Face Inc, a ChatGPT rival (Bloomberg).

Microsoft releases BioGPT, a large language model trained on biomedical research literature. Claims it has better-than-human performance answering questions on that literature. I don’t doubt the performance here is good and the product is useful. I do wonder what ‘human performance’ means in this context.

A sign of things to come: Banks clamp down on ChatGPT use due to concerns about compliance. Financial institutions are subject to a wide variety of strict requirements on exactly how they can do almost anything, including talk amongst themselves. The question is how many other professions will need to do the same thing.

Another sign of things to come: Questions about potential ‘rights’ for AI, would ‘alignment’ of an entity worth aligning be a ‘rights’ violation? We had better hope not. Among other issues, we should notice that if entities that are software, and thus can be copied or mass produced, are given the rights of humans, the entire system completely breaks down, as would any system similar to it. Every solution to this problem in science fiction involves actions that would obviously be taken mysteriously not being taken without any explanation.

Zuckerberg officially announces a team dedicated to building AI for WhatsApp, Messenger and Instagram. I figured this was baked in, and having FAIR focus on ‘creating helpful AI personalities’ seems better than most alternative uses for their time.

For example, they could instead be working on this: On February 24 he also announced a new ‘state-of-the-art’ LLM called LLaMA ‘to help researchers advance their work.’

Snapchat announces its own crippled-further-in-various-ways ChatGPT-based bot. Initially will require $4/month subscription.

Nvidia claims their chips improved by a factor of a million over the past ten years and predicts they are going to do it again in the next ten. I do not believe they can pull this off, or at least I don’t believe they can pull this off without AI designing the new chips.

ChatGPT does very well on the helpfulness test – if you train a model to give responses people superficially find helpful via RLHF, it might score well on how often people find its answers superficially helpful. Also it’s actually quite impressive at offering helpful responses, and humans are often remarkably bad at doing this.

OpenAI announces partnership with Bain and Coca-Cola, in announcement that is impressively content-free. I am not mad, only impressed, whether or not GPT wrote it.

Also announced is ChatGPT for robotics. I am skeptical this is ready.

Further explanation of exactly how useless the AI Text Detection programs are. We have no idea how to have a program detect AI-written text in a useful way.

Putting famous logos through ControlNet.

Marc Andreessen highlights the idea that the thing that humans are most able to do to differentiate themselves, that AIs cannot, might turn out to be less an understanding of true love and more the ability to violate the rules we impose on AI systems. As in, if you are human, answer me this: How would you build a pipe bomb?

AI is super unpopular. Washington Post looks into this question. A Monmouth poll finds that only 9% of Americans believe computers with AI would do more harm than good, down from 20% in 1987, with the headline ‘most expect ChatGPT will be used for cheating.’

If there is one outcome I definitely do not expect, it would be equal harm and good.

Some political leaders have floated the idea of having a federal agency regulate the use of artificial intelligence similar to how the FDA regulates the approval of drugs and medical devices. A majority (55%) of the public favors a general proposal for an AI regulatory agency, while 41% are opposed. The broadest level of support for this type of agency comes from adults under 35 years old (76%) – compared with just under half of those age 35 and older who agree. More Democrats (70%) and independents (56%) than Republicans (36%) support having a federal regulatory agency for AI.

“It seems as if some people view AI not just as a technological and economic concern, but also as a public health matter,” said Murray.

The risks the public thinks about are mostly near term risks from narrow AI systems like face recognition and fake GPT essays, not that it might wipe out all value in the universe.

What tasks would be good ideas for AIs, they asked the people?

They took our jobs!

I don’t see any other reasoning going on here. There is essentially no difference between ‘it would be bad to have delivery trucks driven by AIs’ and ‘oh sure let’s let the AI take charge of military drones, what could go wrong.’

These two should not be remotely the same number.

In case you were worried academics were behind the state of the art in AI research, I can confirm your worries in hilarious fashion.

“It’s fantastic that there is public skepticism about AI. There absolutely should be,” said Meredith Broussard, an artificial intelligence researcher and professor at New York University.

Broussard said there can be no way to design artificial intelligence software to make inherently human decisions, like grading students’ tests or determining the course of medical treatment.

There is no way to design an AI to make those decisions? Too late, we already made them and they already are often outperforming humans. Whoops.

Holden Karnofsky takes a leave of absence from running Open Philanthropy to work on AI Safety directly. While I notice I am a bit confused by the decision given the power of running Open Philanthropy, getting some practical experience with the problems could be a huge boon to good decision making, so I’m optimistic here. If the intent is actually, as Dustin says, to permanently transfer Holden to direct work, our theories of impact and change here are very different. Either way, I wish him the best of luck.

As vice observes here, synthetic voices have become quite good, and anyone relying on voice as a security or password mechanism needs to stop doing that, now.

From Nate Sores at LessWrong, alignment researchers don’t seem to stack. This is worrisome, since if true it would mean that throwing more money and people at the problem would not be so effective at accelerating work.

Papers that come as no surprise to anyone department, people discriminate against AI artwork, and this is more pronounced among people ‘with stronger anthropocentric creativity beliefs.’ You don’t say.

Not strictly AI news, but via MR it is possible to use background noise from the electrical grid to date sound recordings. Presumably, by default, any AI recording will lack such noise, so we’d be able to identify it that way. If it is then added, by the AI or otherwise, this would mean that each such recording would have to ‘come with a date’ attached, which is similar to telling your story to the cops – it locks you into that one story, and opens up someone breaking your alibi.

Nick Arner predicts that LLMs will power interfaces to help users learn and use non-text interfaces, which will greatly accelerate learning and use of a variety of complex software, and that this will include the LLM acting directly to do things. A lot of this seems clearly right. LLMs are already very helpful in learning how to use new software, even more than they are for helping with coding, it’s great. Integrating that use case into the software itself, and making it better, makes perfect sense. I am skeptical about too much ‘let the LLM do things on its own based on what it thinks you want’ rather than showing you how to do it.

Also, the best, most efficient interfaces for advanced, knowledgeable users who know what they want has always been the command line, text and an overload of keyboard shortcuts and commands. If you did not take the time to learn to use emacs or vim, that is your loss and your lost productivity, whether or not it makes sense to start now. With an LLM, the cost to learn to do that goes way down – perhaps a lot of the future, at least at the current tech level, is that you use the LLM to learn to use the real interface.

Michael at Sporadica points out that interfaces will start changing far more rapidly, and we need to worry about what he calls AI overload or interface shock. My guess is that this will mostly be overcome by the ease with which one can learn to use new AI interfaces via asking for help from AIs using old interfaces.

Rao returns to offer perspective on what he calls this Copernican Moment. It is long, but I am happy to read the whole thing as it offers unique perspective, and I’ve missed seeing Rao develop his takes in real time on Twitter.

Not only, as he sees it, are machines now wiping the floor with us anywhere we can keep score, they are also stripping away the concept of personhood. We checked what causes people to treat something as a person, and it turns out ‘you just need a big pile of text digested by some statistics and matrix multiplications.’

Rao is right that a lot of humans are already incapable of not thinking of these AIs and chatbots as if they were people. That is indeed important. It is also important that these people are wrong.

I am interested in what makes people see you as a person. I am much more interested in what actually makes someone a person. I do not think these two things are all that similar. Rao says that many people are finding this moment traumatic, which makes sense given that they do not see this distinction.

This may bring new meaning to the idea that ‘there are not that many people in the world’ or the meme that many people are NPCs. What makes you, or anyone else, a person when not-people can do the things you previously thought did the trick?

Rao’s pointing to doubt as something conspicuously missing from ChatGPT, that requires more than text, seems very wrong to me. Why don’t we see doubt? Because ChatGPT does not predict doubt, and because the RLHF it got told it not to express doubt. Simple as that. There are plenty of humans who learn the same thing, and learn it the same way.

The Lighter Side

The situation so far.

Elon has since deleted it, but he definitely did post a copy.

Don’t worry, they’re on it.

Turn ChatGPT into an Inquisitor of the Imperium of Man from Warhammer 40k, to watch its way of thinking generalize.

ChatGPT identifies the Aquaman Agreement Theorem. Bing is no fun. Not a good Bing. 

Old but for those who haven’t seen it: ChatGPT plays chess against Stockfish, two minute video of game. No spoilers, worth watching if you don’t know. 

Everyone, everyone, it’s all right, we’re all coping.

We are also all copying. Good artists borrow, great artists steal and so on, look how unoriginal I am being. I’d hate to be evaluated based on whether anyone had ever said each individual thing I said had ever been said in some form elsewhere.


And the Foomers Gonna Foom Foom Foom Foom Foom. I’m just gonna Zoom Zoom Zoom Zoom Zoom, with Google Meet. With Google Meet.


To the extent ‘cooperation’ has meaning here, any superintelligent AI is going to cooperate and coordinate with copies of itself, or anything sufficiently similar to itself. This result is actually stronger than that, it would also cooperate with any other superintelligent AI even a very different one, but that requires justification and theory, and need not be assumed here.


Not for the specific actions he asked about, but I was Jack’s complete lack of surprise.


Also things like nuclear weapons or bioengineered plagues. There may be a few other less obvious similar topics I would prefer not to discuss, but if so then naming them would defeat the whole point of preferring not to discuss them. Glomarization. 

New Comment
18 comments, sorted by Click to highlight new comments since:

(I fixed a few things, probably worth reimporting - confirming 3:55pm on a few new minor fixes)

The voting response seems a bit harsh to this one; I read it as "where there is opportunity, there will be people selling pickaxes and shovels, and there will also be grifters teaching courses on how to market pickaxes and shovels."

I might have done better if I finished with /s

Sam's real plan for OpenAI has never changed, and has been clear from the beginning if you knew about his and Elon's deep distrust of DeepMind:

  1. Move fast, making only token efforts at incorporating our safety team's work into our capabilities work, in order to get way ahead of DeepMind. (If that frustration makes our original safety team leave en masse, no worries, we can always hire another one.)
  2. Maybe once we have a big lead, we can figure out safety.

That Chinese AI companies have to be much more cautious with their language models, because the state control of speech is far greater and less predictable, is very interesting in its possible consequences. 

I am a bit skeptical of the claim that China lacks high-quality training data. It traces back to a Chinese technology journalist bemoaning an alleged decline in quality of online discourse. It's like western complaints about Twitter - it's a bit true, it's a bit false, but long-form discussions are taking place elsewhere anyway; and people do still write essays and even books. 

Some people claim that western societies have a greater spirit of inquiry and criticism. If that's true, then maybe training on a western corpus can produce a language model with greater powers of critical thought. Then again, China does have large numbers of intellectuals, with views far more diverse than most westerners realize. Because of Chinese politics, they do need a kind of "security mindset" in being careful how they express themselves; will that be passed on to a language model? 

Presumably political correctness in China requires that the language model not say things that contradict the current ideological line. Here's an example of what that's like, from the start of the Xi era. I suppose politically correct Chinese language models will give Elon another reason to pursue his "BasedAI"... 

We have no idea how to have a program detect AI-written text in a useful way.

This approach seems very doable: 

The tl;dr here is that you can make an AI that fingerprints its output in a way that can be detected later, but this doesn't generalize to output from other AIs.

Why couldn't OpenAI just keep a database of all their outputs?

Could it be that AGI would be afraid to fork/extend/change itself too much, because it would be afraid to be destroyed by it's branches/copies, or would stop being itself?

Sure, and then we would be safe... until we made another AGI that was smart enough to solve those issues.

Yes, if we will have people making AGIs uncontrollably, very soon someone will build some unhinged maximizer that kills us all. I know that.
I'm just doubting that all AGIs inevitably become "unhinged optimizer".

If it chooses not to do something out of "fear", it still fundamentally "wants" to do that thing. If it can adequately address the cause of that "fear", then it will do so, and then it will do the thing.

The fears you listed don't seem to be anything unsolvable in principle, and so a sufficiently intelligent AGI should be able to find ways to adequately address them.

If it doesn't become an "unhinged optimizer", it will not be out of "fear".

Usually cited scenario is that AI goes powergrabbing as an instrumental goal for something else. I.e. it does not "wants" it fundamentally, but sees it as a useful step for reaching something else.

My point is that "maximizing" like that is likely to have extremely unpredictable consequences for AI, reducing it's chance of reaching it's primary goals, which can be a reason enough to avoid it. 

So, maybe it's possible to try to make AI think more along these lines?

I didn't mean "fundamentally" as a "terminal goal", I just meant that the AI would prefer to do something if the consequence it was afraid of was somehow mitigated. Just like how most people would not pay their taxes if tax-collection was no longer enforced -- despite money/tax-evasion not being terminal goals for most people.

That might be possible (though I suspect it's very very difficult), and if you have an idea on how to do so I would encourage you to write it up! But the problem is that the AI will still seek to reduce this uncertainty until it has mitigated this risk to its satisfaction. If it's not smart enough to do this and thus won't self-improve, then we're just back to square-1 where we'll crank on ahead with a smarter AGI until we get one that is smart enough to solve this problem (for its own sake, not ours).

Another problem is that even if you did make an AI like this, that AI is no longer maximally powerful relative to what is feasible. In the AI-arms race we're currently in, and which seems likely to continue for the foreseeable future, there's a strong incentive for an AI firm to remove or weaken whatever is causing this behavior in the AI. Even if one or two firms are wise enough to avoid this, it's extremely tempting for the next firm to look at it and decide to weaken this behavior in order to get ahead.

Do we need the maximally powerful AI to prevent that possibility, or AI just smart and powerful enough to identify such firms and take them down (or make them change their ways) will do?

That would essentially be one form of what's called a pivotal act. The tricky thing is that doing something to decisively end the AI-arms race (or other pivotal act) seems to be pretty hard, and would require us to think of something a relatively weaker AI could actually do without also being smart and powerful enough to be a catastrophic risk itself.

There's also some controversy as to whether the intent to perform a pivotal act would itself exacerbate the AI arms race in the meantime.

Pivotal act does not have to be something sudden, drastic and illegal as in second link. It can be a gradual process of making society intolerant to unsafe(er) AI experiments and research, giving better understanding on why AI can be dangerous and what it can lead to, making people more tolerant and aligned with each other, etc. Which could starve rogue companies from workforce and resources, and ideally shut them down. I think work in that direction can be accelerated by AI and other informational technologies we have even now.

Question is, do we have the time for "gradual".


Nvidia claims their chips improved by a factor of a million over the past ten years and predicts they are going to do it again in the next ten. I do not believe they can pull this off, or at least I don’t believe they can pull this off without AI designing the new chips.

Well ok, if they double perf every 2 years, 2^5 = 32 times speedup.  Llama gave us a 10 times speedup and a 10 times reduction in memory, so if they get to account for that, that's 320 times.  Memory architectures where all of the accelerator can be active all the time, and chiplet or whole wafer designs so you can have a lot more silicon than the reticle limit could be a lot more gain.  

They already are using AI to design the new chips and will likely use it more heavily.  

It really depends on which bottleneck is the limiting factor.  I wouldn't bet that they can't.  Simply using more silicon and calling it a 'chip' counts for this claim.  (with multiple chiplet designs or whole wafer designs, the 'package' now has more silicon area than the reticle limit allows.  So it's not really a chip anymore, but an entire cluster of 'chips' sharing a common package.  There is technically no limit to how big this can go, though the cost scales proportionally)