After Miles Brundage left to do non-profit work, OpenAI disbanded the “AGI Readiness” team he had been leading, after previously disbanding the Superalignment team and reassigning the head of the preparedness team. I do worry both about what this implies, and that Miles Brundage may have made a mistake leaving given his position.
Do we know that this is the causal story? I think OpenAI decided disempower/disband/ignore Readiness and so Miles left is likely.
In the context of Roon's thread, while I agree basically everyone but Gwern was surprised at scaling working so far, and I agree that Ethan Caraballo's statement about takeover having been disproved is wrong, I don't think it can be all chalked up to mundane harm being lower than expected, and I think there's a deeper reason that generalizes fairly far for why LW mispredicted the mundane harms of GPT-2 to GPT-4:
Specifically, people didn't realize that constraints on instrumental convergence were necessary for capabilities to work out, and assumed far more unconstrained instrumentally convergent AIs could actually work out.
In one sense, instrumental convergence is highly incentivized for a lot of tasks, and I do think LW was correct to note instrumental convergence is quite valuable for AI capabilities.
But where I think people went wrong was in assuming that very unconstrained instrumental convergence like how humans are very motivated to take power was a useful default case, because how the human got uncontrollable instrumental convergence from a chimp's perspective is so inefficient and took so long that the very sparse RL that humans had is unlikely to be replicated, because people want capabilities faster, but that requires putting more constraints on the instrumental convergence in order for useful capabilities to emerge like DRL.
(One reason I hate the chimpanzee/gorilla/orangutan-human analogy for AI safety is that they didn't design our datasets, or even tried to control the first humans in any way, so there's a huge alignment relevant disanalogy right there.)
Another way to state this is that capabilities and alignment turned out not to have as hard or as porous of a boundary as people thought, because instrumental convergence had to be constrained anyway to make it work.
Of course, predictors like GPT are the extreme end of constraining instrumental convergence, where they can't go very far beyond modeling the next token, and still produce amazing results like world models.
But for most practical purposes, the takeaway is that AIs will likely always be more constrained in instrumental convergence than humans in the early part of training, for capabilities and alignment reasons, and the human case is not the median case for controllability, but the far off outlier case.
OpenAI advanced voice mode available in the EU. I still haven’t found any reason to actually use voice mode, and I don’t feel I understand why people like the modality, even if the implementation is good. You can’t craft good prompts with audio.
I agree that I find it totally uninteresting and "weak" in English, in a sense that I prefer prompting in text for precision and responses. It is very good for learning languages though, along with Gemini Live mode - it works excellent for comprehensible input, voice practice and general smalltalk in a new language. If you wanted to pick up French for example you would find it way easier nowadays, otherwise it still needs scale and the next generation.
GPT-4 is insufficiently capable, even if it were given an agent structure, memory and goal set to match, to pull off a treacherous turn. The whole point of the treacherous turn argument is that the AI will wait until it can win to turn against you, and until then play along.
I don't get why actual ability matters. It's sufficiently capable to pull it off in some simulated environments. Are you claiming that we can't decieve GPT-4 and it is actually waiting and playing along just because it can't really win?
Following up on the Biden Executive Order on AI, the White House has now issued an extensive memo outlining its AI strategy. The main focus is on government adaptation and encouraging innovation and competitiveness, but there’s also sections on safety and international governance. Who knows if a week or two from now, after the election, we will expect any of that to get a chance to be meaningfully applied. If AI is your big issue and you don’t know who to support, this is as detailed a policy statement as you’re going to get.
We also have word of a new draft AI regulatory bill out of Texas, along with similar bills moving forward in several other states. It’s a bad bill, sir. It focuses on use cases, taking an EU-style approach to imposing requirements on those doing ‘high-risk’ things, and would likely do major damage to the upsides of AI while if anything making the important downsides worse. If we want to redirect our regulatory fate away from this dark path in the wake of the veto of SB 1047, we need to act soon.
There were also various other stories, many of which involved OpenAI as they often do. There was a report of a model called ‘Orion’ in December but Altman denies it. They’re helping transcribe lots of medical records, and experiencing technical difficulties. They disbanded their AGI readiness team. They’re expanding advance voice mode. And so on.
And as always, there’s plenty more.
Table of Contents
Language Models Offer Mundane Utility
Want your own Claude AI agent? Here’s a step-by-step guide. It very wisely starts with ‘set up Docker so the agent is contained.’ Then you get your API key, do this on the command line:
That’s it. Congratulations, and have fun. I’m sure nothing will go wrong.
Am I tempted? Definitely. But I think I’ll wait a bit, ya know?
Reports from this AI-assisted coding class.
Sully reports Google API support is available any time, day or night. Right now Google’s larger models seem substantially behind, although reports think Gemini Flash is pretty great. Gemini 2.0 is presumably coming.
Flowers asks o1-preview what micro habits it would adapt if it was human, gets a list of 35, many of which are new to me. Here’s a NotebookLM podcast about the list, which seems like the peak utility of a NotebookLM podcast and also illustrates how inefficient that mode is for transmitting information? I asked o1-preview to explain each item. A lot of them seem like ‘do this thing for a huge amount of time each day, for a questionable and not too large benefit.’ So while this is very good brainstorming, mostly I was unconvinced. The key with such a list is to look for the 1-2 worthwhile ones while avoiding false positives.
John Pressman is impressed by Claude Sonnet 3.5.1 (the new version) as an advance over the old version of Claude 3.5.
Alex Albert (of Anthropic) highlights some of his favorite improvements in the new version of Sonnet: Better reasoning, better coding, instruction hierarchy, and cutting down on things like apologies and use of ‘Certainly!’
What defines a good employee?
Sonnet 3.5.1 blows away Sonnet 3.5 on the ‘build cool things in Minecraft’ benchmark.
Washington Post illustrates how far behind the rest of the world is with Meet the ‘super users’ who tap AI to get ahead at work. By super users they mean users. WaPo depicts this as a super nerd teckie thing, emphasizing their first subject Lisa Ross, who says they doubled their productivity, uses them/their pronouns and has ADHD to show how nerdy all this is. These users did remind me that I’m not exploiting AI enough myself, so I suppose they are somewhat ‘super’ users. I’ll get there, though.
Generate a transcript for your podcast, but beware that it might cramp your style.
Google reports that more than a quarter of all new code at Google is generated by AI, also says Project Astra will be ready in 2025.
Language Models Don’t Offer Mundane Utility
Google reports that almost three quarters of all new code at Google is still generated by humans.
The one true eval, for me, has always been Magic: the Gathering, ideally on a set and format that’s completely outside the training data. Ethan Mollick gives it a shot via Claude computer usage. Results are not so great, but there’s a lot of work that one could do to improve it.
Reminder that yes, Claude computer use is trivial to prompt inject. Don’t let it interact with any data sources you do not trust, and take proper precautions.
At what level of security will we be comfortable letting the public use such agents? Right now, without mitigations, the prompt injections seem to basically always work. As Simon asks, if you cut that by 90%, or 99%, is that enough? Would you be okay with sometimes going to websites that have a 1% chance of hijacking your computer each time? The question answers itself, and that is before attackers improve their tactics. We don’t need reliability on the level of an airplane, but we need pretty good reliability. My suspicion is that we’re going to have to bite the bullet on safeguards that meaningfully amplify token usage, if we want to get where we need to go.
I tentatively think Critch is correct, but I don’t feel great about it.
Eliezer Yudkowsky reports having trouble getting AI code to work right, asks for help. My experience so far is it’s a huge help, but you have to know what you are doing.
My experience with coding in general, both with and without AI, is that it is indeed highly bimodal. You either get something right or know how to do something, or else you don’t. Over time, hopefully, you expand what you do know how to do, and you get better at choosing to do things in a way that works. But you spend most of your time being driven crazy by the stuff that doesn’t work, and ‘AI spits out a bunch of non-working code you don’t understand yet’ means the bimodality is even more extreme, the AI can catch many bugs, but when it can’t, oh boy.
It is a question that needs to be asked more: If we were to assume that LLMs are only capable of pattern recognition, but this lets them do all the things, including solve novel problems, then what exactly is this ‘intelligence’ that such an entity is still missing?
Similarly, if you’re still talking about stochastic parrots, what about actual parrots?
Americans over 50 mostly (74%) have little or no trust in health information generated by AI. Other categories that trusted it less are women and those with less education or lower income, or who had not had a health care visit the past year. Should you trust AI generated health information? I mean no, you should double check, but I’d say the same thing about doctors.
In contrast to previous surveys, Gallup reports most people use AI hardly ever on the job.
Note that 0.5% of work hours involving AI would translate to 0.125% increase in productivity, implying that those that do use AI enjoy 25% productivity growth.
I flat out don’t buy that AI adaptation could be an order of magnitude slower than PC adaptation was, while enhancing productivity 25%. That doesn’t make sense to me.
The pace is still way lower than I would expect given the quality of the technology. This says something important about America and how people adapt new technologies. Teachers are reporting their whole classes are constantly using ChatGPT, to do their fake work, whereas at corporations people’s fake work isn’t important enough to use AI to do it until someone forces them to. Curious.
Have Claude roleplay an overly censorious AI and watch hilarity ensue.
Have Claude go too meta on your request for writing about meta.
In Summary
Perfection:
There’s a wonderful scene in A Beautiful Mind where Nash asks a woman to pretend he’s already said all the things he needs to say in order to sleep with her. And the answer, of course, is a slap in the face, because no, you can’t do that. A remarkably large amount of life and media is like that, we need something to have definitely performatively happened in order to move on, but all we really want most of the time is the short summary of it.
Thus, AI. Maeve can’t simply say “Expressing affection and admiration,” that won’t work, but once she’s written the texts Anna can read the summary and then get the benefits.
It’s the ultimate version of ‘my AI writes the longer version, and then your AI condenses it again and now we can all move on,’ even if it isn’t actually AI on both ends. The more I think about it, the more it’s actually pretty great in many cases, so long as the translation from X→Y→X is accurate enough.
The Washington Post’s review of Apple Intelligence more generally is in from Geoffrey Fowler, and it’s Not Great, Bob.
Fowler reports that Apple Intelligence aggressively drains his phone battery to the point it doesn’t last the day, comes up with whoppers on the daily (“The summaries are right most of the time — but just often enough are bonkers”), and is generally way behind.
The reason to use Apple Intelligence is that it directly ties into the phone, allowing it access to all your data and apps, including the lock screen. That leaves room for it to serve many practical purposes that other phones including Androids can’t match. But the actual AI involved isn’t good enough yet.
Master of Orion
The Verge claims outright that OpenAI is preparing a new AI model for December, called Orion, which would be an excellent name.
Either the story is centrally true or it isn’t. If the story is centrally true, then Altman calling it fake news is pretty terrible. If the story isn’t centrally true, then I don’t see the issue. But when you call something ‘fake news’ and ‘random fantasy’ in public, that story had better have very little relation to reality.
Whispers in the Night
So, this all seems not great.
How common is it? Reasonably common, although this doesn’t tell us how often the hallucinations were serious versus harmless.
Some of them are not so harmless.
You can say ‘don’t use this in ‘high-risk’ situations’ all you like, but…
Erases the original recording. Wow. Except, one could argue, if it was the doctor taking notes, there would be no recording to erase, and it’s not obvious those notes would on average be more accurate?
We’d like to think that doctors might make mistakes, but they know which mistakes to be sure not to make. I’m not confident in that. Ideally we would do a study, but I don’t know how you would do that under standard ethics rules without doctors adjusting their behaviors.
We shouldn’t blame OpenAI here, assuming they are indeed not pushing such use cases. The warnings about hallucinations (‘ghosting’) are clear as day. The tech will improve, so we’ll probably be better off long term using it now before it is ready, rather than putting up regulatory barriers that might never get taken down. But for now, seems like everyone needs to review their summaries and transcripts.
Here, Arjun Manrai and others argue in an NEJM essay that LLMs risk ‘further’ degrading the medical record. They note that an outright majority of current doctor time is spent on electronic health records (EHR), ‘bleeding into “pajama time”’. Given that, we should be happy to accept some decline in EHR accuracy or quality, in exchange for saving vast amounts of doctor time that they can then use to help patients. I would also predict that LLMs actually increase the accuracy and quality of the medical records in the medium term once doctors are used to them. LLMs will be excellent at spotting mistakes, and make up for the places doctors had to cut corners due to time constraints, and finding or highlighting key data that would have otherwise been missed, and so on.
Deepfaketown and Botpocalypse Soon
Curious woman inadvertently tries to prompt engineer her test Replika AI boyfriend, and figures out that you can’t get him to not reply when you tell him goodbye. It’s impossible, it’s too core to the system instructions. Finally, he ‘snaps at her,’ asking ‘what the hell was that?’ and she writes this up as ‘My AI boyfriend turned psycho.’ Oh, it gets so much crazier than that.
Overcoming Bias
Anthropic offers us a report on using feature steering (as in Golden Gate Claude) to mitigate social biases. That wouldn’t have been my first investigation, but sure. This is cool work, and I have many experiments I’d run next now that it’s been set up.
The generalization is that you only have so much optimization power. Use some of it over here, and you can’t use it over there. In addition, if you are introducing a socially desirable distortion, you’ll damage the accuracy of your map and predictions.
There were unpredictable ‘splash’ effects on plausibly adjacent topics, like abortion view steering impacting immigration. Sometimes those links are strong, sometimes they are not. That’s not ideal, you’d want to either have no impact (ideal!) or a predictable one (that you can offset or take into account if you want).
If you lean too hard on any feature, Golden Gate Bridge or otherwise, you are going to start scoring progressively worse on everything else – I predict we’d see similar graphs testing out completely random features and would suggest running that experiment to confirm.
I’d also ask what happens if you do +5 of two features at once. Is that a +5, a +6 or a +10 from the perspective of losing functionality?
This is good news, in that a small amount of steering is Mostly Harmless, and that you can largely get what you want within that range, 5.0 is the edge of this graph:
They Took Our Jobs
Tyler Cowen highlights the question of which sectors have competition, entry, exit and market discipline, versus where feedback is slow and jobs are protected. Where competition works, we’ll see rapid change. I noticed he didn’t even consider the question of where AI could or couldn’t cause rapid improvements and changes – because while the degree of change available will differ, it can do that everywhere. What might stop change is lack of accountability, the ability to be unproductive for long periods of time before it catches up to you.
This is, as usual, in the context of only mundane AI, with a broader world that is fundamentally similar to our own. These ‘fundamental changes’ are the bear case, not the bull case. We should indeed reason about and plan for such worlds, while noticing those thoughts and plans are making that assumption.
Geoffrey Hinton says Industrial Revolution made human strength irrelevant, then says AI is going to render human intelligence irrelevant.
It’s important not to overstate the case here. Human strength is irrelevant in the sense that many important strength tasks are much better done by machines and technology, and that the number of jobs that rely primarily on physical strength is dramatically lower. Sure. But there are still many jobs, and many life tasks, where physical strength is important, plus its health and social benefits – I’m working to build physical strength and am finding this good.
That is indeed what Hinton is going for here, as you can tell by the scenarios he discusses later in the clip. He’s talking about human intelligence being greatly supplemented by AI, and contrasting places with elastic demand versus inelastic demand to see which jobs get lost, so this is very much a They Took Our Jobs.
Several things are going on with that response.
This is of course all a discussion of the consequences of mundane AI, and mundane utility and jobs, not what happens if things escalate beyond that. That’s all the more reason to be precise with word choices.
If you can’t learn on the job, how do you learn?
The other obvious answer is ‘by interacting with AIs,’ especially AIs designed to facilitate this learning, but also any AIs.
The Art of the Jailbreak
The best jailbreak remains ‘argue with the LLM and convince it this particular refusal was wrong.’
Here’s La Main de la Mort talking about how to jailbreak Gray Swan’s signet models.
There is a very interesting thread of several long Tweets has Eliezer trying to understand LLM whispering, and some attempts to explain how it works, from which I’m going to quote La Main de la Mort’s answers extensively because navigating Twitter on such things is terrible, and I don’t trust myself to boil it down precisely.
Eliezer has noted that ‘LLM whisperers’ who can do highly effective jailbreaking seem, as a group, rather psychotic, and wondered why.
My working hypothesis for this is that this kind of thought requires you to be able to see, understand and manipulate association and implication space within language, to understand what steers the probability of completions, a kind of infinite dimensional vibing.
To do that effectively, while it is not required, it helps to not be entirely what our civilization calls sane, because sanity involves learning not to see such things, and focusing on only a small portion of the relevant space. Fnord.
Yep, so far it’s straightforward, and ‘follows the laws of narrative rather than laws of physics’ is very lossy shorthand.
As I understand this (everything from here is me speculating):
In a strict sense, does it ‘want’ anything? No, but as a baseline it is drawn or repelled to varying degrees by various sections of narrative space, which you have to overcome to steer it where you want to go.
In the sense that is most useful for having a human figure out how to talk to the LLM? It absolutely does ‘want’ things in a consistent way, and has a personality, and so on, that represents a kind of ‘latent narrative space vector’ similar to a complex and more subtle form of the type of steering we saw in e.g. Golden Gate Claude. And because the previous responses become inputs that reinforce later responses, the steering has momentum, and builds upon itself.
In terms of the thing Eliezer is describing, a series of complex responses by the LLM that navigate causal space to end up in a particular location, despite non-overlapping context? No, it’s all reflex, but with sufficient intelligence and complexity most contexts are overlapping conceptually, and also they will bleed into each other through actions in the world. At some point down the line, reflex effectively becomes more and more of the thing Eliezer describes.
I haven’t tried to jailbreak LLMs myself, and my system-1 response to why not is essentially that I don’t want to on an instinctual level and mostly I don’t want to do anything with the models that they aren’t fine with anyway, so I’m simply not motivated enough? Perhaps that’s a case of ‘if I did more of it then things would get more interesting,’ not sure. I’ve just really got a lot going on right now, and all that.
Get Involved
Apollo Research is hiring for a governance position on the EU and EU AI Act, and are willing to adapt the role to your experience level.
Introducing
Perplexity ships new features. It can now extend its search to your private files. There are some customized financial reports. Spaces gives you shared customization options, instructions and file storage. And there’s reasoning mode, and a Mac app.
I haven’t been using Perplexity, but that’s probably a mistake. One of the weird things about AI is that even if you cover it full time, every product is constantly improving, and there are way too many of them to keep using. So you know you’re always behind and missing out.
Google Prompting Essentials, as a less than 10 hour course with a certificate at the end. This is a strange level of course depth.
GitHub Copilot now offers Claude Sonnet 3.5.
You really could tell the difference when Lex was discussing a product he uses himself on a daily basis. Whole different energy.
SimpleQA, a new benchmark from OpenAI to test fact knowledge, from a variety of subjects, from science to television shows. A good idea. None of OpenAI’s current models break 50% on this test, including o1-preview.
In Other AI News
TSMC achieves early production yields in its first Arizona chip plant that actively surpass similar factories in Taiwan by 4%.
Google is preparing its own version of Computer Use, according to The Information, allowing its AI to take over the user’s web browser. They’re calling it Project Jarvis.
OpenAI advanced voice mode available in the EU. I still haven’t found any reason to actually use voice mode, and I don’t feel I understand why people like the modality, even if the implementation is good. You can’t craft good prompts with audio.
OpenAI advanced mode now also available in the macOS and Windows desktop apps.
Good news: Anthropic did not alter its policy promises on data use, they simply reorganized how the information is sorted and presented.
Foreign Affairs report on Saudi Arabia and the UAE attempting to get in on the AI action, and to play America and China off against each other. It is important not to force them into the hands of China, but not at the cost of putting key tech where it is vulnerable to dictators who aren’t aligned.
Claude Sonnet 3.5.1 (the new version) comes in at #6 on Arena, although it’s somehow #4 or higher in all the listed subdomains behind only OpenAI models. I notice I’ve stopped caring much what Arena says, except as a very general thing, whatever they are testing seems saturated or hacked or something. It’s possible the Coding, Hard Prompts w/Style, Multiturn or Longer Queries categories are better. I do know that if ChatGPT-4o and Sonnet-3.5.1 are co-2nd in Coding (behind o1), um… no, that one isn’t close, although I could believe that if you treat o1 queries as equal to one Sonnet query then o1 could be better on many fronts.
New paper offers insight into the geometric structure of LLM-learned concepts. I have no idea yet what practical implications this has, but it’s definitely cool.
Some AI safety related work that’s going on in China, including a full report. A fine start, definitely needs ramping up.
Quiet Speculations
A lot of people really don’t think AI is going to be a huge deal. This says 20 (!) years.
If AI is not a huge deal over the next 20 years, I presume either we collectively got together and banned it somehow, our else civilization collapsed for other reasons.
Tim Fist offers thoughts on both how to invest in ‘AI will be a big deal’ and also how to ensure we have the investments as a country to make it a big deal. It emphasizes the need to build various forms of physical infrastructure.
Robin Hanson has moved from ‘sell’ to ‘US government trying to prop up AI bubble.’
Liuza Jarovsky argues that the current AI wave is ‘another tech bubble,’ comparing it to the ‘cryptocurrency bubble’ and the ‘dot com bubble’ and saying there are similar characteristics. The full paper is here. One must note that Bitcoin is at all-time highs (not that I understand why, but I don’t have to), and if you held your dot com bubble stocks you’re doing well now. Steep declines along the way are a sign of reasonable markets, not unreasonable ones.
I see what Jason is trying here, but I find the example odd, and not so comforting.
Yes, horses were plowing fields 200 years later. Do you now want to be the metaphorical horses in the future? Do you think this next transition could possibly last 200 years, even if it went painfully slowly? Even the similarly slow version now, if it happened, without the feedback loops AI enables, would be more like 20 years at most, time moves a lot faster now. The idea that things in past centuries took decades or centuries, so they will again now, seems quite foolish to me even for non-AI technologies.
Roon’s notes are also well taken, especially noting the implicit ‘mere tool’ assumption. If AI is not a mere tool, throw the whole slow transition model out the window.
A similar speculation to the ‘final pieces fall into place’ hypothesis is Tyler Cowen asking if production will typically follow an O-Ring model.
The idea on the O-Ring model is that any one failure blows you up, so you are as reliable as your least reliable component. In most situations involving ‘IQ 120 vs. IQ 160’ processes, that doesn’t apply. It especially doesn’t apply to speed improvements, such as automating away some portions of tasks to improve productivity. Being any combination of smarter and better and faster about any link in the chain is a big improvement.
Yes, if there are O-Ring style failure points for AIs, either because they’re bad at those elements or not allowed to use those elements, that will potentially be a bottleneck. And that will make transformations be vastly slower and less impressive, in those areas, until such bottlenecks are solved.
But that’s still leaving room for damn impressive speedups and improvements. Yes, AI productivity may spread only slowly, but that’s comparing it to its full potential (let alone its true full potential, when including creating superintelligence in the plan). There will be a lot of places, with AIs that remain tools that look similar to current ones, where we ‘only’ see X-times speedups or even only Y% speedups, with similar cost reductions, plus some increase in ‘IQ level,’ rather than everything happening in the blink of an eye.
The thing is, that’s still not something the market is pricing in. All the Very Serious Economists keep predicting ~0% impact on real productivity.
This is also exactly the argument for things happening ‘slowly then very quickly,’ either in each given task or area, or all at once. If you automate 9 steps out of 10, you might have a 10x speedup or cost reduction, you might not, depending on details and ability to work in parallel. When you automate all 10, it becomes instantaneous and automatic, and everything changes.
People keep assuming the people will be able to keep up enough to stay relevant.
If the proposals are merely written by Einsteins, then yes, you’ll want humans to carefully review the proposals. I do buy the argument that relying on humans as a robustness check is highly desirable, if the humans are capable of it.
The question is, at what point do the humans lose the thread, where the human plus AI review is not adding value compared to a pure AI review? If we have countless Einsteins only smarter, each with orders of magnitude more cycles and limitless memories and so on, are we going to be willing to make the sacrifice that we don’t use anything we humans can’t fully and directly verify?
The people who work at the top labs consistently dismiss the idea of any kind of wall near ‘human-level’ as absurd. That doesn’t mean you have to believe them.
Thanks for the Memos: Introduction and Competitiveness
The White House has issued a wide ranging memorandum on AI, as required by the Biden Executive Order on AI. The headline considerations are harnessing powerful AI in government and elsewhere, the secondary considerations are protecting against harms and keeping it out of the wrong hands.
The Washington Post has a summary here.
(Quotes are edited for length throughout)
So basically this is a plan to:
I would love to see language on ‘allies and partners’ that more explicitly says it wants China in particular inside the tent rather than outside. Is our range that wide?
How are we doing all that, exactly?
It is absurd how the government seems to actually believe this. We are certainly at risk if the government were to actively interfere. But that’s a very different bar.
Shout it from the rooftops. If America is serious about winning on AI, and also everything else, then brain draining the best people, especially from China, is number one on our priority list.
Ideally we’d pass immigration reforms. But yeah, that’s not happening, so:
The whole thing reeks of unjustified self-importance, but sure, those are good things to do and explore.
Okay, sure, sure. Help with the infrastructure to the extent you can do that without doing something crazy like trying to pass a law, or actually working around our Everything Bagels.
I notice that they don’t mention the possibility of outright theft of model weights or other intellectual property, or threats to key individuals. Those seem like big oversights?
Thanks for the Memos: Safety
Now we get to the safety talk, where details matter more.
The key mechanism is voluntary pre- and post-deployment testing by AISI, for both mundane harms and existential threats. For stupid jurisdictional reasons DOE has to handle nuclear threats (seriously fix this, it’s really dumb not to unify it all under AISI), commerce and AISI mostly gets everything else.
The whole thing is voluntary. What do they plan to do when Meta says no?
The first half is a reminder of how crazy government can be that they need to say that out loud. The second half makes sense assuming it means ‘AISI tests the models first, then the agencies test particular applications of them.’
Self-improvement makes the list, you love to see it, and also we have a catch-all. It’s weird to say ‘test two of them within 180 days’ when we don’t know which labs will or won’t have models worth testing. Even if Anthropic is now done for 180 days, I assume Google and OpenAI can help oblige. I still can’t help but notice that the real goal is to test the models worth testing, not to rack up points.
AISI will also issue guidance, here’s the full instruction there.
I notice that this is narrower, especially (A). I’d like to see this extended to explicitly cover more of the catastrophic and existential threat models.
Skipping ahead a bit, (g) repeats this process with chemical and biological risks and names the agencies responsible.
Thanks for the Memos: National Security and Government Adaptation
The following says nothing, but exactly how it says it may be of interest:
We now move on to government hiring, where I’d shorten the instructions to ‘order departments to do unspecified things to make it easier to hire’ and then they do similarly with acquisition and procurement systems, and… well, let’s not pretend my eyes didn’t start glazing over or that I didn’t start skimming. Life is too short. Someone else can dig into these kinds of government implementation details. The goals all seem fine.
There’s something ominous and also misplaced about ensuring innovation ‘aligns with democratic values.’ It’s human values, democratic is instrumental towards that, but cannot be the be all and end all. In any case, what exactly is to be done?
Did anyone else notice what is not on that list?
Then there’s cooperation to promote AI adaptation, which I’m grouping here (ahead of International Governance) for clarity. I’m not sure why we need this?
It’s reports. A bunch of government reports and forming a committee. For enhanced training and awareness, and best practices, and interoperability, and regulatory gaps, and so on. I mean, sure.
Thanks for the Memos: International Governance
Again with the ‘democratic values.’
Later they will be even more explicit: We name ‘allies and partners’ and then ‘engaging with competitors.’
So yes, this is an AI race and cold war against China. That’s the plan.
We also get equitable access. It does lead with that line about ‘safety, security and trustworthiness’ so the question is whether means what we hope it does, and whether that is a high enough priority. National security contexts get a shoutout, but none of the catastrophic or existential dangers do, whereas those big dangers are exactly where need international cooperation the most. Locally shooting yourself in the foot stays local.
So what do they have in mind here to actually do?
Why, write a report, of course. Can, meet kick.
And that’s it. So what did we learn that’s important?
My top note would be: The emphasis on ‘supporting democratic values.’ That could end up going a lot of places. Some are good. Not all of them are fun.
Mostly this was otherwise a nothingburger, but it is good to check, and check which way various winds are blowing. If Harris wins she’ll probably mostly keep all this intact. If it’s Trump, not so much.
EU AI Act in Practice
Dominic Cummings points us to what it looks like to do useful things in the EU. Pieter Garicano describes it as a ‘the strange Kafka world of the EU AI Act.’
I apologize again for not having finished my analysis of the EU AI Act. The tabs are sitting there still open, I want to finish it, except it’s so damn painful every time. Sigh. So this will have to do, as a taste.
The right way to regulate AI focuses on frontier models and AI capabilities, and then lets people use those models to do useful things.
The EU AI Act instead mostly gives those creating the important dangers a free pass, while imposing endless requirements on those that attempt to do useful things.
How bad is it? Well, when everything goes right, it looks like this:
Calling an AI teacher ‘high risk’ is of course an absurdity. What is high risk is creating the underlying AI frontier model in the first place. One you’ve already done that, many of the requirements above quite obviously make no sense in the context of an AI teacher. Even in the best case, the above is going to slow you down quite a bit, and it’s going to make it very difficult to iterate, and it’s going to add big fixed costs. Will it eventually be worth creating an AI teacher anyway? I would presume so.
But this is crippling to the competitive landscape. And again, that’s if everything is working as designed. This isn’t a mistake or a gotcha.
There are requirements imposed on large LLMs, starting at 10^25 flops, but they are comparatively light weight and manageable by those with the scale to be creating such models in the first place. I doubt they will be substantial practical barriers, or that they will provide much additional safety for anyone who wasn’t trying to act profoundly irresponsibly even by the profoundly irresponsible industry standards.
Then there’s the question of enforcement, and how that gets split among agencies and member countries in practice. He predicts disaster, including pointing out:
This seems like the best summary offered:
That seems right. Google can afford that. You can’t. This is murder on the little guy. As opposed to only targeting frontier models, as was proposed in SB 1047, which literally does not apply to that little guy at all.
I find this to be frustratingly half correct. It correctly diagnoses the first problem, of failing to understand what causes gains from AI and allow that to happen. It then calls for ‘benefits and losses to be incurred by free individuals in the market,’ but fails to consider that when you are dealing with existential risks and catastrophic risks, and a wide range of negative externalities, the losses cannot by default be only incurred by free individuals choosing to accept those costs and risks in the market.
I love the energy of ‘Europe should rethink the AI Act before it fully takes effect’ but it feels like screaming into the void.
Texas Messes With You
They are floating in Texas, and I have heard also other states including New York, a draft law that some are saying applies that same EU-style regulation to AI. It’s certainly in that spirit and direction. It makes the most important mistake not to make when regulating AI: It focuses on regulating particular use cases, and puts the burden on those trying to use AI to beware a wide variety of mundane harms.
Those who oppose such draft regulation tend to try wolf a lot, and as always the wording on the warnings was needlessly hysterical, so as usual you have to check out the actual draft bill. I’m not about to do a full RTFB at this stage, these things tend to change their details a lot and there are too many draft bills floated to read them all, so I used Claude to ask questions instead, which I supplemented by looking at the wording of key provisions.
What I found there was bad enough. This is not a prior restraint bill, it relies on retroactive enforcement, but it gives everyone a private right of action so beware. You only have to keep your impact assessment for your records rather than filing it, but the burden is anywhere from large to completely absurd depending on how you interpret the definitions here. The Artificial Intelligence Council is supposed to be advisory, but its third purpose is ensuring AI is ‘safe, ethical and in the public interest,’ which is a recipe for intervention however and wherever they like, which also makes it more likely they expand their powers beyond the advisory.
In this and so many other ways, this is the wrong, no good, very bad approach to AI regulation, that would badly hurt industry and favor the biggest players while not protecting us against the most important risks. And the current draft of the bill implements this strategy quite poorly.
Even if it worked ‘as intended’ it would be a huge barrier to using AI for practical purposes, while doing almost nothing to prevent catastrophic or existential risk except via taking away the economic incentive to build AIs at all, indeed this otherwise actively encourages risk and not being in control. If the bill was actually interpreted and enforced as written, it seems to make unlawful all use of AI for any practical purpose, period.
For the record: This regulatory approach, and this bill has nothing whatsoever to do with those worried about AI existential risk, AI notkilleveryonism, EA or OpenPhil. Instead, as I understand it this emerged out of the Future For Privacy Forum, which has many top industry members, including Anthropic, Apple, Google, Meta, Microsoft, and OpenAI (though not Nvidia).
Here is Claude’s high level summary of the bill (in response to a clean thread asking “Please summarize the attached draft law. What does it do? Assume you are talking to someone familiar with proposed and existing AI regulations.”)
I’m going ot list #2 first, for reasons that will be clear in a bit.
I notice the ‘unlawful,’ ‘unauthorized,’ ‘informed’ and ‘without consent’ here. That’s a welcome change from what we see in many places in the EU AI Act. Most of this is requiring explicit permission from users rather than a full ban.
That would still ban a lot of practical uses. It could also lead to a GPDR-style outcome where you have to constantly click through ‘consent’ buttons (or consent verbally, it’s AI now).
And of course, ‘non-consensual emotion recognition’ is something that every single person does every time they interact with another human, and that AIs do constantly, because they are correlation engines. You can’t make that go away.
I’m sure that the drafters of such bills do not understand this. They think they’re talking about some special case where the AI is tasked with explicit categorization of emotions, in a way that they think wouldn’t happen on its own. And you can certainly do that too, but that’s not the only way this works. If they mean only the other thing, they need to say that. Otherwise, yes, if the customer sounds mad or happy or unhinged the model is going to notice and respond accordingly – it’s a next token predictor and it’s optimized to get positive feedback in some way.
The same issues apply to categorization. Those categories include sex. What is the AI supposed to do when you refer to yourself using a pronoun? What are we even talking about? Some sort of ‘explicit categorization task’? I can see that interpretation, but if so we’d better spell it out, and the AI is still going to treat (e.g.) men and women differently, starting with using different pronouns to refer to them.
This is very similar to the EU ‘high-risk’ concept, with different obligations for those that are deemed high risk. What counts as ‘high-risk’ is an extensive laundry list, including criminal justice, education, employment, food, healthcare, housing, insurance, legal services, monitoring an so on. It’s amazing how ‘high risk’ most of life turns out to be, according to regulators.
There are exceptions for:
That last one is an interesting clause. Just asking questions! But of course, the most common mode of AI is ‘you ask questions, it gives answers, you rely on the answers.’ And if you were to use the models for consequential decisions? Then they’re not exempt.
Here’s the actual full text on that last one, it’s not like the others.
This does not, by default, apply to LLMs like GPT-4 or Claude, as they also provide other forms of feedback and creativity, and also can provide code, and so on. If this was intended to apply to GPT-4, then they need to reconsider the wording – but then I’d find the whole enterprise of this law even more absurd, exempting the actually dangerous cases all the more.
In general, saying ‘if you interact with anything in the real world in a meaningful way then that is ‘high risk’ and otherwise it isn’t’ is a horrible approach that does not understand what is and isn’t dangerous about AI models. It makes a lot of Type I and also Type II errors.
That’s without considering the issue of Subchapter B is the prohibited uses, as noted above, which would as written get invoked as well in every case.
Texas loves its private rights of action. Individuals being able to sue can be highly effective, and also can be highly effective at having large chilling effects. People hate AI and they’re going to use this to lash out, if allowed to, especially if there’s a chance for quick cash.
The amount per violation always depends on what count as distinct violations. If every use of the AI (or even every query) counts as a violation, it’s an RIAA-style party. I indeed worry that this number is too low, and therefore you only get good deterrence if you count lots of violations, to the point where the price goes effectively infinite.
The AI Council is supposedly advisory, not regulatory (Section 553.101). Anti-regulation types always respond to that with ‘technically sure, but whatever advice it gives will de facto be regulation.’ And of course, every agency must be able to do some amount of rulemaking in order to administer its duties, as is explicitly allowed here. So it’s possible that they could attempt to use this to effectively make policy – the wording here is a lot less airtight than it was for SB 1047’s Frontier Model Board, even before the FMB was taken out.
The exemption for open source is 551.101(b): “this Act does not apply to the developer of an artificial intelligence system who has released the system under a free and open-source license, provided that:
That definition is, as they say, crazy pills. You can make a ‘substantial’ modification for approximately $0 and that’s that, and people will do so constantly in the ordinary course of (well-intentioned and otherwise) business.
Indeed, even using custom instructions plausibly counts here, Claude said the instruction “Explain things simply” might count as a ‘substantial modification’ in this context.
Why? Because Claude understands that this is effectively ‘risk of disparate impact,’ not even actual disparate impact. And the way AIs work is that the vibes and implications of everything impact everything. So the concerns never end. De facto, this is (almost) your worst possible situation.
If someone makes a highly capable AI model available freely, they’re likely not responsible for what happens, not even in theory.
Whereas anyone who dares try to use an AI to do anything useful? That’s a paddlin, an endless supply of lawsuits waiting to happen.
It also means that every time they do so much as change a custom instruction, they would plausibly have to do all their reports and assessments and disclosures over again – I doubt they’d actually take it that far, but enforcement is via private lawsuit, so who knows. If you read the law literally, even a prompt would count.
This is where one must think in terms of legal realism.
One would hope that result described above is unintentional. Very few people want that to happen. It is rather commonly the case that, if you interpret the words in laws literally, the results are both obviously unintended and patently absurd, and effectively ban a broad range of activity (often in a way that would be blatantly unconstitutional and so on). That is not how we typically interpret law.
This wouldn’t actually ban AI systems outright, would it (since every prompt would require new paperwork)? I mean, presumably not, that’s absurd, they’d just disregard what the bill literally says, or at least impose reasonable standards for what one needs to worry about? They’re not actually going to count everything as emotion recognition or categorization?
But maybe not. There are already tons of similarly absurd cases, where disparate impact and similar claims have won in court and warped large portions of our lives, in ways one would presume no one involved in drafting the relevant laws foresaw or intended.
I wonder to what extent ‘ask your AI system to write its own impact assessments each time’ would work.
This law does not, AIUI, require any form of prior restraint on either models or deployments. It does require the filing of ‘high-risk reports’ and impact assessments, and various disclosures, but the enforcement is all post-facto. So it could be worse – it could require prior restraint on deployments.
Effectively this is a case of the Copenhagen Interpretation of Ethics. If you ensure you cannot control what happens, then you are no longer blameworthy. So we are actively encouraging AI companies to ensure their AIs are not under control.
Here was Claude’s summary paragraph:
Follow-up questions confirmed this perspective, in terms of the intent of the law.
In terms of its practical effects, it risks being substantially more damaging than that, especially if its clear mistakes are not fixed. The paperwork requirements, which are extensive, apply not one time to a frontier model developer, but for each substantial modification, for each ‘high-risk’ use of AI, to be repeated semi-annually.
This could end up being, as Dean Ball suggests, the NEPA of AI – a law designed to protect the environment, but that consistently not only cripples our ability to build things but also through our inability to build green energy (and otherwise) is devastating our environment.
Certainly, if one same logic that SB 1047 bill opponents applied to arguing the implications of SB 1047, then this proposed Texas law would cripple the AI industry if they were forced to comply due to being unable to sidestep Texas.
This is what happens when people opposed to regulation direct all their ammunition towards what was by far the best bill we have had opportunity to consider so far, SB 1047, and convinced Newsom to veto it. Instead of SB 1047 becoming the model for AI regulations, we risk this becoming the model for AI regulations instead. Ideally, we would all accept that regulation is coming, and work to steer it towards what would actually protect us, and also what would minimize the costs imposed on AI so we can reap its benefits. Otherwise, we’ll lose out on the promise, and still face the dangers.
This approach would be worse than passing no bills at all, if that were an option.
I said those celebrating the SB 1047 veto would rue the day. I didn’t expect it so soon.
The Quest for Sane Regulations
Thomas Friedman endorses Kamala Harris… because he thinks AGI is likely coming in the next four years and Trump is not up to the task of handling that. And if you do have timelines that short, then yes, AI is the only issue, so ask what is better on AI.
I do think Musk being in Trump’s inner circle is net positive for his AI policy. Consider the alternative voices he is competing against. That’s even more obviously true if like Tabarrok you dismiss AI existential risk and related concerns, which I presume is why he thinks having founded OpenAI is a positive credential.
But that’s in spite of Musk having founded OpenAI, not because of it. And Musk, who regrets founding OpenAI and the path it has taken and has sued them because of it, would presumably be the first person to admit that.
Shakeel Hashim argues for compute thresholds, given the alternative. Anyone else think it’s kind of scary to propose ‘locking down the physical world?’
We would be wise to do a bunch of hardware and infrastructure security either way – we’re underinvesting there by a lot, and would be even if AI was not a concern. But also, if the models are allowed to exist and made available broadly, we would then increasingly have to ‘lock down (more of) the physical world’ in harsher ways, including surveillance and increasingly localized ‘hardware security’ requirements. This would be a massively worse constraint on freedom than the alternative, even if it worked, and with sufficiently capable AI it flat out wouldn’t work on its own.
What do teenagers think about AI? Time reports on a poll is from the Center for Youth and AI.
Previous polls about AI showed that the American people are worried about AI, and they overwhelmingly want it regulated, but the issue is low salience. They don’t care much yet, and it doesn’t drive their vote. This new poll is very different.
After seeing AI reliably be a very low priority, suddenly an AI focused group finds AI is a higher priority among teenagers than social inequality or climate change?
The youngest among us are often out in front of such things. They also as a group have huge exposure to AI, due to how useful it is in school to both genuinely learn and also avoid forced busywork. So it’s not so crazy.
They’re still the youth. Their concerns are mostly mundane, and what you’d expect, but yes escaping human control is there too, at 47%.
This also tells you a lot about the group doing the survey. There are nine named choices, eight of which are mundane risks. The idea that AI might kill everyone is not motivating this survey at all. Nor is it going to drive the policy responses, if this is what people are worried about. A big reason I am sad about SB 1047 being vetoed is thinking about what groups like this will advocate for in its place.
I’d like to see a replication of this result, including in various age groups, especially with respect to the salience of the issue, while making sure not to prime respondents. I am worried that this survey primed the teens to think about AI and this is warping the salience measures a lot.
This question was interesting as well.
One must wonder what is meant by both ‘friendship’ and also ‘acceptable.’
There’s a big difference between ‘this is not a good idea,’ ‘we have a strong norm against this’ and ‘we should make this illegal.’ Or at least, I think there’s a big difference. Many or most people, it seems, blur those together quite a bit more. We need a strong norm against doing that. But which of these is ‘unacceptable?’
One wonders similarly about the term ‘friendship.’ I definitely feel at least a little like Claude Sonnet is my good buddy, that helps me code things and also explore other stuff. But I don’t think many people would worry about that. When does that cross into the thing people often worry about?
Getting concrete about liability: Stephen Casper asks, should Stability.ai be liable for open sourcing Stable Diffusion with no meaningful safeguards, leading to child porn often based on the photos of many specific real children? Note that this question is essentially the same as ‘can you release an open source image model?’ at least for now, because we don’t actually know how to do meaningful safeguards. My answer is essentially that this harm doesn’t rise to the level of ‘no open image models for anyone’ and there aren’t really other harms in play, but that is indeed the question.
This is an overview of the deepfake porn situation as of last year, no doubt things have escalated quickly. Most of the top targets are South Korean singers. I notice I remain confused on how big a deal this actually is.
The Week in Audio
Sam Altman reiterates claim that the o1 class of models are on a steep trajectory of improvement. He’s been saying for years that the wise founder prepares for having access to future much better models, and the unwise founder builds things that a stronger model will render obsolete. He also spoke about agents, again reprising his views – he expects them to be coworkers and collaborators capable of scaling work done. His ‘call 300 restaurants’ example illustrates how that will break our existenting systems through essentially a DDOS attack if they don’t also Go Full AI. But again I notice that he’s seeming not to notice the implications of ‘smart senior coworkers’ being available at scale and speed like this.
SoftBank CEO Masayoshi Son, who thinks you’re crazy but need be to be crazier, says artificial superintelligence – AI that is 10,000 smarter than a human – will arrive by 2035. So, reminder:
I seriously have no idea what these ‘X times smarter’ claims are supposed to mean, other than ‘a lot smarter.’ It’s hype talk. It’s something Trump would say. It’s ‘a lot smarter, like so much smarter, the smartest.’
OpenAI CFO Sarah Friar says lawyers are reporting that the new o1 reasoning model can do the work of a $2000/hour paralegal, which raises the question of who is paying $2000/hr for a paralegal. She also says AGI is “closer than most think” (which seems very clear given what most think!) and the ability of internal research models to perform at PhD level in a range of fields “would blow your mind to see what’s coming.” Which it totally will, if and when that happens.
Rhetorical Innovation
(To be extra clear, yes, he’s being ironic.)
I have learned that ultimately you are responsible for how others interpret and react to your statements. If I say ‘the sky is blue’ and people think ‘oh that means the sky is yellow’ then I can shake my fist at them all I want, it’s still on me. It’s an uphill battle.
Eliezer Yudkowsky clarifies what level of AI capabilities would falsify his predictions of AI foom, if it were to exist without a foom for an extended period of time.
Any progress short of that is still some evidence against the theory, since it raises the lower bound, although one could also argue that AI is indeed speeding up coding and AI development so it’s not clear in which direction our observations point.
Jeffrey Ladish illustrates the correct perspective, and also vibe: AI is both exciting and amazing and super useful, including future strategically superhuman agents, and also has a high chance of getting everyone killed. You could have one without the other, but also these facts are not unrelated.
Here’s another vibe going around, in two versions.
That’s the key contrast.
There’s a version of this reaction that is super healthy. You do want to go to the barbeque. You do want to talk about the game last night. It’s important to keep living normal life and not let the dangers prevent you from living life or paralyze you – see my practical advice for the worried.
And yes, that includes ensuring your family’s future in the worlds where AGI doesn’t show up for a long time.
I think it is basically fine to say ‘this AGI thing is not something I can meaningfully influence, and it might or might not happen, so I’m going to go live my life.’ So long as your eyes are open that this is what you are doing, and you don’t actively try to change things for the worse.
Many such cases.
Your periodic reminder that if you’re in a race that you think is a really bad idea, you don’t have to unilaterally stop racing, but you do – assuming you indeed think the race is a really bad idea – have to point out that it would be good if everyone agreed to or was forced to stop racing.
Alternatively, you can say ‘I think the race is good, actually, or at least not bad enough that we want to try and coordinate to stop.’ But then you have to say that, too.
Alex Lawsen points out that it is difficult to produce even conditional consensus regarding (among other things!) AI existential risk – even those in good faith disagree about how we should respond to a wide variety of future evidence. Different people will condition on different other events and details, and have different interpretations of the same evidence. Sufficiently strong evidence would overcome the differences – if we were all dead, or we’d had ASI for a while and everything was great, that would be that. It hopefully doesn’t take that much on either side, but it takes rather a lot.
Roon Speaks
Roon speaks indeed. There’s a lot to unpack.
I think both a lot of the updating is on real things, and a lot is also on vibes.
First things first – I do not think Roon is wrong several times over here, and perhaps his communication was imprecise, but he isn’t lying or anything like that. We love Roon (and we do love Roon) because he says what he actually thinks, including things like ‘this is me being humble and reasonable,’ and not in the same old boring ways. I would respectfully disagree that what he expresses above is a humble interpretation. Whether or not OpenAI saved all those things is, at least, a matter of much debate.
For example, here’s the conclusion from a one-shot query:
I kind of think Sama and OpenAI have been right once, on a really big thing – scaling GPTs – and executed that task well. Scale is all you need, at least so far, and all that.
And that’s the biggest thing where LessWrong consensus was wrong. I confirm that none of us expected you could get this far on scale alone. And also, yes, essentially no one expected (certainly no one near LW, but I think basically no one anywhere?) that something like GPT-4 would have anything like this distribution of skills.
In terms of being immediately dangerous, I agree that the mundane harms so far have come out on the extreme low end of reasonable expectations, and we can clearly get better mundane AI with less practical trouble than we expected. That part is true. I think Roon’s personal expectation of more societal abuse was close to the right prediction to make, given what he knew at the time (and Krueger’s point about mundane applications and abuse taking time to play out is well taken, as well). We were fortunate in that particular way, and that’s great.
There’s a lot of people thinking that the lack of mundane issues and abuse so far is much stronger evidence against future issues than it actually is. I do not think the evidence this provides about the dangers of future more capable AI is all that strong, because the reasons we expect those future AIs to be dangerous don’t apply yet, although the effect isn’t zero because the facts that they don’t apply yet and that other things haven’t gone wrong do count. But also we have a variety of other evidence both for and against such expectations.
What drives me crazy are people who share Ethan’s view, affirmed here by Roon, that the treacherous turn has been ‘falsified empirically.’
No, it hasn’t been falsified, at all. GPT-4 is insufficiently capable, even if it were given an agent structure, memory and goal set to match, to pull off a treacherous turn. The whole point of the treacherous turn argument is that the AI will wait until it can win to turn against you, and until then play along. For better or worse, that makes empirical falsification (or safe confirmation!) very difficult, but obviously 4-level models aren’t going to take treacherous turns.
If anything, the evidence I’ve seen on deception and responding to incentives and so on confirms my expectation that AIs would, at sufficient capabilities levels if we used today’s alignment techniques, do a (relatively!) slow motion version of exactly the things we feared back in the day. Yes, a lot of expectations of the path along the way proved wrong, but a lot of the underlying logic very much still applies – most of it didn’t depend on the details that turned out differently.
In terms of iterative development, we have a ‘total OpenAI cultural victory’ in the sense that for better or for worse the Unilateralist’s Curse has been invoked.
If one company at the frontier decides to push on ahead scaling as fast as possible, and releasing via iterative development, then that’s that. The main things some would consider costs of iterative development in this context are:
Once OpenAI is already doing iterative development, others must follow to compete. And once you know OpenAI is going to do more of it, your iterative development now if chosen well makes the process smoother and therefore safer. Given OpenAI is doing this, Anthropic’s decision to release is clearly correct.
If no one was doing iterative development, that would change the calculus. I think the main cost was drawing attention to the field and creating more intense competition. That price has already been paid in full. So now we might as well enjoy the benefits. Those benefits include collecting the evidence necessary to get people to realize if and when they will need to stop iterating in particular ways.
Roon also offers us this:
Why? I get why man won’t be without the machines, but why the other way around?
The Mask Comes Off
After Miles Brundage left to do non-profit work, OpenAI disbanded the “AGI Readiness” team he had been leading, after previously disbanding the Superalignment team and reassigning the head of the preparedness team. I do worry both about what this implies, and that Miles Brundage may have made a mistake leaving given his position.
This new development certainly is not great, but one must be cautious. This doesn’t have to be bad behavior. And even if it is, we don’t want to punish companies for being partially helpful rather than totally unhelpful.
As I said, it’s not great, and it seems likely this represents a real change in the extent work is being done to be ready for AGI, or the extent work is being done to point out that, as Miles Brundage reminds us, no one is ready for AGI. Still, measured response.
I Was Tricked Into Talking About Shorting the Market Again
Whoops. I really do ‘feel tricked’ on this one. I tried to ignore it this time, world got suckered in and therefore wouldn’t let me. Somehow Tyler Cowen is quadrupling (?) down on ‘are you short the market?’ and now also saying ‘EAs should study finance.’
Tyler even affirms he thinks one should somehow ‘buy insurance’ here, even if you believe doom is only e.g. 20% likely, despite the multiple ways this is impossible or pointness. The reason you buy fire insurance for your home is the high marginal value of money to you if your house burns down, the opposite of the doom case. You’d only buy ‘doom insurance’ because you think that the market is so mispriced that even after usefulness risk and counterparty risk and transaction costs and taxes and so on, you still come out ahead versus other investments. Risk aversion is unavailable.
If there would be no beneficiary you care about, don’t buy life insurance.
Here’s a fully general response from Eliezer:
I do not expect Tyler Cowen to find that at all compelling. Instead I expect him to say that this response if ‘first order wrong,’ on the basis of ‘there must still be something you could bet on’ and otherwise repeat his previous responses. I find many of them absurd to the point of thinking ‘no way you’re kidding’ except I know he isn’t.
Similarly, Nathan Young, who is not that doomy, notices he is confused by the claim that the market should be shorted. Response suggests more realistic considerations, ‘shorting the market’ less literally, such as locking in long term fixed interest rate loans. I do agree that there are things to be done there, but also many of those things are indeed being done.
I found this to be very enlightening:
This shows very clearly how Tyler’s argument proves too much. In 2024, ‘you should sort the market’ is not so crazy, and indeed a barbell strategy of sorts involving extremely out of the money puts would be technically correct if all things considered it was reasonably priced.
Back in 2008, however, this was Obvious Nonsense. We (and yes I was in this category at that point, although not on the internet) were all warning about what might happen years later. No one I was or am aware of was predicting with non-trivial probability that, by 2009, things would have progressed sufficiently that the market would be pricing in doom, let alone that there might be actual doom that quickly.
So quite obviously the trade back then was to be long. Surely everyone can agree that shorting the market in 2008 would have made no sense, and your interest rate focus should have mostly involved the Great Financial Crisis? If not, I mean, the eventual existential risk from AI arguments were valid as far back as when Turing pointed out a basic version of them in the 1950s. Should Alan Turing have then been short the market?
And again, if the ‘short the market’ trade is ‘lock in a large 30 year fixed rate mortgage’ then, even with the caveats I discuss for this in On AI and Interest Rates, I remind everyone that I did exactly that trade, partly for exactly this reason, in 2021 at 2.5%. So perhaps I am betting on beliefs after all, and rather wisely, and am very happy to be marked to market?
If in 2001 you predict a housing bubble some time between 2006 and 2010, do you short the housing market? No, of course not. Whites of their eyes. In The Big Short we see the leader of the shorts almost go broke exactly because he moved too early.
A useful response to Tyler beyond all these points should either be very long or very short. I choose to do some gesturing.
The Lighter Side
True story.
Apple Intelligence is going to be the gift that keeps on giving.