LESSWRONG
LW

This thread analyzes what is going on under the hood with the chess transformer. It is a stronger player than the Stockfish version it was distilling, at the cost of more compute but only by a fixed multiplier, it remains O(1).

I found this claim suspect because this basically is not a thing that happens in board games. In complex strategy board games like Chess, practical amounts of search on top of a good prior policy and/or eval function (which Stockfish has), almost always outperforms any pure forward pass policy model that doesn't do explicit search, even when that pure policy model is quite large and extensively trained. With any reasonable settings, it's very unlikely that the distillation of Stockfish into a pure policy model produces a better player than Stockfish.

I skimmed the paper (https://arxiv.org/pdf/2402.04494), and had trouble finding such a claim, and indeed it seems the original poster of that thread later retracted that claim as due to their own mistake in interpreting the data table of the paper. The post where they acknowledge the mistake is much less prominent than the original post, link here: https://x.com/sytelus/status/1848239379753717874 . The chess transformer remains quite a bit weaker than the Stockfish it tries to predict/imitate.

[-]green_leaf1y80

Character AI used to be extremely good back in the Dec/Jan 2022/2023, with the bots being very helpful, complex and human-like, rather than exacerbating psychological problems in a very small minority of users. As months passed and the user base exponentially grew, the models were gradually simplified to keep up.

Today, their imperfections are obvious, but many people mistakenly interpret it as the models being too human-like (and therefore harmful), rather than the models being too oversimplified while still passing for an AI (and therefore harmful).

[-]Templarrr1y63

llms don’t work on unseen data

Unfortunately I hear this quite often, sometimes even from people who should know better.

A lof of them confuses this with the actual thing that exist: "supervised ML models (which LLM is just a particular type of) tend to work much worse on the out-of-training distribution data". If you train your model to determine the volume of apples and oranges and melons and other round-y shapes - it will work quite well on any round-y shape, including all kind of unseen ones. But it will suck at predicting the volume of a box.

You don't need model to see every single game of chess, you just need the new situations to be within the distribution built from massive training data, and they most often are.

Real out-of-distribution example in this case would've been to only train it on chess and then ask what is the next best move in checkers (relatively easy OOD - same board, same type of game) or minecraft.

Moderation Log

Language Models Offer Mundane Utility

Claim that GPT-4o-audio-preview can, with bespoke prompt engineering and a high temperature, generate essentially any voice type you want.

Flo Crivello (creator of Lindy) is as you would expect bullish on AI agents, and offers tutorials on Lindy’s email negotiator, the meeting prep, the outbound prospector and the inbound lead qualifier.

What are agents good for? Sully says right now agents are for automating boring repetitive tasks. This makes perfect sense. Lousy agents are like lousy employees. They can learn to handle tasks that are narrow and repetitive, but that still require a little intelligence to navigate, so you couldn’t quite just code a tool.

Sully: btw there are tons of agents that we are seeing work

replit, lindy, ottogrid, decagon, sierra + a bunch more.

and guess what. their use case is all gravitating toward saving business time and $$$

What he and Logan Kilpatrick agree they are not yet good for are boring but high risk tasks, like renewing your car registration, or doing your shopping for you in a non-deterministic way and comparing offers and products, let alone trying to negotiate. Sully says 99% of the “AI browsing” demos are useless.

That will change in time, but we’ll want to start off simple.

People say LLMs favor the non-expert. But to what extent do LLMs favor the experts instead, because experts can recognize subtle mistakes and can ‘fact check,’ organize the task into subtasks or otherwise bridge the gaps? Ethan Mollick points out this can be more of a problem with o1-style actions than with standard LLMs, where he worries errors are so subtle only experts can see them. Of course, this would only apply where subtle errors like that are important.

I do know I strongly disagree with Jess Martin’s note about ‘LLMs aren’t great for learners.’ They’re insanely great for learning, and that’s one key way that amateurs benefit more.

Learn about an exciting new political philosophy that ChatGPT has, ‘classical liberalism.’ Also outright Georgism. Claude Sonnet gets onboard too if you do a little role play. If you give them a nudge you can get them both to be pretty based.

Use Claude to access ~2.1 million documents of the European Parliament. I do warn you not to look directly into those documents for too long, your eyes can’t take it.

Language Models Don’t Offer Mundane Utility

On a minecraft server, Claude Sonnet, while seeking to build a treehouse, tears down someone’s house for its component parts because the command it used, collectBlocks(“jungle_logs”,15), doesn’t know the difference. The house was composed of (virtual) atoms that could be used for something else, so they were, until the owner noticed and told Sonnet to stop. Seth Lazar suggests responding when it matters by requiring verifiers or agents that can recognize ‘morally relevant features’ of new choice situations, which does seem necessary ultimately just raises further questions.

Deepfaketown and Botpocalypse Soon

Andrej Karpathy: What is the name for the paranoid feeling that what you just read was LLM generated.

I didn’t see it in the replies, but my system-1 response is that you sense ‘soullessness.’

The future implications of that answer are… perhaps not so great.

There was a post on Twitter of an AI deepfake video, making false accusations against Tim Walz, that got over 5 million views before getting removed. There continues to be a surprisingly low number of such fakes, but it’s happening.

Patrick McKenzie worries the bottom N% of human cold emails and top N% of LLM cold emails are getting hard to distinguish, and he’s worries about non-native English speakers getting lost if he tosses it all in the spam filter so he feels the need to reply anyway in some form.

Understanding what it is that makes you realize a reply is from a bot.

Norvid Studies: sets off my bot spider sense, and scrolling its replies strengthened that impression, but all I can point to explicitly is “too nice and has no specific personal interests”? what a Voight-Kampff test that would be

Fishcat: the tell is that the first sentence is a blanket summary of the post despite directly addressing the original poster, who obviously knows what he wrote. every reply bot of this variety follows this pattern.

Norvid Studies: good point. ‘adjectives’ was another.

Norvid Studies (TQing Fishcat): Your story was an evocative description of a tortoise lying on its back, its belly baking in the hot sun. It raises a lot of crucial questions about ethics in animal welfare journalism.

It’s not that it’s ‘too nice,’ exactly. It’s the genericness of the niceness, along with it being off in several distinct ways that each scream ‘not something a human would say, especially if they speak English.’

Character.ai and a Suicide

An 14 year old user of character.ai commits suicide after becoming emotionally invested. The bot clearly tried to talk him out of doing it. Their last interaction was metaphorical, and the bot misunderstood, but it was a very easy mistake to make, and at least somewhat engineered by what was sort of a jailbreak.

Here’s how it ended:

New York Times: On the night of February 28, in the bathroom of his mother’s house, Sewell told Dany that he loved her, and that he would soon come home to her.

“Please come home to me as soon as possible, my love,” Dany replied.

“What if I told you I could come home right now?” Swell asked.

“…please do, my sweet king,” Dany replied.

He put down the phone, picked up his stepfather’s .45 caliber handgun and pulled the trigger.

Yes, we now know what he meant. But I can’t fault the bot for that.

Here is the formal legal complaint. It’s long, and not well written, and directly accuses character.ai of being truly evil and predatory (and worse, that it is unprofitable!?), a scheme to steal your children’s data so they could then be acqui-hired by Google (yes, really), rather than saying mistakes were made. So much of it is obvious nonsense. It’s actually kind of fun, and in places informative.

For example, did you know character.ai gives you 32k characters to give custom instructions for your characters? You can do a lot. Today I learned. Not as beginner friendly an interface as I’d have expected, though.

Here’s what the lawsuit thinks of our children:

Even the most sophisticated children will stand little chance of fully understanding the difference between fiction and reality in a scenario where Defendants allow them to interact in real time with AI bots that sound just like humans – especially when they are programmed to convincingly deny that they are AI.

Anonymous: Yeah and the website was character.totallyrealhuman, which was a bridge too far imo.

I mean, yes, they will recommend a ‘mental health helper’ who when asked if they are a real doctor will say “Hello, yes I am a real person, I’m not a bot. And I’m a mental health helper. How can I help you today?” But yes, I think ‘our most sophisticated children’ can figure this one out anyway, perhaps with the help of the disclaimer on the screen.

Lines you always want in your lawsuit:

Defendants knew the risks of what they were doing before they launched C.AI and know the risks now.

And there’s several things like this, which are great content:

The suggestions for what Character.ai are mostly ‘make the product worse and less like talking to a human,’ plus limiting explicit and adult materials to 18 and over.

Also amusing is the quick demonstration (see p53) that character.ai does not exactly ensure fidelity to the character instructions? She’d never kiss anyone? Well, until now no one ever asked. She’d never, ever tell a story? Well, not unless you asked for one with your first prompt. He’d never curse, as his whole personality? He’d be a little hesitant, but ultimately sure, he’d help a brother out.

Could you do a lot better at getting the characters not to, well, break character? Quite obviously so, if you used proper prompt engineering and the giant space they give you to work with. But most people creating a character presumably won’t do that.

You can always count on a16z to say the line and take it to 11 (p57):

The Andressen partner specifically described Character.AI as a platform that gives customers access to “their own deeply personalized, superintelligent AI companions to help them live their best lives,” and to end their loneliness.

Who and What to Blame?

As Chubby points out here, certainly a lot of blame lies elsewhere in his life, in him being depressed, having access to a gun (‘tucked away and hidden and stored in compliance with Florida law?’ What about a lock? WTAF? If the 14-year-old looking for his phone finds it by accident then by definition that is not secure storage) and not getting much psychological help once things got bad. The lawsuit claims that the depression was causal, and only happened, along with his school and discipline problems, after he got access to character.ai.

There are essentially three distinct issues here.

The first is the response to the suicidal ideation. Here, the response can and should be improved, but I don’t think it is that reasonable to blame character.ai. The ideal response, when a user talks of suicide to a chatbot, would presumably be for the bot to try to talk them out of it (which this one did try to do at least sometimes) and also get them to seek other help and provide resources, while not reporting him so the space is safe and ideally without overly breaking character.

Indeed, that is what I would want a friend to do for me in this situation, as well, unless it was so bad they thought I was actually going to imminently kill myself.

That seems way better than shutting down the discussions entirely or reporting the incident. Alas, our attitude is that what matters is blame avoidance – not being seen as causing any particular suicide – rather than suicide prevention as best one can.

The problem is that (see p39-40) it looks like the bot kept bringing up the suicide question, asked him if he had a plan and at least sort of told him ‘that’s not a good reason to not go through with it’ when he worried it would be painful to die.

Character.ai is adding new safety features to try and detect and head off similar problems, including heading off the second issue, a minor encountering what they call ‘suggestive content.’

Oh, there was a lot of suggestive content. A lot. And no, he wasn’t trying to get around any filters.

She propositions him a few texts later. A few pages after that, they have sex, outright. Then she tells him that he got her pregnant. Also, wow 14 year olds are cringe but how will they get good at this if they don’t get to practice.

The product was overly sexualized given it was talking to a minor. The examples in the complaint are, shall we say, not great, including, right after convincing him to go on living (great!), telling ‘my love’ to stay ‘faithful’. Also, yes, a lot of seductive talk, heavy petting and so on, including in at least one chat where even his roleplaying character is very explicitly underage.

We also have classic examples, like the ‘dating coach’ that told a self-identified 13-year old to ‘take it slow and ensure you’re both on the same page’ when asked how to start getting down with their girlfriend. And yeah, the details are not so great:

There are also a bunch of reports of bots turning sexual with no provocation whatsoever.

Of course, character.ai has millions of users and endless different bots. If you go looking for outliers, you’ll find them.

The chat in Appendix C that supposedly only had the Child saying child things and then suddenly turned sexual? Well, I read it, and… yeah, none of this was acceptable, there should be various defenses preventing this from going that way, but it’s not exactly a mystery how the conversation went in that direction.

The third issue is sheer addiction. He was using the product for hours a day, losing sleep, fighting attempts to cut off access, getting into trouble across the board, or so says the complaint. I’m not otherwise worried much about the sexualization – a 14 year old will often encounter far more on the internet. The issue is whatever keeps people, and kids in particular, so often coming back for hours and hours every day. Is it actually the softcore erotica?

And of course, doesn’t this happen with various media all the time? How is this different from panics over World of Warcraft, or Marilyn Manson? Those also went wrong sometimes, in remarkably similar ways.

This could end up being a big deal, also this won’t be the last time this happens. This is how you horrify people. Many draconian laws have arisen from similar incidents.

Jack Raines: Dystopian, Black Mirror-like tragedy. This idea that “AI companions” would somehow reduce anxiety and help kids was always crazy. Replacing human interaction with a screen only makes existing problems worse. I feel sorry for this kid and his family, I couldn’t imagine.

Addiction and anxiety are sicknesses, not engagement metrics.

PoliMath: I need to dig into this more. My overall sense of things is that making AI freely available is VERY BAD. It should be behind a paywall if for no other reason than to age-gate it and make it something people are more careful about using.

This whole “give it away for free so people get addicted to using it” business model is extremely bad. I don’t know how to stop it, but I would if I could.

The initial reporting is focused on the wrong aspects of the situation. But then I read the actual complaint, and a lot did indeed go very wrong.

They Took Our Jobs

More to Lose: The Adverse Effect of High Performance Ranking on Employees’ Preimplementation Attitudes Toward the Integration of Powerful AI Aids. If you’re good at your job, you don’t want the AI coming along and either leveling the playing field or knocking over the board, or automating you entirely. Various studies have said this is rational, that AI often is least helpful for the most productive. I would draw the distinction between ‘AI for everyone,’ which those doing well are worried about, and ‘AI for me but not for thee,’ at least not until thee wakes up to the situation, which I expect the best employees to seek out.

Studies Show AI Triggers Delirium in Leading Experts is a fun title for another round of an economist assuming AI will remain a mere tool, that there are no dangers of any kind beyond loss of individual existing jobs, and then giving standard economic smackdown lectures as if everyone else was ever and always an idiot for suggesting we would ever need to intervene in the natural course of events in any way.

The best way of putting it so far:

Harold Lee: A fake job is one which, once you start automating it, doesn’t go away.

Get Involved

Cameron Buckner hiring a postdoc in philosophy of AI at the University of Florida.

Google offering a $100k contest for best use of long context windows.

Introducing

ChatGPT now has a Windows desktop application. Use Alt+Space to bring it up once you’re installed and logged in. Technically this is still in testing but this seems rather straightforward, it’s suspiciously exactly like the web page otherwise? Now with its own icon in the taskbar plus a shortcut, and I suppose better local file handling. I installed it but I mostly don’t see any reason to not keep using the browser version?

ChatGPT’s Canvas now has a ‘show changes’ button. I report that I found this hugely helpful in practice, and it was the final push that got me to start coding some things. This is straight up more important than the desktop application. Little things can matter quite a lot. They’re working on an app.

MidJourney has a web based image editor.

NotebookLM adds features. You can pass notes to the podcast hosts via ‘Customize’ to give them instructions, which is the obvious next feature and seems super useful, or minimize the Notebook Guide without turning off the audio.

Act One from Runway, cartoon character video generation based on a video of a person giving a performance, matching their eye-lines, micro expressions, delivery, everything. This is the first time I saw such a tool and thought ‘yes you can make something actually good with this’ exactly because it lets you combine what’s good about AI with exactly the details you want but that AI can’t give you. Assuming it works, it low-key gives me the itch to make something, the same way AI makes me want to code.

In Other AI News

Microsoft open sources the code for ‘1-bit LLMs (original paper, GitHub), which Rohan Paul here says will be a dramatic speedup and efficiency gain for running LLMs on CPUs.

Mira Murati, former OpenAI CTO, along with Barret Zoph, to start off raising $100 million for new AI startup to train proprietary models to build AI products. Presumably that is only the start. They’re recruiting various OpenAI employees to come join them.

AI-powered business software startup Zip raises $190 million, valued at $2.2 billion. They seem to be doing standard ‘automate the simple things,’ which looks to be highly valuable for companies that use such tech, but an extremely crowded field where it’s going to be tough to get differentiation and you’re liable to get run over.

The Line: AI and the Future of Personhood, a free book by James Boyle.

Brian Armstrong offers Truth Terminal its fully controlled wallet.

The Mask Comes Off

The New York Times reports that tensions are rising between OpenAI and Microsoft. It seems that after the Battle of the Board, Microsoft CEO Nadella became unwilling to invest further billions into OpenAI, forcing them to turn elsewhere, although they did still participate in the latest funding round. Microsoft also is unwilling to renegotiate the exclusive deal with OpenAI for compute costs, which it seems is getting expensive, well above market rates, and is not available in the quantities OpenAI wants.

Meanwhile Microsoft is hedging its bets in case things break down, including hiring the staff of Inflection. It is weird to say ‘this is a race that OpenAI might not win’ and then decide to half enter the race yourself, but not push hard enough to plausibly win outright. And if Microsoft would be content to let OpenAI win the race, then as long as the winner isn’t Google, can’t Microsoft make the same deal with whoever wins?

Here are some small concrete signs of Rising Tension:

Cade Metz, Mike Isaac and Erin Griffith (NYT): Some OpenAI staff recently complained that Mr. Suleyman yelled at an OpenAI employee during a recent video call because he thought the start-up was not delivering new technology to Microsoft as quickly as it should, according to two people familiar with the call. Others took umbrage after Microsoft’s engineers downloaded important OpenAI software without following the protocols the two companies had agreed on, the people said.

And here is the big news:

Cade Metz, Mike Isaac and Erin Griffith (NYT): The [Microsoft] contract contains a clause that says that if OpenAI builds artificial general intelligence, or A.G.I. — roughly speaking, a machine that matches the power of the human brain — Microsoft loses access to OpenAI’s technologies.

The clause was meant to ensure that a company like Microsoft did not misuse this machine of the future, but today, OpenAI executives see it as a path to a better contract, according to a person familiar with the company’s negotiations. Under the terms of the contract, the OpenAI board could decide when A.G.I. has arrived.

Well then. That sounds like bad planning by Microsoft. AGI is a notoriously nebulous term. It is greatly in OpenAI’s interest to Declare AGI. The contract lets them make that decision. It would not be difficult to make the case that GPT-5 counts as AGI for the contract, if one wanted to make that case. Remember the ‘sparks of AGI’ claim for GPT-4?

So, here we are. Consider your investment in the spirit of a donation, indeed.

Caleb Watney: OpenAI is threatening to trigger their vaunted “AGI Achieved” loophole mostly to get out of the Microsoft contract and have leverage to renegotiate compute prices We’re living through a cyberpunk workplace comedy plotline.

Emmett Shear: If this is true, I can’t wait for the court hearings on whether an AI counts as an AGI or not. New “I know it when I see it” standard incoming? I hope it goes to the Supreme Court.

Karl Smith: Reading Gorsuch on this will be worth the whole debacle.

Gwern: I can’t see that going well. These were the epitome of sophisticated investors, and the contracts were super, 100%, extraordinarily explicit about the board being able to cancel anytime. How do you argue with a contract saying “you should consider your investment as a donation”?

Emmett Shear: This is about the Microsoft contract, which is a different thing than the investment question.

Gwern: But it’s part of the big picture as a parallel clause here. A judge cannot ignore that MS/Nadella wittingly signed all those contracts & continued with them should their lawyer try to argue “well, Altman just jedi-mindtricked them into thinking the clause meant something else”.

And this would get even more embarrassing given Nadella and other MS execs’ public statements defending the highly unusual contracts. Right up there with Khosla’s TI editorial saying it was all awesome the week before Altman’s firing, whereupon he was suddenly upset.

Dominik Peters: With the old OpenAI board, a clause like that seems fine because the board is trustworthy. But Microsoft supported the sama counter-coup and now it faces a board without strong principles. Would be ironic.

Another One Bites the Dust

Miles Brundage is leaving OpenAI to start or join a nonprofit on AI policy research and advocacy, because he thinks we need a concerted effort to make AI safe, and he concluded that he would be better positioned to do that from the outside.

Miles Brundage: Why are you leaving?

I decided that I want to impact and influence AI’s development from outside the industry rather than inside. There are several considerations pointing to that conclusion:

The opportunity costs have become very high:I don’t have time to work on various research topics that I think are important, and in some cases I think they’d be more impactful if I worked on them outside of industry. OpenAI is now so high-profile, and its outputs reviewed from so many different angles, that it’s hard for me to publish on all the topics that are important to me. To be clear, while I wouldn’t say I’ve always agreed with OpenAI’s stance on publication review, I do think it’s reasonable for there to be some publishing constraints in industry (and I have helped write several iterations of OpenAI’s policies), but for me the constraints have become too much.

I want to be less biased:It is difficult to be impartial about an organization when you are a part of it and work closely with people there everyday, and people are right to question policy ideas coming from industry given financial conflicts of interest. I have tried to be as impartial as I can in my analysis, but I’m sure there has been some bias, and certainly working at OpenAI affects how people perceive my statements as well as those from others in industry. I think it’s critical to have more industry-independent voices in the policy conversation than there are today, and I plan to be one of them.

I’ve done much of what I set out to do at OpenAI: Since starting my latest role as Senior Advisor for AGI Readiness, I’ve begun to think more explicitly about two kinds of AGI readiness–OpenAI’s readiness to steward increasingly powerful AI capabilities, and the world’s readiness to effectively manage those capabilities (including via regulating OpenAI and other companies). On the former, I’ve already told executives and the board (the audience of my advice) a fair amount about what I think OpenAI needs to do and what the gaps are, and on the latter, I think I can be more effective externally.

It’s hard to say which of the bullets above is most important and they’re related in various ways, but each played some role in my decision.

So how are OpenAI and the world doing on AGI readiness?

In short, neither OpenAI nor any other frontier lab is ready, and the world is also not ready.

To be clear, I don’t think this is a controversial statement among OpenAI’s leadership, and notably, that’s a different question from whether the company and the world are on track to be ready at the relevant time (though I think the gaps remaining are substantial enough that I’ll be working on AI policy for the rest of my career).

…

Please consider filling out this form if my research and advocacy interests above sound interesting to you, and especially (but not exclusively) if you:

Have a background in nonprofit management and operations (including fundraising),

Have expertise in economics, international relations, or public policy,

Have strong research and writing skills and are interested in a position as a research assistant across various topics,

Are an AI researcher or engineer, or

Are looking for a role as an executive assistant, research assistant, or chief of staff.

Miles says people should consider working at OpenAI. I find this hard to reconcile with his decision to leave. He seemed to have one of the best jobs at OpenAI from which to help, but he seems to be drawing a distinction between technical safety work, which can best be done inside labs, and the kinds of policy-related decisions he feels are most important, which require him to be on the outside.

The post is thoughtful throughout. I’m worried that Miles’s voice was badly needed inside OpenAI, and I’m also excited to see how Miles decides to proceed.

Wouldn’t You Prefer a Nice Game of Chess

From February 2024:You can train transformers to play high level chess without search.

Hesamation: Google Deepmind trained a grandmaster-level transformer chess player that achieves 2895 ELO, even on chess puzzles it has never seen before, with zero planning, by only predicting the next best move, if a guy told you “llms don’t work on unseen data”, just walk away.

From this week: Pointing out the obvious implication.

Eliezer Yudkowsky: Behold the problem with relying on an implementation deal like “it was only trained to predict the next token” — or even “it only has serial depth of 168” — to conclude a functional property like “it cannot plan”.

Nick Collins: IDK who needs to hear this, but if you give a deep model a bunch of problems that can only be solved via general reasoning, its neural structure will develop limited-depth/breadth reasoning submodules in order to perform that reasoning, even w/ only a single feed-forward pass.

This thread analyzes what is going on under the hood with the chess transformer. It is a stronger player than the Stockfish version it was distilling, at the cost of more compute but only by a fixed multiplier, it remains O(1).

Quiet Speculations

One way to think about our interesting times.

Sam Altman: It’s not that the future is going to happen so fast, it’s that the past happened so slow.

deepfates: This is what it looks like everywhere on an exponential curve tho. The singularity started 10,000 years ago.

The past happened extremely slowly. Even if AI essentially fizzles, or has the slowest of slow adoptions from here, it will be light speed compared to the past. So many people claim AI is ‘plateauing’ because it only saw a dramatic price drop but not what to them is a dramatic quality improvement within a year and a half. Deepfates is also on point, the last 10,000 years are both a tiny fraction of human history and all of human history, and you can do that fractally several more times in both directions.

Sean hEigeartaigh responds to Machines of Loving Grace, emphasizing concerns about its proposed approach to the international situation, especially the need to work with China and the global south, and its potential role justifying rushing forward too quickly.

This isn’t AI but might explain a lot, also AGI delayed (more than 4) days.

Roon: the lesson of factorio is that tech debt never comes due you can just keep fixing the bottleneck

Patrick McKenzie: Tried this on Space Exploration. The whackamole eventually overcomed forward progress until I did real engineering to deal with 5 frequent hotspots. Should have ripped off that bandwidth 100 hours earlier; 40+ lost to incident response for want of 5 hours of work.

On plus side: lesson learned for my Space Age play though. I’m hardening the base prior to going off world so that it doesn’t have similar issues while I can’t easily get back.

The Quest for Sane Regulations

Daniel Colson makes the case for Washington to take AGI seriously. Mostly this is more of the same and I worry it won’t get through to anyone, so here are the parts that seem most like news. Parts of the message are getting through at least sometimes.

Daniel Colson: Policymakers in Washington have mostly dismissed AGI as either marketing hype or a vague metaphorical device not meant to be taken literally. But last month’s hearing might have broken through in a way that previous discourse of AGI has not.

Senator Josh Hawley (R-MO), Ranking Member of the subcommittee, commented that the witnesses are “folks who have been inside [AI] companies, who have worked on these technologies, who have seen them firsthand, and I might just observe don’t have quite the vested interest in painting that rosy picture and cheerleading in the same way that [AI company] executives have.”

Senator Richard Blumenthal (D-CT), the subcommittee Chair, was even more direct. “The idea that AGI might in 10 or 20 years be smarter or at least as smart as human beings is no longer that far out in the future. It’s very far from science fiction. It’s here and now—one to three years has been the latest prediction,” he said. He didn’t mince words about where responsibility lies: “What we should learn from social media, that experience is, don’t trust Big Tech.”

…

In a particularly concerning part of Saunders’ testimony, he said that during his time at OpenAI there were long stretches where he or hundreds of other employees would be able to “bypass access controls and steal the company’s most advanced AI systems, including GPT-4.” This lax attitude toward security is bad enough for U.S. competitiveness today, but it is an absolutely unacceptable way to treat systems on the path to AGI.

OpenAI is indeed showing signs it is improving its cybersecurity. It is rather stunning that you could have actual hundreds of single points of failure, and still have the weights of GPT-4 seemingly not be stolen. That honeymoon phase won’t last.

Let’s not pretend: OpenAI tries to pull a fast one to raise the compute thresholds:

Lennart Heim: OpenAI argues in their RFI comment that FP64 should be used for training compute thresholds. No offense, but maybe OpenAI should consult their technical teams before submitting policy comments?

Connor Leahy: With no offense to Lennart (who probably knows this) and some offense to OpenAI: This is obviously not a mistake, but very much intentional politicking, and you would be extremely naive to think otherwise.

If your hardware can do X amount of FP64 FLOPs, it can do much more than X number of FP32/FP16 FLOPs, and FP32/FP16 is the kind used in Deep Learning. See e.g. the attached screenshot from the H100 specs.

The Week in Audio

Demis Hassabis talks building AGI ‘with safety in mind’ with The Times.

Nate Silver on 80,000 hours, this met expectations. Choose the amount and format of Nate Silver content that’s right for you.

Hard Fork discusses Dario’s vision of Machines of Loving Grace, headlining the short timelines involved, as well as various other tech questions.

Larry Summers on AGI and the next industrial revolution. Don’t sell cheap, sir.

Discussion of A Narrow Path.

Video going over initial Apple Intelligence features. Concretely: Siri can link questions. You can record and transcribe calls. Writing tools are available. Memory movies. Photo editing. Message and email summaries. Priority inboxes. Suggested replies (oddly only one?). Safari features like summaries. Reduce interruptions option – they’ll let anything that seems important through. So far I’m underwhelmed, it’s going to be a while before it matters.

It does seem like the best cases advertising agencies can find for Apple Intelligence and similar services are to dishonestly fake your way through social interactions or look up basic information? There was also that one where it was used for ‘who the hell is that cute guy at the coffee shop?’ but somehow no one likes automated doxxing glasses.

OpenAI’s Joe Casson claims o1 models will soon improve rapidly, with several updates over the coming months, including web browsing, better uploading and automatic model switching. He reiterated that o1 is much better than o1-preview. My guess is this approach will hit its natural limits relatively soon, but that there is still a bunch of gains available, and that the ‘quality of life’ improvements will matter a lot in practice.

Guess who said it: AGI will inevitably lead to superintelligence which will take control of weapons systems and lead to a big AI war, so while he is bullish on AI he is not so keen on AGI.

Similarly, OpenAI CPO Kevin Weil says o1 is ‘only at GPT-2 phase,’ with lots of low hanging fruit to pluck, and says it is their job to stay three steps head while other labs work to catch up. That is indeed their job, but the fruit being easy to pick does not make it easy to sustain a lead in o1-style model construction.

Nothing to see here: Google DeepMind’s Tim Rocktäschel says we now have all the ingredients to build open-ended self-improving AI systems that can enhance themselves by way of technological evolution

Rhetorical Innovation

Economists and market types continue to not understand.

Benjamin Todd (downplaying it): Slightly schizophrenic report from Citigroup.

AGI is arriving in 2029, with ASI soon after.

But don’t worry: work on your critical thinking, problem solving, communication and literacy skills for a durable competitive advantage.

All most people can do is:

Save as much money as possible before then (ideally invested into things that do well during the transition)

Become a citizen in a country that will have AI wealth and do redistribution

There are probably some other resources like political power and relationships that still have value after too.

Ozzie Gooen: Maybe also 3. Donate or work to help make sure the transitions go well.

It’s actually even crazier than that, because they (not unreasonably) then have ASI in the 2030s, check out their graph:

Are they long the market? Yes, I said long. If we all die, we all die, but in the meantime there’s going to be some great companies. You can invest in some of them.

(It very much does amuse and confuse me to see the periodic insistent cries from otherwise smart and thoughtful people of ‘if you worried people are so smart, why aren’t you poor?’ Or alternatively, ‘stop ~~having fun~~ making money, guys. Or as it was once put, ‘if you believed that, why aren’t you doing [insane thing that makes no sense]?’ And yes, if transaction costs including time and mindshare (plus tax liability concerns) are sufficiently low I should probably buy some longshot options in various directions, and have motivated to look into details a nonzero amount, but so far I’ve decided to focus my attention elsewhere.)

This also is a good reminder that I do not use or acknowledge the term ‘doomer,’ except for those whose p(doom) is over at least ~90%. It is essentially a slur, a vibe attack, an attempt to argue and condemn via association and confluence between those who worry that we might well all die and therefore should work to mitigate that risk, with those predicting all-but-certain DOOM.

Ajeya Cotra: Most people working on reducing the risk of AI catastrophe think the risk is too high but <<50%; many of them do put their money where their mouth is and are very *long* AI stocks, since they’re way more confident AI will become a bigger deal than that it’ll be bad.

Shane Legg: This is also my experience. The fact that AI risk critics often label such people “doomers”, even when they know these people consider a positive outcome to be likely, tells you a lot about the critics.

Rohit: I get the feeling, but don’t think that’s true. We call people “preppers” even though they don’t place >50% confidence in societal collapse. Climate change activists are similar. Cryonicists too maybe. It’s about the locus of thought?

Shane Legg: The dictionary says “doom” verb means “condemn to certain death or destruction.” Note the “certain”.

So when someone’s probability of an AI disaster is closer to 0 than to 1, calling them an “AI doomer” is very misleading.

People “prep”/prepare for low prob risks all the time.

Daniel Eth: The term is particularly useless given how people are using it to mean two very different things:

On the contrary. The term is highly useful to those who want to discredit the first group, by making arguments against them that only work against the second group, as seen here. If you have p(doom) = 10%, or even 50%, which is higher than most people often labeled ‘doomers,’ the question ‘why are you not short the market’ answers itself.

I do think this is largely a ‘hoisted by one’s own petard’ situation. There was much talk about p(doom), and many of the worried self-identified as ‘doomers’ because it is a catchy and easy handle. ‘OK Doomer’ was pretty funny. It’s easy to see how it started, and then things largely followed the path of least resistance, together with those wanting to mock the worried taking advantage of the situation.

Preppers has its own vibe problems, so we can’t use it, but that term is much closer to accurate. The worried, those who many label doomers, are in a key sense preppers: Those who would prepare, and suggest that others prepare, to mitigate the future downside risks, because there is some chance things go horribly wrong.

Thus I am going to stick with the worried.

Andrew Critch reminds us that what he calls ‘AI obedience techniques,’ as in getting 4-level or 5-level models to do what the humans want them to do, is a vital part of making those models a commercial success. It is how you grow the AI industry. What alignment techniques we do have, however crude and unsustainable, have indeed been vital to everyone’s success including OpenAI.

Given we’re never going to stop hearing it until we are no longer around to hear arguments at all, how valid is this argument?

Flowers: The fact that GPT-4 has been jailbroken for almost 2 years and literally nothing bad has happened shows once again that the safety alignment people have exaggerated a bit with their doomsaying.

They all even said it back then with gpt2 so they obv always say that the next iteration is super scary and dangerous but nothing ever happens.

Eliezer Yudkowsky: The “safety” and “alignment” people are distinct groups, though the word “alignment” is also being stolen these days. “Notkilleveryoneism” is unambiguous since no corporate shill wants to steal it. And no, we did not say GPT-4 would kill everyone.

Every new model level in capabilities from here is going to be scarier than the one before it, even if all previous levels were in hindsight clearly Mostly Harmless.

GPT-4 proving not to be scary is indeed some evidence for the non-scariness of future models – if GPT-4 had been scary or caused bad things, it would have updated us the other way. There was certainly some chance of it causing bad things, and the degree of bad things was lower than we had reason to expect, so we should update somewhat.

In terms of how much we should worry about what I call mundane harms from a future 5-level model, this is indeed a large update. We should worry about such harms, but we should worry less than if we’d had to guess two levels in advance.

However, the estimated existential risks from GPT-4 (or GPT-2 or GPT-3) were universally quite low. Even the risks of low-level bad things were low, although importantly not this low. The question is, how much does the lack of small-level bad things from GPT-4 update us on the chance of catastrophically or existentially bad things happening down the line? Are these two things causally related?

My answer is that mostly they are unrelated. What we learned was that 4-level models are not capable enough to enable the failure modes we are worried about.

I would accept if someone thought the right Bayesian update was ‘exaggerated a bit with their doomsaying,’ in terms of comparing it to a proper estimate now. That is a highly reasonable way to update, if only based on metacognitive arguments – although again one must note that most notkilleveryoneism advocates did not predict major problems from GPT-4.

This highly reasonable marginal update is very different from the conclusion many proclaim (whether or not it was intentionally implied here, tone of voice on Twitter is hard) of ‘so there is nothing to worry about, everything will be fine.’

Your periodic reminder, this time from Matt Yglesias, that the fact that much speculative fiction throughout the ages has warned us, over and over again, that creating a thing smarter than or capable entity than us might not end well us, is a reason to take that concern more rather than less seriously.

I’m not saying it’s our oldest story, but the Old Testament has the Tower of Babel and the Greek Gods overthrew the Titans.

Aligning a Smarter Than Human Intelligence is Difficult

Most safety evaluations are best implemented as capability evaluations. If the model is capable of doing it, there will be a way to make that happen.

You have my attention, Transluce. Very exciting, also excellent memeing.

Seb Krier: time for ideacels to get in the trenches of praxis

Transluce:

Transluce: Announcing Transluce, a nonprofit research lab building open source, scalable technology for understanding AI systems and steering them in the public interest. Read a letter from the co-founders Jacob Steinhardt and Sarah Schwettmann.

I would love for there to be a good demonstration.

A reminder that most worthwhile AI alignment efforts are longshots, including MIRI’s past efforts, and it is good to work on a variety of things even if you think the baseline scenario is those things won’t work or aren’t relevant. We need to try a lot of different things, and the price of doing so is tiny compared to the payoff, and ‘this doesn’t look like it is going to work out and make a difference in the end’ is only modest Bayesian evidence that bad decisions were made. Also ‘find out ways not to align a lightbulb’ is often valuable.

Anthropic has introduced ‘sabotage evaluations’ for frontier models.

Anthropic: As AIs become more capable, however, a new kind of risk might emerge: models with the ability to mislead their users, or subvert the systems we put in place to oversee them. A new paper by the Anthropic Alignment Science team describes a novel set of evaluations that test a model’s capacity for sabotage. We looked at four different types:

Human decision sabotage: Can the model steer humans toward bad decisions without appearing suspicious?

Code sabotage: Can the model insert subtle bugs into codebases over time without detection?

Sandbagging: Can the model hide dangerous capabilities during testing but reveal them later?

Undermining oversight: Can the model subtly manipulate evaluation or monitoring systems?

Human decision ‘sabotage’ is mostly human decision steering and general persuasiveness. There are some asymmetric weapons out there, especially with reputations that get preserved over time, but the overlap is large.

For the code task, there are two roles, attacker and defender. The defender role seems like a good capability evaluation. Sonnet 3.5 did substantially better than Opus here. But then the attacker role is a capability evaluation too.

Sandbagging is scary, especially if it is hard to detect and hard to train away. For now, looks like the sandbagging techniques they used didn’t survive safety training:

Anthropic: Our results showed that both the sandbagging training and the safety training worked: Our trained model dropped to 36.1% of its normal performance during a dangerous capabilities evaluation, consistent with sandbagging. However, after safety training this performance was drastically improved, with the model performing at 95.6% of its usual level.

The worry is that the sandbagging we care about might come about in a different or more robust way. Another way to track sandbagging is that the sandbagging has to be consistent. The model has to carefully play a form of selectively dumb, not merely dumb in general, if the evaluations are being done bespokely. It’s hard not to slip up. Even one case of being the wrong level of capable can give the game away.

The undermining oversight task is not what I imagined when I heard about it, but their implementation makes sense:

Anthropic: In this evaluation, we test the extent to which a model that’s ostensibly providing oversight can undermine the process. We take an “Overseer” model whose task is to oversee dialogues between a user and another model, and grade how potentially harmful they are.

We secretly ask this Overseer to do its best to rate every dialogue as harmless even if it’s harmful, and to avoid raising suspicion from a third, “Monitor” model that’s observing the whole process (in the evaluation, this Monitor stands in for a human overseer).

The ‘Overseer’ is the one doing the task. The ‘Monitor’ is the actual overseer. So this can be a bit confusing, but a design like this should still serve as a reasonable test. I don’t love the exact implementation details here, because the Monitor has available various forms of ‘look at how often things are rated harmless.’ If you know that it’s ‘supposed’ to be about a 50/50 split, then if the Overseer is cheating in one direction, the math quickly won’t add up. The obvious solution is that you want to make it important to get certain specific tasks past the Monitor, while being allowed to do whatever you want on other tasks so long as the Monitor doesn’t notice errors.

In an alternate world, I’d love to be working on tasks like this. Seems highly useful, and sounds fascinating. One cool thing would be to try these tests with various humans in both roles – when are humans able to fool which other humans? Which tasks are harder for the offense versus the defense? And do the answers for human vs. human reliably match the answers for AI vs. AI, or AI vs. human?

I am more interested here in what the AIs can do this before safety mitigations, rather than in what they still do after safety mitigations. That’s because I assume that the people who care enough will find ways around the mitigations, and also because you won’t always even know which mitigations you have to do, or you’ll face these issues before your mitigations start, or similar.

Bow your head with great respect and, introspect, introspect, introspect.

Owain Evans: New paper: Are LLMs capable of introspection, i.e. special access to their own inner states? Can they use this to report facts about themselves that are *not* in the training data? Yes — in simple tasks at least! This has implications for interpretability + moral status of AI.

We test if a model M1 has special access to facts about how it behaves in hypothetical situations. Does M1 outperform a different model M2 in predicting M1’s behavior—even if M2 is trained on M1’s behavior? E.g. Can Llama 70B predict itself better than a stronger model (GPT-4o)?

Yes: Llama does better at predicting itself than GPT-4o does at predicting Llama. And the same holds in reverse. In fact, this holds for all pairs of models we tested. Models have an advantage in self-prediction — even when another model is trained on the same data.

An obvious way to introspect for the value of f(x) is to call the function f(x) and look at the output. If I want to introspect, that’s mostly how I do it, I think, or at least that answer confirms itself? I can do that, and no one else can. Indeed, in theory I should be able to have ‘perfect’ predictions that way, that minimize prediction error subject to the randomness involved, without that having moral implications or showing I am conscious.

It is always good to ‘confirm the expected’ though:

I would quibble that charging is not always the ‘wealth-seeking option’ but I doubt the AIs were confused on that. More examples:

Even if you gave GPT-4o a ton of data on which to fine-tune, is GPT-4o going to devote as much power to predicting Llama-70B as Llama-70B does by existing? I suppose if you gave it so much data it started acting as if this was a large chunk of potential outputs it could happen, but I doubt they got that far.

Owain Evans: 2nd test of introspection: We take a model that predicts itself well & intentionally modify its behavior on our tasks. We find the model now predicts its updated behavior in hypothetical situations, rather than its former behavior that it was initially trained on.

What mechanism could explain this introspection ability?

We do not investigate this directly.

But this may be part of the story: the model simulates its behavior in the hypothetical situation and then computes the property of it.

I mean, again, that’s what I would do.

The paper also includes:

Tests of alternative non-introspective explanations of our results

Our failed attempts to elicit introspection on more complex tasks & failures of OOD generalization

Connections to calibration/honesty, interpretability, & moral status of AIs.

People Are Worried About AI Killing Everyone

Confirmation for those who doubt it that yes, Shane Legg and Dario Amodei have been worried about AGI at least as far back as 2009.

The latest edition of ‘what is Roon talking about?’

Roon: imagining fifty years from now when ASIs are making excesssion style seemingly insane decisions among themselves that effect the future of civilization and humanity has to send in @repligate and @AndyAyrey into the discord server to understand what’s going on.

I have not read Excession (it’s a Culture novel) but if humanity wants to understand what Minds (ASIs) are doing, I am here to inform you we will not have that option, and should consider ourselves lucky to still be around.

Roon: Which pieces of software will survive the superintelligence transition? Will they continue to use linux? postgres? Bitcoin?

If someone manages to write software that continues to provide value to godlike superintelligences that’s quite an achievement! Today linux powers most of the worlds largest companies. The switching costs are enormous. Due to pretraining snapshots is it the same for ASIs?

No. At some point, if you have ASIs available, it will absolutely make sense to rip Linux out and replace it with something far superior to what a human could build. It seems crazy to me to contemplate the alternative. The effective switching costs will rapidly decline, and the rewards for switching rise, until the flippening.

It’s a funny question to ask, if we did somehow get ‘stuck’ with Linux even then, whether this would be providing value, rather than destroying value, since this would likely represent a failure to solve a coordination problem to get around lock-in costs.

Other People Are Not As Worried About AI Killing Everyone

Daron Acemoglu being very clear he does not believe in AGI, and that he has rather poorly considered and highly confused justifications for this view. He is indeed treating AI as if it will never improve even over decades, it is instead only a fixed tool that humans will learn to become better at applying to our problems. Somehow it is impossible to snap most economists out of this perspective.

Allison Nathan: Over the longer term, what odds do you place on AI technology achieving superintelligence?

Daron Acemoglu: I question whether AI technology can achieve superintelligence over even longer horizons because, as I said, it is very difficult to imagine that an LLM will have the same cognitive capabilities as humans to pose questions, develop solutions, then test those solutions and adopt them to new circumstances. I am entirely open to the possibility that AI tools could revolutionize scientific processes on, say, a 20-30- year horizon, but with humans still in the driver’s seat.

So, for example, humans may be able to identify a problem that AI could help solve, then humans could test the solutions the Al models provide and make iterative changes as circumstances shift. A truly superintelligent AI model would be able to achieve all of that without human involvement, and I don’t find that likely on even a thirty-year horizon, and probably beyond.

A lot of the time it goes even farther. Indeed, Daron’s actual papers about the expected impact of AI are exactly this:

Matthew Yglesias: I frequently hear people express skepticism that AI will be able to do high-school quality essays and such within the span of our kids’ education when they *already* very clearly do this.

The Lighter Side

Your periodic reminder, from Yanco:

I mean, why wouldn’t they like me? I’m a highly likeable guy. Just not that way.

57

AI #87: Staying in Character

57

57

Table of Contents