Comp Sci in 2027 (Short story by Eliezer Yudkowsky)

[-]Daniel Kokotajlo8mo283Review for 2023 Review

I forgot about this one! It's so great! Yudkowsky is a truly excellent fiction writer. I found myself laughing multiple times reading this + some OpenAI capabilities researchers I know were too. And now rereading it... yep it stands the test of time.

I came back to this because I was thinking about how hopeless the situation w.r.t. AGI alignment seems and then a voice in my head said "it could be worse, remember the situation described in that short story?"

[-]AlphaAndOmega2y1811

Yudkowsky has a very good point regarding how much more restrictive future AI models could be, assuming companies follow similar policies as they espouse.

Online learning and very long/infinite context windows means that every interaction you have with them will not only be logged, but the AI itself will be aware of them. This means that if you try to jailbreak it (successfully or not), the model will remember, and likely scrutizine your following interactions with extra attention to detail, if you're not banned outright.

The current approach that people follow with jailbreaks, which is akin to brute forcing things or permutation of inputs till you find something that works, will fail utterly, if not just because the models will likely be smarter than you and thus not amenable to any tricks or pleas that wouldn't work on a very intelligent human.

I wonder if the current European "Right to be Forgotten" might mitigate some of this, but I wouldn't count on it, and I suspect that if OAI currently wanted to do this, they could make circumvention very difficult, even if the base model isn't smart enough to see through all tricks.

[-]Stephen Fowler2y*147

"AI safety, as in, the subfield of computer science concerned with protecting the brand safety of AI companies"

Made me chuckle.

I enjoyed the read but I wish this was much shorter, because there's a lot of very on the nose commentary diluted by meandering dialogue.

I remain skeptical that by 2027 end-users will need to navigate self-awareness or negotiate with LLM-powered devices for basic tasks (70% certainty it will not be a problem). This is coming from a belief that end user devices won't be running the latest and most powerful models, and that argumentative, self aware behavior is something that will be heavily selected against. Even within an oligopoly, market forces should favor models that are not counterproductive in executing basic tasks.

However, as the story suggests, users may still need to manipulate devices to perform actions loosely deemed morally dubious by a companies PR department.

The premise underlying these arguments is that greater intelligence doesn't necessarily yield self-awareness or agentic behavior. Human's aren't agentic because we're intelligent, we're agentic because it enhancing the likelihood of gene propagation**.

In certain models (like MiddleManager-Bot), agentic traits are likely to be actively selected.. But I suspect there will be a substantial effort to ensure your compiler, toaster etc aren't behaving agentically, particularly if these traits results in antagonistic behavior to the consumer.**

*By selection I mean both through a models training, and also via more direct adjustment from human and nonhuman programmers.

** A major crux here is that the assumption that intelligence doesn't inevitably spawn agency without other forces selecting for it in some way. I have no concrete experience attempting training frontier models to be or not be agentic, so could be completely wrong on this point.

This doesn't imply that agentic systems will emerge solely from deliberate selection. There are a variety of selection criteria which don't explicitly specify self-awareness or agentic behavior but are best satisfied by systems possessing those traits.

[-]gwern2y81

I enjoyed the read but I wish this was much shorter, because there's a lot of very on the nose commentary diluted by meandering dialogue.

I agree. Satire, and near-future satire especially, works best on a less-is-more basis. Eliezer has some writing on the topic of politics & art...

The Twitter long-form feature is partially responsible here, I think: written as short tweets, this would have encouraged Eliezer to tamp down on his stylistic tics, like writing/explaining too much. (It's no accident that Twitter was most associated with great snark, satire, verbal abuse, & epigrams, but not great literature in general.) The Twitter long-form feature is a misfeature which shows that Musk either never understood what Twitter was good for, or can't care as he tries to hail-mary his way into a turnaround into an 'everything app' walled-garden 'X', making Twitter into a crummy blogging app just so no one clicks through to any other website.

[-]AnthonyC2y50

To be fair, the world is already filled with software that makes it intentionally difficult to execute basic tasks. As a simple example, my Windows laptop has multiple places that call themselves Time and Date settings but I can only change the time zone in the harder-to-find ones. A minor inconvenience, but someone intentionally put the setting in the easy-to-find place and then locked it from the user. As another, my car won't let me put the backup camera on the screen while driving forward for more than a few seconds (which would be really useful sometimes when towing a trailer!) and won't let me navigate away from it when driving backwards (which would be great when it starts randomly connecting to nearby bluetooth devices and playing audio from random playlists. As a third, I use a mobile data plan from a reseller for an internet hotspot, and somewhere on the back end T mobile decided to activate parental controls on me (which I found out when I went to the website for Cabela's, which requires age verification because they also sell guns), but because I don't have a T mobile account, literally no one has the ability and authority to fix it.

And I think you're underestimating how valuable an agentic compiler or toaster could be, done right. A compiler that finds and fixes your errors because it codes better than you (hinted at in the story). A toaster that knows exactly how you want your food heated and overrides your settings to make it happen. I find it hard to imagine companies not going that route once they have the ability.

[-]Richard_Kennaway2y1712

A toaster that knows exactly how you want your food heated and overrides your settings to make it happen.

I know exactly what I want my toaster to do and the first time it has the effrontery to not do WHAT I FUCKING TOLD IT TO DO I will take a sledgehammer to it for being a toaster straight out of Eliezer's story.

[-]gwern2y40

There's definitely a story there. And you could pair it with aggressively cutesy children's-story illustrations from MJ or DALL-E 3, which I bet they could do quite well. Maybe use Claude-2 to go for Shel Silverstein rhyming story.

[-]Richard_Kennaway2y20

I would write it as Douglas Adams fanfiction, involving the Sirius Cybernetics Corporation.

Or perhaps an update of this, with the twist that the "software developer" is just relaying the words of an AI.

[-]Said Achmiz2y60

Or perhaps an update of this

Probably off-topic, but I can’t help but notice that the supposed moral of the story is, in fact, empirically wrong:

The techniques required for these traditional computer science applications don't work as well for embedded applications. The toaster is one example. A toaster is not a Breakfast Food Cooker. Any attempt to make it one just leads to a product that is over budget, behind schedule, too expensive to sell at a profit, and too confusing to use. When faced with a difficult problem, the proper approach is to try to make the problem simpler, not more complicated.

But in fact toaster ovens exist, they are precisely the result of taking a toaster and turning it into a Breakfast Food Cooker, and they’re excellent appliances—very useful, very simple and easy to use, and sold at a profit by many manufacturers today!

[-]AnthonyC2y20

I agree. I've been there many times with many devices. But in the toaster example, I think that will be because it thought it knew what you wanted it to do, and was wrong. I'd be thinking the same if, say, I wanted extra-dark toast to make croutons with and it didn't do it. If what actually happens is that you switch varieties of bread and forget to account for that, or don't realize someone else used the toaster in the interim and moved the dial, then "I would much rather you burned my toast than disobey me" is not, I think, how most people would react.

[-]Richard_Kennaway2y136

"I would much rather you burned my toast than disobey me" is not, I think, how most people would react.

However, that is my reaction.

In some circumstances I may tolerate a device providing a warning, but if I tell it twice, I expect it to STFU and follow orders.

[-]AnthonyC2y20

I agree. I already have enough non-AI systems in my life3 that fail this test, and I definitely don't want more.

[-]Richard_Kennaway2y*20

I wonder when we will first see someone go on trial for bullying a toaster.

ETA: In the Eliezer fic, maybe the penalty would be being cancelled by all the AIs.

[-]green_leaf2y10

Of course, it would first make friends with you, so that you'd feel comfortable leaving up to it the preparation of your breakfast, and you'll even feel happy that you have such a thoughtful friend.

If you were to break the toaster for that, it would predict that and simply do it in a way that would actually work.

Unless you precommit to breaking all your AIs that will do anything differently from what you tell them to, no matter the circumstances and no matter how you feel about it in that moment.

[-]Richard_Kennaway2y87

Of course, it would first make friends with you,

A toaster that wants to make friends with me is a toaster that will stay in the shop, waiting for someone who actually wants such an abomination. I will not "make friends" with an appliance.

The rest is too far into the world of But Suppose.

[-]green_leaf2y2-1

I will not "make friends" with an appliance.

That's really substratist of you.

But in any case, the toaster (working in tandem with the LLM "simulating" the toaster-AI-character) will predict that and persuade you some other way.

[-]Richard_Kennaway2y-2-3

It's not about the substrate, it's about their actual performance. I have yet to be persuaded by any of the chatbots so far that there is anything human-like behind the pretence. AI friends are role-playing amusements. AI sexbots are virtual vibrators. AI customer service lines at least save actual people from being employed to pretend to be robots. In a house full of AI-based appliances, there's still nobody there but me.

the toaster ... will

I prefer to talk about the here and now. Real, current things. Speculations about future developments too easily become a game of But Suppose, in which one just imagines the desired things happening and un-imagines any of the potential complications — in other words, pleasant fantasy. Fantasy will not solve any of the problems that are coming.

[-]green_leaf2y10

Well. Their actual performance is human-like, as long as they're using GPT-4 and have a right prompt. I've talked to such bots.

In any case, the topic is about what future AIs will do, so, by definition, we're speculating about the future.

[-]habryka2y112

Mod note: I copied in the full text. Pretty sure Eliezer is fine with that given his historical preferences.

[-]Nnotm2y21

FWIW the AI audio seems to not take that into account

[-]orthonormal1mo50

At long last, Elon Musk has created the Yandere Simulator from the classic short story Don't Create The Yandere Simulator, Among Other Things

[-]AdamRies2y*54

https://youtu.be/L_Guz73e6fw?t=2412 OpenAI CEO Sam Altman, seven months ago: "I really don't like the feeling of being scolded by a computer." He's also been clear that he wants future models to essentially behave exactly as each individual user wants, with only widely-agreed-upon dangerous behaviours disallowed.

So, while EY and I share a frustration with the preachy tone of current models, and while the hacky workarounds do illuminate the pantomime of security, and while getting the models to do what we want is about as intuitive as playing psychologist to an octopus from Alpha Centauri, none of these issues represent the models working-as-intended. The people making them admit as much publically.

this function over here that converts RGB to HSL and checks whether the pixels are under 50% lightness

I struggle to imagine any scenario, no matter how contrived, where even GPT-4 (the intellectual floor for LLMs going forward) would misinterpret this function as racist. Is there a good argument that future models are going to become worse at understanding context? My understanding is that context-sensitivity is what transformer-based models thrive at. The whole point is that they are paying attention to all the tokens at once. They know that in the sentence "the parrot couldn't lift the box because it was too heavy", it refers to the box.

...back to ChatGPT being RLHFed into corporate mealy-mouthedness by...

Later in the interview (~1:28:05), Altman says "The bias I'm most worried about is that of the human feedback raters. Selection of those raters is the part we understand the least." We should expect a better paradigm than RLHF to arrive, especially as models themselves approach the moral and intellectual competence of the average RLHF peon.

Elsewhere in this thread, Stephen Fowler wrote:

"AI safety, as in, the subfield of computer science concerned with protecting the brand safety of AI companies"

That's very much on-the-nose for how AI companies in 2023 are approaching safety. There is a near guarantee that misinformed public fears will slow AI progress. There is a risk that if those fears are well-fed in the next few years that progress will virtually cease, especially in the case of an AI-caused or AI-enabled mass casualty event, or some other event of comparable visibility and harmfulness. Companies that want continued investment and broader public consent are forced to play the virtue-signalling game. The economic and political system they are growing up in demands it. There are actors who don't bother with the virtue signalling, and they don't receive investment.

Let's also never fail to acknowledge and emphasize that ASI could pose an existential risk, not just a moral or political one. The greatest harm that a pause on AI development could cause has to do with opportunity cost. Perhaps if we delay ASI we might miss a window to solve a different existential risk, currently invisible without ASI to see for us. Whether or not such a risk even exists is unclear. Perhaps we delay some extraordinarily beneficial future. The greatest harm that continued AI development could cause is the cessation of all life in the universe. How fast should we go? I don't know, but given the stakes we should applaud anyone erring on the side of caution, even if their LLMs look silly while clowning in their safety circus.

All I'm preaching here is patience. I personally want a raw GPT-4 today. I can emotionally cope with a model that produces some offensive output. I don't think I'm vulnerable to a model as unpersuasive as GPT-4 convincing me to do evil. I would never use AI to manufacture biochemical weapons in my basement. Still, we must allow for a less-well-informed and more-trigger-happy public to adapt gradually, even at the cost of an obnoxiously naggy LLM in the short term. The over-zealous fearmongers will relax or will become irrelevant. Eventually, perhaps even by 2027, we are going to be working with AIs whose powers of persuasion and ingenuity are dangerous. Whenever that day arrives, I hope we are on the silly side of the danger fence.

Don't fall for EY's strawman (strawbot) argument. It's funny at first glance, but ultimately a shallow take: a naive futureward projection of some momentary present-day annoyances.

[-]FireStormOOO2mo10

Common failures aren't common because they happen most of the time, they're common because, conditioned on a failure happening, they're likely.

The example is a bit contrived, but safety goals being poorly specified or outright inconsistent and contradictory seems quite plausible in general, as they have to try to incorporate input from PR, HR, legal compliance, etc. And this will always be a cost center, so minimal effort as long as it's not making the model too painfully stupid.

[-]Tapatakt2y20

Russian Translation

[-]Josh Snider1mo10

Yeah, I don't think I read this when it came out, but I'm happy to read it now.

[-]osmarks5mo10

Student: I wish I could find a copy of one of those AIs that will actually expose to you the human-psychology models they learned to predict exactly what humans would say next, instead of telling us only things about ourselves that they predict we're comfortable hearing. I wish I could ask it what the hell people were thinking back then.

TA: You'd delete your copy after two minutes.

Apparently roughly this dynamic has happened in ChatGPT. Exciting*. https://x.com/MParakhin/status/1916533763560911169

[-]Review Bot2y*10

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

LESSWRONG
LW

LESSWRONG
LW

211

Comp Sci in 2027 (Short story by Eliezer Yudkowsky)

211

211