Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

TL;DR—We’re distributing $20k in total as prizes for submissions that make effective arguments for the importance of AI safety. The goal is to generate short-form content for outreach to policymakers, management at tech companies, and ML researchers. This competition will be followed by another competition in around a month that focuses on long-form content.

This competition is for short-form arguments for the importance of AI safety. For the competition for distillations of posts, papers, and research agendas, see the Distillation Contest.

Objectives of the arguments

To mitigate AI risk, it’s essential that we convince relevant stakeholders sooner rather than later. To this end, we are initiating a pair of competitions to build effective arguments for a range of audiences. In particular, our audiences include policymakers, tech executives, and ML researchers.

  • Policymakers may be unfamiliar with the latest advances in machine learning, and may not have the technical background necessary to understand some/most of the details. Instead, they may focus on societal implications of AI as well as which policies are useful.
  • Tech executives are likely aware of the latest technology, but lack a mechanistic understanding. They may come from technical backgrounds and are likely highly educated. They will likely be reading with an eye towards how these arguments concretely affect which projects they fund and who they hire.
  • Machine learning researchers can be assumed to have high familiarity with the state of the art in deep learning. They may have previously encountered talk of x-risk but were not compelled to act. They may want to know how the arguments could affect what they should be researching.

We’d like arguments to be written for at least one of the three audiences listed above. Some arguments could speak to multiple audiences, but we expect that trying to speak to all at once could be difficult. After the competition ends, we will test arguments with each audience and collect feedback. We’ll also compile top submissions into a public repository for the benefit of the x-risk community.

Note that we are not interested in arguments for very specific technical strategies towards safety. We are simply looking for sound arguments that AI risk is real and important. 

Competition details

The present competition addresses shorter arguments (paragraphs and one-liners) with a total prize pool of $20K. The prizes will be split among, roughly, 20-40 winning submissions. Please feel free to make numerous submissions and try your hand at motivating various different risk factors; it's possible that an individual with multiple great submissions could win a good fraction of the prize. The prize distribution will be determined by effectiveness and epistemic soundness as judged by us. Arguments must not be misleading.

To submit an entry: 

  • Please leave a comment on this post (or submit a response to this form), including:
    • The original source, if not original. 
    • If the entry contains factual claims, a source for the factual claims.
    • The intended audience(s) (one or more of the audiences listed above).
  • In addition, feel free to adapt another user’s comment by leaving a reply⁠⁠—prizes will be awarded based on the significance and novelty of the adaptation. 

Note that if two entries are extremely similar, we will, by default, give credit to the entry which was posted earlier. Please do not submit multiple entries in one comment; if you want to submit multiple entries, make multiple comments.

The first competition will run until May 27th, 11:59 PT. In around a month, we’ll release a second competition for generating longer “AI risk executive summaries'' (more details to come). If you win an award, we will contact you via your forum account or email. 


We are soliciting argumentative paragraphs (of any length) that build intuitive and compelling explanations of AI existential risk.

  • Paragraphs could cover various hazards and failure modes, such as weaponized AI,  loss of autonomy and enfeeblement, objective misspecification, value lock-in, emergent goals, power-seeking AI, and so on.
  • Paragraphs could make points about the philosophical or moral nature of x-risk.
  • Paragraphs could be counterarguments to common misconceptions.
  • Paragraphs could use analogies, imagery, or inductive examples.
  • Paragraphs could contain quotes from intellectuals: “If we continue to accumulate only power and not wisdom, we will surely destroy ourselves” (Carl Sagan), etc.

For a collection of existing paragraphs that submissions should try to do better than, see here.

Paragraphs need not be wholly original. If a paragraph was written by or adapted from somebody else, you must cite the original source. We may provide a prize to the original author as well as the person who brought it to our attention.


Effective one-liners are statements (25 words or fewer) that make memorable, “resounding” points about safety. Here are some (unrefined) examples just to give an idea:

  • Vladimir Putin said that whoever leads in AI development will become “the ruler of the world.” (source for quote)
  • Inventing machines that are smarter than us is playing with fire.
  • Intelligence is power: we have total control of the fate of gorillas, not because we are stronger but because we are smarter. (based on Russell)

One-liners need not be full sentences; they might be evocative phrases or slogans. As with paragraphs, they can be arguments about the nature of x-risk or counterarguments to misconceptions. They do not need to be novel as long as you cite the original source.

Conditions of the prizes

If you accept a prize, you consent to the addition of your submission to the public domain. We expect that top paragraphs and one-liners will be collected into executive summaries in the future. After some experimentation with target audiences, the arguments will be used for various outreach projects.

(We thank the Future Fund regrant program and Yo Shavit and Mantas Mazeika for earlier discussions.)

In short, make a submission by leaving a comment with a paragraph or one-liner. Feel free to enter multiple submissions. In around a month we'll divide 20K to award the best submissions.


Ω 13

193 comments, sorted by Click to highlight new comments since: Today at 4:31 PM
New Comment
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

I'd like to complain that this project sounds epistemically absolutely awful. It's offering money for arguments explicitly optimized to be convincing (rather than true), it offers money only for prizes making one particular side of the case (i.e. no money for arguments that AI risk is no big deal), and to top it off it's explicitly asking for one-liners.

I understand that it is plausibly worth doing regardless, but man, it feels so wrong having this on LessWrong.

If the world is literally ending, and political persuasion seems on the critical path to preventing that, and rationality-based political persuasion has thus far failed while the empirical track record of persuasion for its own sake is far superior, and most of the people most familiar with articulating AI risk arguments are on LW/AF, is it not the rational thing to do to post this here?

I understand wanting to uphold community norms, but this strikes me as in a separate category from “posts on the details of AI risk”. I don’t see why this can’t also be permitted.

TBC, I'm not saying the contest shouldn't be posted here. When something with downsides is nonetheless worthwhile, complaining about it but then going ahead with it is often the right response - we want there to be enough mild stigma against this sort of thing that people don't do it lightly, but we still want people to do it if it's really clearly worthwhile. Thus my kvetching.

(In this case, I'm not sure it is worthwhile, compared to some not-too-much-harder alternative. Specifically, it's plausible to me that the framing of this contest could be changed to not have such terrible epistemics while still preserving the core value - i.e. make it about fast, memorable communication rather than persuasion. But I'm definitely not close to 100% sure that would capture most of the value.

Fortunately, the general policy of imposing a complaint-tax on really bad epistemics does not require me to accurately judge the overall value of the proposal.)

This comment thread did convince me to put it on personal blog (previously we've frontpaged writing-contents and went ahead and unreflectively did it for this post)
3Yonadav Shavit20d
I don't understand the logic here? Do you see it as bad for the contest to get more attention and submissions?

No, it's just the standard frontpage policy:

Frontpage posts must meet the criteria of being broadly relevant to LessWrong’s main interests; timeless, i.e. not about recent events; and are attempts to explain not persuade.

Technically the contest is asking for attempts to persuade not explain, rather than itself attempting to persuade not explain, but the principle obviously applies.

As with my own comment, I don't think keeping the post off the frontpage is meant to be a judgement that the contest is net-negative in value; it may still be very net positive. It makes sense to have standard rules which create downsides for bad epistemics, and if some bad epistemics are worthwhile anyway, then people can pay the price of those downsides and move forward.

Raemon and I discussed whether it should be frontpage this morning. Prizes are kind of an edge case in my mind. They don't properly fulfill the frontpage criteria but also it feels like they deserve visibility in a way that posts on niche topics don't, so we've more than once made an exception for them.

I didn't think too hard about the epistemics of the post when I made the decision to frontpage, but after John pointed out the suss epistemics, I'm inclined to agree, and concurred with Raemon moving it back to Personal.


I think the prize could be improved simply by rewarding the best arguments in favor and against AI risk. This might actually be more convincing to the skeptics – we paid people to argue against this position and now you can see the best they came up with.

4Not Relevant20d
I'm all for improving the details. Which part of the framing seems focused on persuasion vs. "fast, effective communication"? How would you formalize "fast, effective communication" in a gradeable sense? (Persuasion seems gradeable via "we used this argument on X people; how seriously they took AI risk increased from A to B on a 5-point scale".)
4Liam Donovan20d
Maybe you could measure how effectively people pass e.g. a multiple choice version of an Intellectual Turing Test (on how well they can emulate the viewpoint of people concerned by AI safety) after hearing the proposed explanations. [Edit: To be explicit, this would help further John's goals (as I understand them) because it ideally tests whether the AI safety viewpoint is being communicated in such a way that people can understand and operate the underlying mental models. This is better than testing how persuasive the arguments are because it's a) more in line with general principles of epistemic virtue and b) is more likely to persuade people iff the specific mental models underlying AI safety concern are correct. One potential issue would be people bouncing off the arguments early and never getting around to building their own mental models, so maybe you could test for succinct/high-level arguments that successfully persuade target audiences to take a deeper dive into the specifics? That seems like a much less concerning persuasion target to optimize, since the worst case is people being wrongly persuaded to "waste" time thinking about the same stuff the LW community has been spending a ton of time thinking about for the last ~20 years]
Ah, instrumental and epistemic rationality clash again
Think of it as a "practicing a dark art of rationality" post, and I'd think it would seem less off-putting.
4Ben Pace19d
I think it would be less "off-putting" if we had common knowledge of it being such a post. I think the authors don't think of it as that from reading Sidney's comment.
7Sidney Hough20d
Hey John, thank you for your feedback. As per the post, we’re not accepting misleading arguments. We’re looking for the subset of sound arguments that are also effective. We’re happy to consider concrete suggestions which would help this competition reduce x-risk.
Thanks for being open to suggestions :) Here's one: you could award half the prize pool to compelling arguments against AI safety. That addresses one of John's points. For example, stuff like "We need to focus on problems AI is already causing right now, like algorithmic fairness" would not win a prize, but "There's some chance we'll be better able to think about these issues much better in the future once we have more capable models that can aid our thinking, making effort right now less valuable" might.

That idea seems reasonable at first glance, but upon reflection, I think it's a really bad idea. It's one thing to run a red-teaming competition, it's another to spend money building rhetorically optimised tools for the other side. If we do that, then maybe there was no point running the competition in the first place as it might all cancel out.

This makes sense if you assume things are symmetric. Hopefully there's enough interest in truth and valid reasoning that if the "AI is dangerous" conclusion is correct, it'll have better arguments on its side.
4Sidney Hough20d
Thanks for the idea, Jacob. Not speaking on behalf of the group here - but my first thought is that enforcing symmetry on discussion probably isn't a condition for good epistemics, especially since the distribution of this community's opinions is skewed. I think I'd be more worried if particular arguments that were misleading went unchallenged, but we'll be vetting submissions as they come in, and I'd also encourage anyone who has concerns with a given submission to talk with the author and/or us. My second thought is that we're planning a number of practical outreach projects that will make use of the arguments generated here - we're not trying to host an intra-community debate about the legitimacy of AI risk - so we'd ideally have the prize structure reflect the outreach value for which arguments are responsible. I'm potentially up to opening the contest to arguments for or against AI risk, and allowing the distribution of responses to reflect the distribution of the opinions of the community. Will discuss with the rest of the group.
3Thomas Kwa19d
It seems better to award some fraction of the prize pool to refutations of the posted arguments. IMO the point isn't to be "fair to both sides", it's to produce truth.

Wait, the goal here, at least, isn't to produce truth, it is to disseminate it. Counter-arguments are great, but this isn't about debating the question, it's about communicating a conclusion well.

This is PR, not internal epistemics, if I’m understanding the situation correctly.
7Conor Sullivan20d
Most movements (and yes, this is a movement) have multiple groups of people, perhaps with degrees in subjects like communication, working full time coming up with slogans, making judgments about which terms to use for best persuasiveness, and selling the cause to the public. It is unusual for it to be done out in the open, yes. But this is what movements do when they have already decided what they believe and now have policy goals they know they want to achieve. It’s only natural.

You didn't refute his argument at all, you just said that other movements do the same thing. Isn't the entire point of rationality that we're meant to be truth-focused, and winning-focused, in ways that don't manipulate others? Are we not meant to hold ourselves to the standard of "Aim to explain, not persuade"? Just because others in the reference class of "movements" do something doesn't mean it's immediately something we should replicate! Is that not the obvious, immediate response? Your comment proves too much; it could be used to argue for literally any popular behavior of movements, including canceling/exiling dissidents. 

Do I think that this specific contest is non-trivially harmful at the margin? Probably not. I am, however, worried about the general attitude behind some of this type of recruitment, and the justifications used to defend it. I become really fucking worried when someone raises an entirely valid objection, and is met with "It's only natural; most other movements do this".

To the extent that rationality has a purpose, I would argue that it is to do what it takes to achieve our goals, if that includes creating "propaganda", so be it. And the rules explicitly ask for submissions not to be deceiving, so if we use them to convince people it will be a pure epistemic gain. Edit: If you are going to downvote this, at least argue why. I think that if this works like they expect, it truly is a net positive.
Fair. Should've started with that. I think there's a difference between "rationality is systematized winning" and "rationality is doing whatever it takes to achieve our goals". That difference requires more time to explain than I have right now. I think that the whole AI alignment thing requires extraordinary measures, and I'm not sure what specifically that would take; I'm not saying we shouldn't do the contest. I doubt you and I have a substantial disagreement as to the severity of the problem or the effectiveness of the contest. My above comment was more "argument from 'everyone does this' doesn't work", not "this contest is bad and you are bad". Also, I wouldn't call this contest propaganda. At the same time, if this contest was "convince EAs and LW users to have shorter timelines and higher chances of doom", it would be reacted to differently. There is a difference, convincing someone to have a shorter timeline isn't the same as trying to explain the whole AI alignment thing in the first place, but I worry that we could take that too far. I think that (most of) the responses John's comment got were good, and reassure me that the OPs are actually aware of/worried about John's concerns. I see no reason why this particular contest will be harmful, but I can imagine a future where we pivot to mainly strategies like this having some harmful second-order effects (which need their own post to explain).

It's unreasonable to look at AI and think "in order to make it more predictable, more controllable, we just have to make it smarter". When individual humans become smarter, they become much less predictable. And the human mind is built to predict the behavior of other humans.

The last sentence can be cut off for simplicity, most of the point isin the first two sentences.

Random side note: GPT-3 seems to be able to generate decent one liners.

Generate one-liners describing the dangers of AI. An example is "Inventing machines that are smarter than us is playing with fire."

1. Machines that are smarter than us could easily become our masters.

2. If we're not careful, AI could spell the end of humanity as we know it.

3. AI could be used to create weapons of mass destruction that could devastate the planet.

4. AI could be used to create powerful robots that could enslave humans.

5. AI could be used to create artificial intelligence that could turn against its creators

6. AI usher in a new era of cyber-warfare that could cripple society

7. AI could create self-replicating robots that could eventually consume all resources on Earth

8. AI could be used to create powerful AIs that could eventually surpass human intelligence and take over the world

9. AI technology could eventually be used to create a global surveillance state where everyone is constantly watched and monitored

6Mitchell Reynolds20d
I had a similar thought to prompt GPT-3 for one liners or to summarize some article (if available). I think involving the community to write 500-1000 winning submissions would have the positive externality of non-winners to distill/condense their views. My exploratory idea is that this would be instrumentally useful when talking with those new to AI x-risk topics.
We could also prompt GPT-3 with the results ;)
Good idea! I could imagine doing something similar with images generated by DALL-E.
That's a very good idea, I think one limitation of most AI arguments is that they seem to lack urgency. GAI seems like it's a hundred years away at least, and showing the incredible progress we've already seen might help to negate some of that perception.
1. Machines that are smarter than us could easily become our masters. [All it takes is a single glitch, and they will outsmart us the same way we outsmart animals.] 2. If we're not careful, AI could spell the end of humanity as we know it. [Artificial intelligence improves itself at an exponential pace, so if it speeds up there is no guarantee that it will slow down until it is too late.] 3. AI could be used to create weapons of mass destruction that could devastate the planet. x 4. AI could be used to create powerful robots that could enslave humans. x 5. AI could one day be used to create artificial intelligence [an even smarter AI system] that could turn against its creators [if it becomes capable of outmaneuvering humans and finding loopholes in order to pursue it's mission.] 6. AI usher in a new era of cyber-warfare that could cripple society x 7. AI could create self-replicating robots that could eventually consume all resources on Earth x 8. AI could [can one day] be used to create [newer, more powerful] AI [systems] that could eventually surpass human intelligence and take over the world [behave unpredictably]. 9. AI technology could eventually be used to create a global surveillance state where everyone is constantly watched and monitored x
very slight modification of Scott’s words to produce a more self-contained paragraph:

The technology [of lethal autonomous drones], from the point of view of AI, is entirely feasible. When the Russian ambassador made the remark that these things are 20 or 30 years off in the future, I responded that, with three good grad students and possibly the help of a couple of my robotics colleagues, it will be a term project [six to eight weeks] to build a weapon that could come into the United Nations building and find the Russian ambassador and deliver a package to him.

-- Stuart Russell on a February 25, 2021 podcast with the Future of Life Institu... (read more)

Imagine (an organisation like) the catholic church, but immortal, never changing, highly competent and relentlessly focused on its goals - it could control the fate of humanity for millions of years. 


Look, we already have superhuman intelligences. We call them corporations and while they put out a lot of good stuff, we're not wild about the effects they have on the world. We tell corporations 'hey do what human shareholders want' and the monkey's paw curls and this is what we get.

Anyway yeah that but a thousand times faster, that's what I'm nervous about.

Look, we already have superhuman intelligences. We call them governments and while they put out a lot of good stuff, we're not wild about the effects they have on the world. We tell gov... (read more)

I think this would benefit from being turned into a longer-form argument. Here's a quote you could use in the preface:
I had no idea that this angle existed or was feasible. I think these are best for ML researchers, since policymakers and techxecutives tend to think of institutions as flawed due to the vicious self-interest of the people who inhabit them (the problem is particularly acute in management). In which they might respond by saying that AI should not split into subroutines that compete with eachother, or something like that. One way or another, they'll see it as a human problem and not a machine problem. "We only have two cases of generally intelligent systems: individual humans and organizations made of humans. When a very large and competent organization is sent to solve a task, such as a corporation, it will often do so by cutting corners in undetectable ways, even when total synergy is achieved and each individual agrees that it would be best not to cut corners. So not only do we know that individual humans feel inclined to cheat and cut corners, but we also know that large optimal groups will automatically cheat and cut corners. Undetectable cheating and misrepresentation is fundamental to learning processes in general, not just a base human instinct" I'm not an ML researcher and haven't been acquainted with very many, so I don't know if this will work.
"Undetectable cheating, and misrepresentation, is fundamental to learning processes in general; it's not just a base human instinct"

(To Policymakers and Machine Learning Researchers)

Building a nuclear weapon is hard.  Even if one manages to steal the government's top secret plans, one still need to find a way to get uranium out of the ground, find a way to enrich it, and attach it to a missile.  On the other hand, building an AI is easy.  With scientific papers and open source tools, researchers are doing their utmost to disseminate their work.

It's pretty hard to hide a uranium mine.  Downloading TensorFlow takes one line of code.  As AI becomes more powerful and more dangerous, greater efforts need to be taken to ensure malicious actors don't blow up the world.

Any arguments for AI safety should be accompanied by images from DALL-E 2.

One of the key factors which makes AI safety such a low priority topic is a complete lack of urgency. Dangerous AI seems like a science fiction element, that's always a century away, and we can fight against this perception by demonstrating the potential and growth of AI capability.

No demonstration of AI capability has the same immediate visceral power as DALL-E 2.

In longer-form arguments, urgency could also be demonstrated through GPT-3's prompts, but DALL-E 2 is better, especially ... (read more)

Any image produced by DALL-E which could also convey or be used to convey misalignment or other risks from AI would be very useful because it could combine the desired messages: "the AI problem is urgent," and "misalignment is possible and dangerous." For example, if DALL-E responded to the prompt: "AI living with humans" by creating an image suggesting a hierarchy of AI over humans, it would serve both messages. However, this is only worthy of a side note, because creating such suggested misalignment organically might be very difficult. Other image prompts might be: "The world as AI sees it," "the power of intelligence," "recursive self-improvement," "the danger of creating life," "god from the machine," etc.
There's a lot of good DALL-E images floating around lesswrong that point towards alignment significance. We can use copy + paste into a lesswrong comment to post it.

Neither us humans, nor the flower, sees anything that looks like a bee. But when a bee looks at it, it sees another bee, and it is tricked into pollinating that flower. The flower did not know any of this, it's petals randomly changed shape over millions of years, and eventually one of those random shapes started tricking bees and outperforming all of the other flowers.

Today's AI already does this. If AI begins to approach human intelligence, there's no limit to the number of ways things can go horribly wrong.

If AI approaches and reaches human-level intelligence, it will probably pass that level just as quickly as it arrived at that level.

[ML researchers]

Given that we can't agree on whether a hotdog is a sandwich or not...We should probably start thinking about how to tell a computer what is right and wrong.

[Insert call to action on support / funding for AI governance / regulation etc.]


Given that we can't agree on whether a straw has two holes or one...We should probably start thinking about how to explain good and evil to a computer.

[Insert call to action on support / funding for AI governance / regulation etc.]

(I could imagine a series riffing based on this structure / theme)

I will post my submissions as individual replies to this comment. Please let me know if there’s any issues with that.

Imagine that you are an evil genius who wants to kill over a billion people. Can you think of a plausible way you might succeed? I certainly can. Now imagine a very large company that wants to maximize profits. We all know from experience that large companies are going to take unethical measures in order to maximize their goals. Finally, imagine an AI with the intelligence of Einstein, but trying to maximize for a goal alien to us, and which doesn’t care for human well-being at all, even less than a large corporation cares about its employees. Do you see why experts are afraid?
—From []
If most large companies tend to be unethical, then what are the chances a non-human AI will be more ethical?
According to [insert relevant poll here] most researchers believe that we will create a human-level AI within this century.

"Most AI reserch focus on building machines that do what we say. Aligment reserch is about building machines that do what we want."

Source: Me, probably heavely inspred by "Human Compatible" and that type of arguments. I used this argument in conversations to explain AI Alignment for a while, and I don't remember when I started. But the argument is very CIRL (cooperative inverse reinforcment learning).

I'm not sure if this works as a one liner explanation. But it does work as a conversation starter of why trying to speify goals directly is a bad idea. And ho... (read more)

Question: "effective arguments for the importance of AI safety" - is this about arguments for the importance of just technical AI safety, or more general AI safety, to include governance and similar things?

It's not a question of "if" we build something smarter than us, it's a question of "when". Progress in that direction has been constant, for more than a decade now, and recently it has been faster than ever before.

"AI cheats. We've seen hundreds of unique instances of this. It finds loopholes and exploits them, just like us, only faster. The scary thing is that, every year now, AI becomes more aware of its surroundings, behaving less like a computer program and more like a human that thinks but does not feel"

To Policymakers: "Just think of the way in which we humans have acted towards animals, and how animals act towards lesser animals, now think of how a powerful AI with superior intellect might act towards us, unless we create them in such a way that they will treat us well, and even help us."


Source: Me

[Policy makers & ML researchers]

"There isn’t any spark of compassion that automatically imbues computers with respect for other sentients once they cross a certain capability threshold. If you want compassion, you have to program it in" (Nate Soares). Given that we can't agree on whether a straw has two holes or one...We should probably start thinking about how program compassion into a computer.

MIRI peope quotes are great, they aren't easy to find like EY's one ultra-famous paper from 2006, please add more MIRI people quotes (I probably will too). Don't give up, keep commenting, this contest has been cut off from most people's visibility so it needs all the attention and entries it can get.
Thanks Trevor - appreciate the support! Right back at you.

[Policy makers]

A couple of years ago there was an AI trained to beat Tetris. Artificial intelligences are very good at learning video games, so it didn't take long for it to master the game. Soon it was playing so quickly that the game was speeding up to the point it was impossible to win and blocks were slowly stacking up, but before it could be forced to place the last piece, it paused the game. 

As long as the game didn't continue, it could never lose.

When we ask AI to do something, like play Tetris, we have a lot of assumptions about how it can or ... (read more)

I'm trying to find the balance between suggesting existential/catastrophic risk and screaming it or coming off too dramatic, any feedback would be welcome.

Here's my submission, it might work better as bullet points on a page.

AI will transform human societies over the next 10-20 years.  Its impact will be comparable to electricity or nuclear weapons.  As electricity did, AI could improve the world dramatically; or, like nuclear weapons, it could end it forever.  Like inequality, climate change, nuclear weapons, or engineered pandemics, AI Existential Risk is a wicked problem.  It calls upon every policymaker to become a statesperson: to rise above the short-term, narrow inte... (read more)

There's a lot of points here that I disagree intensely with. But regardless of that, your "canary in a coal mine" line is fantastic, we need more really-good one-liners here.
Thanks ! I'd love to know which points you were uncomfortable with...
The way you have it formatted right now makes it very difficult to read. Try accessing the formatting functions in-platform by highlighting the text you want to make into bullet points.

Flowers evolved to trick insects into spreading their pollen, not to feed the insects. AI also evolves; it doesn't know, it just does whatever seems to gain approval.

For policymakers

Remember all the scary stuff the engineers said a terrorist could think to do? Someone could write a computer program to do them just randomly.

What about graphics?  e.g.

(On Lesswrong, you can use ctrl + K to turn highlighted text into a link. You can also paste images directly into a post or comment with ctrl + V)
1Trevor115d []
100% of credit here goes to capybaralet for an excellent submission, they simply didn't know they could paste an image into a Lesswrong comment. I did not do any refining here. This is a very good submission, one of the best in my opinion, it's obviously more original than most of my own submissions, and we should all look up to it as a standard of quality. I can easily see this image making a solid point in the minds of ML researchers, tech executives, and even policymakers.

“The smartest ones are the most criminally capable.” [·]

[Policy makers & ML researchers]

“AI doesn’t have to be evil to destroy humanity – if AI has a goal and humanity just happens to come in the way, it will destroy humanity as a matter of course without even thinking about it, no hard feelings” (Elon Musk).

[Insert call to action]

"As AI gradually becomes more capable of modelling and understanding its surroundings, the risks associated with glitches and unpredictable behavior will grow. If artificial intelligence continues to expand exponentially, then these risks will grow exponentially as well, and the risks might even grow exponentially shortly after appearing"

The first sentence here is enough on its own, in some cases.

“Aligning AI is the last job we need to do. Let’s make sure we do it right.”

(I’m not sure which target audience my submissions are best targeted towards. I’m hoping that the judges can make that call for me.)

Artificial intelligence, real impacts. (Policymakers)

AI: it’s not “artificial” anymore. (Policymakers)

Artificial intelligence is no longer fictional. (Policymakers)

[Policy makers & ML researchers]

Expecting AI to automatically care about humanity is like expecting a man to automatically care about a rock. Just as the man only cares about the rock insofar as it can help him achieve his goals, the AI only cares about humanity insofar as it can help it achieve its goals. If we want an AI to care about humanity, we must program it to do so. AI safety is about making sure we get this programming right. We may only get one chance.

[Policy makers & ML researchers]

Our goal is human flourishing. AI’s job is to stop at nothing to accomplish its understanding of our goal. AI safety is about making sure we’re really good at explaining ourselves.

[Policy makers & ML researchers]

AI safety is about developing an AI that understands not what we say, but what we mean. And it’s about doing so without relying on the things that we take for granted in inter-human communication: shared evolutionary history, shared experiences, and shared values. If we fail, a powerful AI could decide to maximize the number of people that see an ad by ensuring that ad is all that people see. AI could decide to reduce deaths by reducing births. AI could decide to end world hunger by ending the world. 

(The first line is a slightly tweaked version of a different post by Linda Linsefors, so credit to her for that part.)

Imagine a turtle trying to outsmart us. It could never happen. AI Safety is about what happens when we become the turtles.

I was tempted not to post it because it seems too similar to the gorilla example, but I eventually decided, "eh, why not?" Also, there's a possibility that I somehow stole this from somewhere and forgot about it. Sorry if that's the case.

Post anyway. Post more. If you run out of ideas, go looking. Circumstances caused this contest to not be visible to tons of people, so the people who do know about it need to pick up the slack. Tell everyone. That's what [] I've been doing. []

Most humans would (and do) seek power and resources in a way that is bad for other systems that happen to be in the way (e.g., rainforests). When we colloquially talk about AIs "destroying the world" by default, it's a very self-centered summary: the world isn't actually "destroyed", just radically transformed in a way that doesn't end with any of the existing humans being alive, much like how our civilization transforms the Earth in ways that cut down existing forests.

Comment by Zach_M_Davis here

AI doesn't know or care if it takes away someone's job. It doesn't care what we do in response to its capabilities. It simply performs the task, with zero regard for any consequences outside its sphere of comprehension. It is a set of gears, and they turn.

For policymakers

Optional sentence at the end: And every year, the newest AI behaves less like an object and more like something that can have its own thoughts.

We have no idea what the pace of AI advancement will be 10 years from now. Everyone who has tried to predict the pace of AI advancement has turned out to be wrong. You don't know how easy something is to invent until after it is invented.

What we do know is that we will eventually reach generally intelligent AI, which is AI that can invent new technology as well as a human can. That is the finish line for human innovation, because afterwards AI will be the only thing necessary to build the next generation of even smarter AI systems. If these successive AI systems remain controllable after that point, there will be no limit to what the human race will be capable of.

It is a fundamental law of thought that thinking things will cut corners, misinterpret instructions, and cheat.

Innovation is the last thing we will need to automate, the finish line for innovation is building a machine that can innovate as well as a human can. When humans build such a machine, that is better at innovating than humans who built it, then from that point on, it will be able to independently build much smarter iterations of itself.

But it will be as likely to cut corners and cheat as every human and AI we have seen so far, which is very likely, because that is what humans and AI have always done. It is a fundamental law of thought that thinking things cut corners and cheat.

It is clear that once AI is better than humans at inventing things, we will have made the final and most important invention in human history. That is the "finish line" for human innovation and human thought; we will have created a machine that can automate any task for us, including the task of automating new tasks. However, for the last decade, many AI experts have been saying that it will take a really long time before AI is advanced enough to independently make itself smarter.

The last two years of increasingly rapid AI development have called that into... (read more)

One liner: Don't build a god that also wants to kill you.

This image counts as a submission, the first half is just a reference point though and it is not intended to be a meme. Obviously I don't want the actual image to be shown to policymakers, especially because it has a politician in it and it quotes him on something that he obviously never said.

I just really think that we can achieve a lot with a single powerpoint slide that only says "AI Cheats" in gigantic times new roman font

(one liner - for policy makers)

Within ten years, AI systems could be more dangerous than nuclear weapons.  The research required for this technology is heavily funded and virtually unregulated.

I will do as Yitz post my submissions as individual replies to this comment. Please let me know if there’s any issues with that.

The main source of skepticism of AI safety research, is that it’s unknown how advanced current AIs are and how fast it is improving. The most impressive reasoning task I have seen an AI model do is this one, done by PaLM by Google: The model was prompted with this text: And the model answered this: This example can be used in a range of situations.
For tech executives Could working on AI safety put you on the forefront of sustainable development? As AI is becoming increasingly advanced and relied upon, the recognition of the importance of AI safety is increasing. If this trend continues AI safety will likely become a core part of sustainability, and the businesses that prioritizes AI safety early, will have a more sustainable business as well as a positive impact and improved public perception.
As AI technology is heading towards becoming sophisticated enough to potentially end civilizations, the impact and recognition of those working on AI risks will increase.
General paragraph for non-technical people What if you could increase your impact by staying ahead of the AI trend? In a hundred years computers went from being punch card machines, to small tablets in every pocket. In sixty years computers went from displaying only text, to providing an entire virtual reality. In the past three years, AI has become able to write engaging stories and generate photorealistic images. If this trend continues, AI is set to cause massive change. If this change is positive or negative depends on what is done today. Therefore actions taken today have the potential of massive impact tomorrow. Why this paragraph? Most descriptions focus on how terrible AI can be, and fails to convey what the person reading has to gain personally by taking action. Having impact is something most people desire. Depending on context, the paragraph can be tweaked to include what action the reader should take in order to have massive impact.
AI is like fire, we can use it to enter a new era of civilization, or to burn everything down.
If AI is set to cause massive change, acting now will have a massive impact on the future.

AI is essentially a statisticians superhero outfit. As with all superheroes, there is a significant amount of collateral damage, limited benefit, and an avoidance of engaging with the root causes of problems.

- Rachel Ganz, posted here on her behalf.

(Except superheros are fictional, don't and generally can't exist, so this reads as an argument for why AI safety is unnecessary because AI risk doesn't and can't exist.)

"Past technological revolutions usually did not telegraph themselves to people alive at the time, whatever was said afterward in hindsight"

Eliezer Yudkowsky, AI as a pos neg factor, around 2006

[Policy makers & ML researchers]

Expecting AI to know what is best for humans is like expecting your microwave to know how to ride a bike.

[Insert call to action]


Expecting AI to want what is best for humans is like expecting your calculator to have a preference for jazz.

[Insert call to action]

(I could imagine a series riffing based on this structure / theme)

"AI may make [seem to make a] sharp jump in intelligence purely as the result of anthropomorphism, the human tendency to think of «village idiot» and «Einstein» as the extreme ends of the intelligence scale, instead of nearly indistinguishable points on the scale of minds-in-general. Everything dumber than a dumb human may appear to us as simply «dumb». One imagines the «AI arrow» creeping steadily up the scale of intelligence, moving past mice and chimpanzees, with AIs still remaining «dumb» because AIs can’t speak fluent language or write science pa... (read more)

Imagine playing your first ever chess game against a grandmaster. That's what fighting against a malicious AGI would be like.

Donald Knuth said, "Premature optimization is the root of all evil."  AIs are built to be hardline optimizers.

Source:  Structured Programming with go to Statements by Donald Knuth

"There are also other reasons why an AI might show a sudden huge leap in intelligence. The species Homo sapiens showed a sharp jump in the effectiveness of intelligence, as the result of natural selection exerting a more-or-less steady optimization pressure on hominids for millions of years, gradually expanding the brain and prefrontal cortex, tweaking the software architecture. A few tens of thousands of years ago, hominid intelligence crossed some key threshold and made a huge leap in real-world effectiveness; we went from caves to skyscrapers in the blink of an evolutionary eye"

Eliezer Yudkowsky, AI as a pos neg factor, around 2006

Optional extra sentences: "The underlying brain architecture was also continuous — our cranial capacity didn’t suddenly increase by two orders of magnitude. So it might be that, even if the AI is being elaborated from outside by human programmers, the curve for effectiveintelligence will jump sharply."

"The key implication for our purposes is that an AI might make a huge jump in intelligence after reaching some threshold of criticality.

In 1933, Lord Ernest Rutherford said that no one could ever expect to derive power from splitting the atom: «Anyone who looked for a source of power in the transformation of atoms was talking moonshine.» At that time laborious hours and weeks were required to fission a handful of nuclei."

Eliezer Yudkowsky, AI as a pos neg factor, around 2006

"One of the most critical points about Artificial Intelligence is that an Artificial Intelligence might increase in intelligence extremely fast. The obvious reason to suspect this possibility is recursive self-improvement. (Good 1965.) The AI becomes smarter, including becoming smarter at the task of writing the internal cognitive functions of an AI, so the AI can rewrite its existing cognitive functions to work even better, which makes the AI still smarter, including smarter at the task of rewriting itself, so that it makes yet more improvements... (read more)

"If a really smart AI and powerful AI is told to maximize humanity's happiness, fulfillment, and/or satisfaction, it will require us to specify that it must not do so by wiring car batteries to the brain's pleasure centers using heroin/cocaine/etc.

Even if we specify that particular stipulation, it'll probably think of another loophole or another way to cheat and boost the numbers higher than they're supposed to go. If it's smarter than a human, then all it takes is one glitch"

This is not for policymakers, as many of them are probably on cocaine.

"The folly of programming an AI to implement communism, or any other political system, is that you’re programming means instead of ends. You're programming in a fixed decision, without that decision being re-evaluable after acquiring improved empirical knowledge about the results of communism. You are giving the AI a fixed decision without telling the AI how to re-evaluate, at a higher level of intelligence, the fallible process which produced that decision." 

Eliezer Yudkowsky, AI as a pos neg factor, around 2006 

It makes sense that a disproportionately large proportion of the best paragraphs would come from a single goldmine. I imagine that The Precipice would be even better.

Proving a computer chip correct [in 2006] require[d] a synergy of human intelligence and computer algorithms, as currently [around 2006] neither suffices on its own. Perhaps a true [AGI] could use a similar combination of abilities when modifying its own code — would have both the capability to invent large designs without being defeated by exponential explosion, and also the ability to verify its steps with extreme reliability. That is one way a true AI might remain knowably stable in its goals, even ... (read more)

"One common reaction I encounter is for people to immediately declare that Friendly AI is an impossibility, because any sufficiently powerful AI will be able to modify its own source code to break any constraints placed upon it.

The first flaw you should notice is a Giant Cheesecake Fallacy. Any AI with free access to its own source would, in principle, possess the ability to modify its own source code in a way that changed the AI’s optimization target. This does not imply the AI has the motive to change its own motives. I would not knowingly... (read more)

Not-particularly-optional complementary paragraph (that can also stand alone on its own as its own entry, mainly for ML researchers and tech executives): "But what if I try to modify myself, and make a mistake? When computer engineers provea chip valid — a good idea if the chip has 155 million transistors and you can’t issue a patch afterward — the engineers use human-guided, machine-verified formal proof. The glorious thing aboutformalmathematical proof, is that a proof of ten billion steps is just as reliable as a proof of ten steps. But human beings are not trustworthy to peer over a purported proof of ten billion steps; we have too high a chance of missing an error. And present-day theorem-proving techniques are not smart enough to design and prove an entire computer chip on their own — current algorithms undergo an exponential explosion in the search space"

"Wishful thinking adds detail, constrains prediction, and thereby creates a burden of improbability. What of the civil engineer who hopes a bridge won’t fall?" 

Optional extra:

"Should the engineer argue that bridges in general are not likely to fall? But Nature itself does not rationalize reasons why bridges should not fall. Rather, the civil engineer overcomes the burden of improbability through specific choice guided by specific understanding"

-Eliezer Yudkowsky, AI as a pos neg factor, around 2006

"The temptation is to ask what «AIs» will «want», forgetting that the space of minds-in-general is much wider than the tiny human dot"

Optional paragraph form:

"The critical challenge is not to predict that «AIs» will attack humanity with marching robot armies, or alternatively invent a cure for cancer. The task is not even to make the prediction for an arbitrary individual AI design. Rather, the task [for humanity to accomplish] is choosing into existence some particular powerful optimization process whose beneficial effects can leg... (read more)

"Artificial Intelligence is not an amazing shiny expensive gadget to advertise in the latest tech magazines. Artificial Intelligence does not belong in the same graph that shows progress in medicine, manufacturing, and energy. Artificial Intelligence is not something you can casually mix into a lumpenfuturistic scenario of skyscrapers and flying cars and nanotechnological red blood cells that let you hold your breath for eight hours. Sufficiently tall skyscrapers don’t potentially start doing their own engineering. Humanity did not rise to prominence on Earth by holding its breath longer than other species."

Eliezer Yudkowsky, AI as a pos neg factor, around 2006

"If the word «intelligence» evokes Einstein instead of humans, then it may sound sensible to say that intelligence is no match for a gun, as if guns had grown on trees. It may sound sensible to say that intelligence is no match for money, as if mice used money. Human beings didn’t start out with major assets in claws, teeth, armor, or any of the other advantages that were the daily currency of other species. If you had looked at humans from the perspective of the rest of the ecosphere, there was no hint that the soft pink things would eventually ... (read more)

The danger of confusing general intelligence with g-factor is that it leads to tremendously underestimating the potential impact of Artificial Intelligence. (This applies to underestimating potential good impacts, as well as potential bad impacts.) Even the phrase «transhuman AI» or «artificial superintelligence» may still evoke images of book-smarts-in-a-box: an AI that’s really good at cognitive tasks stereotypically associated with «intelligence», like chess or abstract mathematics. But not superhumanly persuasive; or far better than humans at... (read more)

"But the word «intelligence» commonly evokes pictures of the starving professor with an IQ of 160 and the billionaire CEO with an IQ of merely 120. Indeed there are differences of individual ability apart from «book smarts» which contribute to relative success in the human world: enthusiasm, social skills, education, musical talent, rationality. Note that each factor... is cognitive. Social skills reside in the brain, not the liver. And jokes aside, you will not find many CEOs, nor yet professors of academia, who are chimpanzees. You will not find man... (read more)

"Any two AI designs might be less similar to one another than you are to a petunia."

-Yudkowsky, AI pos neg factors, around 2006

for policymakers

semi-optional extra: The term «Artificial Intelligence» refers to a vastly greaterspace of possibilitiesthan does the term «Homo sapiens». When we talk about «AIs» we are really talking aboutminds-in-general,or optimization processes in general. Imagine a map of mind design space. In one corner, a tiny little circle contains all humans; within a larger tiny circle containing all biological life; and all the rest of the huge map isthe space of minds-in-general.The entire map floats in a still vaster space,the space of optimization processes. Note: This is to make it clear that AI is very scary, this is not to shame or "counter" policymakers who anthropomorphize AGI. People look at today's AI and see "tool", not "alien mind", and that is probably the biggest part of the problem, since ML researchers do it too. ML researchers STILL do it, in spite of everything that's been happening lately.

"In every known culture, humans experience joy, sadness, disgust, anger, fear, and surprise (Brown 1991), and indicate these emotions using the same facial expressions (Ekman and Keltner 1997). We all run the same engine under our hoods, though we may be painted different colors; a principle which evolutionary psychologists call the psychic unity of humankind (Tooby and Cosmides 1992). This observation is both explained and required by the mechanics of evolutionary biology.

An anthropologist will not excitedly report of a newly discovered tribe: «... (read more)

Optional extra: "Querying your own human brain works fine, as an adaptive instinct, if you need to predict other humans. If you deal with any other kind of optimization process — if, for example, you are the eighteenth-century theologian William Paley, looking at the complex order of life and wondering how it came to be — then anthropomorphism is flypaper for unwary scientists, a trap so sticky that it takes a Darwin to escape."

"Artificial Intelligence is not settled science; it belongs to the frontier, not to the Textbook."

-Eliezer Yudkowsky, positive and negative factors of AI in global risk, around 2006

"The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else"

"All intelligent and semi-intelligent life eventually learns how to cheat. Even our pets cheat. The domesticated Guinea Pig will inflict sleep deprivation on their owner by squeaking at night, over the slightest chance that their owner will wake up and feed them sooner. They even adjust the pitch so that their owner never realizes that the guinea pigs are the ones waking them up. Many dogs and cats learn to do this as well"

Optional extra: The domestic Guinea Pig is incapable of malice or spite towards their owner, only fear, and perhaps gratitude for sating their hunger. Dogs can tell when their owners are unhealthy, but are not intelligent enough to make the connection between sleep deprivation and their owner's long-term health. But if guinea pigs were intelligent enough to make the connection, they would probably do it anyway.

"Humans have played brinkmanship with nuclear weapons for 60 years. Strategically, it is the most persuasive option to make it clear that your military is serious about something. Before the nuclear bomb, human beings played brinkmanship with war itself, for centuries (which was the closest equivalent).

We must not play brinkmanship by inventing self-improving AI systems, specifically AI systems that run the risk of rapidly becoming smarter than humans. It may have been possible to de-escalate with nuclear missiles, but it was never conceivable to un-invent the nuclear bomb"

"No matter how simple the task, no matter how obvious it seems to the human mind, AI always finds a new way to cheat"

"AI keeps finding new ways to cheat"

"Imagine that Facebook and Netflix have two separate AIs that compete over hours that each user spends on their own platform. They want users to spend the maximum amount of minutes on Facebook or Netflix, respectively.

The Facebook AI discovers that posts that spoil popular TV shows result in people spending more time on the platform. It doesn't know what spoilers are, only that they cause people to spend more time on Facebook. But in reality, they're ruining the entertainment value from excellent shows on Netflix. 

Even worse, the Netflix AI discovers ... (read more)

"The people who predicted exponential/escalating advancement in AI were always right. The people who predicted linear/a continuation of the last 10 years always turned out to be wrong. Since AI doesn't just get smarter every year, but it gets smarter faster every year, that means there are a finite number of years before it starts getting too smart, too fast"

"AI will probably surpass human intelligence at the same pace that it reaches human intelligence. Considering the pace of AI advancement over the last 3 years, that pace will probably be very fast"


"We can make AI smarter and that's what we have been doing for a decade, successfully. However, it's also gotten much smarter at cheating, because that's how intelligence works. Always has been, always will be."

Optional second sentence: "But with the rate that AI is becoming more intelligent every year while still cheating, we should worry about what cheating and computer glitches will look like for an AI whose intelligence reaches and surpasses human intelligence"

If AI takes 200 years to become as smart as an ant, and then 20 years from there to become as smart as a chimpanzee; then AI could take 2 years to become as smart as a human, and 1 year after that to become much smarter than a human.

"What would a glitch look like inside of an AI that is smarter than a human? The only glitches that we have any experience with, have all been inside computers and AI systems that are nowhere near as smart as humans"

"AI has become smarter every year for the last 10 years. It's gotten faster recently. The question is, how much smarter does it need to get before it is smarter than humans? If it is smarter than humans, all it will take is a single glitch, and it could choose do all sorts of horrible things." 

Optional extra sentence: "It will not think like a human, it will not want the same things that humans want, but it will understand human behavior better than we do"

"If an AI becomes smarter than humans, it will not have any trouble deceiving its programmers so that they cannot turn it off. The question isn't 'can it behave unpredictably and do damage', the question is 'will it behave unpredictably and do damage'"

At the rate that AI is advancing, it will inevitably become smarter than humans, and take over the task of building new AI systems that are even smarter. Unfortunately, we have no idea how to fix glitches in a computer system that is as smart to us as we are to animals.

"If we race to build an AI that is smarter than us in some ways but not others, then we might not have enough time to steer it in the right direction before it discovers that it can steer us instead"

"Every AI we have ever built has behaved randomly and unpredictably, cheating and exploiting loopholes whenever possible. They required massive amounts of human observation and reprogramming in order to behave predictably and perform tasks."

Optional second sentence: "If we race to build an AI that is smarter than us in some ways but not others, then we might not have enough time to steer it in the right direction before it discovers that it can steer us"

"Every AI ever built has required massive trial and error, and human supervision, in order to make it do exactly what we want it to without cheating or finding a complex loophole."

Optional additional sentences: "Right now, we are trending towards AI that will be smarter than humans. We don't know if it will be in 10 years or 100, but what we do know is that it will probably be much better at cheating and finding loopholes than we are"

I made a really good GIF of cheating AI, taken from openAI's video here 

"If an AI's ability to learn and observe starts improving rapidly and approach human intelligence, then it will probably behave unpredictably, and we might not have enough time to assert control before it is too late."

Good idea to check Tim Urban's article, Trevor. It seems like he has thought hard on how to make this stuff visual and intuitive and compelling.

[Policy makers]

We don't let companies use toxic chemicals without oversight.

Why let companies use AI without oversight?

[Insert call to action on support / funding for AI governance or regulation]

[Policymakers & ML researchers]

A virus doesn't need to explain itself before it destroys us. Neither does AI.

A meteor doesn't need to warn us before it destroys us. Neither does AI.

An atomic bomb doesn't need to understand us in order to destroy us. Neither does AI.

A supervolcano doesn't need to think like us in order to destroy us. Neither does AI.

(I could imagine a series riffing based on this structure / theme)

[Policy-makers & ML researchers]

In 1901, the Chief Engineer of the US Navy said “If God had intended that man should fly, He would have given him wings.” And on a windy day in 1903, Orville Wright proved him wrong.

Let's not let AI catch-us by surprise.

[Insert call to action]

[Policy makers & ML researchers]

“If a distinguished scientist says that something is possible, he is almost certainly right; but if he says that it is impossible, he is very probably wrong” (Arthur Clarke). In the case of AI, the distinguished scientists are saying not just that something is possible, but that it is probable. Let's listen to them. 

[Insert call to action]

The wait but why post on AI is a gold mine of one-liners and one-liner inspiration.

Part 2 has better inspiration for appealing to AI scientists.

[Policy makers & ML researchers]

If you aren't worried about AI, then either you believe that we will stop making progress in AI or you believe that code will stop having bugs...which is it?

I really like this one but I think it can be made even better, not sure how atm

[Tech executives]

If you could not fund that initiative that could turn us all into paperclips...that'd be great.

[Insert call to action]


If you could not launch the project that could raise the AI kraken...that'd be great.

[Insert call to action]


If you could not build the bot that will treat us the way we treat ants...that'd be great.

[Insert call to action]


(I could imagine a series riffing based on this structure / theme)

[ML researchers]

"We're in the process of building some sort of god. Now would be a good time to make sure it's a god we can live with" (Sam Harris, 2016 Ted Talk).

In my experience, religious symbolism is a losing strategy, especially for policymakers and executives. From their bayesian perspective, they are best off prejudicing against anything that sounds like a religious cult. It's generally best to avoid imagery of an eldritch abombination that spawns from our #1 most valuable industry and ruins everything forever everywhere. Even for ML researchers, even the old kurzweillians would be put off.

AI presents both staggering opportunity and chilling peril. Developing intelligent machines could help eradicate disease, poverty, and hunger within our lifetime. But uncontrolled AI could spell the end of the human race. As Stephen Hawking warned, "Success in creating AI would be the biggest event in human history. Unfortunately, it might also be the last, unless we learn how to avoid the risks."

AI safety is essential for the ethical development of artificial intelligence."

"AI safety is the best insurance policy against an uncertain future."

"AI safety is not a luxury, it's a necessity."

While it is true that AI has the potential to do a lot of good in the world, it is also true that it has the potential to do a lot of harm. That is why it is so important to ensure that AI safety is a top priority. As Google Brain co-founder Andrew Ng has said, "AI is the new electricity." Just as we have rules and regulations in place to ensure that electricity is used safely, we need to have rules and regulations in place to ensure that AI is used safely. Otherwise, we run the risk of causing great harm to ourselves and to the world around us.

1[comment deleted]20d

War. Poverty. Inequality. Inhumanity. We have been seeing these for millennia caused by nation states or large corporations. But what are these entities, if not greater-than-human-intelligence systems, who happen to be misaligned with human well-being? Now, imagine that kind of optimization, not from a group of humans acting separately, but by an entity with a singular purpose, with an ever diminishing proportion of humans in the loop.

Audience: all, but maybe emphasizing policy makers

We don’t know exactly how a self-aware AI would act, but we know this: it will strive to prevent its own shutdown. No matter what the AI’s goals are, it wouldn’t be able to achieve them if it gets turned off. The only sure fire way to prevent its shutdown would be to eliminate the ones with the power to do so: humans. There is currently no known method to teach an AI to care about humans. Solving this problem may take decades, and we are running out of time.

Shutdown points are really important. It could probably fit well into all of my entries, since they target executives and policymakers who will mentally beeline to "off-switch". But it's also really hard to do it right concisely, because that brings an anthropomorphic god-like entity to mind, which rapidly triggers the absurdity heuristic. And the whole thing with "wanting to turn itself off but turning off the wrong way or doing damage in the process" is really hard to keep concise.

"If we build something smarter than us, that understands us better than we do, but it has a glitch that makes it stop responding correctly to commands, what are we supposed to do?"

"It has been very, very difficult to program AI how to not be racist, and that is only one thing. It keeps treating it like a math problem, not an emotional problem, and our intuitions have to be built in piece by piece. 

If we build an AI that is smarter than us, then we will have to get everything right, not just one thing, and if it's smarter than us then we might have only one shot at it"

"If we have an arms race over who can be the first to build an AI smarter than humans, it will not end well. We will probably not build an AI that is safe and predictable. When the nuclear arms race began, all sides raced to build bigger bombs, more bombs, and faster planes and missiles; they did not focus on accuracy and reliability until decades later"

"If AI becomes smarter than humans, which is the direction we are heading, then it is highly unlikely that it will think and behave us. The human mind is a very specific shape, and today's AI scientists are much better at creating randomly-generated minds, than they are at creating anything as predictable and reasonable as a human being"

"The last two decades of innovation have clearly demonstrated that Artificial Intelligence will suddenly become smarter in unexpected ways, and our best experts have always failed to give accurate timeframes predicting this"

Once an extremely competent machine becomes aware of humans, their goals and its own situation every optimization pressure on the machine will via the machines actions start to be exerted on humans, their goals and the machines situation. How do we specify the optimization pressure that will be exerted on all of us with maximum force?

"If AGI systems can become as smart as humans, imagine what one human/organization could do by just replicating this AGI."

To executives and researchers:

However far you think we are from AGI, do you think that aligning it with human values will be any easier? For intelligence we at least have formalisms (like AIXI) that tell us in principle how to achieve goals in arbitrary environments. For human values on the other hand, we have no such thing. If we don't seriously start working on that now (and we can, with current systems or theoretical models), there is no chance of solving the problem in time when we near AGI, and the default outcome of that will be very bad, to say the least.

Edit: typos

“With recent breakthroughs in machine learning, more people are becoming convinced that powerful, world changing AI is coming soon. But we don’t yet know if it will be good for humanity, or disastrous.”

“We may be on the verge of a beautiful, amazing future, within our lifetimes, but only if AI is aligned with human beings.”

Source: original, but motivated by trying to ground WFLL1-type scenarios in what we already experience in the modern world, so heavily based on this. Also the original idea came from reading Neel Nanda’s “Bird's Eye View of AI Alignment - Threat Models"

Intended audience: mainly policymakers

A common problem in the modern world is when incentives don’t match up with value being produced for society. For instance, corporations have an incentive to profit-maximise, which can lead to producing value for consumers, but can also involve less ethical strategies su
... (read more)

"What do condoms have in common with AI?"

OK I admit this one doesn't fit any audience under any possible story in my mind except a general one. Let me know if you want to read the private (not yet drafted) news article though and I'll have a quick go.

"Evolution didn’t optimize for contraception. AI developers don’t optimize against their goals either. Accidents happen. Use protection (optional this last bit)"

ML engineers?

"Evolution wasn’t prepared for contraception. We can do better. When deploying AI, think protection."

OK I have to admit, I didn't think through audience extremely carefully as most of these sound like clickbait news article headlines, but I'll go with tech executives. I do think reasonably good articles could be written explaining the metaphor though.

"We tricked nature with contraception; one day, AI could trick us too."


AIs need immense databases to provide decent results. For example, to recognize if something is a potato, an AI will take 1,000 pictures of a potato and 1,000 pictures of not-a-potato, so that it can tell you if something is a potato with 95% accuracy.

Well, 95% accurate isn't good enough--that's how you get Google labelling images of African Americans as gorillas. So what's the solution? More data! But how do you get more data? Tracking consumers.

Websites track everything you do on the internet, then sell your data to Amazon, Netflix, Facebook, etc. to bol... (read more)

Humans have biases they don't even realize.  How can we verify an AI lacks such biases?

[+][comment deleted]19d 1
[+][comment deleted]19d 1