All of otto.barten's Comments + Replies

Thanks for writing the post! Strongly agree that there should be more research into how solvable the alignment problem, control problem, and related problems are. I didn't study uncontrollability research by e.g. Yampolskiy in detail. But if technical uncontrollability would be firmly established, it seems to me that this would significantly change the whole AI xrisk space, and later the societal debate and potentially our trajectory, so it seems very important.

I would also like to see more research into the nontechnical side of alignment: how aggregatable... (read more)

Thanks for your kind remarks. Yes, we would need to shift focus to acting to restrict corporate-AI scaling altogether. Particularly, restrict data piracy, compute toxic to the environment, and model misuses (three dimensions through which AI corporations consolidate market power). I am working with other communities (including digital creatives, environmentalists and military veterans) on litigation and lobbying actions to restrict those dimensions of AI power-consolidation. I hope this post clarifies to others in AI Safety why there is no line of retreat. AI development will need to be restricted. Yes. Consider too that these would be considerations on top of the question whether AGI would be long-term safe (if AGI cannot be controlled to be long-term safe to humans, then we do not need to answer the more fine-grained questions about eg. whether human values are aggregatable). Even if, hypothetically, long-term AGI safety was possible… * then you still have to deal with limits on modelling and consistently acting on preferences expressed by the billions of boundedly-rational humans from their (perceived) context. [] * and not consistently represent the preferences of malevolent, parasitic or short-term human actors who want to misuse/co-opt the system through any attack vectors they can find. * and deal with that the preferences of a lot of the possible future humans and of non-human living beings will not get automatically represented in a system that AI corporations by default have built to represent current living humans only (preferably, those who pay). ~ ~ ~ Here are also relevant excerpts from Roman Yampolskiy’s 2021 paper [] relevant to aggregating democratically solicited preferences and human values:

Me and @Roman_Yampolskiy published a piece on AI xrisk in a Chinese academic newspaper:

We were approached after our piece in Time and asked to write for them (we also gave quotes for another provincial newspaper). I have the impression (I've also lived and worked in China) that leading Chinese decision makers and intellectuals (or perhaps their children) read Western news sources like Time, NYTimes, Economist, etc. AI xrisk is currently probably mostly unknown in China, and if stumbled upon people... (read more)

I agree that raising awareness about AI xrisk is really important. Many people have already done this (Nick Bostrom, Elon Musk, Stephen Hawking, Sam Harris, Tristan Harris, Stuart Russell, Gary Marcus, Roman Yampolskiy (I coauthored one piece with him in Time), and Eliezer Yudkowsky as well).

I think a sensible place to start is to measure how well they did using surveys. That's what we've done here:

More comms research from us is coming up, and I know a few others are doing the same now.

2Seth Herd1mo
I took a look at your paper, and I think it's great! My PhD was in cognitive psychology where they're pretty focused on study design, so even though I haven't done a bunch of empirical work, I do have ideas about it. No real critique of your methodology, but I did have some vague ideas about expanding it to address the potential for polarization.

You could pick corporations as an example of coordinated humans, but also e.g. Genghis Khan's hordes. And they did actually take over. If you do want to pick corporations, look e.g. at East India companies that also took over parts of the world.

Funny, I had exactly the same thought and was just considering writing a short post on it. So I agree and I do think it's a very relevant model update. Some people probably already updated before. I also agree though with your second point about Auto-GPT and similar peripherals. So it looks like we're in a not-too-fast take-off with humans pretty solidly in the loop for now?

As long as there's no autonomous self-improvement of the core model, maybe an existential event could look like this: GPT-X gets trained and released, open sourcers build unsafe periphe... (read more)

As co-author of one of the mentioned pieces, I'd say it's really great to see the AGI xrisk message mainstreaming. It doesn't nearly go fast enough, though. Some (Hawking, Bostrom, Musk) have already spoken out about the topic for close to a decade. So far, that hasn't been enough to change common understanding. Those, such as myself, who hope that some form of coordination could save us, should give all they have to make this go faster. Additionally, those who think regulation could work should work on robust regulation proposals which are currently lacki... (read more)

It's an interesting letter. If I were an AI, I would find a heartfelt appeal from every human that doesn't want to die much more convincing, though.

Hereby: I don't want to die, AI.

I'd say your first assumption is off. We actually researched something related. We asked people the question: "List three events, in order of probability (from most to least probable) that you believe could potentially cause human extinction within the next 100 years". I would say that if your assumption would be correct, they would say "robot takeover" or something similar as part of that top 3. However, >90% doesn't mention AI, robots, or anything similar. Instead, they typically say things like climate change, asteroid strike, or pandemic. So based o... (read more)

2Lone Pine3mo
Ah wow interesting. I assumed that most people have seen or know about either The Terminator, The Matrix, I, Robot, Ex Machina or M3GAN. Obviously people usually dismiss them as sci-fi, but I assumed most people were at least aware of them.

Thank you!

I see your point, but I think this is unavoidable. Also, I haven't heard of anyone who was stressing out much after our information.

Personally, I was informed (or convinced perhaps) a few years ago at a talk from Anders Sandberg from FHI. That did cause stress and negative feelings for me at times, but it also allowed me to work on something I think is really meaningful. I never for a moment regretted being informed. How many people do you know who say, I wish I hadn't been informed about climate change back in the nineties? For me, zero. I do kn... (read more)

2Lone Pine3mo
Here's how we could reframe the issue in a more positive way: first, we recognize that people are already broadly aware of AI x-risk (but not by that name). I think most people have an idea that 'robots' could 'gain sentience' and take over the world, and have some prior probability ranging from 'it's sci-fi' to 'I just hope that happens after I'm dead'. Therefore, what people need to be informed of is this: there is a community of intellectuals and computer people who are working on the problem, we have our own jargon, here is the progress we have made and here is what we think society should do. Success could be measured by the question "should we fund research to make AI systems safer?"

AI safety researcher Roman Yampolskiy did research into this question and came to the conclusion that AI cannot be controlled or aligned. What do you think of his work?

Thank you for writing this post! I agree completely, which is perhaps unsurprising given my position stated back in 2020. Essentially, I think we should apply the precautionary principle for existentially risky technologies: do not build unless safety is proven.

A few words on where that position has brought me since then.

First, I concluded back then that there was little support for this position in rationalist or EA circles. I concluded as you did, that this had mostly to do with what people wanted (subjective techno-futurist desires), and less with what ... (read more)

Thanks for the post and especially for the peer-reviewed paper! Without disregarding the great non-peer-reviewed work that many others are doing, I do think it is really important to get the most important points peer-reviewed as well, preferably as explicit as possible (e.g. also mentioning human extinction, timelines, lower bound estimates, etc). Thanks as well for spelling out your lower bound probabilities, I think we should have this discussion more often, more structurally, and more widely (also with people outside of the AI xrisk community). I guess... (read more)

This is what we are doing with the Existential Risk Observatory. I agree with many of the things you're saying.

I think it's helpful to debunk a few myths:

- No one has communicated AI xrisk to the public debate yet. In reality, Elon Musk, Nick Bostrom, Stephen Hawking, Sam Harris, Stuart Russell, Toby Ord and recently William MacAskill have all sought publicity with this message. There are op-eds in the NY Times, Economist articles, YouTube videos and Ted talks with millions of views, a CNN item, at least a dozen books (including for a general audience), an... (read more)

Thank you very much for this response!

I've made an edit and removed the specific regulation proposal. I think it's more accurate to just state that it needs to be robust, do as little harm as possible, and that we don't know yet what it should look like precisely.

I agree that it's drastic and clumsy. It's not an actual proposal, but a lower bound of what would likely work. More research into this is urgently needed.

Aren't you afraid that people could easily circumvent the regulation you mention? This would require every researcher and hacker, everywhere, forever, to comply. Also, many researchers are probably unaware that their models could start self-improving. Also, I'd say the security safeguards that you mention amount to AI Safety, which is of course currently an unsolved problem.

But honestly, I'm interested in regulation proposals that would be sufficiently robust while minimizing damage. If you have those, I'm all ears.

Thanks for the suggestion! Not sure we are going to have time for this, as it doesn't align completely with informing the public, but someone should clearly do this. Also great you're teaching this already to your students!

Perhaps all the more reason for great people to start doing it?

(4): I think regulation should get much more thought than this. I don't think you can defend the point that regulation would have 0% probability of working. It really depends on how many people are how scared. And that's something we could quite possibly change, if we would actually try (LW and EA haven't tried).

In terms of implementation: I agree that software/research regulation might not work. But hardware regulation seems much more robust to me. Data regulation might also be an option. As a lower bound: globally ban hardware development beyond 1990 lev... (read more)

I think you have to specify which policy you mean. First, let's for now focus on regulation that's really aiming to stop AGI, at least until safety is proven (if possible), not on regulation that's only focusing on slowing down (incremental progress).

I see roughly three options: software/research, hardware, and data. All of these options would likely need to be global to be effective (that's complicating things, but perhaps a few powerful states can enforce regulation on others - not necessarily unrealistic).

Most people who talk about AGI regulation seem t... (read more)

First, if there were a widely known argument about the dangers of AI, on which most public intellectual agreed.

This is exactly what we have piloted at the Existential Risk Observatory, a Dutch nonprofit founded last year. I'd say we're fairly successful so far. Our aim is to reduce human extinction risk (especially from AGI) by informing the public debate. Concretely, what we've done in the past year in the Netherlands is (I'm including the detailed description so others can copy our approach - I think they should):

  1. We have set up a good-looking website, fo
... (read more)

Richard, thanks for your reply. Just for reference, I think this goes under argument 5, right?

It's a powerful argument, but I think it's not watertight. I would counter it as follows:

  1. As stated above, I think the aim should be an ideally global treaty were no country is allowed to go beyond a certain point of research. The countries should then enforce the treaty on all research institutes/companies within their borders. You're right that in this case, a criminal or terrorist group will have an edge. But seeing how hard it currently is for legally allowed a
... (read more)

Minimum hardware leads to maximum security. As a lab or a regulatory body, one can increase safety of AI prototypes by reducing the hardware or amount of data researchers have access to.

My response to counterargument 3 is summarized in this plot, for reference:

Basically, this would only be an issue if postponement cannot be done until risks are sufficiently low, and if take-off would be slow without postponement intervention.

Interesting line of thought. I don't know who and how, but I still think we should already think about if it would be a good idea in principle.

Can I restate your idea as 'we have a certain amount of convinced manpower, we should use it for the best purpose, which is AI safety'? I like the way of thinking, but I still think we should use some of them for looking into postponement. Arguments:

- The vast majority of people is unable to contribute meaningfully to AI safety research. Of course all these people could theoretically do whatever makes most money and... (read more)

The idea that most people who can't do technical AI alignment are therefore able to do effective work in public policy or motivating public change seems unsupported by anything you've said. And a key problem with "raising awareness" as a method of risk reduction is that it's rife with infohazard concerns. For example, if we're really worried about a country seizing a decisive strategic advantage via AGI, that indicates that countries should be much more motivated to pursue AGI.  And I don't think that within the realm of international agreements and pursuit of AI regulation, postponement is neglected, at least relative to tractability, and policy for AI regulation is certainly an area of active research. 

Thanks for that comment! I didn't know Bill McKibben, but I read up on his 2019 book 'Falter: Has the Human Game Begun to Play Itself Out?' I'll post a review as a post later. I appreciate your description of what the scene was like back in the 90s or so, that's really insightful. Also interesting to read about nanotech, I never knew these concerns were historically so coupled.

But having read McKibben's book, I still can't find others on my side of the debate. McKibben is indeed the first one I know who both recognizes AGI danger, and does not believe in a... (read more)

Thanks for your insights Adam. If every AGI researcher is in some sense for halting AGI research, I'd like to get more confirmation on that. What are their arguments? Would they also work for non-AGI researchers?

I can imagine the combination of Daniel's point 1 and 2 stops AGI researchers from speaking out on this. But for non-AGI researchers, why not explore something that looks difficult, but may have existential benefits?

I agree and thanks for bringing some nuance in the debate. I think that would be a useful path to explore.

I'm imagining an international treaty, national laws, and enforcement from police. That's a serious proposal.

So we’d have all major military nations agreeing to a ban on artificial intelligence research, while all of them simultaneously acknowledge that AI research is key to their military edge? And then trusting each other not to carry out such research in secret? While policing anybody who crosses some undefinable line about what constitutes banned AI research? That sounds both intractable and like a policing nightmare to me - one that would have no end in sight. If poorly executed, it could be both repressive and ineffective. So I would like to know what a plan to permanently and effectively repress a whole wing of scientific inquiry on a global scale would look like. The most tractable way seems like it would be to treat it like an illegal biological weapons program. That might be a model. The difference is that people generally interested in the study of bacteria and viruses still have many other outlets. Also, bioweapons haven’t been a crucial element in any nation’s arsenal. They don’t have a positive purpose. None of this applies to AI. So I see it as having some important differences from a bioweapons program. Would we be willing to launch such an intrusive program of global policing, with all the attendant risks of permanent infringement of human rights, and risk setting up a system that both fails to achieve its intended purpose and sucks to live under? Would such a system actually reduce the chance of unsafe GAI long term? Or, as you’ve pointed out, would it risk creating a climate of urgency, secrecy, and distrust among nations and among scientists? I’d welcome work to investigate such plans, but it doesn’t seem on its face to be an obviously great solution.

I think a small group of thoughtful, committed, citizens can change the world. Indeed, it is the only thing that ever has.

1Kevin Lacker3y
You can change the world, sure, but not by making a heartfelt appeal to the United Nations. You have to be thoughtful, which means you pick tactics with some chance of success. Appealing to stop AI work is out of the political realm of possibility right now.

I appreciate the effort you took in writing a detailed response. There's one thing you say in which I'm particularly interested, for personal reasons. You say 'I've been in or near this debate since the 1990s'. That suggests there are many people with my opinion. Who? I would honestly love to know, because frankly it feels lonely. All people I've met, so far without a single exception, are either not afraid of AI existential risk at all, or believe in a tech fix and are against regulation. I don't believe in the tech fix, because as an engineer, I've seen ... (read more)

Nick Bostrom comes to mind as at least having a similar approach. And it's not like he's without allies, even in places like Less Wrong. ... and, Jeez, back when I was paying more attention, it seemed like some kind of regulation, or at least some kind of organized restriction, was the first thing a lot of people would suggest when they learned about the risks. Especially people who weren't "into" the technology itself. I was hanging around the Foresight Institute. People in that orbit were split about 50-50 between worrying most about AI and worrying most about nanotech... but the two issues weren't all that different when it came to broad precautionary strategies. The prevailing theory was roughly that the two came as a package anyway; if you got hardcore AI, it would invent nanotech, and if you got nanotech, it would give you enough computing power to brute-force AI. Sometimes "nanotech" was even taken as shorthand for "AI, nanotech, and anything else that could get really hairy"... vaguely what people would now follow Bostrom and call "X-risk". So you might find some kindred spirits by looking in old "nanotech" discussions. There always seemed to be plenty of people who'd take various regulate-and-delay positions in bull sessions like this one, both online and offline, with differing degrees of consistency or commitment. I can't remember names; it's been ages. The whole "outside" world also seemed very pro-regulation. It felt like about every 15 minutes, you'd see an op-ed in the "outside world" press, or even a book, advocating for a "simple precautionary approach", where "we" would hold off either as you propose, until some safety criteria were met, or even permanently. There were, and I think still are, people who think you can just permanently outlaw something like AGI ,and that will somehow actually make it never happen. This really scared me. I think the word "relinquishment" came from Bill McKibben, who I as I recall was, and for all I know may still
The key point that I think you’re missing here is that evaluating whether such a policy “should” be implemented necessarily depends on how it would be implemented. We could in theory try to kill all AI researchers (or just go whole hog and try to kill all software engineers, better safe than sorry /s). But then of course we need to think about the side effects of such a program, ya know, like them running and hiding in other countries and dedicating their lives to fighting back against the countries that are hunting them. Or whatever. That’s just one example, and I use it because it might be the only tractable way to stop this form of tech progress: literally wiping out the knowledge base. I do not endorse this idea, by the way. I’m just trying to show that your reaction to “should we” depends hugely on “how.”

That goes under Daniel's point 2 I guess?

Not to speak for Dagon, but I think point 2 as you write it is way, way too narrow and optimistic. Saying "it would be rather difficult to get useful regulation" is sort of like saying "it would be rather difficult to invent time travel".

I mean, yes, it would be incredibly hard, way beyond "rather difficult", and maybe into "flat-out impossible", to get any given government to put useful regulations in place... assuming anybody could present a workable approach to begin with.

It's not a matter of going to a government and making an argument. For one thing,... (read more)

I'm not in favour of nuclear war either :)

Why do you think we need to find out 1 before trying? I would say, if it is indeed a good idea to postpone, then we can just start trying to postpone. Why would we need to know beforehand how effective that will be? Can't we find that out by trial and error if needed? Worst case, we would be postponing less. That is of course, as long as the flavor of postponement does not have serious negative side effects.

Or rephrased, why do these brakes need to be carefully calibrated?

Presuming that this is a serious topic, then we need to understand what the world would look like if we could put the brakes on technology. Right now, we can’t. What would it look like if we as a civilization were really trying hard to stop a certain branch of research? Would we like that state of affairs?

Thanks for your comments, these are interesting points. I agree that these are hard questions and that it's not clear that policymakers will be good at answering them. However, I don't think AI researchers themselves are any better, which you seem to imply. I've worked as an engineer myself and I've seen that when engineers or scientists are close to their own topic, their judgement of any risks/downsides of this topic does not become more reliable, but less. AGI safety researchers will be convinced about AGI risk, but I'm afraid their judgement of their o... (read more)

Thanks for your thoughts. Of course we don't know whether AGI will harm or help us. However I'm making the judgement that the harm could plausibly be so big (existential), that it outweighs the help (reduction in suffering for the time until safe AGI, and perhaps reduction of other existential risks). You seem to be on board with this, is that right?

Why exactly do you think interference would fail? How certain are you? I'm acknowledging it would be hard, but not sure how optimist/pessimist to be on this.

I think it's misleading to use terms like "us".  AGI will harm some humans, almost no matter what.  AGI MAY harm all humans, current and future.  It may also vastly increase flourishing of humans (and other mind types, including AGI itself).  This is true of most significant technologies, but even more so with AGI, which is likely to have much broader impact than most things. I think that, exactly because of breadth of impact, it's going to be pursued by many people/organizations, with different goals and different kinds of influence that you or I might exert over them.  That diversity makes the pursuit of AGI very resilient, and unlikely to be significantly slowed by our actions.   

I have kind of a strong opinion in favor of policy intervention because I don't think it's optional. I think it's necessary. My main argument is as follows:

I think we have two options to reduce AI extinction risk:

1) Fixing it technically and ethically (I'll call the combination of both working out the 'tech fix'). Don't delay.

2) Delay until we can work out 1. After the delay, AGI may or may not still be carried out, depending mainly on the outcome of 1.

If option 1 does not work, of which there is a reasonable chance (it hasn't worked so far and we're not n... (read more)

2Daniel Kokotajlo3y
Want to have a video chat about this? I'd love to. :)

I wouldn't say less rational, but more bipartisan, yes. But you're right I guess that European politics is less important in this case. Also don't forget Chinese politics, which has entirely different dynamics of course.

I think you have a good point as well that wonkery, think tankery, and lobbying are also promising options. I think they, and starting a movement, should be on a little list of policy intervention options. I think each will have its own merits and issues. But still, we should have a group of people actually starting to work on this, whatever the optimal path turns out to be.

It's funny, I heard that opinion a number of times before, mostly from Americans. Maybe it has to do with your bipartisan flavor of democracy. I think Americans are also much more skeptical of states in general. You tend to look to companies for solving problems, Europeans tend to look to states (generalized). In The Netherlands we have a host of parties, and although there are still a lot of pointless debates, I wouldn't say it's nearly as bad as what you describe. I can't imagine e.g. climate change solved without state intervention (the situation here i... (read more)

5Daniel Kokotajlo3y
Perhaps American politics is indeed less rational than European politics, I wouldn't know. But American politics is more important for influencing AI since the big AI companies are American. Besides, if you want to get governments involved, raising public awareness is only one way to do that, and not the best way IMO. I think it's much more effective to do wonkery / think tankery / lobbying / etc. Public movements are only necessary when you have massive organized opposition that needs to be overcome by sheer weight of public opinion. When you don't have massive organized opposition, and heads are still cool, and there's still a chance of just straightforwardly convincing people of the merits of your case... best not to risk ruining that lucky situation!

Don't get me wrong, I think institutes like FHI are doing very useful research. I think there should be a lot more of them, at many different universities. I just think what's missing in the whole X-risk scene is a way to take things out of this still fairly marginal scene and into the mainstream. As long as the mainstream is not convinced that this is an actual problem, efforts are always enormously going to lag mainstream AI efforts, with predictable results.

Maybe. But I actually currently think that the longer these issues stay out of the mainstream, the better. Mainstream political discourse is so corrupted; when something becomes politicized, that means it's harder for anything to be done about it and a LOT harder for the truth to win out. You don't see nuanced, balancing-risks-and-benefits solutions come out of politicized debates. Instead you see two one-sided, extreme agendas bashing on each other and then occasionally one of them wins.

(That said, now that I put it that way, maybe that's what we want for AI risk--but only if we get to dictate the content of one of the extreme agendas and only if we are likely to win. Those are two very big ifs.)

I know their work and I'm pretty sure there's no list on how to convince governments and corporations that AI risk is an actual thing.. PhDs are not the kind of people inclined to take any concrete action I think.

2Daniel Kokotajlo3y
I disagree. I would be surprised if they haven't brainstormed such a list at least once. And just because you don't see them doing any concrete action doesn't mean they aren't--they just might not be doing anything super public yet.

I agree and I think books such as Superintelligence have definitely decreased the x-risk chance. I think 'convincing governments and corporations that this is a real risk' would be a great step forward. What I haven't seen anywhere, is a coherent list of options how to achieve that, preferably ranked by impact. A protest might be up there, but probably there are better ways. I think making that list would be a great first step. Can't we do that here somewhere?

4Daniel Kokotajlo3y
I think there are various people working on it, the AI policy people at Future of Humanity Institute for example, maybe people at CSET. I recommend you read their stuff and maybe try to talk to them.

That makes sense and I think it's important that this point gets made. I'm particularly interested by the political movement that you refer to. Could you explain this concept in more detail? Is there anything like such a political movement already being built at the moment? If not, how would you see this starting?

3Daniel Kokotajlo3y
I don't consider this my area of expertise; I think it's very easy to do more harm than good by starting political movements. However, it seems likely to me that in order for the future to go well various governments and corporations will need to become convinced that AI risk is real, and maybe an awareness-raising campaign is the best way to do this. That's what I had in mind. In some sense that's what many people have been doing already, e.g. by writing books like Superintelligence. However, maybe eventually we'd need to get more political, e.g. by organizing a protest or something. Idk. Like I said, this could easily backfire.

I think actually 1+1 = ? is not really an easy enough goal, since it's not 100% sure that the answer is 2. Getting to 100% certainty (including what I actually meant with that question) could still be nontrivial. But let's say the goal is 'delete filename.txt'? Could be the trick is in the language..

Thanks again for your reply. I see your point that the world is complicated and a utility maximizer would be dangerous, even if the maximization is supposedly trivial. However, I don't see how an achievable goal has the same problem. If my AI finds the answer of 2 before a meteor hits it, I would say it has solidly landed at 100% and stops doing anything. Your argument would be true if it decides to rule out all possible risks first, before actually starting to look for the answer of the question, which would otherwise quickly be found. But since ruling ou... (read more)

I think actually 1+1 = ? is not really an easy enough goal, since it's not 100% sure that the answer is 2. Getting to 100% certainty (including what I actually meant with that question) could still be nontrivial. But let's say the goal is 'delete filename.txt'? Could be the trick is in the language..

Thanks for your insights. I don't really understand 'setting [easy] goals is an unsolved problem'. If you set a goal: "tell me what 1+1 is", isn't that possible? And once completed ("2!"), the AI would stop to self-improve, right?

I think this may contribute to just a tiny piece of the puzzle, however, because there will always be someone setting a complex or, worse, non-achievable goal ("make the world a happy place!"), and boom there you have your existential risk again. But in a hypothetical situation where you have your AGI in the lab, no-one else has, ... (read more)

4Charlie Steiner3y
Suppose I get hit by a meteor before I can hear your "2" - will you then have failed to tell me what 1+1 is? If so, suddenly this simple goal implies being able to save the audience from meteors. Or suppose your screen has a difficult-to-detect short circuit - your expected utility would be higher if you could check your screen and repair it if necessary. Because a utility maximizer treats a 0.09% improvement over a 99.9% baseline just as seriously as it treats a 90% improvement over a 0% baseline, it doesn't see these small improvements as trivial, or in any way not worth its best effort. If your goal actually has some chance of failure, and there are capabilities that might help mitigate that failure, it will incentivize capability gain. And because the real world is complicated, this seems like it's true for basically all goals that care about the state of the world. If we have a reinforcement learner rather than a utility maximizer with a pre-specified model of the world, this story is a bit different, because of course there will be no meteors in the training data. Now, you might think that this means that the RL agent cannot care about meteors, but this is actually somewhat undefined behavior, because the AI still gets to see observations of the world. If it is vanilla RL with no "curiosity," it won't ever start to care about the world until the world actually affects its reward (which for meteors, will take much too long to matter, but does become important when the reward is more informative about the real world), but if it's more along the lines of DeepMind's game-playing agents, then it will try to find out about the world, which will increase its rate of approaching optimal play. There are definitely ideas in the literature that relate to this problem, particularly trying to formalize the notion that the AI shouldn't "try too hard" on easy goals. I think these attempts mostly fall under two umbrellas - other-izers (that is, not maximizers) and impact r

Thanks Charlie! :)

They are asking for only one proposal, so I will have to choose one and am planning to work out that one. So I'm mostly asking about which idea you find most interesting, rather than about which one is the strongest proposal now - that will be worked out. But thanks a lot for your feedback so far - that helps!

AGI is unnecessary for an intelligence explosion

Many arguments state that it would require an AGI to have an intelligence explosion. However, it seems to me that the critical point for achieving this explosion is that an AI can self-improve. Which skills are needed for that? If we have hardware overhang, it probably comes down to the type of skills an AI researcher uses: reading papers, combining insights, doing computer experiments until new insights emerge, writing papers about them. Perhaps an AI PhD can weigh in on the actual skills needed. I'm ho... (read more)

Technically, tiling the entire universe with paperclips or tiny smiling faces would probably count as modern art...

Tune AGI intelligence by easy goals

If an AGI is provided an easily solvable utility function ("fetch a coffee"), it will lack the incentive to self-improve indefinitely. The fetch-a-coffee-AGI will only need to become as smart as a hypothetical simple-minded waiter. By creating a certain easiness for a utility function, we can therefore tune the intelligence level we want an AGI to achieve using self-improvement. The only way to achieve an indefinite intelligence explosion (until e.g. material boundaries) would be to program a utility function ma... (read more)

3Charlie Steiner3y
The hard part is that the real world is complicated and setting goals that truly have no incentive for self-improvement or gaining power is an unsolved problem. Relevant Rob Miles video []. One could use artificial environments that are less complicated, and of course we do, but it seems like this leaves some important problems unsolved.

This is the exact topic I'm thinking a lot about, thanks for the link! I've wrote my own essay for a general audience but it seems ineffective. I knew about the Wait but why blog post, but there must be better approaches possible. What I find hard to understand is that there have been multiple best-selling books about the topic, but still no general alarm is raised and the topic is not discussed in e.g. politics. I would be interested in why this paradox exists, and also how to fix it.

Is there any more information about reaching out to a general ... (read more)

Load More