Rather conflicts arise when various individuals and populations (justifiably or not) perceive that they are in zero-sum games for limited resources. The solution for this is not “philosophical progress” as much as being able to move out of the zero-sum setting by finding “win win” resolutions for conflict or growing the overall pie instead of arguing how to split it.
I think many of today's wars are at least as much about ideology (like nationalism, liberalism, communism, religion) as about limited resources. I note that Russia and Ukraine both have belo...
One way that things could go wrong, not addressed by this playbook: AI may differentially accelerate intellectual progress in a wrong direction, or in other words create opportunities for humanity to make serious mistakes (by accelerating technological progress) faster than wisdom to make right choices (philosophical progress). Specific to the issue of misalignment, suppose we get aligned human-level-ish AI, but it is significantly better at speeding up AI capabilities research than the kinds of intellectual progress needed to continue to minimize misalign...
I guess part of the problem is that the people who are currently most receptive to my message are already deeply enmeshed in other x-risk work, and I don't know how to reach others for whom the message might be helpful (such as academic philosophers just starting to think about AI?). If on reflection you think it would be worth spending some of your time on this, one particularly useful thing might be to do some sort of outreach/field-building, like writing a post or paper describing the problem, presenting it at conferences, and otherwise attracting more ...
Here's a link to the part of interview where that quote came from: https://youtu.be/GyFkWb903aU?t=4739 (No opinion on whether you're missing redeeming context; I still need to process Nesov's and your comments.)
Even at 10% p(doom), which I consider to be unreasonably low, it would probably be worth delaying a few years.
Someone with with 10% p(doom) may worry that if they got into a coalition with others to delay AI, they can't control the delay precisely, and it could easily become more than a few years. Maybe it would be better not to take that risk, from their perspective.
And lots of people have p(doom)<10%. Scott Aaronson just gave 2% for example, and he's probably taken AI risk more seriously than most (currently working on AI safety at OpenAI), so prob...
Why is 1 important? It seems like something we can defer discussion of until after (if ever) alignment is solved, no?
If aging was solved or looked like it will be solved within next few decades, it would make efforts to stop or slow down AI development less problematic, both practically and ethically. I think some AI accelerationists might be motivated directly by the prospect of dying/deterioration from old age, and/or view lack of interest/progress on that front as a sign of human inadequacy/stagnation (contributing to their antipathy towards humans)....
Setting that aside, it reads to me like the frame-clash happening here is (loosely) between “50% extinction, 50% not-extinction” and “50% extinction, 50% utopia”
Yeah, I think this is a factor. Paul talked a lot about "1/trillion kindness" as the reason for non-extinction, but 1/trillion kindness seems to directly imply a small utopia where existing humans get to live out long and happy lives (even better/longer lives than without AI) so it seemed to me like he was (maybe unintentionally) giving the reader a frame of “50% extinction, 50% small utopia”, while still writing other things under the “50% extinction, 50% not-extinction” frame himself.
I do explicitly flag the loss of control over the future in that same sentence.
In your initial comment you talked a lot about AI respecting the preferences of weak agents (using 1/trillion of its resources) which implies handing back control of a lot of resources to humans, which from the selfish or scope insensitive perspective of typical humans probably seems almost as good as not losing that control in the first place.
...I don’t think the much worse outcomes are closely related to unaligned AI so I don’t think they seem super relevant to my comment or
I regret mentioning "lie-to-children" as it seems a distraction from my main point. (I was trying to introspect/explain why I didn't feel as motivated to express disagreement with the OP as you, not intending to advocate or endorse anyone going into "the business of telling lies-told-to-children to adults".)
My main point is that I think "misaligned AI has a 50% chance of killing everyone" isn't alarming enough, given what I think happens in the remaining 50% of worlds, versus what a typical person is likely to infer from this statement, especially after se...
I'm worried that people, after reading your top-level comment, will become too little worried about misaligned AI (from their selfish perspective), because it seems like you're suggesting (conditional on misaligned AI) 50% chance of death and 50% alive and well for a long time (due to 1/trillion kindness), which might not seem so bad compared to keeping AI development on hold indefinitely which potentially implies a high probability of death from old age.
I feel like "misaligned AI kills everyone because it doesn't care at all" can be a reasonable lie-to-ch...
If a misaligned AI had 1/trillion "protecting the preferences of whatever weak agents happen to exist in the world", why couldn't it also have 1/trillion other vaguely human-like preferences, such as "enjoy watching the suffering of one's enemies" or "enjoy exercising arbitrary power over others"?
From a purely selfish perspective, I think I might prefer that a misaligned AI kills everyone, and take my chances with continuations of myself (my copies/simulations) elsewhere in the multiverse, rather than face whatever the sum-of-desires of the misaligned AI decides to do with humanity. (With the usual caveat that I'm very philosophically confused about how to think about all of this.)
It seems like just 4 months ago you still endorsed your second power-seeking paper:
This paper is both published in a top-tier conference and, unlike the previous paper, actually has a shot of being applicable to realistic agents and training processes. Therefore, compared to the original[1] optimal policy paper, I think this paper is better for communicating concerns about power-seeking to the broader ML world.
Why are you now "fantasizing" about retracting it?
...I think a healthy alignment community would have rebuked me for that line of research, but s
Is it just me or is it nuts that a statement this obvious could have gone outside the overton window, and is now worth celebrating when it finally (re?)enters?
How is it possible to build a superintelligence at acceptable risk while this kind of thing can happen? What if there are other truths important to safely building a superintelligence, that nobody (or very few) acknowledges because they are outside the overton window?
Now that AI x-risk is finally in the overton window, what's your vote for the most important and obviously true statement that is still...
Note that this paper already used "Language Agents" to mean something else. See link below for other possible terms. I will keep using "Language Agents" in this comment/thread (unless the OP decide to change their terminology).
I added the tag Chain-of-Thought Alignment, since there's a bunch of related discussion on LW under that tag. I'm not very familiar with this discussion myself, and have some questions below that may or may not already have good answers.
How competent will Language Agents be at strategy/planning, compared to humans and other AI approa...
Related to this, it occurs to me that a version of my Hacking the CEV for Fun and Profit might come true unintentionally, if for example a Friendly AI was successfully built to implement the CEV of every sentient being who currently exists or can be resurrected or reconstructed, and it turns out that the vast majority consists of AIs that were temporarily instantiated during ML training runs.
This seems a reasonable consideration, but doesn't change my desire to experiment with having the new feature, since there are potential benefits that could outweigh the downside that you describe. (Not sure if you meant to indicate an overall disagreement, or just want to point out this additional consideration.) And if the downside turns out to be a significant issue, it could be ameliorated by clarifying that "I plan to reply later" should be interpreted not as a commitment but just indication of current state of mind.
and also the goal of alignment is not to browbeat AIs into doing stuff we like that they'd rather not do; it's to build them de-novo to care about valuable stuff
This was my answer to Robin Hanson when he analogized alignment to enslavement, but it then occurred to me that for many likely approaches to alignment (namely those based on ML training) it's not so clear which of these two categories they fall into. Quoting a FB comment of mine:
We're probably not actually going to create an aligned AI from scratch but by a process of ML "training", which actua...
Related to this, it occurs to me that a version of my Hacking the CEV for Fun and Profit might come true unintentionally, if for example a Friendly AI was successfully built to implement the CEV of every sentient being who currently exists or can be resurrected or reconstructed, and it turns out that the vast majority consists of AIs that were temporarily instantiated during ML training runs.
There is also a somewhat unfounded narrative of reward being the thing that gets pursued, leading to expectation of wireheading or numbers-go-up maximization. A design like this would work to maximize reward, but gradient descent probably finds other designs that only happen to do well in pursuing reward on the training distribution. For such alternative designs, reward is brain damage and not at all an optimization target, something to be avoided or directed in specific ways so as to make beneficial changes to the model, according to the model.
Apart from ...
Good point! For the record, insofar as we attempt to build aligned AIs by doing the moral equivalent of "breeding a slave-race", I'm pretty uneasy about it. (Whereas insofar as it's more the moral equivalent of "a child's values maturing", I have fewer moral qualms. As is a separate claim from whether I actually expect that you can solve alignment that way.) And I agree that the morality of various methods for shaping AI-people are unclear. Also, I've edited the post (to add a "at least according to my ideals" clause) to acknowledge the point that others might be more comfortable with attempting to align AI-people via means that I'd consider morally dubious.
Thanks for this. I was just wondering how your views have updated in light of recent events.
Like you I also think that things are going better than my median prediction, but paradoxically I've been feeling even more pessimistic lately. Reflecting on this, I think my p(doom) has gone up instead of down, because some of the good futures where a lot of my probability mass for non-doom were concentrated have also disappeared, which seems to outweigh the especially bad futures going away and makes me overall more pessimistic.
These especially good futures were 1...
So long as property rights are respected, humans will continue to have a comparative advantage in something, and whatever that is we will be much richer in a world with hyper-competitive AGI than we are today.
I don't think this is right? Consider the following toy example. Suppose there's a human who doesn't own anything except his own labor. He consumes 1 unit of raw materials (RM) per day to survive and can use his labor to turn 1 unit of RM into 1 paperclip or 2 staples per hour. Then someone invents an AI that takes 1 unit of RM to build, 1 unit of ...
Not sure I understand. Please explain more? Also do you have a concrete suggestion or change you'd like to see?
In a previous comment you talked about the importance of "the problem of solving the bargaining/cooperation/mutual-governance problem that AI-enhanced companies (and/or countries) will be facing". I wonder if you've written more about this problem anywhere, and why you didn't mention it again in the comment that I'm replying to.
My own thinking about 'the ~50% extinction probability I’m expecting from multi-polar interaction-level effects coming some years after we get individually “safe” AGI systems up and running' is that if we've got "safe" AGIs, we coul...
If this feature is in part meant to address the problems of 1) threads often ending without people knowing why and 2) people feeling bad about receiving certain kinds of criticism or about certain critics because it's costly to both respond and not respond, I would suggest adding the following reactions:
Maybe too hard but it might be nice to have somewhere you can go to see all the comments you've reacted "I plan to respond later" to that you haven't yet responded to.
the various failure modes that ChatGPT has are a concrete demonstration both about the general difficulty of aligning AI and some of the specific issues more specifically
By this logic, wouldn't Microsoft be even more praiseworthy, because Bing Chat / Sidney was even more misaligned, and the way it was released (i.e. clearly prioritizing profit and bragging rights above safety) made AI x-risk even more obvious to people?
the ~50% extinction probability I’m expecting from multi-polar interaction-level effects coming some years after we get individually “safe” AGI systems up and running (“safe” in the sense that they obey their creators and users; see again my Multipolar Failure post above for why that’s not enough for humanity to survive as a species).
Do you have a success story for how humanity can avoid this outcome? For example what set of technical and/or social problems do you think need to be solved? (I skimmed some of your past posts and didn't find an obvious pla...
Do you have a success story for how humanity can avoid this outcome? For example what set of technical and/or social problems do you think need to be solved? (I skimmed some of your past posts and didn't find an obvious place where you talked about this.)
I do not, but thanks for asking. To give a best efforts response nonetheless:
David Dalrymple's Open Agency Architecture is probably the best I've seen in terms of a comprehensive statement of what's needed technically, but it would need to be combined with global regulations limiting compute expendit...
I agree with Eliezer that acausal trade/extortion between humans and AIs probably doesn't work, but I'm pretty worried about what happens after AI is developed, whether aligned or unaligned/misaligned, because then the "acausal trade/extortion between humans and AIs probably doesn't work" argument would no longer apply.
I think fully understanding the issue requires solving some philosophical problems that we probably won't solve in the near future (unless with help of superintelligence), so it contributes to me wanting to:
...preserve and improve the collect
For areas where we don’t have empirical feedback-loops (like many philosophical topics), I imagine that the “baseline solution” for getting help from AIs is to teach them to imitate our reasoning. Either just by literally writing the words that it predicts that we would write (but faster), or by having it generate arguments that we would think looks good. (Potentially recursively, c.f. amplification, debate, etc.)
This seems like the default road that we're walking down, but can ML learn everything that is important to learn? I questioned this in Some Th...
I also think this is interesting, but whenever I see a proposal like this I like to ask, does it work on philosophical topics, where we don't have a list of true and false statements that we can be very sure about, and we also don't have a clear understanding of what kinds of arguments or sentences count as good arguments what kinds count as manipulation? There could be deception tactics specific to philosophy or certain philosophical topics, which can't be found by training on other topics (and you can't train directly on philosophy because of the above i...
For example, making numerous copies of itself to work in parallel would again raise the dangers of independently varying goals.
The AI could design a system such that any copies made of itself are deleted after a short period of time (or after completing an assigned task) and no copies of copies are made. This should work well enough to ensure that the goals of all of the copies as a whole never vary far from its own goals, at least for the purpose of researching a more permanent alignment solution. It's not 100% risk-free of course, but seems safe enoug...
Probability that humanity has somehow irreversibly messed up our future within 10 years of building powerful AI: 46%
What's a short phrase that captures this? I've been using "AI-related x-risk" or just "AI x-risk" or "AI risk" but it sounds like you might disagree with using some or all of these phrases for this purpose (since most of this 46% isn't "from AI" in your perspective)?
(BTW it seems that we're not as far part as I thought. My own number for this is 80-90% and I thought yours was closer to 20% than 50%.)
While those concerns are still relevant, the much more likely path is simply that people will try their hardest to make the LLM into an agent as soon as possible, because agents with the ability to carry out long-term goals are much more useful.
Did this come as a surprise to you, and if so I'm curious why? This seemed to me like the most obvious thing that people would try to do.
Lastly, AIs may soon be sentient, and people will torture them because people like doing that.
How do we know they're not already capable of having morally relevant experienc...
I’d at least want to see a second established user asking for it before I considered prioritizing it more.
I doubt you'll ever see this, because when you're an established / high status member, ignoring other people feels pretty natural and right, and few people ignore you so you don't notice any problems. I made the request back when I had lower status on this forum. I got ignored by others way more than I do now, and ignored others way less than I do now. (I had higher motivation to "prove" myself to my critics and the audience.)
If I hadn't written dow...
Wei Dai had a comment below about how important it is to know whether there’s any criticism or not, but mostly I don’t care about this either because my prior is just that it’s bad whether or not there’s criticism. In other words, I think the only good approach here is to focus on farming the rare good stuff and ignoring the bad stuff (except for the stuff that ends up way overrated, like (IMO) Babble or Simulators, which I think should be called out directly).
But how do you find the rare good stuff amidst all the bad stuff? I tend to do it with a combi...
Looks like I was right to suspect that Germany wasn't really going to keep their nuclear plants open. From CNN:
Germany’s final three nuclear power plants close their doors on Saturday, marking the end of the country’s nuclear era that has spanned more than six decades.
What's the German word for seeing a rare glimmer of rationality being snuffed out after all?
I think a problem that my proposal tries to solve, and this one doesn't, is that some authors seem easily triggered by some commenters, and apparently would prefer not to see their comments at all. (Personally if I was running a discussion site I might not try so hard to accommodate such authors, but apparently they include some authors that the LW team really wants to keep or attract.)
I feel fine doing this because I feel comfortable just ignoring him after he’s said those initial things, when a normal/common social script would consider that somewhat rude. But this requires a significant amount of backbone.
I still wish that LW would try my idea for solving this (and related) problem(s), but it doesn't seem like that's ever going to happen. (I've tried to remind LW admins about my feature request over the years, but don't think I've ever seen an admin say why it's not worth trying.) As an alternative, I've seen people suggest that it...
I support exposing the number of upvotes/downvotes. (I wrote a userscript for GW to always show the total number of votes, which allows me to infer this somewhat.) However that doesn't address the bulk of my concerns, which I've laid out in more detail in this comment. In connection with karma, I've observed that sometimes a post is initially upvoted a lot, until someone posts a good critique, which then causes the karma of the post to plummet. This makes me think that the karma could be very misleading (even with upvotes/downvotes exposed) if the critique had been banned or disincentivized.
And if there is an important critique to be made I’d expect it to be something that more than the few banned users would think of and decide to post a comment on.
This may be true in some cases, but not all. My experience here comes from cryptography where it often takes hundreds of person-hours to find a flaw in a new idea (which can sometimes be completely fatal), and UDT, where I found a couple of issues in my own initial idea only after several months/years of thinking (hence going to UDT1.1 and UDT2). I think if you ban a few users who might have th...
(Tangentially) If users are allowed to ban other users from commenting on their posts, how can I tell when the lack of criticism in the comments of some post means that nobody wanted to criticize it (which is a very useful signal that I would want to update on), or that the author has banned some or all of their most prominent/frequent critics? In addition, I think many users may be mislead by lack of criticism if they're simply not aware of the second possibility or have forgotten it. (I think I knew it but it hasn't entered my conscious awareness for a w...
On the substance I’m skeptical of the more general anti-change sentiment—I think that technological progress has been one of the most important drivers of improving human conditions, and procedurally I value a liberal society where people are free to build and sell technologies as long as they comply with the law.
I'm pretty conflicted but a large part of me wants to bite this bullet, and say that a more deliberate approach to technological change would be good overall, even when applied to both the past and present/future. Because:
COVID and climate change are actually easy problems that only became serious or highly costly because of humanity's irrationality and lack of coordination.
I don't think I understand, what's the reason to expect that the "acausal economy" will look like a bunch of acausal norms, as opposed to, say, each civilization first figuring out what its ultimate values are, how to encode them into a utility function, then merging with every other civilization's utility function? (Not saying that I know it will be the latter, just that I don't know how to tell at this point.)
Also, given that I think AI risk is very high for human civilization, and there being no reason to suspect that we're not a typical pre-AGI civiliz...
To your first question, I'm not sure which particular "the reason" would be most helpful to convey. (To contrast: what's "the reason" that physically dispersed human societies have laws? Answer: there's a confluence of reasons.). However, I'll try to point out some things that might be helpful to attend to.
First, committing to a policy that merges your utility function with someone else's is quite a vulnerable maneuver, with a lot of boundary-setting aspects. For instance, will you merge utility functions multiplicatively (as in Nas...
What does merging utility functions look like and are you sure it's not going to look the same as global free trade? It's arguable that trade is just a way of breaking down and modularizing a big multifaceted problem over a lot of subagent task specialists (and there's no avoiding having subagents, due to the light speed limit)
That’s the path the world seems to be on at the moment. It might end well and it might not, but it seems like we are on track for a heck of a roll of the dice.
I agree with almost everything you've written in this post, but you must have some additional inside information about how the world got to this state, having been on the board of OpenAI for several years, and presumably knowing many key decision makers. Presumably this wasn't the path you hoped that OpenAI would lead the world onto when you decided to get involved? Maybe you can't share specific ...
We have a lot of experience and knowledge of building systems that are broadly beneficial and safe, while operating in the human capabilities regime.
What? A major reason we're in the current mess is that we don't know how to do this. For example we don't seem to know how to build a corporation (or more broadly an economy) such that its most powerful leaders don't act like Hollywood villains (race for AI to make a competitor 'dance')? Even our "AGI safety" organizations don't behave safely (e.g., racing for capabilities, handing them over to others, e.g....
Looking forward to your next post, but in the meantime:
My first thought upon hearing about Microsoft deploying a GPT derivative was (as I told a few others in private chat) "I guess they must have fixed the 'making up facts' problem." My thinking was that a big corporation like Microsoft that mostly sells to businesses would want to maintain a reputation for only deploying reliable products. I honestly don't know how to adjust my model of the world to account for whatever happened here... except to be generically more pessimistic?
But it seems increasingly plausible that AIs will not have explicit utility functions, so that doesn’t seem much better than saying humans could merge their utility functions.
There are a couple of ways to extend the argument:
If someone cares a lot about a strictly zero-sum resource, like land, how do you convince them to 'move out of the zero-sum setting by finding "win win" resolutions'? Like what do you think Ukraine or its allies should have done to reduce the risk of war before Russia invaded? Or what should Taiwan or its allies do now?
Also to bring this thread back to the original topic, what kinds of interventions do you think your position suggests with regard to AI?