Wonderful that you’re working on this! I’m with AI Safety Impact Markets, and I suspect that we will need a system like this eventually. We haven’t received a lot of feedback to the effect yet, so I haven’t prioritized it, but there are at least two applications for it (for investors and (one day, speculatively) for impact buyers/retrofunders). We’re currently addressing it with a bonding curve auction of sorts, which incentivizes donors to come in early, so that they’re also not so incentivized to wait each other out. The incentive structures are differen...
My perhaps a bit naive take (acausal stuff, other grabby aliens, etc.) is that a conflict needs at least two, and humans are too weak and uncoordinated to be much of an adversary. Hence I’m not so worried about monopolar takeoffs. Not sure, though. Maybe I should be more worried about those too.
I expect that if you make a superintelligence it won’t need humans to tell it the best bargaining math it can use
I’m not a fan of idealizing superintelligences. 10+ years ago that was the only way to infer any hard information about worst-case scenarios. Assume perfect play from all sides, and you end up with a fairly narrow game tree that you can reason about. But now it’s a pretty good guess that superintelligences will be more advanced successors of GPT-4 and such. That tells us a lot about the sort of training regimes through which they might learn bar...
Sorry for glossing over some of these. E.g., I’m not sure if you consider ems to be “scientifically implausible technologies.” I don’t, but I bet there are people who could make smart arguments for why they are far off.
Reason 5 is actually a reason to prioritize some s-risk interventions. I explain why in the “tractability” footnote.
No, just a value-neutral financial instrument such as escrow. If two people can fight or trade, but they can’t trade, because they don’t trust each other, they’ll fight. That loses out on gains from trade, and one of them ends up dead. But once you invent escrow, there’s suddenly, in many cases, an option to do the trade after all, and both can live!
I’ve thought a bunch about acausal stuff in the context of evidential cooperation in large worlds, but while I think that that’s super important in and of itself (e.g., it could solve ethics), I’d be hard pressed to think of ways in which it could influence thinking about s-risks. I rather prefer to think of the perfectly straightforward causal conflict stuff that has played out a thousand times throughout history and is not speculative at all – except applied to AI conflict.
But more importantly it sounds like you’re contradicting my “tractability“ footnot...
I'm confused what you're saying, and curious. I would predict that this attitude toward suicide would indeed correlate with being open to discussing S-risks. Are you saying you have counter-data, or are you saying you don't have samples that would provide data either way?
I was just agreeing. :-3 In mainstream ML circles there is probably a taboo around talking about AI maybe doing harm or AI maybe ending up uncontrollable etc. Breaking that taboo was, imo, a good thing because it allowed us to become aware of the dangers AI could pose. Similarly, breaking ...
I’d prefer to keep these things separate, i.e. (1) your moral preference that “a single human death is worse than trillions of years of the worst possible suffering by trillions of people” and (2) that there is a policy-level incentive problem that implies that we shouldn’t talk about s-risks because that might cause a powerful idiot to take unilateral action to increase x-risk.
I take it that statement 1 is a very rare preference. I, for one, would hate for it to be applied to me. I would gladly trade any health state that has a DALY disability weight >...
The example I was thinking of is this one. (There’s a similar thread here.) So in this case it’s the first option – they don’t think they’ll prefer death. But my “forever” was an extrapolation. It’s been almost three years since I read the comment.
I’m the ECL type of intersubjective moral antirealist. So in my mind, whether they really want what they want is none of my business, but what that says about what is desirable as a general policy for people we can’t ask is a largely empirical question that hasn’t been answered yet. :-3
That sounds promising actually… It has become acceptable over the past decade to suggest that some things ought not to be open-sourced. Maybe it can become acceptable to argue for DRM for certain things too. Since we don’t yet have brain scanning technology, I’d also be interested in an inverse cryonics organization that has all the expertise to really really really make sure that your brain and maybe a lot of your social media activity and whatnot really gets destroyed after your death. (Perhaps even some sorts of mechanism by which suicide and complete s...
Yeah, that’s a known problem. I don’t quite remember what the go-to solutions where that people discussed. I think creating an s-risks is expensive, so negating the surrogate goal could also be something that is almost as expensive… But I imagine an AI would also have to be a good satisficer for this to work or it would still run into the problem with conflicting priorities. I remember Caspar Oesterheld (one of the folks who originated the idea) worrying about AI creating infinite series of surrogate goals to protect the previous surrogate goal. It’s not a deployment-ready solution in my mind, just an example of a promising research direction.
In the tractability footnote above I make the case that it should be at least vastly easier than influencing the utility functions of all AIs to make alignment succeed.
Interesting take!
Friend circles of mine – which, I should note, don’t to my knowledge overlap with the s-risks from AI researchers I know – do treat suicide as a perfectly legitimate thing you can do after deliberation, like abortion or gender-affirming surgery. So there’s no particular taboo there. Hence, maybe, why I also don’t recoil from considering that the future might be vastly worse than the present.
But it seems to be like a rationalist virtue not to categorically recoil from certain considerations.
Could you explain the self-fulfilling prophe...
Thx! Yep, your edit basically captures most of what I would reply. If alignment turns out so hard that we can’t get any semblance of human values encoded at all, then I’d also guess that hell is quite unlikely. But there are caveats, e.g., if there is a nonobvious inner alignment failure, we could get a system that technically doesn’t care about any semblance of human values but doesn’t make that apparent because ostensibly optimizing for human values appears useful for it at the time. That could still cause hell, even with a higher-than-normal probability.
Thanks for linking that interesting post! (Haven’t finished it yet though.) Your claim is a weak one though, right? Only that you don’t expect the entirely lightcone of the future to be filled with worst-case hell, or less than 95% of it? There are a bunch of different definitions of s-risk, but what I’m worried about definitely starts at a much smaller-scale level. Going by the definitions in that paper (p. 3 or 391), maybe the “astronomical suffering outcome” or the “net suffering outcome.”
Interesting take! Obviously that’s different for me and many others, but you’re not alone with that. I even know someone who would be ready to cook in a lava lake forever if it implies continuing to exist. I think that’s also in line with the DALY disability weights, but only because they artificially scale them to the 0–1 interval.
So I imagine you’d never make such a deal as shortening you life by three hours in exchange for not experiencing one hour of the worst pain or other suffering you’ve experienced?
Yeah, very much agreed. :-/
in particular, an aligned AI sells more of its lightcone to get baby-eating aliens to eat their babies less, and in general a properly aligned AI will try its hardest to ensure what we care about (including reducing suffering) is satisfied, so alignment is convergent to both.
Those are some good properties, I think… Not quite sure in the end.
But your alignment procedure is indirect, so we don’t quite know today what the result will be, right? Then the question whether we’ll end up ...
Some promising interventions against s-risks that I’m aware of are:
Interpretability research is probably i...
I don’t see how any of these actually help reduce s-risk. Like, if we know some bargaining solutions lead to everyone being terrible and others lead to everyone being super happy so what? Its not like we can tremendously influence the bargaining solution our AI & those it meets settles on after reflection.
Thx! I’ll probably drop the “more heavily” for stylistic reasons, but otherwise that sounds good to me!
I suppose my shooting range metaphor falls short here. Maybe alignment is like teaching a kid to be an ace race car driver, and s-risks are accidents on normal roads. There it also depends on the details whether the ace race car driver will drive safely on normal roads.
Oh, true! Digital sentience is also an important point! A bit of an intuition pump is that if you consider a certain animal to be sentient (at least with some probability), then an em of that animal’s brain may be sentient with a similar probability. If an AI is powerful enough to run such ems, the question is no longer whether digital sentience is possible but why an AI would run such an em.
The Maslow hierarchy is reverse for me, i.e. rather dead/disempowered than being tortured, but that’s just a personal thing. In the end it’s more important what the acausal moral compromise says, I think.
Good point. I can still change it. What title would you vote for? I spent a lot of time vacillating between titles and don’t have a strong opinion. These were the options that I considered:
I agree with what Lukas linked. But there are also various versions of the Waluigi Effect, so that alignment, if done wrong, may increase s-risk. Well, and I say in various answers and the in post proper that I’m vastly more optimistic about reducing s-risk than having to resort to anything that would increase x-risk.
Yeah… When it comes to the skill overlap, having alignment research aided by future pre-takeoff AIs seems dangerous. Having s-risk research aided that way seems less problematic to me. That might make it accessible (now or in a year) for people who have struggled with alignment research. I also wonder whether there is maybe still more time for game-theoretic research in s-risks than three is in alignment. The s-risk-related problems might be easier, so they can perhaps still be solved in time. (NNTR, just thinking out loud.)
Oooh, good point! I’ve certainly observed that in myself in other areas.
Like, “No one is talking about something obvious? Then it must be forbidden to talk about and I should shut up too!” Well, no one is freaking out in that example, but if someone were, it would enhance the effect.
Here are some ways to learn more: “Coordination Challenges for Preventing AI Conflict,” “Cooperation, Conflict, and Transformative Artificial Intelligence: A Research Agenda,” and Avoiding the Worst (and s-risks.org).
Too unknown. Finally there’s the obvious reason that people just don’t know enough about s-risks. That seems quite likely to me.
Too unpopular. Maybe people are motivated by what topics are in vogue in their friend circles, and s-risks are not?
Personal fit. Surely, some people have tried working on s-risks in different roles for some substantial period of time but haven’t found an angle from which they can contribute given their particular skills.
There have been countless discussions of takeoff speeds. The slower the takeoff and the closer the arms race, the greater the risk of a multipolar takeoff. Most of you probably have some intuition of what the risk of a multipolar takeoff is. S-risk is probably just 1/10th of that – wild guess. So I’m afraid that the risk is quite macroscopic.
The second version ignores the expected value. I acknowledge that expected value calculus has its limitations, but if we use it at all, and we clearly do, a lot, then there’s no reason to ignore its implications&n...
Too unlikely. I’ve heard three versions of this concern. One is that s-risks are unlikely. I simply don’t think it is as explained above, in the post proper. The second version is that it’s 1/10th of extinction, hence less likely, hence not a priority. The third version of this take is that it’s just psychologically hard to be motivated for something that is not the mode of the probability distribution of how the future will turn out (given such clusters as s-risks, extinction, and business as usual). So even if s-risks are much worse and only slightly less likely than extinction, they’re still hard for people to work on.
There have been countless discussions of takeoff speeds. The slower the takeoff and the closer the arms race, the greater the risk of a multipolar takeoff. Most of you probably have some intuition of what the risk of a multipolar takeoff is. S-risk is probably just 1/10th of that – wild guess. So I’m afraid that the risk is quite macroscopic.
The second version ignores the expected value. I acknowledge that expected value calculus has its limitations, but if we use it at all, and we clearly do, a lot, then there’s no reason to ignore its implications&n...
That sounds to me like, “Don’t talk about gun violence in public or you’ll enable people who want to overthrow the whole US constitution.” Directionally correct but entirely disproportionate. Just consider that non-negative utilitarians might hypothetically try to kill everyone to replace them with beings with greater capacity for happiness, but we’re not self-censoring any talk of happiness as a result. I find this concern to be greatly exaggerated.
In fact, moral cooperativeness is at the core of why I think work on s-risks is a much stronger option than ...
NNTs. Some might argue that “naive negative utilitarians that take ideas seriously” (NNTs) want to destroy the world, so that any admissions that s-risks are morally important in expectation should happen only behind closed doors and only among trusted parties.
I want to argue with the Litany of Gendlin here, but what work on s-risks really looks like in the end is writing open source game theory simulations and writing papers. All try academic stuff that makes it easy to block out thoughts of suffering itself. Just give it a try! (E.g., at a CLR fellowship.)
I don’t know if that’s the case, but s-risks can be reframed:
Too sad. Some people think that maybe working on s-risks is unpopular because suffering is too emotionally draining to think about, so people prefer to ignore it.
Another version of this concern is that sad topics are not in vogue with the rich tech founders who bankroll our think tanks; that they’re selected to be the sort of people who are excited about incredible moonshots rather than prudent risk management. If these people hear about averting suffering, reducing risks, etc. too often from EA circles, they’ll become uninterested in EA-aligned thinking and think tanks.
Egoism, presentism, or substratism. The worst s-risks will probably not befall us (humans presently alive) or biological beings at all. Extinction, if it happens, will. Maybe death or the promise of utopia has a stronger intuitive appeal to people if they themselves have a risk/chance of experiencing it?
I don’t think you have to be a negative utilitarian to care about s-risks. S-risks are about suffering, but people can be concerned about suffering among other values. Classic utilitarianism is about minimizing suffering and maximizing happiness. One does not exclude the other. Neither does concern for suffering exclude self-preservation, caring for one’s family, wanting to uphold traditions or making one’s ancestors proud. All values are sometimes in conflict, but that is not cause to throw out concern for suffering in particular.
My vague ...
NUs. Some people may think that you have to be a negative utilitarian to care about s-risks. They are not negative utilitarians, so they steer clear of the topic.
Agreed. Also here’s the poem that goes with that comment:
...Do not go gentle into that good night,
Old age should burn and rave at close of day;
Rage, rage against the dying of the light.Though wise men at their end know dark is right,
Because their words had forked no lightning they
Do not go gentle into that good night.Good men, the last wave by, crying how bright
Their frail deeds might have danced in a green bay,
Rage, rage against the dying of the light.Wild men who caught and sang the sun in flight,
And learn, too late, they grieved it on its way,
Do
Hi! I suppose that question mostly goes to Adam? The importer is fixed, so I’m not doing any of this anymore. What I did was to extrapolate to the current, incomplete week n using the slope from week n-2 to n-1. Then I set the percent increase to 0 because the week is actually the current week already.
But the differences from week to week will, in most cases, be minor, so I don’t think it’s important to get this exactly right. There are so many other uncertain factors that go into the model that I don’t recommend investing too much time into this one.
My friend Rai said in a Telegram chat:
...So my thinking is something like: If you just throw money at FOSS ML libraries, I expect you'd mostly shorten the time from "some researcher writes a paper" and "that model is used in a billion real-world projects". I think you'd make whatever AI tech exists be more broadly distributed. I don't think that would directly make stronger AI arrive faster, because I think it would mostly just give people lots of easy-to-use boxes, like when CNNs got popularized, it became quite trivial to train any sort of visual classifi
Good. There’s a site for Switzerland like that too. I extrapolated from that in a similar manner. :-)
Thanks for the analysis!
Some of these points seem like attempts at being conservative (I imagine most people will prefer to err in the direction of caution when advising others), while other points just look like the project is not well-maintained anymore.
Do you have a feeling for what a good adjustment factor is that one could apply to the result to compensate the first kind of problem, so everything related to others’ vaccination status? (I’ll just enter the local data manually while the importer is broken.)
(I’m hesitant to go with a budget that would im...
That’s a great report and exercise! Thank you!
I hopelessly anchored on almost everything in the report, so my estimate is far from independent. I followed roughly Nate’s approach (though that happened without anchoring afaik), and my final probability is ~ 50% (+/- 5% when I play around with the factors). But it might’ve turned out differently if I had had an espresso more or less in the morning.
50% is lower than what I expected – another AI winter would lead to a delay and might invalidate the scaling hypothesis, so that the cumulative probability should ...
Oh, got it! Thanks!
I thought you’d be fundraising to offer refund compensation to others to make their fundraisers more likely to succeed. But if the project developer themself put up the compensation, it’s probably also an important signal or selection effect in the game theoretic setup.
Yeah, courts decide that in the end. Howey Test: money: yes; common enterprise: yes; expectation of ... (read more)