All of Tamsin Leake's Comments + Replies

If you start out with CDT, then the thing you converge to is Son of CDT rather than FDT.
(that arbital page takes a huge amount of time to load for me for some reason, but it does load eventually)

And I could totally see the thing that kills us {being built with} or {happening to crystallize with} CDT rather than FDT.

We have to actually implement/align-the-AI-to the correct decision theory.

2Adele Lopez1mo
Point taken about CDT not converging to FDT. I don't buy that an uncontrolled AI is likely to be CDT-ish though. I expect the agentic part of AIs to learn from examples of human decision making, and there are enough pieces of FDT like voting and virtue in human intuition that I think it will pick up on it by default. (The same isn't true for human values, since here I expect optimization pressure to rip apart the random scraps of human value it starts out with into unrecognizable form. But a piece of a good decision theory is beneficial on reflection, and so will remain in some form.)
3Pi Rogers1mo
I think this is only true if we are giving the AI a formal goal to explicitly maximize, rather than training the AI haphazardly and giving it a clusterfuck of shards. It seems plausible that our FAI would be formal-goal aligned, but it seems like UAI would be more like us unaligned humans—a clusterfuck of shards. Formal-goal AI needs the decision theory "programmed into" its formal goal, but clusterfuck-shard AI will come up with decision theory on its own after it ascends to superintelligence and makes itself coherent. It seems likely that such a UAI would end up implementing LDT, or at least something that allows for acausal trade across the Everett branches.
2Wei Dai1mo
(ETA: Sorry, upon reviewing the whole thread, I think I misinterpreted your comment and thus the following reply is probably off point.) I think the best way to end up with an AI that has the correct decision theory is to make sure the AI can competently reason philosophically about decision theory and are motivated to follow the conclusions of such reasoning. In other words, it doesn't judge a candidate successor decision theory by its current decision theory (CDT changing into Son-of-CDT), but by "doing philosophy", just like humans do. Because given the slow pace of progress in decision theory, what are the chances that we correctly solve all of the relevant problems before AI takes off?

By thinking about each other's source code, FAI and Clippy will be able to cooperate acausally like Alice and Bob, each turning their future lightcone into 10% utopia, 90% paperclips. Therefore, we get utopia either way! :D

So even if we lose we win, but even if we win we lose. The amount of utopiastuff is exactly conserved, and launching unaligned AI causes timelines-where-we-win to have less utopia by exactly as much as our timeline has more utopia.

The amount of utopiastuff we get isn't just proportional to how much we solve alignment, it's actually ba... (read more)

5Pi Rogers1mo
Yes, amount of utopiastuff across all worlds remains constant, or possibly even decreases! But I don't think amount-of-utopiastuff is the thing I want to maximize. I'd love to live in a universe that's 10% utopia and 90% paperclips! I much prefer that to a 90% chance of extinction and a 10% chance of full-utopia. It's like insurance. Expected money goes down, but expected utility goes up. Decision theory does not imply that we get to have nice things, but (I think) it does imply that we get to hedge our insane all-or-nothing gambles for nice things, and redistribute the nice things across more worlds.

I think it's exceedingly unlikely (<1%) that we robustly prevent anyone from {making an AI that kills everyone} without an aligned sovereign.

I continue to think that, in worlds where we robustly survive, money is largely going to be obsolete. The thing that maximizes the terminal values of the kind of (handshake of) utility functions we can expect probably aren't maximized by maintaining current allocations of wealth and institutions-that-care-about-that-wealth. The use for money/investment/resources is making sure we get utopia in the first place, by slowing capabilities and solving alignment (and thus also plausibly purchasing shares of the LDT utility function handshake), not being rich in u... (read more)

2tailcalled2mo
What if we survive without building a global utility maximizer god?

‘high level machine intelligence’ (HLMI) and ‘full automation of labor’ (FAOL)

I continue to believe that predicting things like that is not particularly useful to predicting when AI will achieve decisive strategic advantage and/or kill literally everyone. AI could totally kill literally everyone without us ever getting to observe HLMI or FAOL first, and I think development in HLMI / FAOL does not say much about how close we are to AI that kills literally everyone.

Both are possible. For theoretical examples, see the stamp collector for consequentialist AI and AIXI for reward-maximizing AI.

What kind of AI are the AIs we have now? Neither, they're not particularly strong maximizers. (if they were, we'd be dead; it's not that difficult to turn a powerful reward maximizer into a world-ending AI).

If the former, I think this makes alignment much easier. As long as you can reasonably represent “do not kill everyone”, you can make this a goal of the AI, and then it will literally care about not killing everyone, it won’t

... (read more)
1RedFishBlueFish2mo
I think this goes to Matthew Barnett’s recent article of actually yes we do. And regardless I don’t think this point is a big part of Eliezer’s argument. https://www.lesswrong.com/posts/i5kijcjFJD6bn7dwq/evaluating-the-historical-value-misspecification-argument Yeah so I think this is the crux of it. My point is that if we find some training approach that leads to a model that cares about the world itself rather than hacking some reward function, that’s a sign that we can in fact guide the model in important ways and there’s a good chance this includes being able to tell it not to kill everyone This is just a way of saying “we don’t know what AGI would do”. I don’t think this point pushes us toward x-risk any more than it pushes us toward not-x-risk.

The first one. Alice fundamentally can't fully model Bob because Bob's brain is as large as Alice's, so she can't fit it all inside her own brain without simply becoming Bob.

I remember a character in Asimov's books saying something to the effect of

It took me 10 years to realize I had those powers of telepathy, and 10 more years to realize that other people don't have them.

and that quote has really stuck with me, and keeps striking me as true about many mindthings (object-level beliefs, ontologies, ways-to-use-one's-brain, etc).

For so many complicated problem (including technical problems), "what is the correct answer?" is not-as-difficult to figure out as "okay, now that I have the correct answer: how the hell do other peo... (read more)

1quetzal_rainbow2mo
I should note that it's not entirely known whether quining is applicable for minds.
2NicholasKross2mo
I relate to this quite a bit ;-;
4NicholasKross2mo
Is this "fundamentally" as in "because you, the reader, are also a bounded human, like them"? Or "fundamentally" as in (something more fundamental than that)?
2NicholasKross2mo
If timelines weren't so short, brain-computer-based telepathy would unironically be a big help for alignment. (If a group had the money/talent to "hedge" on longer timelines by allocating some resources to that... well, instead of a hivemind, they first need to run through the relatively-lower-hanging fruit. Actually, maybe they should work on delaying capabilities research, or funding more hardcore alignment themselves, or...)
8Viliam2mo
Somewhat related: What Universal Human Experiences Are You Missing Without Realizing It? (and its spinoff: Status-Regulating Emotions)

I don't think it's a binary; they could still pay less attention!

(plausibly there's a bazillion things constantly trying to grab their attention, so they won't "lock on" if we avoid bringing AI to their attention too much)

4the gears to ascension2mo
to clarify: governments have already put some of their agentic capability towards figuring out the most powerful ways to use ai, and there is plenty of documentation already as to what those are. the documentation is the fuel, and it has already caught on "being used to design war devices" fire. the question is how do they respond. it's not likely they'll respond well, regardless, of course. I'm more worried about pause regulation itself changing the landscape in a way that causes net acceleration, rather than advocacy for it independent of the enactment of the regulation, which I expect to do relatively little. individual human words mean little next to the might of "hey chatgpt," suddenly being a thing that exists.

You might want to read this post (it's also on lesswrong but the images are broken there)

(to be clear: this is more an amusing suggestion than a serious belief)

By "vaguely like dath ilan" I mean the parts that made them be the kind of society that can restructure in this way when faced with AI risk. Like, even before AI risk, they were already very different from us.

4JBlack2mo
Ah, I see! Yeah, I have pretty much no idea. I vaguely suspect that humans are not inherently well-suited to coordination in that sense, and that it would take an unusual cultural situation to achieve it. We never got anywhere close at any point in our history. It also seems likely that the window to achieve it could be fairly short. There seems to be a lot of widespread mathematical sophistication required as described, and I don't think that naturally arises long before AI. On the other hand, maybe some earlier paths of history could and normally should have put some useful social technology and traditions in place that would be built on later in many places and ways, but for some reason that didn't happen for us. Some early unlikely accident predisposed us to our sorts of societies instead. Our sample size of 1 is difficult to generalize from. I would put my credence median well below 1:1, but any distribution I have would be very broad, spanning orders of magnitude of likelihood and the overall credence something like 10%. Most of that would be "our early history was actually weird".

I'm pretty sure we just need one resimulation to save everyone; once we have located an exact copy of our history, it's cheap to pluck out anyone (including people dead 100 or 1000 years ago). It's a one-time cost.

Lossy resurrection is better than nothing but it doesn't feel as "real" to me. If you resurrect a dead me, I expect that she says "I'm glad I exist! But — at least as per my ontology and values — you shouldn't quite think of me as the same person as the original. We're probly quite different, internally, and thus behaviorally as well, when ran ov... (read more)

4Thane Ruthenis2mo
No argument on that. I don't find it particularly surprising that {have lost a loved one they wanna resurrect} ∩ {take the singularity and the possibility of resurrection seriously} ∩ {would mention this} is empty, though: * "Resurrection is information-theoretically possible" is a longer leap than "believes an unconditional pro-humanity utopia is possible", which is itself a bigger leap than just "takes singularity seriously". E. g., there's a standard-ish counter-argument to "resurrection is possible" which naively assumes a combinatorial explosion of possible human minds consistent with a given behavior. Thinking past it requires some additional less-common insights. * "Would mention this" is downgraded by it being an extremely weakness/vulnerability-revealing motivation. Much more so than just "I want an awesome future". * "Would mention this" is downgraded by... You know how people who want immortality get bombarded with pop-culture platitudes about accepting death? Well, as per above, immortality is dramatically more plausible-sounding than resurrection, and it's not as vulnerable-to-mention a motivation. Yet talking about it is still not a great idea in a "respectable" company. Goes double for resurrection.

(Let's call the dead person "rescuee" and the person who wants to resurrect them "rescuer".)

The procedure you describe is what I call "lossy resurrection". What I'm talking about looks like: you resimulate the entire history of the past-lightcone on a quantum computer, right up until the present, and then either:

  • You have a quantum algorithm for "finding" which branch has the right person (and you select that timeline and discard the rest) (requires that such a quantum algorithm exists)
  • Each branch embeds a copy of the rescuer, and whichever branch looks
... (read more)
4Thane Ruthenis2mo
Yeah, I don't know about this one either. Even if possible, it might be incredibly wasteful, in terms of how much negentropy (= future prosperity for new people) we'll need to burn in order to rescue one person. And then the more we rescue, the less value we get out of that as well, since burning negentropy will reduce their extended lifespans too. So we'd need to assign greater (dramatically greater?) value to extending the life of someone who'd previously existed, compared to letting a new person live for the same length of time. "Lossy resurrection" seems like a more negentropy-efficient way of handling that, by the same tokens as acausal norms likely being a better way to handle acausal trade than low-level simulations and babble-and-prune not being the most efficient way of doing general-purpose search. Like, the full-history resimulation will surely still not allow you to narrow things down to one branch. You'd get an equivalence class of them, each of them consistent with all available information. Which, in turn, would correspond to a probability distribution over the rescuee's mind; not a unique pick. Given that, it seems plausible that there's some method by which we can get to the same end result – constrain the PD over the rescuee's mind by as much as the data available to us can let us – without actually running the full simulation. Depends on how the space of human minds looks like, I suppose. Whether it's actually much lower-dimensional than a naive analysis of possible brain-states suggests.

Take our human civilization, at the point in time at which we invented fire. Now, compute forward all possible future timelines, each right up until the point where it's at risk of building superintelligent AI for the first time. Now, filter for only timelines which either look vaguely like earth or look vaguely like dath ilan.

What's the ratio between the number of such worlds that look vaguely like earth vs look vaguely like dath ilan? 100:1 earths:dath-ilans ? 1,000,000:1 ? 1:1 ?

2JBlack2mo
Even in the fiction, I think dath ilan didn't look vaguely like dath ilan until after it was at risk of building superintelligent AI for the first time. They completely restructured their society and erased their history to avert the risk.

Typical user of outside-view epistemics

(actually clipped from this YourMovieSucks video)

tbh I kinda gave up on reaching people who think like this :/

My heuristic is that they have too many brainworms to be particularly helpful to the critical parts of worldsaving, and it feels like it'd be unpleasant and not-great-norms to have a part of my brain specialized in "manipulating people with biases/brainworms".

7quetzal_rainbow2mo
I don't think that reframing is manipulation? In my model, reframing between various setting is a necessary part of general intelligence - you set problem and switch between frameworks until you find one of which where solution-search-path is "smooth". The same with communication - you build various models of your companion until you find shortest-inference path.
7Thane Ruthenis2mo
I meant when interfacing with governments/other organizations/etc., and plausibly at later stages, when the project may require "normal" software engineers/specialists in distributed computations/lower-level employees or subcontractors. I agree that people who don't take the matter seriously aren't going to be particularly helpful during higher-level research stages. I don't think this is really manipulation? You're communicating an accurate understanding of the situation to them, in the manner they can parse. You're optimizing for accuracy, not for their taking specific actions that they wouldn't have taken if they understood the situation (as manipulators do). If anything, using niche jargon would be manipulation, or willful miscommunication: inasmuch as you'd be trying to convey them accurate information in the way you know they will misinterpret (even if you're not actively optimizing for misinterpretation).

Alright, I think I've figured out what my disagreement with this post is.

A field of research pursues the general endeavor of finding out things there are to know about a topic. It consists of building an accurate map of the world, of how-things-work, in general.

A solution to alignment is less like a field of research and more like a single engineering project. A difficult one, for sure! But ultimately, still a single engineering project, for which it is not necessary to know all the facts about the field, but only the facts that are useful.

And small groups... (read more)

6Thane Ruthenis2mo
Suggestion: if you're using the framing of alignment-as-a-major-engineering-project, you can re-frame "exfohazards" as "trade secrets". That should work to make people who'd ordinarily think that the very idea of exfohazards is preposterous[1] take you seriously. 1. ^ As in: "Aren't you trying to grab too much status by suggesting you're smart enough to figure out something dangerous? Know your station!"

Being embedded in a fake reality and fooled into believing it's true would be against many people's preferences.

Strongly agree; I have an old, short post about this. See also Contact with reality.

Some people might (under reflection) be locally-caring entities, but most people's preferences are about what the reality actually contains and they (even under reflection) wouldn't want to, for example, press a button that cause them to mistakenly believe that everything is fine.

I'm kinda bewildered at how I've never observed someone say "I want to build aligned superintelligence in order to resurrect a loved one". I guess the sets of people who {have lost a loved one they wanna resurrect}, {take the singularity and the possibility of resurrection seriously}, and {would mention this} is… the empty set??

(I have met one person who is glad that alignment would also get them this, but I don't think it's their core motivation, even emotionally. Same for me.)

4Thane Ruthenis2mo
Do you have any (toy) math arguing that it's information-theoretically possible? I currently consider it plausible that yeah, actually, for any person X who still exists in cultural memory (let alone living memory, let alone if they lived recently enough to leave a digital footprint), the set of theoretically-possible psychologically-human minds whose behavior would be consistent with X's recorded behavior is small enough that none of the combinatorial-explosion arguments apply, so you can just generate all of them and thereby effectively resurrect X. But you sound more certain than that. What's the reasoning?

Hence, the policy should have an escape clause: You should feel free to talk about the potential exfohazard if your knowledge of it isn't exclusively caused by other alignment researchers telling you of it. That is, if you already knew of the potential exfohazard, or if your own research later led you to discover it.

In an ideal world, it's good to relax this clause in some way, from a binary to a spectrum. For example: if someone tells me of a hazard that I'm confident I would've discovered one my own one week later, then they only get to dictate me not... (read more)

Pretty sure that's what the "telling you of it" part fixes. Alice is the person who told you of Alice's hazards, so your knowledge is exclusively caused by Alice, and Alice is the person whose model dictates whether you can share them.

6Cleo Nardo2mo
yep, if that's OP's suggestion then I endorse the policy. (But I think it'd be covered by the more general policy of "Don't share information someone tells you if they wouldn't want you to".) But my impression is that OP is suggesting the stronger policy I described?

(Epistemic status: Not quite sure)

Realityfluid must normalize for utility functions to work (see 1, 2). But this is a property of the map, not the territory.

Normalizing realityfluid is a way to point to an actual (countably) infinite territory using a finite (conserved-mass) map object.

1Tetraspace2mo
I replied on discord that I feel there's maybe something more formalisable that's like: * reality runs on math because, and is the same thing as, there's a generalised-state-transition function * because reality has a notion of what happens next, realityfluid has to give you a notion of what happens next, i.e. it normalises * the idea of a realityfluid that doesn't normalise only comes to mind at all because you learned about R^n first in elementary school instead of S^n which I do not claim confidently because I haven't actually generated that formalisation, and am posting here because maybe there will be another Lesswronger's eyes on it that's like "ah, but...". 
2Dagon2mo
Many mechanisms of aggregation literally normalize random elements.  Simple addition of two (or more) evenly-distributed linear values (say, dice) yields a normal distribution (aka bell curve). And yes, human experience is all map - the actual state of the universe is imperceptible.

Seems right. In addition, if there was some person out there waiting to make a new AI org, it's not like they're waiting for the major orgs to shut down to compete.

Shutting down the current orgs does not fully solve the problem, but it surely helps a lot.

I still do not agree with your position, but thanks to this post I think I at least understand it better than I did before. I think my core disagreements are:

Here is the catch: AGI components interacting to maintain and replicate themselves are artificial. Their physical substrate is distinct from our organic human substrate.

That needn't be the case. If all of the other arguments in this post were to hold, any AI or AI-coalition (whether aligned to us or not) which has taken over the world could simply notice "oh no, if I keep going I'll be overtaken b... (read more)

1Remmelt2mo
Thanks for your thoughts   If artificial general intelligence moves to a completely non-artificial substrate at many nested levels of configuration (meaning in this case, a substrate configured like use from the proteins to the cells), then it would not be artificial anymore.  I am talking about wetware like us, not something made out of standardised components. So these new wetware-based configurations definitely would also not have the general capacities you might think they would have. It's definitely not a copy of the AGI's configurations. If they are standardised in their configuration (like hardware), the substrate-needs convergence argument above definitely still applies. The argument is about how general artificial intelligence, as defined, would converge if they continue to exist. I can see how that was not clear from the excerpt, because I did not move over this sentence: "This is about the introduction of self-sufficient learning machinery, and of all modified versions thereof over time, into the world we humans live in."   I get what you are coming from. Next to the speed of the design, maybe look at the *comprehensiveness* of the 'design'. Something you could consider spending more time thinking about is how natural selection works through the span of all physical interactions between (parts of) the organism and their connected surroundings.  And top-down design does not. For example, Eliezer brought up before how top-down design of an 'eye' wouldn't have the retina sit back behind all that fleshy stuff that distorts light. A camera was designed much faster by humans. However, does a camera self-heal when it breaks like our eye does? Does a camera clean itself?  And so on – to much fine-grained functional features of the eye.   Yesterday,  Anders Sandberg had a deep productive conversation about this with my mentor. What is missing in your description is that the unidimensionality and simple direct causality of low-level error correction method

My current belief is that you do make some update upon observing existing, you just don't update as much as if we were somehow able to survive and observe unaligned AI taking over. I do agree that the no update at all because you can't see the counterfactual is wrong, but anthropics is still somewhat filtering your evidence; you should update less.

(I don't have my full reasoning for {why I came to this conclusion} fully loaded rn, but I could probably do so if needed. Also, I only skimmed your post, sorry. I have a post on updating under anthropics with actual math I'm working on, but unsure when I'll get around to finishing it.)

I really like this! (here's mine)

A few questions:

  • The first time AI reaches STEM+ capabilities (if that ever happpens), it will disempower humanity within three months

    So this is asking for P(fasttakeoff and unaligned | STEM+) ? It feels weird that it's asking for both. Unless you count aligned-AI-takeover as "disempowering" humanity. Asking for either P(fasttakeoff | STEM+) or P(fasttakeoff | unaligned and STEM+) would make more sense, I think.

  • Do you count aligned-AI-takeover (where an aligned AI takes over everything and creates an at-least-okay u

... (read more)

Okay yeah this is a pretty fair response actually. I think I still disagree with the core point (that AI aligned to current people-likely-to-get-AI-aligned-to-them would be extremely bad) but I definitely see where you're coming from.

Do you actually believe extinction is preferable to rolling the dice on the expected utility (according to your own values) of what happens if one of the current AI org people launches AI aligned to themself?

Even if, in worlds where we get an AI aligned to a set of values that you would like, that AI then acausally pays AI-ali... (read more)

1Thane Ruthenis2mo
... I've not actually evaluated this question much, because I am more concerned about the "all the bananas we're on track to find are gonna be radioactive, it doesn't matter who finds one" thing. Inasmuch as I am concerned about the banana's owners, my concerns lie in worlds in which we raise the awareness of AGI Ruin enough to get governments' attention to it, and to get them to somehow minimize race dynamics, but then fail to sell them on the whole "universal utopia" thing, and end up in some xenophobic/kafkaesque dystopia from which even death isn't escape. I frown at sociopolitical strategies that entirely fail to even consider that; it seems like a very EA-brand style of naivete. But it doesn't really seem like a particularly likely failure mode, at this point. Here, I'm mostly trying to illuminate the perspectives of people who are starting from the place of "but who gets the banana?", and the places the AGI-ruin advocates often seem to fail at communicating with them. (And how the policies that would sound convincing to them might sound like.) I guess no, on balance, taking the acausal-negotiation part into account. (I'd need to actually do some toy math to figure out the definitive answer.) That said, I'm quite concerned about certain people's notorious power-hunger tendencies. Such preferences can shake out into an utopia containing vast masses of people to lord over, which would be pretty hellish. But I guess the worst excesses of that would be something they could be acausally negotiated out of, such that life in such a world would still be preferable to non-existence. (Yes, I'm given to understand there's been a lot of pro-humanity messaging from all current major-AI-lab leaders. But... Well, we all know how treacherous turns work, so that's all ~zero evidence. And my priors on the interiors of such people aren't optimistic. That's the other reason I haven't researched the matter: I don't expect to update in a positive direction no matter what I fin

I'm a big fan of Rob Bensinger's "AI Views Snapshot" document idea. I recommend people fill their own before anchoring on anyone else's.

Here's mine at the moment:

I thought the "C" in CEV stood for "coherent" in the sense that it had been reconciled over all people (or over whatever set of preference-possessing entities you were taking into acount). Otherwise wouldn't it just be "EV"?

I mean I guess, sure, if "CEV" means over-all-people then I just mean "EV" here.
Just "EV" is enough for the "basic challenge" of alignment as described on AGI Ruin.

So are you saying that it would literally have an internal function that represented "how good" it thought every possible state of the world was, and then solve an (app

... (read more)

Humans aren't all required to converge to the same volition, there's no particularly defensible way of resolving any real differences

CEV-ing just one person is enough for the "basic challenge" of alignment as described on AGI Ruin.

and even finding any given person's individual volition may be arbitrarily path-dependent

Sure, but it doesn't need to be path-independent, it just needs to have pretty good expected value over possible paths.

Whether something is a utopia or a dystopia is a matter of opinion. Some people's "utopias" may be worse than deat

... (read more)

That's fair (though given the current distribution of people likely to launch the AI, I'm somewhat optimistic that we won't get such a dystopia) — but the people getting confused about that question aren't asking it because they have such concerns, they're usually (in my experience) asking it because they're confused way upstream of that

I disagree. I think they're concerned about the right thing for the right reasons, and the attempt to swap-in a different (if legitimate, and arguably more important) problem instead of addressing their concerns is where a ... (read more)

6faul_sname2mo
What observations are backing this belief? Have you seen approaches that share some key characteristics with expected utility maximization approaches which have worked in real-world situations, and where you expect that the characteristics that made it work in the situation you observed will transfer? If so, would you be willing to elaborate? On the flip side, are there any observations you could make in the future that would convince you that expected utility maximization will not be a good model to describe the kind of AI likely to take over the world?
3jbash2mo
I thought the "C" in CEV stood for "coherent" in the sense that it had been reconciled over all people (or over whatever set of preference-possessing entities you were taking into acount). Otherwise wouldn't it just be "EV"? So are you saying that it would literally have an internal function that represented "how good" it thought every possible state of the world was, and then solve an (approximate) optimization problem directly in terms of maximizing that function? That doesn't seem to me like a problem you could solve even with a Jupiter brain and perfect software.

We have at least one prototype for a fully formalized implementation of CEV, yes: mine.

(I'd argue that for the correct answer to "what are values?" to be "just do CEV", we shouldn't need a specific plan for CEV; we should just need good confidence that something like CEV can be implemented.)

I've heard some describe my recent posts as "overconfident".

I think I used to calibrate how confident I sound based on how much I expect the people reading/listening-to me to agree with what I'm saying, kinda out of "politeness" for their beliefs; and I think I also used to calibrate my confidence based on how much they match with the apparent consensus, to avoid seeming strange.

I think I've done a good job learning over time to instead report my actual inside-view, including how confident I feel about it.

There's already an immense amount of outside-view d... (read more)

A short comic I made to illustrate what I call "outside-view double-counting".

(resized to not ruin how it shows on lesswrong, full-scale version here)

Due to my timelines being this short, I'm hopeful that convincing just "the current crop of major-AI-Lab CEOs" might actually be enough to buy us the bulk of time that something like this could buy.

I'm not sure what you mean? I'm just describing what those concepts are and how I think they fit together in the territory, not prescribing anything.

Oh, egoism is totally coherent. I'm just saying that your values can be egoist, or they can be cosmopolitan, or a mixture of they two. But (a version of) cosmopolitanism is a contents of a person's values, not a standalone objective thing.

2TAG2mo
How does that help in practice?

CEV of humanity is certainly desirable! If you CEV me, I in turn implement some kind of CEV-of-humanity in a way that doesn't particularly privilege myself. But that's downstream of one's values and of decision theory.

Your goal as an agent is to maximize your utility function — and it just so happens that your utility function, as you endorse it in CEV, consists of maximizing everyone's CEV in some way.

Think not "cosmopolitanism vs my-utility-function" but "cosmopolitanism, as entailed by my utility function".

(see also my post surprise! you want what you want)

2TAG2mo
It egoism is incoherent,and altruism coherent, I suppose that would follow...but it's a big if. Where is it proven?

I don't view ASI as substantially different than an upload economy.

I'm very confused about why you think that. Unlike an economy, an aligned ASI is an agent. Its utility function can be something that looks at the kind of economy you ndescribe, and goes "huh, actually, extreme inequality seems not great, what if everyone got a reasonable amount of resources instead".

It's like you don't think the people who get CEV'd would ever notice Moloch; their reflection processes would just go "oh yeah whatever this is fine keep the economy going".

Most worlds where... (read more)

1jacob_cannell3mo
You ignored most of my explanation so I'll reiterate a bit differently. But first taboo the ASI fantasy. * any good post-AGI future is one with uploading - humans will want this * uploads will be very similar to AI, and become moreso as they transcend * the resulting upload economy is one of many agents with different values * the organizational structure of any pareto optimal multi-agent system is necessarily market-like * it is a provable fact that wealth/power inequality is a consequent requisite side effect Unlikely but it also doesn't matter as what alignment actually means is the resulting ASI must approximate pareto optimality with respect to various stakeholder utility functions, which requires that: * it uses stakeholder's own beliefs to evaluate utility of actions * it must redistribute stakeholder power (ie wealth) toward agents with better predictive beliefs over time (in a fashion that looks like internal bayesian updating). In other words, the internal structure of the optimal ASI is nigh indistinguishable from an optimal market. Additionally, the powerful AI systems which are actually created are far more likely to be one which precommit to honoring their creator stakeholder weath distribution. In fact - that is part of what alignment actually means.

I strongly doubt that an aligned superintelligence couldn't upload everyone-who-wants-to-be-uploaded cheaply (given nanotech), but if it couldn't, I doubt its criterion for who gets to be uploaded first will be "whoever happens to have money" rather than a more-likely-to-maximize-utility criterion such as "whoever needs it most right now".

This is why I think wealth allocations are likely to not be relevant; giving stuff to people who had wealth pre-singularity is just not particularly utility-maximizing.

2jacob_cannell3mo
I don't view ASI as substantially different than an upload economy. There are strong theoretical reasons why (relatively extreme) inequality is necessary for pareto efficiency, and pareto efficiency is the very thing which creates utility (see critch's recent argument for example, but there were strong reasons to have similar beliefs long before). The distribution of contributions towards the future is extremely heavy tailed: most contribute almost nothing, a select few contribute enormously. Future systems must effectively trade with the present to get created at all: this is just as true for corporations as it is for future complex AI systems (which will be very similar to corporations). Furthermore, uploads will be able to create copies of themselves proportional to their wealth, so wealth and measure become fungible/indistinguishable. This is already true to some extent today - the distribution of genetic ancestry is one of high inequality, the distribution of upload descendancy will be far more inequal and on accelerated timescales. This is a bizarre, disastrously misguided socialist political fantasy. The optimal allocation of future resources over current humans will necessarily take the form of something like a historical backpropagated shapley value distribution: future utility allocated proportionally to counterfactual importance in creating said future utility. Well functioning capitalist economies already do this absent externalities; the function of good governance is to internalize all externalities.
  • Say, I'm too old to expect aligned AI to give me eternal life (or aligned AI simply might not mean eternal life/bliss for me, for whichever reason; maybe as it's still better to start with newborns more efficiently made into bliss-enjoying automatons or whatever utopia entails), so for me individually, the intermediate years before superintelligence are the relevant ones, so I might rationally want to earn money by working on enriching myself, whatever the (un-)alignment impact of it

I expect that the set of people who:

  • Expect to have died of old age
... (read more)
3FlorianH3mo
It would be extremely small if we'd be talking about binaries/pure certainty. If in reality, everything is uncertain, and in particular (as I think), everyone has individually a tiny probability of changing the outcome, everyone ends up free-riding. This is true for the commoner[1] who's using ChatGPT or whichever cheapest & fastest AI tool he finds for him to succeed in his work, therefore supporting the AI race and "Take actions who impact the what superintelligence is built". It may also be true for CEOs of many AI companies. Yes their distopia-probability-impact is larger, but equally so do their own career, status, power - and future position within the potential new society, see jacob_cannell's comment - depend more strongly hinge on their action. (Imperfect illustrative analogy: Climate change may kill a hundred million people or so, the being called human will tend to fly around the world, heating it up. Would anyone be willing to "sacrifice" hundred million people for her trip to Bali? I have some hope they wouldn't. But, she'll not avoid the holiday if her probability of avoiding disastrous climate change anyway is tiny. And if instead of her holiday, her entire career, fame, power depended on her to continue polluting, even if she was a global scale polluter, she'd likely enough not stop emitting for the sake of changing. I think we clearly must acknowledge this type of public good/freerider dynamics in the AI domain. *** Agree with a lot in this, but w/o changing my interpretation much: Yes, humans are good in rationalization of their bad actions indeed. But they're especially good at it when it's in their egoistic interest to continue the bad thing. So both the commoner and the AI CEO alike, might well rationalize 'for complicated reason it's fine for the world if we (one way or another) heat up the AI race a bit' in irrational ways - really as they might rightly see it in their own material interest to be continuing to do so, and want to make the

Time for another regular (non-google-docs-style) coment.

This is a regular comment.

example side-comment without quote

test

example side-comment

what you describe wouldn't be super-great, but it would address the challenge of alignment:

all I am looking at is prospective results, all I want is that we have justifiable cause to believe of a pivotally useful AGI 'this will not kill literally everyone'. Anybody telling you I'm asking for stricter 'alignment' than this has failed at reading comprehension. The big ask from AGI alignment, the basic challenge I am saying is too difficult, is to obtain by any strategy whatsoever a significant chance of there being any survivors.

that last point is plausible for some, but for most i expect that we're far from the pareto frontier and large positive sum gains to be made through cooperation (assuming they implement a decision theory that allows such cooperation).

agreed overall.

if ones goal is to minimize the harm per animal conditional on it existing, and one believes that ASI is within reach, the correct focus would seem to be to ignore alignment and focus on capabilities

IMO aligned AI reduces suffering even more than unaligned AI because it'll pay alien civilizations (eg baby eaters) to not do things that we'd consider large scale suffering (in exchange for some of our lightcone), so even people closer to the negative utilitarian side should want to solve alignment.

ratfic (as i'm using here) typically showcases characters applying lesswrong rationality well. lesswrong rationality is typically defined as ultimately instrumental to winning.

That sounds rather tautological.

Assuming ratfic represents LessWrong-style rationality well and assuming LW-style rationality is a good approximation of truly useful instrumental reasoning, then the claim should hold. There’s room for error in both assumptions.

LW-rationality(and ratfic by extension) aspires to be instrumental to winning in the real world, whether it in fact does so is an emprical question.

Load More