hmm, like i think there's a reasonable sense of "coherence" such that it plausibly doesn't typically increase with capabilities. i think the survey respondents here are talking about something meaningful and i probably agree with most of their judgments about that thing. for example, with that notion of coherence, i probably agree with "Google (the company) is less coherent now than it was when it had <10 employees" (and this is so even though Google is more capable now than it was when it had 10 employees)
this "coherence" is sth like "not being a hot mess" or "making internal tradeoffs efficiently" or "being well-orchestrated". in this sense, "incoherence" is getting at the following things:
with this notion, i think there are many naturally-occurring cases of someone becoming more capable but less "coherent". e.g. maybe i read a textbook and surface-level-learn some new definitions and theorems and i can now solve the problems in the textbook, but the mathematical understanding i just gained is less integrated with the rest of my understanding than usual for me given that i've only surface-level-learned this stuff (and let's assume surface-level-learning this didn't let me integrate other existing stuff better) — like, maybe i mostly don't see how this theorem relates to other theorems, and wouldn't be able to easily recognize contexts in which it could be useful, and wouldn't be able to prove it, and it doesn't yet really make intuitive sense to me that it has to be true — so now i'm better at math but in a sense less coherent. e.g. maybe i get into acrobatics but don't integrate that interest with the rest of my life much. eg maybe as an infant it was easy to see me as mostly orchestrating my like 5 possible actions well toward like being fed when hungry and sleeping when sleepy, but it's less clear how to see me now as orchestrating most of my parts well toward something. [1]
now there is the following response to this:
my thoughts on this response:
i'm somewhat orchestrated toward understanding AI stuff better or getting AGI banned for a very long time or something but i'm probably leaving value massively on the table all over the place, i think in a sense much more than i was as an infant. (and also, this isn't "my terminal goal".) ↩︎
related: https://www.lesswrong.com/posts/nkeYxjdrWBJvwbnTr/an-advent-of-thought ↩︎
the closest thing to this grand optimizer claim that imo makes sense is: it is generic to have values; it is generic to have opinions on what things should be like. this seems sufficient for a basic case for AI risk, as follows: if you're next to an anthill and you're more capable than the ant colony, then it is generic that the ants' thoughts about what things should be like will not matter for long. (with AI, humanity is the ant colony.) ↩︎
i haven't even skimmed the anthropic paper and i have a high prior that they are being bad at philosophy but also: i think there is plausibly a real mistake LW-ers are making around coherence too, downstream of a conflation of two different notions, as i outline here: https://www.lesswrong.com/posts/jL7uDE5oH4HddYq4u/raemon-s-shortform?commentId=WBk9a7TEA5Benjzsu
with like my guess being that: you are saying something straightforwardly true given one notion here but they are making claims given the other notion at least in some cases, though also they might be conflating the two and you might be conflating the two. one could argue that it is fine to "conflate" the two because they are really equivalent, but i think that's probably false (but non-obviously)
I find it interesting and unfortunate that there aren't more economically left-wing thinkers influenced by Yudkowsky/LW thinking about AGI. It seems like a very natural combination given e.g. "Marx subsequently developed an influential theory of history—often called historical materialism—centred around the idea that forms of society rise and fall as they further and then impede the development of human productive power.". It seems likely that LW being very pro-capitalism has meaningfully contributed to the lack of these sorts of people.
[1]
I guess ACS carries sth like this vibe. But (unlike ACS) it also seems natural to apply this sort of view of history to AI except also thinking that fooming will be fast.
[2]
Relatedly, I wonder if I should be "following the money" more when thinking about AI risk. In particular, instead of saying that "AI researchers/companies" will disempower humanity, maybe it would be appropriate to instead or additionally say "(AI )capitalists and capital and capitalism". My current guess is that while it is appropriate to place a bunch of blame on these, it's also true that e.g. Soviet or Chinese systems [wouldn't be]/aren't doing better, so I've mostly avoided saying this so far. That said, my guess is that if the world were much more like the Europe, we would be dying with significantly more dignity, in part due to Europe getting some hyperparameters of governance+society+culture+life more right due to blind luck, but also actually in part due to getting some hyperparameters right because of good reasoning that was basically tracking something logically connected to AI risk (though so far not significantly explicitly tracking AI risk), e.g. via humanism. Another example of a case where I wonder if I should follow the money more is: to what extent should I think of Constellation being wrong/confused/thoughtless/slop-producing on AGI risk in ways xyz as "really being largely about" OpenPhil/Moskovitz/[some sort of outside view impression on AI risk that maybe controls these] being wrong/confused/thoughtless/slop-liking on AGI risk in ways x'y'z'.
I've been meaning to spend at least a few weeks thinking these sorts of questions through carefully, but I haven't gotten around to that yet. I should maybe seek out some interesting [left-hegelians]/marxists/communists/socialists to talk to and try to understand how they'd think about these things.
Under this view, political/economic systems that produce less growth but don’t create the incentives for unbounded competition are preferred. Sadly, for Molochian reasons this seems hard to pull off.
Imo one interesting angle of attack on this question is: it seems plausible/likely that an individual human could develop for a very long time without committing suicide with AI or otherwise (imo unlike humanity as it is currently organized); we should be able to understand what differences between a human and society are responsible for this — like, my guess is that there is a small set of properties here that could be identified; we could try to then figure out what the easiest way is to make humanity have these properties.
By saying this, I don't mean to imply that LW is incorrect/bad to be very pro-capitalism. Whether it is bad is mostly a matter of whether it is incorrect, and whether it is incorrect is an open question to me. ↩︎
I guess this post of mine is the closest thing that quickly comes to mind when I try to think of something carrying that vibe, but it's still really quite far. ↩︎
There are people who think (imo correctly) that there will be at least one vastly superhuman AI in the next 100 years by default and (imo incorrectly) that proceeding along the AI path does not lead to human extinction or disempowerment by default. My anecdotal impression is that a significant fraction (maybe most) of such people think (imo incorrectly) that letting Anthropic/Claude do recursive self-improvement and be a forever-sovereign would probably go really well for humanity. The point of this note is to make the following proposal and request: if you ever let an AI self-improve, or more generally if you have AIs creating successor AIs, or even more generally if you let the AI world develop and outpace humans in some other way, or if you try to run some process where boxed AIs are supposed to create an initial ASI sovereign, or if you try to have AIs "solve alignment" [1] (in one of the ways already listed, or in some other way), or if you are an AI (or human mind upload) involved in some such scheme, [2] try to make it so the following property is upheld:
I think this is probably a bad term that should be deprecated ↩︎
well, at least if the year is and we're not dealing with a foom of extremely philosophically competent and careful mind uploads or whatever, firstly, you shouldn't be running a foom (except for the grand human foom we're already in). secondly, please think more. thirdly, please try to shut down all other AGI attempts and also your lab and maybe yourself, idk in which order. but fourthly, ... ↩︎
This will plausibly require staying ahead of humanity in capabilities in this galaxy forever, so this will be extremely capable AI. So, when I say the galaxy is AGI-free, I don't mean that artificial generally intelligent systems are not present in the galaxy. I mean that these AIs are supposed to have no involvement in human life except for enforcing an AI ban. ↩︎
or like at least "their values" ↩︎
and assuming we aren't currently massively overestimating the amount of resources accessible to Earth-originating creatures ↩︎
or maybe we do some joint control thing about which this is technically false but about which it is still pretty fair to say that each person got more of a say than if they merely controlled of all the resources ↩︎
an intuition pump: as an individual human, it seems possible to keep carefully developing for a long time without accidentally killing oneself; we just need to make society have analogues of whatever properties/structures make this possible in an individual human ↩︎
Btw, a pro tip for weathering the storm of crazymessactivitythoughtdevelopmenthistory: be the (generator of the) storm. I.e., continue acting and thinking and developing as humanity. Also, pulling ourselves up by our own bootstraps is based imo. Wanting to have a mommy AI think for us is pretty cringe imo. ↩︎
Among currently accessible RSI processes, there is one exception: it is in fact fine to have normal human development continue. ↩︎
Ok, really humans (should) probably importantly have lives and values together, so it would be more correct to say: there is a particular infinite contribution to human life/valuing waiting to grow out of each person. Or: when a person is lost, an important aspect of God) is lost. But the simpler picture is fine for making my current point. ↩︎
I think it's good to think of FIAT stuff as a special case of applying some usual understanding-machinery (like, abductive and inductive machinery) in value-laden cases. It's the special case where one implicitly or explicitly abducts to (one having) goals. Here is an example ethical story where the same thing shows up in various ways such that it'd imo be sorta contrived to analyze it in terms of goals being adopted:
(Acknowledgment. A guiding idea here is from a chat with Tom Everitt.)
(Acknowledgment'. A guiding frustration here is that imo people posting on LessWrong think way too much in terms of goals.)
e.g. "a rational being must always regard himself as lawgiving in a kingdom of ends possible through freedom of the will, whether as a member or as sovereign" ↩︎
on my inside view, the ordering of foomers by some sort of intuitive goodness [1] is [a very careful humanity] > [the best/carefulmost human] > [a random philosophy professor] > [a random human] > [an octopus/chimpanzee civilization somehow conditioned on becoming wise enough in time not to kill itself with AI] > [an individual octopus/chimpanzee] > claude [2] , with a meaningful loss in goodness on each step (except maybe the first step, if the best human can be trusted to just create a situation where humanity can proceed together very carefully, instead of fooming very far alone), and meaningful variance inside each category [3] . my intuitive feeling is that each step from one guy to the next in this sequence is a real tragedy. [4]
but i'm meaningfully unsure about what level of goodness this sequence decreases down to — like, i mean, maybe there's a chance even the last foomers have some chance of being at least a bit good. one central reason is that maybe there's a decent chance that eg an advanced octopus civilization would maintain a vast nature preserve for us retarded plant-humans if they get to a certain intelligence level without already having killed us, which would be like at least a bit good (i'm not sure if you mean to consider this sort of thing a "good future"). this feels logically significantly correlated with whether it is plausible that an octopus civilization maintains some sort of deep privileging of existing/[physically encountered] beings, over possible beings they could easily create (and they will be able to easily create very many other beings once they are advanced enough). like, if they do privilege existing beings, then it's not crazy they'd be nice to physically encountered humans. if they don't privilege existing beings and if resources are finite, then since there is an extremely extremely vast space of (human-level) possible beings, it'd be pretty crazy for them to let humans in particular use a significant amount of resources, as opposed to giving the same resources to some other more interesting/valuable/whatever beings (like, it'd be pretty crazy for them to give significant resources to us particular humans, and also it'd be pretty crazy for them to give significant resources to beings that are significantly human-like, except insofar as directly caused by [[octopuses or arbitrary beings] being a bit human-like]). in slogan form: "we're fucked to the extent that it is common to not end up with "strongly person/plant-affecting+respecting views"", and so then there's a question how common this is, which i'm somewhat confused about. i think it’s probably extremely common among minds in general and probably still common among social species, unfortunately. but maybe there’s like a 1% fraction of individuals from social species who are enduringly nice, idk. (one reason for hope: to a certain kind of guy, probably including some humans, this observation that others who are very utilitarian would totally kill you (+ related observations) itself provides a good argument for having person/plant-affecting views.)
(i've been imagining a hypothetical where humans already happen to be living in the universe with octopuses. if we are imagining a hypothetical where humans don't exist in the universe with octopuses at all, then this reason for the sequence to be bounded below by something not completely meaningless goes away.)
(i feel quite confused about many things here)
whose relationship to more concrete things like the (expected) utility assignment i'd effectively use when evaluating lotteries or p("good future") isn't clear to me; this "intuitive goodness" is supposed to track sth like how many ethical questions are answered correctly or in how many aspects what's going on in the world is correct ↩︎
and humanity in practice is probably roughly equivalent to claude in of worlds (though not equivalent in expected value), because we will sadly probably kill ourselves with a claude-tier guy ↩︎
e.g., even the best human might go somewhat crazy or make major mistakes along lots of paths. there's just very many choices to be made in the future. if we have the imo reasonably natural view that there is one sequence of correct choices, then i think it's very likely that very many choices will be made incorrectly. i also think it's plausible this process isn't naturally going to end (though if resources run out, then it ends in this universe in practice), ie that there will just always be more important choices later ↩︎
in practice, we should maybe go for some amount of fooming of the best/carefulmost human urgently because maybe it's too hard to make humanity careful. but it's also plausible that making a human foom is much more difficult than making humanity careful. anyway, i hope that the best human fooming looks like quickly figuring out how to restore genuine power-sharing with the rest of humanity while somehow making development more thought-guided (in particular, making it so terrorists, eg AI researchers, can't just kill everyone) ↩︎
I disagree somewhat, but—whatever the facts about programs—at least it is not appropriate to claim "not only do most programs which make a mind upload device also kill humanity, it's an issue with the space of programs themselves, not with the way we generate distributions over those programs." That is not true.
Hmm, I think that yes, us probably being killed by a program that makes a mind upload device is (if true) an issue with the way we generated a distribution over those programs. But also, it might be fine to say it's an issue with the space of programs (with an implicit uniform prior on programs up to some length or an implicit length prior) itself.
Like, in the example of two equal gas containers connected by a currently open sliding door, it is fair/correct to say, at least as a first explanation: "it's an issue with the space of gas particle configurations itself that you won't be able to close the door with of the particles on the left side". This is despite the fact that one could in principle be sliding the door in a very precise way so as to leave of the particles on the left side (like, one could in principle be drawing the post-closing microstate from some much better distribution than the naive uniform prior over usual microstates). My claim is that the discussion so far leaves open whether the AI mind upload thing is analogous to this example.
It is at least not true "in principle" and perhaps it is not true for more substantial reasons (depending on the task you want and its alignment tax, psychology becomes more or less important in explaining the difficulty, as I gave examples for). On this, we perhaps agree?
I'm open to [the claim about program-space itself being not human-friendly] not turning out to be a good/correct zeroth-order explanation for why a practical mind-upload-device-making AI would kill humanity (even if the program-space claim is true and the practical claim is true). I just don't think the discussion above this comment so far provides good arguments on this question in either direction.
Of course: whether a particular AI kills humanity [if we condition on that AI somehow doing stuff resulting in there being a mind upload device [1] ] depends (at least in principle) on what sort of AI it is. Similarly, of course: if we have some AI-generating process (such as "have such and such labs race to create some sort of AGI"), then whether [conditioning that process on a mind upload device being created by an AGI makes p(humans get killed) high] depends (at least in principle) on what sort of AI-generating process it is.
Still, when trying to figure out what probabilities to assign to these sorts of claims for particular AIs or particular AI-generating processes, it can imo be very informative to (among other things) think about whether most programs one could run such that mind upload devices exist 1 month after running them are such that running them kills humanity.
In fact, despite the observation that the AI/[AI-generating process] design matters in principle, it is still even a priori plausible that "if you take a uniformly random python program of length such that running it leads to a mind upload device existing, running it is extremely likely to lead to humans being killed" is basically a correct zeroth-order explanation for why if a particular AI creates a mind upload device, humans die. (Whether it is in fact a correct zeroth-order explanation for AI stuff going poorly for humanity is a complicated question, and I don't feel like I have a strong yes/no position on this [2] , but I don't think your piece really addresses this question well.) To give an example where this sort of thing works out: even when you're a particular guy closing a particular kind of sliding opening between two gas containers, "only extremely few configurations of gas particles have of the particles on one side" is basically a solid zeroth-order explanation for why you in particular will fail to close that particular opening with of the particles on one side, even though in principle you could have installed some devices which track gas particles and move the opening up and down extremely rapidly while "closing" it so as to prevent passage in one direction but not the other and closed it with of gas particles on one side.
That said, I think it is also a priori plausible that the AI case is not analogous to this example — i.e., it is a priori plausible that in the AI case, "most programs leading to mind uploads existing kill humanity" is not a correct zeroth-order explanation for why the particular attempts to have an AI create mind uploads we might get would go poorly for humanity. My point is that establishing this calls for better arguments than "it's at least in principle possible for an AI/[AI-generating process] to have more probability mass on mind-upload-creating plans which do not kill humanity".
Like, imo, "most programs which make a mind upload device also kill humanity" is (if true) an interesting and somewhat compelling first claim to make in a discussion of AI risk, to which the claim "but one can at least in principle have a distribution on programs such that most programs which make mind uploads no not also kill humans" alone is not a comparably interesting or compelling response.
some speculation about one thing here that might be weird to "normal people":
i think you’re right that the sohl-dickstein post+survey also conflates different notions, and i might even have added more notions into the mix with my list of questions trying to get at some notion(s) [1]
a monograph untangling this coherence mess some more would be valuable. it could do the following things:
i didn’t re-read that post before writing my comment above ↩︎
the answers to some of these questions might depend on some partly “metaphysical” facts like whether math is genuinely infinite or whether technological maturity is a thing ↩︎
i think the optimistic conclusions are unlikely, but i wouldn’t want to pre-write that conclusion for the monograph, especially if i’m not writing it ↩︎