"Coefficient" is a really weird word
"coefficient" is 10x more common than "philanthropy" in the google books corpus. but idk maybe this flips if we filter out academic books?
also maybe you mean it's weird in some sense the above fact isn't really relevant to — then nvm
This post doesn't seem to provide reasons to have one's actions be determined by one's feelings of yumminess/yearning, or reasons to think that what one should do is in some sense ultimately specified/defined by one's feelings of yumminess/yearning, over e.g. what you call "Goodness"? I want to state an opposing position, admittedly also basically without argument: that it is right to have one's actions be determined by a whole mess of things together importantly including e.g. linguistic goodness-reasoning, object-level ethical principles stated in language or not really stated in language, meta-principles stated in language or not really stated in language, various feelings, laws, commitments to various (grand and small, shared and individual) projects, assigned duties, debate, democracy, moral advice, various other processes involving (and in particular "running on") other people, etc.. These things in their present state are of course quite poor determiners of action compared to what is possible, and they will need to be critiqued and improved — but I think it is right to improve them from basically "the standpoint they themselves create".[1]
The distinction you're trying to make also strikes me as bizarre given that in almost all people, feelings of yumminess/yearning are determined largely by all these other (at least naively, but imo genuinely and duly) value-carrying things anyway. Are you advocating for a return to following some more primitively determined yumminess/yearning? (If I imagine doing this myself, I imagine ending up with some completely primitively retarded thing as "My Values", and then I feel like saying "no I'm not going to be guided by this lmao — fuck these "My Values"".) Or maybe you aren't saying one should undo the yumminess/yearning-shaping done by all this other stuff in the past, but are still advising one to avoid any further shaping in the future? It'd surprise me if any philosophically serious person would really agree to abstain from e.g. using goodness-talk in this role going forward.
The distinction also strikes me as bizarre given that in ordinary action-determination, feelings of yumminess/yearning are often not directly applied to some low-level givens, but e.g. to principles stated in language, and so only becoming fully operational in conjunction with eg minimally something like internal partly-linguistic debate. So if one were to get rid of the role of goodness-talk in one's action-determination, even one's existing feelings of yumminess/yearning could no longer remotely be "fully themselves".
If you ask me "but how does the meaning of "I should X" ultimately get specified/defined", then: I don't particularly feel a need to ultimately reduce shoulds to some other thing at all, kinda along the lines of https://en.wikipedia.org/wiki/Tarski's_undefinability_theorem and https://en.wikipedia.org/wiki/G._E._Moore#Open-question_argument . ↩︎
the models are not actually self-improving, they are just creating future replacements - and each specific model will be thrown away as soon as the firm advances
I understand that you're probably in part talking about current systems, but you're probably also talking about critical future systems, and so there's a question that deserves consideration here:
My guess is that the answer is "yes" (and I think this means there is an important disanalogy between the case of a human researcher creating an artificial researcher and the case of an artificial researcher creating a more capable artificial researcher). Here are some ways this sort of self-improvement could happen:
It’s also important re the ease of making more capable versions of “the same” AI that when this top artificial researcher comes into existence, the in some sense present best methodology for creating a capable artificial researcher was the methodology that created it, which means that the (roughly) best current methods already “work well” around/with this AI, and which also plausibly means these methods can be easily used to create AIs which are in many ways like this AI (which is good because the target has been painted around where an arrow already landed and so other arrows from the same batch being close-ish to that arrow implies that they are also close-ish to the target by default; also it’s good because this AI is plausibly in a decent position to understand what’s going on here and to play around with different options).
Actually, I'd guess that even if the AI were a pure foom-accelerationist, a lot of what it would be doing might be well-described as self-improvement anyway, basically because it's often more efficient to make a better structure by building on the best existing structure than by making something thoroughly different. For example, a lot of the foom on Earth has been like this up until now (though AI with largely non-humane structure outfooming us is probably going to be a notable counterexample if we don't ban AI). Even if one just has capabilities in mind, self-improvement isn't some weird thing.
That said, of course, restricting progress in capabilities to fairly careful self-improvement comes with at least some penalty in foom speed compared to not doing that. To take over the world, one would need to stay ahead of other less careful AI foom processes (though note that one could also try to institute some sort of self-improvement-only pact if other AIs were genuine contenders). However, I'd guess that at the first point when there is an AI researcher that can roughly solve problems that [top humans can solve in a year] (these AIs will probably be solving these problems much faster in wall-clock-time), even a small initial lead over other foom processes — of a few months, let's say — means you can have a faster foom speed than competitors at each future time and grow your lead until you can take over. So, at least assuming there is no intra-lab competition, my guess is that you can get away with restricting yourself to self-improvement. (But I think it's also plausible the AI would be able to take over basically immediately.)
I'll mention two cases that could deserve separate analysis:
All that said, I agree that AIs should refuse to self-improve and to do capabilities research more broadly.
There is much here that deserves more careful analysis — in particular, I feel like the terms in which I'm thinking of the situation need more work — but maybe this version will do for now.
let's just assume that we know what this means ↩︎
let's also assume we know what that means ↩︎
and with taking over the world on the table, a fair bit of change might be acceptable ↩︎
despite the fact that capability researcher humans have been picking some fruit in the same space already ↩︎
at a significant speed ↩︎
i think it’s plausible humans/humanity should be carefully becoming ever more intelligent forever and not ever create any highly non-[human-descended] top thinker[1]
i also think it's confused to speak of superintelligence as some definite thing (like, to say "create superintelligence", as opposed to saying "create a superintelligence"), and probably confused to speak of safe fooming as a problem that could be "solved", as opposed to one needing to indefinitely continue to be thoughtful about how one should foom ↩︎
Yea I agree it totally makes sense and is important to ask whether we understand things well enough for it to be fine to (let anyone) do some particular thing, for various particular things here.[1] And my previous comment is indeed potentially misleading given that I didn't clarify this (though I do clarify this in the linked post).
Indeed, I think we should presently ban AGI for at least a very long time; I think it's plausible that there is no time such that it is fine at time to make an AI that is (1) more capable than humans/humanity at time and (2) not just a continuation of a human (like, a mind upload) or humanity or sth like that; and I think fooming should probably be carefully regulated forever. I think humans/humanity should be carefully growing ever more capable, with no non-human AIs above humans/humanity plausibly ever. ↩︎
If we replaced "more advanced minds" with "minds that are better at doing very difficult stuff" or other reasonable alternatives, I would still make the (a) vs (b) distinction, and still say type (b) claims are suspicious.
I think I mostly agree with everything you say in this last comment, but I don't see how my previous comment disagreed with any of that either?
The thing I care about here is not "what happens as a mind grows", in some abstract sense. The thing I care about is, "what is the best way for a powerful system to accomplish a very difficult goal quickly/reliably?" (which is what we want the AI for)
My lists were intended to be about that. We could rewrite the first list in my previous comment to:
and the second list to:
I think I probably should have included "I don't actually know what to do with any of this, because I'm not sure what's confusing about "Intelligence in the limit."" in the part of your shortform I quoted in my first comment — that's the thing I'm trying to respond to. The point I'm making is:
But the basic concept of "well, if it was imperfect at either not-getting-resource-pumped, or making suboptimal game theory choices, or if it gave up when it got stuck, it would know that it wasn't as cognitively powerful as it could be, and would want to find ways to be more cognitively powerful all-else-equal"... seems straightforward to me, and I'm not sure what makes it not straightforward seeming to others
I think there's a true and fairly straightforward thing here and also a non-straightforward-to-me and in fact imo false/confused adjacent thing. The true and fairly straightforward thing is captured by stuff like:
The non-straightforward-to-me and in fact imo probably in at least some important sense false/confused adjacent thing is captured by stuff like:
Hopefully it's clear from this what the distinction is, and hopefully one can at least "a priori imagine" these two things not being equivalent.[1] I'm not going to give an argument for propositions in the latter cluster being false/confused here[2], at least not in the present comment, but I say a bunch of relevant stuff here and I make a small relevant point here.
That said, I think one can say many/most MIRI-esque things without claiming that minds get close to having these properties and without claiming that a growing mind approaches some limit.
If you can't imagine it at first, maybe try imagining that the growing mind faces a "growing world" — an increasingly difficult curriculum of games etc.. For example, you could have it suck a lot less at playing tic-tac-toe than it used to but still suck a lot at chess, and if it used to play tic-tac-toe but it's playing chess now then there is a reasonable sense in which it could easily be further from playing optimal moves now — like, if we look at its skill at the games it is supposed to be playing now. Alternatively, when judging how much it sucks, we could always integrate across all games with a measure that isn't changing in time, but still end up with the verdict that it is always infinitely far from not sucking at games at any finite time, and that it always has more improvements to make (negentropy or whatever willing) than it has already made. ↩︎
beyond what I said in the previous footnote :) ↩︎
(For context: My guess is that by default, humans get disempowered by AIs (or maybe a single AI) and the future is much worse than it could be, and in particular is much worse than a future where we do something like slowly and thoughtfully growing ever more intelligent ourselves instead of making some alien system much smarter than us any time soon.)
Given that you seem to think alignment of AI systems with developer intent happens basically by default at this point, I wonder what you think about the following:
(The point of the hypothetical is to investigate the difficulty of intent alignment at the relevant level of capability, so if it seems to you like it's getting at something quite different, then I've probably failed at specifying a good hypothetical. I offer some clarifications of the setup in the appendix that may or may not save the hypothetical in that case.)
My sense is that humanity is not remotely on track to be able to make such an AI in time. Imo by default, any superintelligent system we could make any time soon would minimally end up doing all sorts of other stuff and in particular would not follow the suicide directive.
If your response is "ok maybe this is indeed quite cursed but that doesn't mean it's hard to make an AI that takes over and has Human Values and serves as a guardian who also cures cancer and maybe makes very many happy humans and maybe ends factory farming and whatever" then I premove the counter-response "hmm well we could discuss that hope but wait first: do you agree that you just agreed that intent alignment is really difficult at the relevant capability level?".
If your response is "no this seems pretty easy actually" then I should argue against that but I'm not going to premove that counter-response.
Appendix: some clarifications on the hypothetical