I think I mostly agree with everything you say in this last comment, but I don't see how my previous comment disagreed with any of that either?
The thing I care about here is not "what happens as a mind grows", in some abstract sense. The thing I care about is, "what is the best way for a powerful system to accomplish a very difficult goal quickly/reliably?" (which is what we want the AI for)
My lists were intended to be about that. We could rewrite the first list in my previous comment to:
and the second list to:
I think I probably should have included "I don't actually know what to do with any of this, because I'm not sure what's confusing about "Intelligence in the limit."" in the part of your shortform I quoted in my first comment — that's the thing I'm trying to respond to. The point I'm making is:
But the basic concept of "well, if it was imperfect at either not-getting-resource-pumped, or making suboptimal game theory choices, or if it gave up when it got stuck, it would know that it wasn't as cognitively powerful as it could be, and would want to find ways to be more cognitively powerful all-else-equal"... seems straightforward to me, and I'm not sure what makes it not straightforward seeming to others
I think there's a true and fairly straightforward thing here and also a non-straightforward-to-me and in fact imo false/confused adjacent thing. The true and fairly straightforward thing is captured by stuff like:
The non-straightforward-to-me and in fact imo probably in at least some important sense false/confused adjacent thing is captured by stuff like:
Hopefully it's clear from this what the distinction is, and hopefully one can at least "a priori imagine" these two things not being equivalent.[1] I'm not going to give an argument for propositions in the latter cluster being false/confused here[2], at least not in the present comment, but I say a bunch of relevant stuff here and I make a small relevant point here.
That said, I think one can say many/most MIRI-esque things without claiming that minds get close to having these properties and without claiming that a growing mind approaches some limit.
If you can't imagine it at first, maybe try imagining that the growing mind faces a "growing world" — an increasingly difficult curriculum of games etc.. For example, you could have it suck a lot less at playing tic-tac-toe than it used to but still suck a lot at chess, and if it used to play tic-tac-toe but it's playing chess now then there is a reasonable sense in which it could easily be further from playing optimal moves now — like, if we look at its skill at the games it is supposed to be playing now. Alternatively, when judging how much it sucks, we could always integrate across all games with a measure that isn't changing in time, but still end up with the verdict that it is always infinitely far from not sucking at games at any finite time, and that it always has more improvements to make (negentropy or whatever willing) than it has already made. ↩︎
beyond what I said in the previous footnote :) ↩︎
Here's a question that came up in a discussion about what kind of future we should steer toward:
a couple points in response:
I guess one could imagine a future in which someone tiles the world with happy humans of the current year variety or something, but imo this is highly unlikely even conditional on the future being human-shaped, and also much worse than futures in which a wild variety of galaxy-human stuff is going on. Background context: imo we should probably be continuously growing more capable/intelligent ourselves for a very long time (and maybe forever), with the future being determined by us "from inside human life", as opposed to ever making an artificial system that is more capable than humanity and fairly separate/distinct from humanity that would "design human affairs from the outside" (really, I think we shouldn't be making [AIs more generally capable than individual humans] of any kind, except for ones that just are smarter versions of individual humans, for a long time (and maybe forever); see this for some of my thoughts on these topics). ↩︎
maybe we should pick a longer time here, to be comparing things which are more alike? ↩︎
I think this is probably true even if we condition the rollout on you coming to understand the world in the videos quite well. ↩︎
But if you disagree here, then I think I've already finished [the argument that the human far future is profoundly better] which I want to give to you, so you could stop reading here — the rest of this note just addresses a supposed complication you don't believe exists. ↩︎
much like you could grow up from a kid into a mathematician or a philosopher or an engineer or a composer, thinking in each case that the other paths would have been much worse ↩︎
Unlike you growing up in isolation, that galaxy-you's activities and judgment and growth path will be influenced by others; maybe it has even merged with others quite fully. But that's probably how things should be, anyway — we probably should grow up together; our ordinary valuing is already done together to a significant extent (like, for almost all individuals, the process determining (say) the actions of that individual already importantly involves various other individuals, and not just in a way that can easily be seen as non-ethical). ↩︎
There might be some stuff that's really difficult to make sense of here — it is imo plausible that the ethical cognition that a certain kind of all-seeing spacetime-block-chooser would need to have to make good choices is quite unlike any ethical cognition that exists (or maybe even could exist) in our universe. That said, we can imagine a more mundane spacetime-block-chooser, like a clone of you that gets to make a single life choice for you given ordinary information about the decision and that gets deleted after that; it is easier to imagine this clone having ethical cognition that leads to it making reasonably good decisions. ↩︎
I won't address why [AIs that humans create] might[1] have their own alien values (so I won't address the "turning against us" part of your comment), but on these AIs outcompeting humans[2]:
While I'm probably much more of a lib than you guys (at least in ordinary human contexts), I also think that people in AI alignment circles mostly have really silly conceptions of human valuing and the historical development of values.[1] I touch on this a bit here. Also, if you haven't encountered it already, you might be interested in Hegel's work on this stuff — in particular, The Phenomenology of Spirit.
This isn't to say that people in other circles have better conceptions... ↩︎
It's how science works: You focus on simple hypotheses and discard/reweight them according to Bayesian reasoning.
There are some ways in which solomonoff induction and science are analogous[1], but there are also many important ways in which they are disanalogous. Here are some ways in which they are disanalogous:
for example, that usually, a scientific theory could be used for making at least some fairly concrete predictions ↩︎
To be clear: I don't intend this as a full description of the character of a scientific theory — e.g., I haven't discussed how it gets related to something practical/concrete like action (or maybe (specifically) prediction). A scientific theory and a theory-in-the-sense-used-in-logic are ultimately also disanalogous in various ways — I'm only claiming it's a better analogy than that between a scientific theory and a predictive model. ↩︎
However, the reference class that includes the theory of computation is one possible reference class that might include the theory of agents.[1] But for all (I think) we know, the reference class we are in might also be (or look more like) complex systems studies, where you can prove a bunch of neat things, but there's also a lot of behavior that is not computationally reducible and instead you need to observe, simulate, crunch the numbers. Moreover, noticing surprising real-world phenomena can serve as a guide to your attempts to explain the observed phenomena in ~mathematical terms (e.g., how West et al. explained (or re-derived) Kleiber's law from the properties of intra-organismal resource supply networks[2]). I don't know what the theory will look like; to me, its shape remains an open a posteriori question.
along an axis somewhat different than the main focus here, i think the right picture is: there is a rich field of thinking-studies. it’s like philosophy, math, or engineering. it includes eg Chomsky's work on syntax, Turing’s work on computation, Gödel’s work on logic, Wittgenstein’s work on language, Darwin's work on evolution, Hegel’s work on development, Pascal’s work on probability, and very many more past things and very many more still mostly hard-to-imagine future things. given this, i think asking about the character of a “theory of agents” would already soft-assume a wrong answer. i discuss this here
i guess a vibe i'm trying to communicate is: we already have thinking-studies in front of us, and so we can look at it and get a sense of what it's like. of course, thinking-studies will develop in the future, but its development isn't going to look like some sort of mysterious new final theory/science being created (though there will be methodological development (like for example the development of set-theoretic foundations in mathematics, or like the adoption of statistics in medical science), and many new crazy branches will be developed (of various characters), and we will surely resolve various particular questions in various ways (though various other questions call for infinite investigations))
Hmm, thanks for telling me, I hadn't considered that. I think I didn't notice this in part because I've been thinking of the red-black circle as being "canceled out"/"negated" on the flag, as opposed to being "asserted". But this certainly wouldn't be obvious to someone just seeing the flag.
I designed a pro-human(ity)/anti-(non-human-)AI flag:
Feel free to suggest improvements to the flag. Here's latex to generate it:
% written mostly by o3 and o4-mini-high, given k's prompting
% an anti-AI flag. a HAL "eye" (?) is covered by a vitruvian man star
\documentclass[tikz]{standalone}
\usetikzlibrary{calc}
\usepackage{xcolor} % for \definecolor
\definecolor{UNBlue}{HTML}{5B92E5}
\begin{document}
\begin{tikzpicture}
%--------------------------------------------------------
% flag geometry
%--------------------------------------------------------
\def\flagW{6cm} % width -> 2 : 3 aspect
\def\flagH{4cm} % height
\def\eyeR {1.3cm} % HAL-eye radius
% light-blue background
\fill[UNBlue] (0,0) rectangle (\flagW,\flagH);
%--------------------------------------------------------
% concentric “HAL eye” (outer-most ring first)
%--------------------------------------------------------
\begin{scope}[shift={(\flagW/2,\flagH/2)}] % centre of the flag
\foreach \f/\c in {%
1.00/black,
.68/{red!50!black},
.43/{red!80!orange},
.1/orange,
.05/yellow}%
{%
\fill[fill=\c,draw=none] circle ({\f*\eyeR});
}
%── parameters ───────────────────────────────────────
\def\R{\eyeR} % distance from centre to triangle’s tip
\def\Alpha{10} % full apex angle (°)
%── compute half-angle & half-base once ─────────────
\pgfmathsetmacro\halfA{\Alpha/2}
\pgfmathsetlengthmacro\halfside{\R*tan(\halfA)}
%── loop over Vitruvian‐man angles ───────────────────
\foreach \Beta in {0,30,90,150,180,240,265,275,300} {%
% apex on the eye‐rim
\coordinate (A) at (\Beta:\R);
% base corners offset ±90°
\coordinate (B) at (\Beta+90:\halfside);
\coordinate (C) at (\Beta-90:\halfside);
% fill the spike
\path[fill=white,draw=none] (A) -- (B) -- (C) -- cycle;
}
\end{scope}
\end{tikzpicture}
\end{document}
If we replaced "more advanced minds" with "minds that are better at doing very difficult stuff" or other reasonable alternatives, I would still make the (a) vs (b) distinction, and still say type (b) claims are suspicious.