Beren Millidge has an essay arguing for the claim that a future in which humanity proceeds with biological capability increasing is scarier than a future in which we develop AGI. Here are some concluding sentences from his essay:
Ultimately, human intelligence amplification and the resulting biosingularity has a deeper and more intractable alignment problem than AI alignment, at least if we don’t just assume it away by asserting that humans and our transhuman creations just have some intrinsic and ineffable access to ‘human values’ that potential AIs lack.
The only potential positive as regards alignment of the biosingularity is that it will happen much later, most likely in the closing decades of the 21st century and around the end of the natural lifespans of my personal cohort. This gives significantly more time to prepare than AGI, which is likely coming much sooner, but the problem is much harder and requires huge advances in neuroscience and understanding of brain algorithms to even reach the level of control we have over today’s AI systems (which is likely far from sufficient).
I disagree with his thesis — I think that instead of creating AIs smarter than humans, it would be much better to proceed with increasing the capabilities of humans (for at least the next 100 years). [1] [2] Millidge's claim that the only potential positive of a biofoom is that it starts later seems clearly false. In this note, I will list other imo important (pro tanto) positives of a human foom. I agree with Millidge that there are also positives of the AGI path; [3] these won't be discussed in the present note. To assess which path is better overall, one could want to compare the positives of the human path to the positives of the AGI path, [4] but I will not do that here. The rest of this note is the list of positives. [5]
an argument in favor of the human path:
we can also give a similar argument for single humans:
said another way:
an analogy:
about humans, we know the following:
also:
self-improvement generally has many good properties over creating a new agent/mind from scratch [15]
i think we should ban AGI ↩︎
It is also a possibility that both options are bad. My view is that we should push forward with increasing human capabilities biologically and culturally/educationally. But I think this question deserves serious analysis, and there are certainly specific things here that one should be very careful about and regulate. However, this note will not be analyzing this question. ↩︎
That said, I think the analysis of the positives in his essay gets very many things wrong. ↩︎
But one also doesn't have to do that, to compare the two paths. One can also just "directly" think about what would happen along each path. ↩︎
They are not listed in order of importance. ↩︎
clearly this is a reason for these to be around for longer in wall clock time, but also it's a reason for them to be around until higher capability levels ↩︎
in fact it seems plausible that, at least if the mind design has to be done from your own bounded perspective, it will keep being better to self-improve forever. i think this is plausible on this individual future life coolness axis and also all-things-considered. ↩︎
ok, if you (imo incorrectly) believe in some sort of soul theory of personal identity then maybe to make this a fair example you would need to imagine the soul getting detached in all three examples, but then maybe you will think all of these are suicides... so maybe this isn't a good analogy for you... but hmm i guess maybe you should also in the same sense believe in there being a soul attached to each society though, and then it would be a good analogy actually ↩︎
ok, there's a meaningful amount of shared culture. even if you thought human minds and AI minds are "mostly cultural" and that subbing out representation/learning/etc algorithms/structures and radically changing learning contexts doesn't make it legit to say AGI will be a novel thing created largely in novel ways from scratch, it is still at least a really big step; it's still creating some totally new guy ↩︎
as Millidge says ↩︎
yes, there are sort of examples like climate change. but it is still much easier to imagine entities that are just programs that can be run on arbitrary computers being totally fine with or even preferring a very different environment. there is a large difference in degree here ↩︎
and we at least know humans have a propensity to carry out and be moved by this style of reasoning ↩︎
or three labs or whatever. In reality, I think only a single lab will matter, absent strict capability regulation. ↩︎
there is a major issue around diffuse effects — like, people and institutions currently not tracking [decreasing the lifespan of
this is also important to track when thinking about AIs making more capable AIs ↩︎
Hmmm, some good points. Clearly if I write something to complex this will take way too long and therefore, the cool voice is good.
Okay, yes humans can be cool but all humans cool?
Maybe some governments not cool? How does AI vs Bio affect if one big cool or many small cool? What if like homo deus we get separate Very Smart group of humans and one not so smart?
Human worse thing done worse than LLM worse thing done? Less control over range of expression? Moral mazes lead to psychopaths in control? Maybe non cool humans take control?
Yet, slow process good point. Coolness better chance if longer to remain cool.
Maybe democracy + debate cool? Totalitarianism not cool? Coolness is not group specific for AI or human? Coolness about how cool the decision process is? What does coolness attractor look like?
Cool.
What if like homo deus we get separate Very Smart group of humans and one not so smart?
I agree this would most likely be either somewhat bad or quite bad, probably quite bad (depending on the details), both causally and also indicatorily.
I'll restrict my discussion to reprogenetics, because I think that's the main way we get smarter humans. My responses would be "this seems quite unlikely in the short run (like, a few generations)" and "this seems pretty unlikely in the longer run, at least assuming it's in fact bad" and "there's a lot we can do to make those outcomes less likely (and less bad)".
Why unlikely in the short run? Some main reasons:
What can we do? Without going into detail on how, I'll just basically say "greatly increase equality of access through innovation and education, and research cultures to support those things".
Okay, I think the gradual point is a good one and also that it very much helps our institutions to be able to deal with increased intelligence.
I would be curious what you think about the idea of more permanent economic rifts and also the general economics of gene editing? Might it be smart to make it a public good instead?
Maybe there's something here about IQ being hereditary already and thus the point about a more permanent two caste society with smart and stupid people is redundant but somehow I still feel that the economics of private gene editing over long periods of time feels a bit off?
I would be curious what you think about the idea of more permanent economic rifts and also the general economics of gene editing?
As a matter of science and technology, reprogenetics should be inexpensive. I've analyzed this area quite a bit (though not focused specifically on eventual cost), see https://berkeleygenomics.org/articles/Methods_for_strong_human_germline_engineering.html . My fairly strong guess is that it's perfectly feasible to have strong reprogenetics that's pretty inexpensive (on the order of $5k to $25k for a pretty strongly genomically vectored zygote). From a tech and science perspective, I think I see multiple somewhat-disjunctive ways, each of which is pretty plausibly feasible, and each of which doesn't seem to have any inputs that can't be made inexpensive.
(As a comparison point, IVF is expensive--something like $8k to $20k--but my guess is that this is largely because of things like regulatory restrictions (needing an MD to supervise egg retrieval, even though NPs can do it well), drug price lock-in (the drugs are easy to manufacture, so are available cheaply on gray markets), and simply economic friction/overhang (CNY is cheaper basically by deciding to be cheaper and giving away some concierge-ness). None of this solves things for IVF today; I'm just saying, it's not expensive due to the science and tech costing $20k.)
Assuming that it can be technically inexpensive, that cuts out our work for us: make it be inexpensive, by
Might it be smart to make it a public good instead?
I definitely think that
Is that what you mean? I don't think we can rely on gvt and philanthropic funding to build out a widely-accessible set of clinics / other practical reprogenetics services, so if you meant nationalizing the industry, my guess is no, that would be bad to do.
I meant the basic economy way of defining public good, not necessarily the distribution mchanism, electricity and water are public goods but they aren't necessarily determined by the government.
I've had the semi ironic idea of setting up a "genetic lottery" if supply was capped as it would redistribute things evenly (as long as people sign up evenly which is not true).
Anyways, cool stuff, happy that someone is on top of this!
Okay, yes humans can be cool but all humans cool?
generally, humans are cool. in fact probably all current humans are intrinsically cool. a few are suffering very badly and say they would rather not exist, and in some cases their lives have been net negative so far. we should try to help these people. some humans are doing bad things to other humans and that's not cool. some humans are sufficiently bad to others that it would have been better if they were never born. such humans should be rehabilitated and/or contained, and conditions should be maintained/created in which this is disincentivized
Coolness is not group specific for AI or human?
not group specific in principle, but human life is pro tanto strongly cooler. but eg a mind uploaded human society would still be cool. continuing human life is very important. deep friendships with aliens should not be ruled out in principle, but should be approached with great caution. any claim that we should already care deeply about the possible lives of some not-specifically-chosen aliens that we might create, that we haven't yet created, and so that we have great reason to create them, is prima facie very unlikely. this universe probably only has negentropy for so many beings (if you try to dovetail all possible lives, you won't even get to running any human for a single step); we should think extremely carefully about which ones we create and befriend
What if like homo deus we get separate Very Smart group of humans and one not so smart?
Human worse thing done worse than LLM worse thing done? Less control over range of expression? Moral mazes lead to psychopaths in control? Maybe non cool humans take control?
i agree these are problems that would need to be handled on the human path
Moral mazes lead to psychopaths in control? Maybe non cool humans take control?
This is a significant worry--but my guess is that having lots more really smart people would make the problem get better in the long run. That stuff is already happening. Figuring out how to avoid it is a very difficult unsolved problem, which is thus likely to be heavily bottlenecked on ideas of various kinds (e.g. ideas for governance, for culture, for technology to implement good journalism, etc etc.).
Hmmm but what if human good not coupled with human wisdom? Maybe more intelligence more power seeking if not carefully implemented?
Probably better than doing the Big AI though.
Hmmm but what if human good not coupled with human wisdom? Maybe more intelligence more power seeking if not carefully implemented?
I think this is just not the case; I'd guess it's slightly the opposite on average, but in any case, I've never heard anyone make an argument for this based on science or statistics. (There could very well be a good such arguments, curious to hear!)
Separately, I'd suggest that humanity is bottlenecked on good ideas--including ideas for how to have good values / behave well / accomplish good things / coordinate on good things / support other people in getting more good. A neutral/average-goodness human, but smart, would I think want to contribute to those problems, and be more able to do so.
I'll share a paper I remember seeing on the ability to do motivated reasoning and holding onto false views being higher for higher iq people tomorrow (if it actually is statistically significant).
Also maybe the more important things to improve after a certain IQ might be openness and conscientiousness? Thoughts on that?
I do think that it actually is quite possible to do some gene editing on big 5 and ethics tbh but we just gotta actually do it.
Personality is a more difficult issue because
That said, yeah, I'm in favor of working out how to do it well. E.g. I'm interested in understanding and eventually measuring "wisdom" https://www.lesswrong.com/posts/fzKfzXWEBaENJXDGP/what-is-wisdom-1 .
I would agree that this is a weird incentive issue and that IQ is probably easier and less thorny than personality traits. With that being said here's a fun little thought on alternative ways of looking at intelligence:
Okay but why is IQ a lot more important than "personality"?
IQ being measured as G and based on correlational evidence about your ability to progress in education and work life. This is one frame to have on it. I think it correlates a lot of things about personality into a view that is based on a very specific frame from a psychometric perspective?
Okay, let's look at intelligence from another angle, we use the predictive processing or RL angle that's more about explore exploit, how does that show up? How do we increase the intelligence of a predictive processing agent? How does the parameters of when to explore and when to exploit and the time horizon of future rewards?
Openness here would be the proclivity to explore and look at new sources of information whilst conscientiousness is about the time horizon of the discouting factor in reward learning. (Correlatively but you could probably define new better measures of this, the big 5 traits are probably not the true names for these objectives.)
I think it is better for a society to be able to talk to each other and integrate information well hence I think we should make openness higher from a collective intelligence perspective. I also think it is better if we imagine that we're playing longer form games with each other as that generally leads to more cooperative equilibria and hence I think conscientiousness would also be good if it is higher.
(The paper I saw didn't replicate btw so I walk back the intelligence makes you more ignorant point. )
(Also here's a paper talking about the ability to be creative having a threshold effect around 120 iq with openness mattering more after that, there's a bunch more stuff like this if you search for it.)
(Also here's a paper talking about the ability to be creative having a threshold effect around 120 iq with openness mattering more after that, there's a bunch more stuff like this if you search for it.)
To speculate, it might be the case that effects like this one are at least to some extent due to the modern society not being well-adapted to empowering very-high-g[1] people, and instead putting more emphasis on "no one being left behind"[2]. Like, maybe you actually need a proper supportive environment (that is relatively scarce in the modern world) to reap the gains from very high g, in most cases.
(Not confident about the size of the effect (though I'm sure it's at least somewhat true) or about the relevance for the study you're citing, especially after thinking it through a bit after writing this, but I'm leaving it for the sake of expanding the hypothesis space.)
But, if it's not that, then the threshold thing is interesting and weird.
I would hypothesise that it is more about the underlying ability to use the engine that is intelligence. If we do the classic eliezer definition (i think it is in the sequences at least) of the ability to hit a target then that is only half of the problem because you have to choose a problem space as well.
Part of intelligence is probably choosing a good problem space but I think the information sampling and the general knowledge level of the people and institutions and general information sources around you is quite important to that sampling process. Hence if you're better at integrating diverse sources of information then you're likely better at making progress.
Finally I think there's something about some weird sort of scientific version of frame control where a lot of science is about asking the right question and getting exposure to more ways of asking questions lead to better ways of asking questions.
So to use your intelligence you need to wield it well and wielding it well partly involves working on the right questions. But if you're not smart enough to solve the questions in the first place it doesn't really matter if you ask the right question.
Having found myself repeating the same points/claims in various conversations about what NN learning is like (especially around singular learning theory), I figured it's worth writing some of them down. My typical confidence in a claim below is like 95%[1]. I'm not claiming anything here is significantly novel. The claims/points:
(* but it looks to me like learning theory is unfortunately hard to make relevant to ai alignment[9])
these thoughts are sorta joint with Jake Mendel and Dmitry Vaintrob (though i'm making no claim about whether they'd endorse the claims). also thank u for discussions: Sam Eisenstat, Clem von Stengel, Lucius Bushnaq, Zach Furman, Alexander Gietelink Oldenziel, Kirke Joamets
with the important caveat that, especially for claims involving 'circuits'/'structures', I think it's plausible they are made in a frame which will soon be superseded or at least significantly improved/clarified/better-articulated, so it's a 95% given a frame which is probably silly ↩︎
train loss in very overparametrized cases is an exception. in this case it might be interesting to note that optima will also be off at infinity if you're using cross-entropy loss, https://arxiv.org/pdf/2006.06657 ↩︎
also, gradient descent is very far from doing optimal learning in some solomonoff sense — though it can be fruitful to try to draw analogies between the two — and it is also very far from being the best possible practical learning algorithm ↩︎
by it being a law of large numbers phenomenon, i mean sth like: there are a bunch of structures/circuits/pattern-completers that could be learned, and each one gets learned with a certain probability (or maybe a roughly given total number of these structures gets learned), and loss is roughly some aggregation of indicators for whether each structure gets learned — an aggregation to which the law of large numbers applies ↩︎
to say more: any concept/thinking-structure in general has to be invented somehow — there in some sense has to be a 'sensible path' to that concept — but any local learning process is much more limited than that still — now we're forced to have a path in some (naively seen) space of possible concepts/thinking-structures, which is a major restriction. eg you might find the right definition in mathematics by looking for a thing satisfying certain constraints (eg you might want the definition to fit into theorems characterizing something you want to characterize), and many such definitions will not be findable by doing sth like gradient descent on definitions ↩︎
ok, (given an architecture and a loss,) technically each point in the loss landscape will in fact have a different local neighborhood, so in some sense we know that the probability of getting to a point is a function of its neighborhood alone, but what i'm claiming is that it is not nicely/usefully a function of its neighborhood alone. to the extent that stuff about this probability can be nicely deduced from some aspect of the neighborhood, that's probably 'logically downstream' of that aspect of the neighborhood implying something about nice paths to the point. ↩︎
also note that the points one ends up at in LLM training are not local minima — LLMs aren't trained to convergence ↩︎
i think identifying and very clearly understanding any toy example where this shows up would plausibly be better than anything else published in interp this year. the leap complexity paper does something a bit like this but doesn't really do this ↩︎
i feel like i should clarify here though that i think basically all existing alignment research fails to relate much to ai alignment. but then i feel like i should further clarify that i think each particular thing sucks at relating to alignment after having thought about how that particular thing could help, not (directly) from some general vague sense of pessimism. i should also say that if i didn't think interp sucked at relating to alignment, i'd think learning theory sucks less at relating to alignment (ie, not less than interp but less than i currently think it does). but then i feel like i should further say that fortunately you can just think about whether learning theory relates to alignment directly yourself :) ↩︎
Simon-Pepin Lehalleur weighs in on the DevInterp Discord:
I think his overall position requires taking degeneracies seriously: he seems to be claiming that there is a lot of path dependency in weight space, but very little in function space 😄
In general his position seems broadly compatible with DevInterp:
This perspective certainly is incompatible with a naive SGD = Bayes = Watanabe's global SLT learning process, but I don't think anyone has (ever? for a long time?) made that claim for non toy models.
It seems that the difference with DevInterp is that
Ok, no pb. You can also add the following :
I am sympathetic but also unsatisfied with a strong empiricist position about deep learning. It seems to me that it is based on a slightly misapplied physical, and specifically thermodynamical intuition. Namely that we can just observe a neural network and see/easily guess what the relevant "thermodynamic variables" of the system.
For ordinary 3d physical systems, we tend to know or easily discover those thermodynamic variables through simple interactions/observations. But a neural network is an extremely high-dimensional system which we can only "observe" through mathematical tools. The loss is clearly one such thermodynamic variable, but if we expect NN to be in some sense stat mech systems it can't be the only one (otherwise the learning process would be much more chaotic and unpredictable). One view of DevInterp is that we are "just" looking for those missing variables...
I'd be curious about hearing your intuition re " i'm further guessing that most structures basically have 'one way' to descend into them"
There are people who think (imo correctly) that there will be at least one vastly superhuman AI in the next 100 years by default and (imo incorrectly) that proceeding along the AI path does not lead to human extinction or disempowerment by default. My anecdotal impression is that a significant fraction (maybe most) of such people think (imo incorrectly) that letting Anthropic/Claude do recursive self-improvement and be a forever-sovereign would probably go really well for humanity. The point of this note is to make the following proposal and request: if you ever let an AI self-improve, or more generally if you have AIs creating successor AIs, or even more generally if you let the AI world develop and outpace humans in some other way, or if you try to run some process where boxed AIs are supposed to create an initial ASI sovereign, or if you try to have AIs "solve alignment" [1] (in one of the ways already listed, or in some other way), or if you are an AI (or human mind upload) involved in some such scheme, [2] try to make it so the following property is upheld:
I think this is probably a bad term that should be deprecated ↩︎
well, at least if the year is and we're not dealing with a foom of extremely philosophically competent and careful mind uploads or whatever, firstly, you shouldn't be running a foom (except for the grand human foom we're already in). secondly, please think more. thirdly, please try to shut down all other AGI attempts and also your lab and maybe yourself, idk in which order. but fourthly, ... ↩︎
This will plausibly require staying ahead of humanity in capabilities in this galaxy forever, so this will be extremely capable AI. So, when I say the galaxy is AGI-free, I don't mean that artificial generally intelligent systems are not present in the galaxy. I mean that these AIs are supposed to have no involvement in human life except for enforcing an AI ban. ↩︎
or like at least "their values" ↩︎
and assuming we aren't currently massively overestimating the amount of resources accessible to Earth-originating creatures ↩︎
or maybe we do some joint control thing about which this is technically false but about which it is still pretty fair to say that each person got more of a say than if they merely controlled of all the resources ↩︎
an intuition pump: as an individual human, it seems possible to keep carefully developing for a long time without accidentally killing oneself; we just need to make society have analogues of whatever properties/structures make this possible in an individual human ↩︎
Btw, a pro tip for weathering the storm of crazymessactivitythoughtdevelopmenthistory: be the (generator of the) storm. I.e., continue acting and thinking and developing as humanity. Also, pulling ourselves up by our own bootstraps is based imo. Wanting to have a mommy AI think for us is pretty cringe imo. ↩︎
Among currently accessible RSI processes, there is one exception: it is in fact fine to have normal human development continue. ↩︎
Ok, really humans (should) probably importantly have lives and values together, so it would be more correct to say: there is a particular infinite contribution to human life/valuing waiting to grow out of each person. Or: when a person is lost, an important aspect of God) is lost. But the simpler picture is fine for making my current point. ↩︎
I'm imagining humanity fracturing into a million or billion different galaxies depending upon their exact level of desire for interacting with AI. I think the human value of the unity of humanity would be lost.
I think we need to buffer people from having to interact with AI if they don't want to. But I value having other humans around. So some thing in between everyone living in their perfect isolation and everyone being dragged kicking and screaming into the future is where I think we should aim.
Here's a question that came up in a discussion about what kind of future we should steer toward:
a couple points in response:
I guess one could imagine a future in which someone tiles the world with happy humans of the current year variety or something, but imo this is highly unlikely even conditional on the future being human-shaped, and also much worse than futures in which a wild variety of galaxy-human stuff is going on. Background context: imo we should probably be continuously growing more capable/intelligent ourselves for a very long time (and maybe forever), with the future being determined by us "from inside human life", as opposed to ever making an artificial system that is more capable than humanity and fairly separate/distinct from humanity that would "design human affairs from the outside" (really, I think we shouldn't be making [AIs more generally capable than individual humans] of any kind, except for ones that just are smarter versions of individual humans, for a long time (and maybe forever); see this for some of my thoughts on these topics). ↩︎
maybe we should pick a longer time here, to be comparing things which are more alike? ↩︎
I think this is probably true even if we condition the rollout on you coming to understand the world in the videos quite well. ↩︎
But if you disagree here, then I think I've already finished [the argument that the human far future is profoundly better] which I want to give to you, so you could stop reading here — the rest of this note just addresses a supposed complication you don't believe exists. ↩︎
much like you could grow up from a kid into a mathematician or a philosopher or an engineer or a composer, thinking in each case that the other paths would have been much worse ↩︎
Unlike you growing up in isolation, that galaxy-you's activities and judgment and growth path will be influenced by others; maybe it has even merged with others quite fully. But that's probably how things should be, anyway — we probably should grow up together; our ordinary valuing is already done together to a significant extent (like, for almost all individuals, the process determining (say) the actions of that individual already importantly involves various other individuals, and not just in a way that can easily be seen as non-ethical). ↩︎
There might be some stuff that's really difficult to make sense of here — it is imo plausible that the ethical cognition that a certain kind of all-seeing spacetime-block-chooser would need to have to make good choices is quite unlike any ethical cognition that exists (or maybe even could exist) in our universe. That said, we can imagine a more mundane spacetime-block-chooser, like a clone of you that gets to make a single life choice for you given ordinary information about the decision and that gets deleted after that; it is easier to imagine this clone having ethical cognition that leads to it making reasonably good decisions. ↩︎
I will be appropriating terminology from the Waluigi post. I hereby put forward the hypothesis that virtue ethics endorses an action iff it is what the better one of Luigi and Waluigi would do, where Luigi and Waluigi are the ones given by the posterior semiotic measure in the given situation, and "better" is defined according to what some [possibly vaguely specified] consequentialist theory thinks about the long-term expected effects of this particular Luigi vs the long-term effects of this particular Waluigi. One intuition here is that a vague specification could be more fine if we are not optimizing for it very hard, instead just obtaining a small amount of information from it per decision.
In this sense, virtue ethics literally equals continuously choosing actions as if coming from a good character. Furthermore, considering the new posterior semiotic measure after a decision, in this sense, virtue ethics is about cultivating a virtuous character in oneself. Virtue ethics is about rising to the occasion (i.e. the situation, the context). It's about constantly choosing the Luigi in oneself over the Waluigi in oneself (or maybe the Waluigi over the Luigi if we define "Luigi" as the more likely of the two and one has previously acted badly in similar cases or if the posterior semiotic measure is otherwise malign). I currently find this very funny, and, if even approximately correct, also quite cool.
Here are some issues/considerations/questions that I intend to think more about:
I designed a pro-human(ity)/anti-(non-human-)AI flag:
Feel free to suggest improvements to the flag. Here's latex to generate it:
% written mostly by o3 and o4-mini-high, given k's prompting
% an anti-AI flag. a HAL "eye" (?) is covered by a vitruvian man star
\documentclass[tikz]{standalone}
\usetikzlibrary{calc}
\usepackage{xcolor} % for \definecolor
\definecolor{UNBlue}{HTML}{5B92E5}
\begin{document}
\begin{tikzpicture}
%--------------------------------------------------------
% flag geometry
%--------------------------------------------------------
\def\flagW{6cm} % width -> 2 : 3 aspect
\def\flagH{4cm} % height
\def\eyeR {1.3cm} % HAL-eye radius
% light-blue background
\fill[UNBlue] (0,0) rectangle (\flagW,\flagH);
%--------------------------------------------------------
% concentric “HAL eye” (outer-most ring first)
%--------------------------------------------------------
\begin{scope}[shift={(\flagW/2,\flagH/2)}] % centre of the flag
\foreach \f/\c in {%
1.00/black,
.68/{red!50!black},
.43/{red!80!orange},
.1/orange,
.05/yellow}%
{%
\fill[fill=\c,draw=none] circle ({\f*\eyeR});
}
%── parameters ───────────────────────────────────────
\def\R{\eyeR} % distance from centre to triangle’s tip
\def\Alpha{10} % full apex angle (°)
%── compute half-angle & half-base once ─────────────
\pgfmathsetmacro\halfA{\Alpha/2}
\pgfmathsetlengthmacro\halfside{\R*tan(\halfA)}
%── loop over Vitruvian‐man angles ───────────────────
\foreach \Beta in {0,30,90,150,180,240,265,275,300} {%
% apex on the eye‐rim
\coordinate (A) at (\Beta:\R);
% base corners offset ±90°
\coordinate (B) at (\Beta+90:\halfside);
\coordinate (C) at (\Beta-90:\halfside);
% fill the spike
\path[fill=white,draw=none] (A) -- (B) -- (C) -- cycle;
}
\end{scope}
\end{tikzpicture}
\end{document}
I like the concept; on the other hand the flag feels strongly fascist to me.
Ran it by the AIs and 2 out of 3 had "authoritarian" as their first descriptor responding to "What political alignment does the aesthetic of this flag evoke?" FWIW.
Hmm, thanks for telling me, I hadn't considered that. I think I didn't notice this in part because I've been thinking of the red-black circle as being "canceled out"/"negated" on the flag, as opposed to being "asserted". But this certainly wouldn't be obvious to someone just seeing the flag.
Kaarel:
TK:
K:
TK:
K:
TK:
K:
TK:
K:
TK:
so this version of entanglement with action is really a very weak criterion
Yeah, exactly, and hence the question: what are some counterexamples, ~concepts that clearly are not tied to action in any way? E.g., I could imagine metaphysical philosophizing to connect to action via contributing to a line of thinking that eventually produces a useful insight on how to do science or something. Is it about "being/remaining open to using it in new ways"?
I think I want to expand my notion of "tautological statements" to include statements like "In the HPMoR universe, X happens". You can also pick any empirical truth "X" and turn it into a tautological one by saying "In our universe, X". Though I agree it seems a bit weird.
I'm inclined to think that your generalized tautological statements are about something like "playing games according to ~rules in (~confined to) some mind-like system". This is in contrast to (canonically) empirical statements that involve throwing a referential bridge across the boundary of the system.
- I think sth is not meaningful if there's no connection between a belief to your main belief pool. So "a puffy is a flippo" is perhaps not meaningful to you because those concepts don't relate to anything else you know? (But that's a different kind of meaningful from what errors people mostly make.)
K:
- yea. tho then we could involve more sentences about puffies and flippos and start playing some game involving saying/thinking those sentences and then that could be fun/useful/whatever
[Thinking out loud.]
Intuitively, it does seem to me that if you start with a small set of elements isolated from the rest of your understanding, then they are meaningless, but then, as you grow this set of elements and add more relations/functions/rules/propositions with high implicative potential, this network becomes increasingly meaningful, even though it's completely disconnected from the rest of understanding and our lives except for playing this domain/subnetwork-specific game.
Is it (/does it seem) meaningful just because I could throw a bridge between it and the rest of my understanding? Well, one could build a computer with this game installed only (+ ofc bare minimum to make it work: OS and stuff) and I would still be inclined to think it meaningful, although perhaps I would be imposing, and the meaningfulness would be co-created by the eye/mind of the beholder.
This leads to the question: What criteria do we want our (explicated) notion of meaningfulness to satisfy?
[For completeness, the concept of meaningfulness may need to be splintered or even eliminated (/factored out in a way that doesn't leave anything clearly serving its role), though I think the latter rather unlikely.]
I think the world would probably be much better if everyone made a bunch more of their notes public. I intend to occasionally copy some personal notes on ML(?) papers into this thread. While I hope that the notes which I'll end up selecting for being posted here will be of interest to some people, and that people will sometimes comment with their thoughts on the same paper and on my thoughts (please do tell me how I'm wrong, etc.), I expect that the notes here will not be significantly more polished than typical notes I write for myself and my reasoning will be suboptimal; also, I expect most of these notes won't really make sense unless you're also familiar with the paper — the notes will typically be companions to the paper, not substitutes.
I expect I'll sometimes be meaner than some norm somewhere in these notes (in fact, I expect I'll sometimes be simultaneously mean and wrong/confused — exciting!), but I should just say to clarify that I think almost all ML papers/posts/notes are trash, so me being mean to a particular paper might not be evidence that I think it's worse than some average. If anything, the papers I post notes about had something worth thinking/writing about at all, which seems like a good thing! In particular, they probably contained at least one interesting idea!
So, anyway: I'm warning you that the notes in this thread will be messy and not self-contained, and telling you that reading them might not be a good use of your time :)
@misc{radhakrishnan2023mechanism, title={Mechanism of feature learning in deep fully connected networks and kernel machines that recursively learn features}, author={Adityanarayanan Radhakrishnan and Daniel Beaglehole and Parthe Pandit and Mikhail Belkin}, year={2023}, url = { https://arxiv.org/pdf/2212.13881.pdf } }
Let denote the activation vector in layer on input , with the input layer being at index , so . Let be the weight matrix after activation layer . Let be the function that maps from the th activation layer to the output. Then their Deep Neural Feature Ansatz says that (I'm somewhat confused here about them not mentioning the loss function at all — are they claiming this is reasonable for any reasonable loss function? Maybe just MSE? MSE seems to be the only loss function mentioned in the paper; I think they leave the loss unspecified in a bunch of places though.)
Letting be a SVD of , we note that this is equivalent to i.e., that the eigenvectors of the matrix on the RHS are the right singular vectors. By the variational characterization of eigenvectors and eigenvalues (Courant-Fischer or whatever), this is the same as saying that right singular vectors of are the highest orthonormal directions for the matrix on the RHS. Plugging in the definition of , this is equivalent to saying that the right singular vectors are the sequence of highest-variance directions of the data set of gradients .
(I have assumed here that the linearity is precise, whereas really it is approximate. It's probably true though that with some assumptions, the approximate initial statement implies an approximate conclusion too? Getting approx the same vecs out probably requires some assumption about gaps in singular values being big enough, because the vecs are unstable around equality. But if we're happy getting a sequence of orthogonal vectors that gets variances which are nearly optimal, we should also be fine without this kind of assumption. (This is guessing atm.))
Assuming there isn't an off-by-one error in the paper, we can pull some term out of the RHS maybe? This is because applying the chain rule to the Jacobians of the transitions gives , so
Wait, so the claim is just which, assuming is invertible, should be the same as . But also, they claim that it is ? Are they secretly approximating everything with identity matrices?? This doesn't seem to be the case from their Figure 2 though.
Oh oops I guess I forgot about activation functions here! There should be extra diagonal terms for jacobians of preactivations->activations in , i.e., it should really say We now instead get This should be the same as which, with denoting preactivations in layer and denoting the function from these preactivations to the output, is the same as This last thing also totally works with activation functions other than ReLU — one can get this directly from the Jacobian calculation. I made the ReLU assumption earlier because I thought for a bit that one can get something further in that case; I no longer think this, but I won't go back and clean up the presentation atm.
Anyway, a takeaway is that the Deep Neural Feature Ansatz is equivalent to the (imo cleaner) ansatz that the set of gradients of the output wrt the pre-activations of any layer is close to being a tight frame (in other words, the gradients are in isotropic position; in other words still, the data matrix of the gradients is a constant times a semi-orthogonal matrix). (Note that the closeness one immediately gets isn't in to a tight frame, it's just in the quantity defining the tightness of a frame, but I'd guess that if it matters, one can also conclude some kind of closeness in from this (related).) This seems like a nicer fundamental condition because (1) we've intuitively canceled terms and (2) it now looks like a generic-ish condition, looks less mysterious, though idk how to argue for this beyond some handwaving about genericness, about other stuff being independent, sth like that.
proof of the tight frame claim from the previous condition: Note that clearly implies that the mass in any direction is the same, but also the mass being the same in any direction implies the above (because then, letting the SVD of the matrix with these gradients in its columns be , the above is , where we used the fact that ).
indexing error in the first displaymath in Sec 2: it probably should say '', not ''
Suppose we are in a world where most top AI capabilities organizations are refraining from publishing their work (this could be the case because of safety concerns, or because of profit motives) + have strong infosec which prevents them from leaking insights about capabilities in other ways. In this world, it seems sort of plausible that the union of the capabilities insights of people at top labs would allow one to train significantly more capable models than the insights possessed by any single lab alone would allow one to train. In such a world, if the labs decide to cooperate once AGI is nigh, this could lead to a significantly faster increase in capabilities than one might have expected otherwise.
(I doubt this is a novel thought. I did not perform an extensive search of the AI strategy/governance literature before writing this.)
I'm updating my estimate of the return on investment into culture wars from being an epsilon fraction compared to canonical EA cause areas to epsilon+delta. This has to do with cases where AI locks in current values extrapolated "correctly" except with too much weight put on the practical (as opposed to the abstract) layer of current preferences. What follows is a somewhat more detailed status report on this change.
For me (and I'd guess for a large fraction of autistic altruistics multipliers), the general feels regarding [being a culture war combatant in one's professional capacity] seem to be that while the questions fought over have some importance, the welfare-produced-per-hour-worked from doing direct work is at least an order of magnitude smaller than the same quantities for any canonical cause area (also true for welfare/USD). I'm fairly certain one can reach this conclusion from direct object-level estimates, as I imagine e.g. OpenPhil has done, although I admit I haven't carried out such calculations with much care myself. Considering the incentives of various people involved should also support this being a lower welfare-per-hour-worked cause area (whether an argument along these lines gives substantive support to the conclusion that there is an order-of-magnitude difference appears less clear).
So anyway, until today part of my vague cloud of justification for these feels was that "and anyway, it's fine if this culture war stuff is fixed in 30 years, after we have dealt with surviving AGI". The small realization I had today was that maybe a significant fraction of the surviving worlds are those where something like corrigibility wasn't attainable but AI value extrapolation sort of worked out fine, i.e. with the values that got locked in being sort of fine, but the relative weights of object-level intuitions/preferences was kinda high compared to the weight on simplicity/[meta-level intuitions], like in particular maybe the AI training did some Bayesian-ethics-evidential-double-counting of object-level intuitions about 10^10 similar cases (I realize it's quite possible that this last clause won't make sense to many readers, but unfortunately I won't provide an explanation here; I intend to write about a few ideas on this picture of Bayesian ethics at some later time, but I want to read Beckstead's thesis first, which I haven't done yet; anyway the best I can offer is that I estimate a 75% of you understanding the rough idea I have in mind (which does not necessarily imply that the idea can actually be unfolded into a detailed picture that makes sense), conditional on understanding my writing in general and conditional on not having understood this clause yet, after reading Beckstead's thesis; also: woke: Bayesian ethics, bespoke: INFRABAYESIAN ETHICS, am I right folks).
So anyway, finally getting to the point of all this at the end of the tunnel, in such worlds we actually can't fix this stuff later on, because all the current opinions on culture war issues got locked in.
(One could argue that we can anyway be quite sure that this consideration matters little, because most expected value is not in such kinda-okay worlds, because even if these were 99% percent of the surviving worlds, assuming fun theory makes sense or simulated value-bearing minds are possible, there will be amazingly more value in each world where AGI worked out really well, as compared to a world tiled with Earth society 2030. But then again, this counterargument could be iffy to some, in sort of the same way in which fanaticism (in Bostrom's sense) or the St. Petersburg paradox feel iffy to some, or perhaps in another way. I won't be taking a further position on this at the moment.)
I proposed a method for detecting cheating in chess; cross-posting it here in the hopes of maybe getting better feedback than on reddit: https://www.reddit.com/r/chess/comments/xrs31z/a_proposal_for_an_experiment_well_data_analysis/