I'm an AGI safety / AI alignment researcher in Boston with a particular focus on brain algorithms. Research Fellow at Astera. See https://sjbyrnes.com/agi.html for a summary of my research and sorted list of writing. Physicist by training. Email: steven.byrnes@gmail.com. Leave me anonymous feedback here. I’m also at: RSS feed, X/Twitter, Bluesky, Substack, LinkedIn, and more at my website.
I think your question is kinda too vague to answer. (You’re asking for a comparison of two AI architectures, but what are they? I need more detail. Are we assuming that the two options are equally powerful & competent? If so, is that a good assumption? And is that power level “kinda like LLMs of today”, “superintelligence”, or something in between?)
…But maybe see my post Foom & Doom §2.9.1: If brain-like AGI is so dangerous, shouldn’t we just try to make AGIs via LLMs? for some possibly-related discussion.
(I say this all the time, but I think that [the thing you call “values”] is a closer match to the everyday usage of the word “desires” than the word “values”.)
I think we should distinguish three things: (A) societal norms that you have internalized, (B) societal norms that you have not internalized, (C) desires that you hold independent of [or even despite] societal norms.
For example:
Anyway, the OP says: “our shared concept of Goodness is comprised of whatever messages people spread about what other people should value. … which sure is a different thing from what people do value, when they introspect on what feels yummy.”
I think that’s kinda treating the dichotomy as (B) versus (C), while denying the existence of (A).
If that 12yo girl “introspects on what feels yummy”, her introspection will say “myself wearing a crop-top with giant sweatpants feels yummy”. This obviously has memetic origins but the girl is very deeply enthusiastic about it, and will be insulted if you tell her she only likes that because she’s copying memes.
By the way, this is unrelated to “feeling of deep loving connection”. The 12yo girl does not have a “feeling of deep loving connection” to the tiktok influencers, high schoolers, etc., who have planted the idea in her head that crop-tops and giant sweatpants look super chic and awesome. I think you’re wayyy overstating the importance of “feeling of deep loving connection” for the average person’s “values”, and correspondingly wayyy understating the importance of this kind of norm-following thing. I have a draft post with much more about the norm-following thing, should be out soon :)
Companies actually do this and (if done competently) the implementation costs are an infinitesimal fraction of the benefits. Big companies generally already have all the info in various databases, they just need to do the right spreadsheet calculation. A tiny 1-person company could practically do it with pen and paper, I think. You can ballpark things; it doesn’t have to be perfect to be wildly better than the status quo. One competent person can do it pretty well even for a giant firm with hundreds or thousands of employees (but they should hire my late father’s company instead, it’ll get done much better and faster!)
If there’s some activity that the company does that sucks up tons of money, like as in a substantial fraction of the company’s total annual operating costs, with no tracking of how that money is actually being spent, like it’s just a black hole, and poof the money is gone … then the wrong way to relate to that situation is “gee it would take a lot of work to figure out where all this money is going”, and the right way is “wtf we obviously need to figure out where all this money is going ASAP”. :) I don’t think that situation comes up. Companies already break down how money gets spent in their major cost centers.
Or if it’s a tiny fraction of the company’s total costs, then you can just come up with some ballpark heuristic and it will probably be fine. Like I asked my dad how they divvy up the CEO salary in this kind of system, and he immediately answered with an extremely simple-to-implement heuristic that made a ton of sense. (I won’t share what it is, I think it might be a trade secret.) And no it does not involve tracking exactly how the CEO spends their time.
Another data point: when I turned Intro to Brain-Like AGI Safety blog post series into a PDF [via typst—I hired someone to do all the hard work of writing conversion scripts etc.], arXiv rejected it, so I put it on OSF instead. I’m reluctant to speculate on what arXiv didn’t like about it (they didn’t say). Some possibilities are: it seems out-of-place on arXiv in terms of formatting (e.g. single-column, not latex), AND tone (casual, with some funny pictures), AND content (not too math-y, interdisciplinary in a weird way). Probably one or more of those three things. But whatever, OSF seems fine.
I don't see why it should be possible for something which knows the physical state of my brain to be able to efficiently compute the contents of it.
I think you meant "philosophically necessary" where you wrote "possible"? If so, agreed, that's also my take.
If an omniscient observer can extract the contents of a brain by assembling a causal model of it in un-encrypted phase space, why would it struggle to build the same casual model in encrypted phase space?
I don’t understand this part. “Causal model” is easy—if the computer is a Turing machine, then you have a causal model in terms of the head and the tape etc. You want “understanding” not “causal model”, right?
If a superintelligence were to embark on the project of “understanding” a brain, it would be awfully helpful to see the stream of sensory inputs and the motor outputs. Without encryption, you can do that: the environment is observable. Under homomorphic encryption without the key, the environmental simulation, and the brain’s interactions with it, look like random bits just like everything else. Likewise, it would be awfully helpful to be able to notice that the brain is in a similar state at times t₁ versus t₂, and/or the ways that they’re different. But under homomorphic encryption without the key, you can’t do that, I think. See what I mean?
The headings “behavioral tools” and “transparency tools” both kinda assume that a mysterious AI has fallen out of the sky, and now you have to deal with it, as opposed to either thinking about, or intervening on, how the AI is trained or designed. (See Connor’s comment here.)
(Granted, you do mention “new paradigm”, but you seem to be envisioning that pretty narrowly as a transparency intervention.)
I think that’s an important omission. For example, it seems to leave out making inferences about Bob from the fact that Bob is human. That’s is informative even if I’ve never met Bob (no behavioral data) and can’t read his mind (no transparency). (Sorry if I’m misunderstanding.)
In one case, a pediatrician in Pennsylvania was getting ready to inoculate a little girl with a vaccine when she suddenly went into violent seizures. Had that pediatrician been working just a little faster, he would have injected that vaccine first. In that case, imagine if the mother had been looking on as her apparently perfectly healthy daughter was injected and then suddenly went into seizures. It would certainly have been understandable—from an emotional standpoint—if that mother was convinced the vaccine caused her daughter’s seizures. Only the accident of timing prevented that particular fallacy in this case. (source)
It’s good to know when you need to “go hard”, and to be able to do so if necessary, and to assess accurately whether it’s necessary. But it often isn’t necessary, and when it isn’t, then it’s really bad to be going hard all the time, for lots of reasons including not having time to mull over the big picture and notice new things. Like how Elon Musk built SpaceX to mitigate x-risk without it ever crossing his mind that interplanetary colonization wouldn’t actually help with x-risk from AI (and then pretty much everything Elon has done about AI x-risk from that point forward made the problem worse not better). See e.g. What should you change in response to an "emergency"? And AI risk, Please don't throw your mind away, Changing the world through slack & hobbies, etc. Oh also, pain is not the unit of effort.
I’ve never seen these abbreviations “mio., bio. trio.” before. I have always only seen M, B, T, e.g. $5B. Is it some regionalism or something?
I want to nitpick this particular point (I think the other arguments you bring up in that section are stronger).
For example, LLaMa 3.1 405B was trained on 15.6 trillion tokens of text data (≈ what a human could get through in 20,000 years of 24/7 reading). I’m not an ML training expert, but intuitively I’m skeptical that this is the kind of regime where we need to be thinking about what is hard versus easy to learn, or about what can be learned quickly versus slowly.
Instead, my guess is that, if [latent model A] is much easier and faster to learn than [latent model B], but [B] gives a slightly lower predictive loss than [A], then 15.6 trillion tokens of pretraining would be WAY more than enough for the model-in-training to initially learn [A] but then switch over to [B].