Steven Byrnes

I'm an AGI safety / AI alignment researcher in Boston with a particular focus on brain algorithms. Research Fellow at Astera. See https://sjbyrnes.com/agi.html for a summary of my research and sorted list of writing. Physicist by training. Email: steven.byrnes@gmail.com. Leave me anonymous feedback here. I’m also at: RSS feed , Twitter , Mastodon , Threads , Bluesky , GitHub , Wikipedia , Physics-StackExchange , LinkedIn

Sequences

Valence
Intro to Brain-Like-AGI Safety

Wiki Contributions

Comments

I don’t think we disagree much if at all.

I think constructing a good theoretical framework is very hard, so people often do other things instead, and I think you’re using the word “legible” to point to some of those other things.

  • I’m emphasizing that those other things are less than completely useless as semi-processed ingredients that can go into the activity of “constructing a good theoretical framework”
  • You’re emphasizing that those other things are not themselves the activity of “constructing a good theoretical framework”, and thus can distract from that activity, or give people a false sense of how much progress they’re making.

I think those are both true.

The pre-Darwin ecologists were not constructing a good theoretical framework. But they still made Darwin’s job easier, by extracting slightly-deeper patterns for him to explain with his much-deeper theory—concepts like “species” and “tree of life” and “life cycles” and “reproduction” etc. Those concepts were generally described by the wrong underlying gears before Darwin, but they were still contributions, in the sense that they compressed a lot of surface-level observations (Bird A is mating with Bird B, and then Bird B lays eggs, etc.) into a smaller number of things-to-be-explained. I think Darwin would have had a much tougher time if he was starting without the concepts of “finch”, “species”, “parents”, and so on.

By the same token, if we’re gonna use language as a datapoint for building a good underlying theoretical framework for the deep structure of knowledge and ideas, it’s hard to do that if we start from slightly-deep linguistic patterns (e.g. “morphosyntax”, “sister schemas”)… But it’s very much harder still to do that if we start with a mass of unstructured surface-level observations, like particular utterances.

I guess your perspective (based on here) is that, for the kinds of things you’re thinking about, people have not been successful even at the easy task of compressing a lot of surface-level observations into a smaller number of slightly-deeper patterns, let alone successful at the the much harder task of coming up with a theoretical framework that can deeply explain those slightly-deeper patterns? And thus you want to wholesale jettison all the previous theorizing? On priors, I think that would be kinda odd. But maybe I’m overstating your radicalism. :)

Thanks!

One thing I would say is: if you have a (correct) theoretical framework, it should straightforwardly illuminate tons of diverse phenomena, but it’s very much harder to go backwards from the “tons of diverse phenomena” to the theoretical framework. E.g. any competent scientist who understands Evolution can apply it to explain patterns in finch beaks, but it took Charles Darwin to look at patterns in finch beaks and come up with the idea of Evolution.

Or in my own case, for example, I spent a day in 2021 looking into schizophrenia, but I didn’t know what to make of it, so I gave up. Then I tried again for a day in 2022, with a better theoretical framework under my belt, and this time I found that it slotted right into my then-current theoretical framework. And at the end of that day, I not only felt like I understood schizophrenia much better, but also my theoretical framework itself came out more enriched and detailed. And I iterated again in 2023, again simultaneously improving my understanding of schizophrenia and enriching my theoretical framework.

Anyway, if the “tons of diverse phenomena” are datapoints, and we’re in the middle of trying to come up with a theoretical framework that can hopefully illuminate all those datapoints, then clearly some of those datapoints are more useful than others (as brainstorming aids for developing the underlying theoretical framework), at any particular point in this process. The “schizophrenia” datapoint was totally unhelpful to me in 2021, but helpful to me in 2022. The “precession of Mercury” datapoint would not have helped Einstein when he was first brainstorming general relativity in 1907, but was presumably moderately helpful when he was thinking through the consequences of his prototype theory a few years later.

The particular phenomena / datapoints that are most useful for brainstorming the underlying theory (privileging the hypothesis), at any given point in the process, need not be the most famous and well-studied phenomena / datapoints. Einstein wrung much more insight out of the random-seeming datapoint “a uniform gravity field seems an awful lot like uniform acceleration” than out of any of the datapoints that would have been salient to a lesser gravity physicist, e.g. Newton’s laws or the shape of the galaxy or the Mercury precession. In my own case, there are random experimental neuroscience results (or everyday observations) that I see as profoundly revealing of deep truths, but which would not be particularly central or important from the perspective of other theoretical neuroscientists.

But, I don’t see why “legible phenomena” datapoints would be systematically worse than other datapoints. (Unless of course you’re also reading and internalizing crappy literature theorizing about those phenomena, and it’s filling your mind with garbage ideas that get in the way of constructing a better theory.) For example, the phenomenon “If I feel cold, then I might walk upstairs and put on a sweater” is “legible”, right? But if someone is in the very early stages of developing a theoretical framework related to goals and motivations, then they sure need to have examples like that in the front of their minds, right? (Or maybe you wouldn't call that example “legible”?)

Can you elaborate on why you think “studying the algorithms involved in grammatically parsing a sentence” is not “a good way to get at the core of how minds work”?

For my part, I’ve read a decent amount of pure linguistics (in addition to neuro-linguistics) over the past few years, and find it to be a fruitful source of intuitions and hypotheses that generalize way beyond language. (But I’m probably asking different questions than you.)

I wonder if you’re thinking of, like, the nuts-and-bolts of syntax of specific languages, whereas I’m thinking of broader / deeper theorizing (random example), maybe?

In Section 1 of this post I make an argument kinda similar to the one you’re attributing to Eliezer. That might or might not help you, I dunno, just wanted to share.

the goal remains to implement CEV or something like it, and optimize the universe according to the resulting utility function

I think you mean “the goal remains to ensure that CEV or something like it is eventually implemented, and the universe is thus optimized according to the resulting utility function”, right? I think Eliezer’s view has always been that we want a CEV-maximizing ASI to be eventually turned on, but if that happens, it wouldn’t matter which human turns it on. And then evidently Eliezer has pivoted over the decades from thinking that this is likeliest to happen if he tries to build such an ASI with his own hands, to no longer thinking that.

Answer by Steven Byrnes147

A starting point is self-reports. If I truthfully say “I see my wristwatch”, then, somewhere in the chain of causation that eventually led to me uttering those words, there’s an actual watch, and photons are bouncing off it and entering my eyes then stimulating neurons etc.

So by the same token, if I say “your phenomenal consciousness is a salty yellow substance that smells like bananas and oozes out of your bellybutton”, and then you reply “no it isn’t!”, then let’s talk about how it is that you are so confident about that.

(I’m using “phenomenal consciousness” as an example, but ditto for “my sense of self / identity” or whatever else.)

So here, you uttered a reply (“No it isn’t!”). And we can assume that somewhere in the chain of causation is ‘phenomenal consciousness’ (whatever that is, if anything), and you were somehow introspecting upon it in order to get that information. You can’t know things in any other way—that’s the basic, hopefully-obvious point that I understand Eliezer was trying to make here.

Now, what’s a ‘chain of causation’, in the relevant sense? Let’s start with a passage from Age of Em:

The brain does not just happen to transform input signals into state changes and output signals; this transformation is the primary function of the brain, both to us and to the evolutionary processes that designed brains. The brain is designed to make this signal processing robust and efficient. Because of this, we expect the physical variables (technically, “degrees of freedom”) within the brain that encode signals and signal-relevant states, which transform these signals and states, and which transmit them elsewhere, to be overall rather physically isolated and disconnected from the other far more numerous unrelated physical degrees of freedom and processes in the brain. That is, changes in other aspects of the brain only rarely influence key brain parts that encode mental states and signals.

In other words, if your body temperature had been 0.1° colder, or if you were hanging upside down, or whatever, then the atoms in your brain would be configured differently in all kinds of ways … but you would still say “no it isn’t!” in response to my proposal that maybe your phenomenal consciousness is a salty yellow substance that oozes out of your bellybutton. And you would say it for the exact same reason.

This kind of thinking leads to the more general idea that the brain has inputs (e.g. photoreceptor cells), outputs (e.g. motoneurons … also, fun fact, the brain is a gland!), and algorithms connecting them. Those algorithms describe what Hanson’s “degrees of freedom” are doing from moment to moment, and why, and how. Whenever brains systematically do characteristically-brain-ish things—things like uttering grammatical sentences rather than moving mouth muscles randomly—then the explanation of that systematic pattern lies in the brain’s inputs, outputs, and/or algorithms. Yes, there’s randomness in what brains do, but whenever brains do characteristically-brainy-things reliably (e.g. disbelieve, and verbally deny, that your consciousness is a salty yellow substance that oozes out of your bellybutton), those things are evidently not the result of random fluctuations or whatever, but rather they follow from the properties of the algorithms and/or their inputs and outputs.

That doesn’t quite get us all the way to computationalist theories of consciousness or identity. Why not? Well, here are two ways I can think of to be non-computationalist within physicalism:

  • One could argue that consciousness & sense-of-identity etc. are just confused nonsense reifications of mental models with no referents at all, akin to “pure white” [because white is not pure, it’s a mix of wavelengths]. (Cf. “illusionism”.) I’m very sympathetic to this kind of view. And you could reasonably say “it’s not a computationalist theory of consciousness / identity, but rather a rejection of consciousness / identity altogether!” But I dunno, I think it’s still kinda computationalist in spirit, in the sense that one would presumably instead make the move of choosing to (re)define ‘consciousness’ and ‘sense-of-identity’ in such a way that those words point to things that actually exist at all (which is good), at the expense of being inconsistent with some of our intuitions about what those words are supposed to represent (which is bad). And when you make that move, those terms almost inevitably wind up pointing towards some aspect(s) of brain algorithms.
  • One could argue that we learn about consciousness & sense-of-identity via inputs to the brain algorithm rather than inherent properties of the algorithm itself—basically the idea that “I self-report about my phenomenal consciousness analogously to how I self-report about my wristwatch”, i.e. my brain perceives my consciousness & identity through some kind of sensory input channel, and maybe also my brain controls my consciousness & identity through some kind of motor or other output channel. If you believe something like that, then you could be physicalist but not a computationalist, I think. But I can’t think of any way to flesh out such a theory that’s remotely plausible.

I’m not a philosopher and am probably misusing technical terms in various ways. (If so, I’m open to corrections!)

(Note, I find these kinds of conversations to be very time-consuming and often not go anywhere, so I’ll read replies but am pretty unlikely to comment further. I hope this is helpful at all. I mostly didn’t read the previous conversation, so I’m sorry if I’m missing the point, answering the wrong question, etc.)

I went through and updated my 2022 “Intro to Brain-Like AGI Safety” series. If you already read it, no need to do so again, but in case you’re curious for details, I put changelogs at the bottom of each post. For a shorter summary of major changes, see this twitter thread, which I copy below (without the screenshots & links): 

I’ve learned a few things since writing “Intro to Brain-Like AGI safety” in 2022, so I went through and updated it! Each post has a changelog at the bottom if you’re curious. Most changes were in one the following categories: (1/7)

REDISTRICTING! As I previously posted ↓, I booted the pallidum out of the “Learning Subsystem”. Now it’s the cortex, striatum, & cerebellum (defined expansively, including amygdala, hippocampus, lateral septum, etc.) (2/7)

LINKS! I wrote 60 posts since first finishing that series. Many of them elaborate and clarify things I hinted at in the series. So I tried to put in links where they seemed helpful. For example, I now link my “Valence” series in a bunch of places. (3/7)

NEUROSCIENCE! I corrected or deleted a bunch of speculative neuro hypotheses that turned out wrong. In some early cases, I can’t even remember wtf I was ever even thinking! Just for fun, here’s the evolution of one of my main diagrams since 2021: (4/7)

EXAMPLES! It never hurts to have more examples! So I added a few more. I also switched the main running example of Post 13 from “envy” to “drive to be liked / admired”, partly because I’m no longer even sure envy is related to social instincts at all (oops) (5/7)

LLMs! … …Just kidding! LLMania has exploded since 2022 but remains basically irrelevant to this series. I hope this series is enjoyed by some of the six remaining AI researchers on Earth who don’t work on LLMs. (I did mention LLMs in a few more places though ↓ ) (6/7)

If you’ve already read the series, no need to do so again, but I want to keep it up-to-date for new readers. Again, see the changelogs at the bottom of each post for details. I’m sure I missed things (and introduced new errors)—let me know if you see any!

This doesn't sound like an argument Yudkowsky would make

Yeah, I can’t immediately find the link but I recall that Eliezer had a tweet in the past few months along the lines of: If ASI wants to tile the universe with one thing, then it wipes out humanity. If ASI wants to tile the universe with sixteen things , then it also wipes out humanity.

My mental-model-of-Yudkowsky would bring up “tiny molecular squiggles” in particular for reasons a bit more analogous to the CoastRunners behavior (video)—if any one part of the motivational system is (what OP calls) decomposable etc., then the ASI would find the “best solution” to maximizing that part. And if numbers matter, then the “best solution” would presumably be many copies of some microscopic thing.

I use rationalist jargon when I judge that the benefits (of pointing to a particular thing) outweigh the costs (of putting off potential readers). And my opinion is that “epistemic status” doesn’t make the cut.

Basically, I think that if you write an “epistemic status” at the top of a blog post, and then delete the two words “epistemic status” while keeping everything else the same, it works just about as well. See for example the top of this post.

(this comment is partly self-plagiarized from here)

Before doing any project or entering any field, you need to catch up on existing intellectual discussion on the subject.

I think this is way too strong. There are only so many hours in a day, and they trade off between

  • (A) “try to understand the work / ideas of previous thinkers” and
  • (B) “just sit down and try to figure out the right answer”.

It’s nuts to assert that the “correct” tradeoff is to do (A) until there is absolutely no (A) left to possibly do, and only then do you earn the right to start in on (B). People should do (A) and (B) in whatever ratio is most effective for figuring out the right answer. I often do (B), and I assume that I’m probably reinventing a wheel, but it’s not worth my time to go digging for it. And then maybe someone shares relevant prior work in the comments section. That’s awesome! Much appreciated! And nothing went wrong anywhere in this process! See also here.

A weaker statement would be “People in LW/EA commonly err in navigating this tradeoff, by doing too much (B) and not enough (A).” That weaker statement is certainly true in some cases. And the opposite is true in other cases. We can argue about particular examples, I suppose. I imagine that I have different examples in mind than you do.

~~

To be clear, I think your post has large kernels of truth and I’m happy you wrote it.

Load More