LESSWRONG
LW

1528
Steven Byrnes
24704Ω403117225314
Message
Dialogue
Subscribe

I'm an AGI safety / AI alignment researcher in Boston with a particular focus on brain algorithms. Research Fellow at Astera. See https://sjbyrnes.com/agi.html for a summary of my research and sorted list of writing. Physicist by training. Email: steven.byrnes@gmail.com. Leave me anonymous feedback here. I’m also at: RSS feed, X/Twitter, Bluesky, Substack, LinkedIn, and more at my website.

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Intuitive Self-Models
Valence
Intro to Brain-Like-AGI Safety
Your posts should be on arXiv
Steven Byrnes2hΩ350

Another data point: when I turned Intro to Brain-Like AGI Safety blog post series into a PDF [via typst—I hired someone to do all the hard work of writing conversion scripts etc.], arXiv rejected it, so I put it on OSF instead. I’m reluctant to speculate on what arXiv didn’t like about it (they didn’t say). Some possibilities are: it seems out-of-place on arXiv in terms of formatting (e.g. single-column, not latex), AND tone (casual, with some funny pictures), AND content (not too math-y, interdisciplinary in a weird way). Probably one or more of those three things. But whatever, OSF seems fine.

Reply
Homomorphically encrypted consciousness and its implications
Steven Byrnes8d20

I don't see why it should be possible for something which knows the physical state of my brain to be able to efficiently compute the contents of it.

I think you meant "philosophically necessary" where you wrote "possible"? If so, agreed, that's also my take.

If an omniscient observer can extract the contents of a brain by assembling a causal model of it in un-encrypted phase space, why would it struggle to build the same casual model in encrypted phase space?

I don’t understand this part. “Causal model” is easy—if the computer is a Turing machine, then you have a causal model in terms of the head and the tape etc. You want “understanding” not “causal model”, right?

If a superintelligence were to embark on the project of “understanding” a brain, it would be awfully helpful to see the stream of sensory inputs and the motor outputs. Without encryption, you can do that: the environment is observable. Under homomorphic encryption without the key, the environmental simulation, and the brain’s interactions with it, look like random bits just like everything else. Likewise, it would be awfully helpful to be able to notice that the brain is in a similar state at times t₁ versus t₂, and/or the ways that they’re different. But under homomorphic encryption without the key, you can’t do that, I think. See what I mean?

Reply
Video and transcript of talk on giving AIs safe motivations
Steven Byrnes10dΩ350

The headings “behavioral tools” and “transparency tools” both kinda assume that a mysterious AI has fallen out of the sky, and now you have to deal with it, as opposed to either thinking about, or intervening on, how the AI is trained or designed. (See Connor’s comment here.)

(Granted, you do mention “new paradigm”, but you seem to be envisioning that pretty narrowly as a transparency intervention.)

I think that’s an important omission. For example, it seems to leave out making inferences about Bob from the fact that Bob is human. That’s is informative even if I’ve never met Bob (no behavioral data) and can’t read his mind (no transparency). (Sorry if I’m misunderstanding.)

Reply
Adele Lopez's Shortform
Steven Byrnes10d192

In one case, a pediatrician in Pennsylvania was getting ready to inoculate a little girl with a vaccine when she suddenly went into violent seizures. Had that pediatrician been working just a little faster, he would have injected that vaccine first. In that case, imagine if the mother had been looking on as her apparently perfectly healthy daughter was injected and then suddenly went into seizures. It would certainly have been understandable—from an emotional standpoint—if that mother was convinced the vaccine caused her daughter’s seizures. Only the accident of timing prevented that particular fallacy in this case. (source)

Reply
Mo Putera's Shortform
Steven Byrnes11d2120

It’s good to know when you need to “go hard”, and to be able to do so if necessary, and to assess accurately whether it’s necessary. But it often isn’t necessary, and when it isn’t, then it’s really bad to be going hard all the time, for lots of reasons including not having time to mull over the big picture and notice new things. Like how Elon Musk built SpaceX to mitigate x-risk without it ever crossing his mind that interplanetary colonization wouldn’t actually help with x-risk from AI (and then pretty much everything Elon has done about AI x-risk from that point forward made the problem worse not better). See e.g. What should you change in response to an "emergency"? And AI risk, Please don't throw your mind away, Changing the world through slack & hobbies, etc. Oh also, pain is not the unit of effort.

Reply1
Humanity Learned Almost Nothing From COVID-19
Steven Byrnes11d2119

I’ve never seen these abbreviations “mio., bio. trio.” before. I have always only seen M, B, T, e.g. $5B. Is it some regionalism or something?

Reply
The IABIED statement is not literally true
Steven Byrnes12d206

My take is that IABIED has basically a three-part disjunctive argument:

  • (A) There’s no alignment plan that even works on paper.
  • (B) Even if there were such a plan, people would fail to get it to work on the first critical try, even if they’re being careful. (Just as people can have a plan for a rocket engine that works on paper, and do simulations and small-scale tests and component tests etc., but it will still almost definitely blow up the first time in history that somebody does a full-scale test.)
  • (C) People will not be careful, but rather race, skip over tests and analysis that they could and should be doing, and do something pretty close to whatever yields the most powerful system for the least money in the least time.

I think your post is mostly addressing disjunct (A), except that step (6) has a story for disjunct (B). My mental model of Eliezer & Nate would say: first of all, even if you were correct about this being a solution to (A) & (B), everyone will still die because of (C). Second of all, your plan does not in fact solve (A). I think Eliezer & Nate would disagree most strongly with your step (3); see their answer to “Aren’t developers regularly making their AIs nice and safe and obedient?”. Third of all, your plan does not solve (B) either, because one human-level system is quite different from a civilization of them in lots of ways, and lots of new things can go wrong, e.g. the civilization might create a different and much more powerful egregiously misaligned ASI, just as actual humans seem likely to do right now. (See also other comments on (B).)

↑ That was my mental model of Eliezer & Nate. FWIW, my own take is: I have the same take on (B) & (C). And as for (A), I think LLMs won’t scale to AGI, and my own take is that the different paradigm that will scale to AGI is even worse for step (3), i.e. existing concrete plans will lead to egregious misalignment.

Reply
[Intuitive self-models] 1. Preliminaries
Steven Byrnes14d40

Thanks! My perspective for this kind of thing is: if there’s some phenomenon in psychology or neuroscience, I’m not usually in the situation where there are multiple incompatible hypotheses that would plausibly explain that phenomenon, and we’d like to know which of them is true. Rather, I’m usually in the situation where I have zero hypotheses that would plausibly explain the phenomenon, and I’m trying to get up to at least one.

There are so many constraints from what I (think I) know about neuroscience, and so many constraints from what I (think I) know about algorithms, and so many constraints from what I (think I) know about everyday life, that coming up with any hypothesis at all that can’t be easily refuted from an armchair is a huge challenge. And generally when I find even one such hypothesis, I wind up in the long term ultimately feeling like it’s almost definitely true, at least in the big picture. (Sometimes there are fine details that can’t be pinned down without further experiments.)

It’s interesting that my outlook here is so different from other people in science, who often (not always) feel like the default should be to have multiple hypotheses from the get-go for any given phenomenon. Why the difference? Part of it might be the kinds of questions that I’m interested in. But part of it, as above, is that I have lots of very strong opinions about the brain, which are super constraining and thus rule out almost everything. I think this is much more true for me than almost anyone else in neuroscience, including professionals. (Here’s one of many example constraints that I demand all my hypotheses satisfy.)

So anyway, the first goal is to get up to even one nuts-and-bolts hypothesis, which would explain the phenomenon, and which is specific enough that I can break it down all the way down to algorithmic pseudocode, and then even further to how that pseudocode is implemented by the cortical microstructure and thalamus loops and so on, and that also isn’t immediately ruled out by what we already know from our armchairs and the existing literature. So that’s what I’m trying to do here, and it’s especially great when readers point out that nope, my hypothesis is in fact already in contradiction to known psychology or neuroscience, or to their everyday experience. And then I go back to the drawing board. :)

Reply
[Intuitive self-models] 3. The Homunculus
Steven Byrnes17d40

You’re replying to Linda’s comment, which was mainly referring to a paragraph that I deleted shortly after posting this a year ago. The current relevant text (as in my other comment) is:

As above, the homunculus is definitionally the thing that carries “vitalistic force”, and that does the “wanting”, and that does any acts that we describe as “acts of free will”. Beyond that, I don’t have strong opinions. Is the homunculus the same as the whole “self”, or is the homunculus only one part of a broader “self”? No opinion. Different people probably conceptualize themselves rather differently anyway.

To me, this seems like something everyone should be able relate to, apart from the PNSE thing in Post 6. For example, if your intuitions include the idea of willpower, then I think your intuitions have to also include some, umm, noun, that is exercising that willpower.

But you find it weird and unrelatable? Or was it a different part of the post that left you feeling puzzled when you read it? (If so, maybe I can reword that part.) Thanks.

Reply
I wasn't confused by Thermodynamics
Steven Byrnes19d20

Cool! Oops, I probably just skimmed too fast and incorrectly pattern-matched you to other people I’ve talked to about this topic in the past.  :-P

Reply1
Load More
5Steve Byrnes’s Shortform
Ω
6y
Ω
86
Wanting vs Liking
2 years ago
Wanting vs Liking
2 years ago
(+139/-26)
Waluigi Effect
2 years ago
(+2087)
28Excerpts from my neuroscience to-do list
25d
1
90Optical rectennas are not a promising clean energy technology
2mo
2
54Neuroscience of human sexual attraction triggers (3 hypotheses)
2mo
6
363Four ways learning Econ makes people dumber re: future AI
Ω
1mo
Ω
49
99Inscrutability was always inevitable, right?
Q
3mo
Q
33
58Perils of under- vs over-sculpting AGI desires
Ω
3mo
Ω
13
48Interview with Steven Byrnes on Brain-like AGI, Foom & Doom, and Solving Technical Alignment
3mo
1
55Teaching kids to swim
3mo
12
56“Behaviorist” RL reward functions lead to scheming
Ω
3mo
Ω
5
152Foom & Doom 2: Technical alignment is hard
Ω
4mo
Ω
65
Load More