LESSWRONG
LW

1636
Steven Byrnes
24817Ω403817425354
Message
Dialogue
Subscribe

I'm an AGI safety / AI alignment researcher in Boston with a particular focus on brain algorithms. Research Fellow at Astera. See https://sjbyrnes.com/agi.html for a summary of my research and sorted list of writing. Physicist by training. Email: steven.byrnes@gmail.com. Leave me anonymous feedback here. I’m also at: RSS feed, X/Twitter, Bluesky, Substack, LinkedIn, and more at my website.

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Intuitive Self-Models
Valence
Intro to Brain-Like-AGI Safety
How human-like do safe AI motivations need to be?
Steven Byrnes15h20

current AIs need to understand at a very early stage what human concepts like “helpfulness,” “harmlessness,” and “honesty” mean. And while it is of course possible to know what these concepts mean without being motivated by them (cf “the genie knows but doesn’t care”), the presence of this level of human-like conceptual understanding at such an early stage of development makes it more likely that these human-like concepts end up structuring AI motivations as well. … AIs will plausibly have concepts like “helpfulness,” “harmlessness,” “honesty” much earlier in the process that leads to their final form … [emphasis added]

I want to nitpick this particular point (I think the other arguments you bring up in that section are stronger).

For example, LLaMa 3.1 405B was trained on 15.6 trillion tokens of text data (≈ what a human could get through in 20,000 years of 24/7 reading). I’m not an ML training expert, but intuitively I’m skeptical that this is the kind of regime where we need to be thinking about what is hard versus easy to learn, or about what can be learned quickly versus slowly.

Instead, my guess is that, if [latent model A] is much easier and faster to learn than [latent model B], but [B] gives a slightly lower predictive loss than [A], then 15.6 trillion tokens of pretraining would be WAY more than enough for the model-in-training to initially learn [A] but then switch over to [B].

Reply
Social drives 1: “Sympathy Reward”, from compassion to dehumanization
Steven Byrnes16h30

I think your question is kinda too vague to answer. (You’re asking for a comparison of two AI architectures, but what are they? I need more detail. Are we assuming that the two options are equally powerful & competent? If so, is that a good assumption? And is that power level “kinda like LLMs of today”, “superintelligence”, or something in between?)

…But maybe see my post Foom & Doom §2.9.1: If brain-like AGI is so dangerous, shouldn’t we just try to make AGIs via LLMs? for some possibly-related discussion.

Reply
Human Values ≠ Goodness
Steven Byrnes10d5615

(I say this all the time, but I think that [the thing you call “values”] is a closer match to the everyday usage of the word “desires” than the word “values”.)

I think we should distinguish three things: (A) societal norms that you have internalized, (B) societal norms that you have not internalized, (C) desires that you hold independent of [or even despite] societal norms.

For example:

  • a 12-year-old girl might feel very strongly that some style of dress is cool, and some other style in cringe. She internalized this from people she thinks of as good and important—older teens, her favorite celebrities, the kids she looks up to, etc. This is (A).
  • Meanwhile, her lame annoying parents tell her that kindness is a virtue, and she rolls her eyes. This is (B).
  • She has a certain way that she likes to arrange her pillows in bed at night before falling asleep. Very cozy. She has never told anyone about this, and has no idea how anyone else arranges their pillows. This is (C).

Anyway, the OP says: “our shared concept of Goodness is comprised of whatever messages people spread about what other people should value. … which sure is a different thing from what people do value, when they introspect on what feels yummy.”

I think that’s kinda treating the dichotomy as (B) versus (C), while denying the existence of (A).

If that 12yo girl “introspects on what feels yummy”, her introspection will say “myself wearing a crop-top with giant sweatpants feels yummy”. This obviously has memetic origins but the girl is very deeply enthusiastic about it, and will be insulted if you tell her she only likes that because she’s copying memes.

By the way, this is unrelated to “feeling of deep loving connection”. The 12yo girl does not have a “feeling of deep loving connection” to the tiktok influencers, high schoolers, etc., who have planted the idea in her head that crop-tops and giant sweatpants look super chic and awesome. I think you’re wayyy overstating the importance of “feeling of deep loving connection” for the average person’s “values”, and correspondingly wayyy understating the importance of this kind of norm-following thing. I have a draft post with much more about the norm-following thing, should be out soon :)

Reply2
{M|Im|Am}oral Mazes - any large-scale counterexamples?
Steven Byrnes12d40

Companies actually do this and (if done competently) the implementation costs are an infinitesimal fraction of the benefits. Big companies generally already have all the info in various databases, they just need to do the right spreadsheet calculation. A tiny 1-person company could practically do it with pen and paper, I think. You can ballpark things; it doesn’t have to be perfect to be wildly better than the status quo. One competent person can do it pretty well even for a giant firm with hundreds or thousands of employees (but they should hire my late father’s company instead, it’ll get done much better and faster!)

If there’s some activity that the company does that sucks up tons of money, like as in a substantial fraction of the company’s total annual operating costs, with no tracking of how that money is actually being spent, like it’s just a black hole, and poof the money is gone … then the wrong way to relate to that situation is “gee it would take a lot of work to figure out where all this money is going”, and the right way is “wtf we obviously need to figure out where all this money is going ASAP”. :)  I don’t think that situation comes up. Companies already break down how money gets spent in their major cost centers.

Or if it’s a tiny fraction of the company’s total costs, then you can just come up with some ballpark heuristic and it will probably be fine. Like I asked my dad how they divvy up the CEO salary in this kind of system, and he immediately answered with an extremely simple-to-implement heuristic that made a ton of sense. (I won’t share what it is, I think it might be a trade secret.) And no it does not involve tracking exactly how the CEO spends their time.

Reply
Your posts should be on arXiv
Steven Byrnes13dΩ470

Another data point: when I turned Intro to Brain-Like AGI Safety blog post series into a PDF [via typst—I hired someone to do all the hard work of writing conversion scripts etc.], arXiv rejected it, so I put it on OSF instead. I’m reluctant to speculate on what arXiv didn’t like about it (they didn’t say). Some possibilities are: it seems out-of-place on arXiv in terms of formatting (e.g. single-column, not latex), AND tone (casual, with some funny pictures), AND content (not too math-y, interdisciplinary in a weird way). Probably one or more of those three things. But whatever, OSF seems fine.

Reply
Homomorphically encrypted consciousness and its implications
Steven Byrnes21d20

I don't see why it should be possible for something which knows the physical state of my brain to be able to efficiently compute the contents of it.

I think you meant "philosophically necessary" where you wrote "possible"? If so, agreed, that's also my take.

If an omniscient observer can extract the contents of a brain by assembling a causal model of it in un-encrypted phase space, why would it struggle to build the same casual model in encrypted phase space?

I don’t understand this part. “Causal model” is easy—if the computer is a Turing machine, then you have a causal model in terms of the head and the tape etc. You want “understanding” not “causal model”, right?

If a superintelligence were to embark on the project of “understanding” a brain, it would be awfully helpful to see the stream of sensory inputs and the motor outputs. Without encryption, you can do that: the environment is observable. Under homomorphic encryption without the key, the environmental simulation, and the brain’s interactions with it, look like random bits just like everything else. Likewise, it would be awfully helpful to be able to notice that the brain is in a similar state at times t₁ versus t₂, and/or the ways that they’re different. But under homomorphic encryption without the key, you can’t do that, I think. See what I mean?

Reply
Video and transcript of talk on giving AIs safe motivations
Steven Byrnes22dΩ350

The headings “behavioral tools” and “transparency tools” both kinda assume that a mysterious AI has fallen out of the sky, and now you have to deal with it, as opposed to either thinking about, or intervening on, how the AI is trained or designed. (See Connor’s comment here.)

(Granted, you do mention “new paradigm”, but you seem to be envisioning that pretty narrowly as a transparency intervention.)

I think that’s an important omission. For example, it seems to leave out making inferences about Bob from the fact that Bob is human. That’s is informative even if I’ve never met Bob (no behavioral data) and can’t read his mind (no transparency). (Sorry if I’m misunderstanding.)

Reply
Adele Lopez's Shortform
Steven Byrnes23d192

In one case, a pediatrician in Pennsylvania was getting ready to inoculate a little girl with a vaccine when she suddenly went into violent seizures. Had that pediatrician been working just a little faster, he would have injected that vaccine first. In that case, imagine if the mother had been looking on as her apparently perfectly healthy daughter was injected and then suddenly went into seizures. It would certainly have been understandable—from an emotional standpoint—if that mother was convinced the vaccine caused her daughter’s seizures. Only the accident of timing prevented that particular fallacy in this case. (source)

Reply
Mo Putera's Shortform
Steven Byrnes23d2120

It’s good to know when you need to “go hard”, and to be able to do so if necessary, and to assess accurately whether it’s necessary. But it often isn’t necessary, and when it isn’t, then it’s really bad to be going hard all the time, for lots of reasons including not having time to mull over the big picture and notice new things. Like how Elon Musk built SpaceX to mitigate x-risk without it ever crossing his mind that interplanetary colonization wouldn’t actually help with x-risk from AI (and then pretty much everything Elon has done about AI x-risk from that point forward made the problem worse not better). See e.g. What should you change in response to an "emergency"? And AI risk, Please don't throw your mind away, Changing the world through slack & hobbies, etc. Oh also, pain is not the unit of effort.

Reply1
Humanity Learned Almost Nothing From COVID-19
Steven Byrnes24d2119

I’ve never seen these abbreviations “mio., bio. trio.” before. I have always only seen M, B, T, e.g. $5B. Is it some regionalism or something?

Reply
Load More
5Steve Byrnes’s Shortform
Ω
6y
Ω
86
18Social drives 2: “Approval Reward”, from norm-enforcement to status-seeking
10h
1
24Social drives 1: “Sympathy Reward”, from compassion to dehumanization
3d
3
28Excerpts from my neuroscience to-do list
1mo
2
90Optical rectennas are not a promising clean energy technology
2mo
2
54Neuroscience of human sexual attraction triggers (3 hypotheses)
3mo
6
364Four ways learning Econ makes people dumber re: future AI
Ω
1mo
Ω
49
99Inscrutability was always inevitable, right?
Q
3mo
Q
33
58Perils of under- vs over-sculpting AGI desires
Ω
3mo
Ω
13
48Interview with Steven Byrnes on Brain-like AGI, Foom & Doom, and Solving Technical Alignment
3mo
1
55Teaching kids to swim
4mo
12
Load More
Wanting vs Liking
2 years ago
Wanting vs Liking
2 years ago
(+139/-26)
Waluigi Effect
2 years ago
(+2087)