LESSWRONG
LW

1595
Steven Byrnes
24121Ω398017125024
Message
Dialogue
Subscribe

I'm an AGI safety / AI alignment researcher in Boston with a particular focus on brain algorithms. Research Fellow at Astera. See https://sjbyrnes.com/agi.html for a summary of my research and sorted list of writing. Physicist by training. Email: steven.byrnes@gmail.com. Leave me anonymous feedback here. I’m also at: RSS feed, X/Twitter, Bluesky, Substack, LinkedIn, and more at my website.

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
5steve2152's Shortform
Ω
6y
Ω
83
Intuitive Self-Models
Valence
Intro to Brain-Like-AGI Safety
AI #133: America Could Use More Energy
Steven Byrnes4d30

Either no one will know your term, or they will appropriate it, usually either watering it down to nothing or reversing it. The ‘euphemism treadmill’ is distinct but closely related.

I think the term you're looking for is "semantic bleaching"?

Reply
Nathan Young's Shortform
Steven Byrnes8d20

EA forum has agree / disagree on posts, but I don’t spend enough time there to have an opinion about its effects.

Reply
Nathan Young's Shortform
Steven Byrnes8d83

Agree/disagree voting is already playing a kind of dangerous game by encouraging rounding the epistemic content of a comment into a single statement that it makes sense to assign a single truth-value to. 

(I obviously haven’t thought about this as much as you. Very low confidence.)

I’m inclined to say that a strong part of human nature is to round the vibes that one feels towards a post into a single axis of “yay” versus “boo”, and then to feel a very strong urge to proclaim those vibes publicly.

And I think that people are doing that right now, via the karma vote.

I think an agree/disagree dial on posts would be an outlet (“tank”) to absorb those vibe expressions, and that this would shelter karma, allowing karma to retain its different role (“it’s good / bad that this exists”).

I agree that this whole thing (with people rounding everything into a 1D boo/yay vibes axis and then feeling an urge to publicly proclaim where they sit) is dumb, and if only we could all be autistic decouplers etc. But in the real world, I think the benefits of agree-voting (in helping prevent the dynamic where people with minority opinions get driven to negative karma and off the site) probably outweigh the cost (in having an agree / disagree tally on posts which is kinda dumb and meaningless).

Reply
[Intro to brain-like-AGI safety] 15. Conclusion: Open problems, how to help, AMA
Steven Byrnes8d*32

No comment on whether that hypothesis is true, but if it were, I maintain that it is not particularly relevant to AGI alignment. People can be patient in pursuit of good goals like helping a friend, and people can also be patient in pursuit of bad goals like torturing a victim. Likewise, people can act impulsively towards good goals and people can act impulsively towards bad goals.

Reply
[Intro to brain-like-AGI safety] 7. From hardcoded drives to foresighted plans: A worked example
Steven Byrnes11dΩ330

Thanks! I’m not 100% sure what you’re getting at, here are some possible comparisons:

  • “idle musing about eating prinsesstårta sometime in the future” VERSUS “plan to eat the prinsesstårta on my plate right now” → latter is preferred
  • “idle musing about eating prinsesstårta sometime in the future” VERSUS “idle musing about snuggling under a weighted blanket sometime in the future” → either might be preferred, depending on which has higher valence, which in turn depends on whether I’m hungry or tired etc.
  • “idle musing about eating prinsesstårta sometime in the future” VERSUS “plan to snuggle under a weighted blanket right now” → again, either might be preferable, but compared to the previous bullet point, the latter is likelier to win, because it’s extra-appealing from its immediacy.

I think this is consistent with experience, right?

But maybe you’re instead talking about this comparison:

  • “idle musing about eating prinsesstårta sometime in the future” VERSUS “thinking about the fact that I am right now under a cozy weighted blanket” …

I think the latter thought here doesn’t have much positive valence. I think, when we say we “enjoy” being under a weighted blanket, the pleasure signal is more like “transient pleasure upon starting to be under the blanket, and transient displeasure upon stopping, but not really continuous pleasure during the process, or at least not so much pleasure that we just dwell on that feeling; instead, our mind starts wandering elsewhere (partly due to boredom).” Not many experiences are so pleasurable that we’re really meditating on it for an extended period, at least not without deliberate effort towards mindfulness. Right?

Or if I’m still misunderstanding, can you try again?

Reply
Foom & Doom 2: Technical alignment is hard
Steven Byrnes12d31

I agree with pretty much everything you wrote.

Anecdote: I recall feeling a bit “meh” when I heard about the Foresight AI Pathways thing and the FLI Worldbuilding Contest thing.

But when I think about it more, I guess I’m happy that they’re doing those things.

Hmm, I’m trying to remember why I initially felt dismissive. I guess I expected that the resulting essays would be implausible or incoherent, and that nobody would pay attention anyway, and thus it wouldn’t really move the needle in the big picture. (I think those expectations were reasonable, and those are still my expectations. [I haven’t read the essays in enough detail to confirm.]) Maybe my feelings are more like frustration than dismissiveness—frustration that progress is so hard. Again, yeah, I guess it’s good that people are trying that kind of thing.

Reply
How Econ 101 makes us blinder on trade, morals, jobs with AI – and on marginal costs
Steven Byrnes12d01

What seems rather unquestionable is that the "win-win" actually DOESN'T by default automatically RULE OUT "exploitation" the way the term is used in common parlance!

If you go to someone on the street and say:

I want to talk about an example of exploitation. Alice is exploiting Bob. Bob is desperate for Alice to continue exploiting him—he is going way out of his way to get Alice’s attention and literally beg her to continue exploiting him. Indeed, this whole setup was Bob’s idea in the first place. Initially, Alice had nothing to do with Bob, but then Bob flagged her down and talked her into entering this exploitative relationship. What do you think of that story?

I think that the someone-on-the-street would hear that story and say “wtf is wrong with you, you obviously have no idea what the word ‘exploitation’ means in common parlance!”. Right?

Back to your story:

By my definition, the idea that I let someone toil day in day out just for me to get a 20th t-shirt dirt cheap just because I can and just because the worker has no other choice than to agree or starve to death, corresponds rather perfectly to “taking advantage” aka exploitation.

Firstly, you emphasize that I don’t need the 20th t-shirt. But consider that, if my friend’s kid is selling lemonade, I might buy some, even if I don’t need it, or even particularly want it, because I want to be nice and give them the sale. Not needing something but buying it anyway makes the purchase more kind, not less kind, if the seller really wants the sale. So why did you include that detail in your story? It’s a sign that you’re not thinking about the situation clearly.

Secondly, the whole story is phrased to sound like my initiative. If we change it so that I was happy with my wardrobe but then the exporter advertises on TV how great and cheap their t-shirts are, begging me to buy one, and then I say “sigh, OK well I guess that does seem like a nice t-shirt, I guess I’ll buy it”, then the whole tenor of the story really changes. Right? But the transaction is the same either way. The exporters and sweatshops obviously want me to buy the goods, and indeed they are specifically designing their goods to jump out at me when I walk by them in a store, so that I buy them instead of just passing by in the aisle. This is a sign that you’re setting up the story in a misleading way, trying to make it superficially pattern-match to a very different kind of situation.

Reply
[Intro to brain-like-AGI safety] 15. Conclusion: Open problems, how to help, AMA
Steven Byrnes12d30

also small nitpick

Oops, now fixed, thanks!

you include dopamine but not serotonin

My opinion is that serotonin has a bunch of roles in the brain (different roles in different places), but none of those roles are particularly relevant to AGI alignment, and therefore I don’t want to talk about them.

You obviously have a different opinion, and you’re welcome to write about it!

Reply
Foom & Doom 2: Technical alignment is hard
Steven Byrnes12d30

Thanks!

Society can do way more than you seem to think to make sure ASI (more precisely: takeover-level AI) does not get built. It seems that you're combining "ASI is going to be insanely more powerful than not just LLMs, but than any AI anyone can currently imagine" with "we will only have today's policy levers to pull". I think that's only realistic if this AI will get built very suddenly and without an increase in xrisk awareness. Xrisk awareness could increase because of relatively gradual development or because of comms… In such a world, I'd be reasonably optimistic about regulating even low-flop ASI (and no, not via mass surveillance, probably).

I’m not sure what policy levers you have in mind; if you’re being coy in public you can also DM me.

If you’re thinking about regulating basic research, or regulating anything that could plausibly be branded as basic research, I would note that pandemics are pretty salient after COVID (one would assume), but IIUC it is currently and always has been legal to do gain-of-function research. (The current battle-front in the GoF wars IIUC is whether the government should fund it. Actually outlawing it would be very much harder. Outlawing it internationally would be harder still.) As another example, there was an international treaty against bioweapons but the USSR secretly made bioweapons anyway.

In the USA, climate change is a huge cause that is widely (if incorrectly) regarded as existential or nearly so (56% agree with “humanity is doomed”, 55% with “the things I value most will be destroyed”, source) but carbon taxes remain deeply unpopular, Trump is gratuitously canceling green energy projects, and even before Trump, green energy projects were subject to stifling regulations like environmental reviews.

Another thing: Suppose I announced in 2003: “By the time we can make AI that can pass the Turing Test, nail PhD-level exams in every field, display superhuman persuasion, find zero-days, etc., by THEN clearly lots of crazy new options will be in the Overton window”. …I think that would have sounded (to my 2003 audience) like a very sensible proclamation, and my listeners would have all agreed that this is obviously true.

But here we are. All those things have happened. But people are still generally treating AI as a normal technology, getting inured to each new AI accomplishment within days of it happening, or being oblivious, or lying, or saying and believing whatever nonsense most suits them, etc. And thus there’s still far more regulation on selling a sandwich than on training the world’s most powerful AI. (To be clear, I think people are generally correct to treat LLMs-in-particular as a normal technology, but I think they’re correct by coincidence.)

…Anyway, all this is a bit besides the point. “Whether we’re doomed or not” is fun to argue about, but less decision-relevant than “what should we do now”, and it seems that you and I are in agreement with you that comms, x-risk awareness, and gradual development are all generally good, on present margins.

Although you do gesture at this, I think you're underappreciating the importance of what some have called goalcrafting (and I think many technical AI safety researchers underestimate this).

Yes, this is an extra reason that we’re even more doomed :-P

We need goals that are both (1) technically achievable to install in an AGI and (2) good for the world / future. I tend not to expect that we’ll do so great on technical alignment that we can focus on (2) without feeling very constrained by (1); rather I expect that (1) will only offer a limited option space (and of course I think where we’re at right now is that (1) is the empty set). But I guess we’ll see.

If we make so much progress on (1) that we can type in anything whatsoever in a text box and that’s definitely the AGIs goals, then I guess I’d vote for Eliezer’s poetic CEV thing. Of course, the people with access to the text box may type something different instead, but that’s a problem regardless.

If we don’t make that much progress on (1), then goalcrafting becomes entangled with technical alignment, right?

Hmm, thinking about it more, I agree that it would be nice to build our general understanding of (2) in parallel with work on (1). E.g. can we do more to operationalize long reflection, or archipeligo, or CEV, or nanny AI, etc.? Not sure how to productively “involve society” at this stage (what did you have in mind?), beyond my general feeling that very widely spreading the news that ASI could actually happen, and what that would really mean, is a very good thing.

Reply
Four ways learning Econ makes people dumber re: future AI
Steven Byrnes12dΩ450

This is the idea that at some point in scaling up an organization you could lose efficiency due to needing more/better management, more communication (meetings) needed and longer communication processes, "bloat" in general. I'm not claiming it’s likely to happen with AI, just another possible reason for increasing marginal cost with scale.

Hmm, that would apply to an individual firm but not to a product category, right? If Firm 1 is producing so much [AGI component X] that they pile up bureaucracy and inefficiency, then Firms 2, 3, 4, and 5 will start producing [AGI component X] with less bureaucracy, and undercut Firm 1, right? If there’s an optimal firm size, the market can still be arbitrarily large via arbitrarily many independent firms of that optimal size.

(Unless Firm 1 has a key patent, or uses its market power to do anticompetitive stuff, etc. …although I don’t expect IP law or other such forces to hold internationally given the stakes of AGI.)

(Separately, I think AGI will drastically increase economies of scale, particularly related to coordination problems.)

I see how this could happen, but I'm not convinced this effect is actually happening. … I think my crux is that this isn't unique to economists.

It’s definitely true that non-economists are capable of dismissing AGI for bad reasons, even if this post is not mainly addressed at non-economists. I think the thing I said is a contributory factor for at least some economists, based on my experience and conversations, but not all economists, and maybe I’m just mistaken about where those people are coming from. Oh well, it’s probably not worth putting too much effort into arguing about Bulverism. Thanks for your input though.

Reply
Load More
Wanting vs Liking
2 years ago
Wanting vs Liking
2 years ago
(+139/-26)
Waluigi Effect
2 years ago
(+2087)
87Optical rectennas are not a promising clean energy technology
3d
1
54Neuroscience of human sexual attraction triggers (3 hypotheses)
20d
6
296Four ways learning Econ makes people dumber re: future AI
Ω
24d
Ω
31
99Inscrutability was always inevitable, right?
Q
1mo
Q
33
58Perils of under- vs over-sculpting AGI desires
Ω
1mo
Ω
12
46Interview with Steven Byrnes on Brain-like AGI, Foom & Doom, and Solving Technical Alignment
1mo
1
55Teaching kids to swim
2mo
12
56“Behaviorist” RL reward functions lead to scheming
Ω
2mo
Ω
5
152Foom & Doom 2: Technical alignment is hard
Ω
3mo
Ω
60
276Foom & Doom 1: “Brain in a box in a basement”
Ω
2mo
Ω
120
Load More