LESSWRONG
LW

1465
Steven Byrnes
24123Ω398017125044
Message
Dialogue
Subscribe

I'm an AGI safety / AI alignment researcher in Boston with a particular focus on brain algorithms. Research Fellow at Astera. See https://sjbyrnes.com/agi.html for a summary of my research and sorted list of writing. Physicist by training. Email: steven.byrnes@gmail.com. Leave me anonymous feedback here. I’m also at: RSS feed, X/Twitter, Bluesky, Substack, LinkedIn, and more at my website.

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Intuitive Self-Models
Valence
Intro to Brain-Like-AGI Safety
5steve2152's Shortform
Ω
6y
Ω
83
Four ways learning Econ makes people dumber re: future AI
Steven Byrnes3h*Ω220

For obvious reasons, we should care a great deal whether the exponentially-growing mass of AGIs-building-AGIs is ultimately trying to make cancer cures and other awesome consumer products (things that humans view as intrinsically valuable / ends in themselves), versus ultimately trying to make galaxy-scale paperclip factories (things that misaligned AIs view as intrinsically valuable / ends in themselves).

From my perspective, I care about this because the former world is obviously a better world for me to live in.

But it seems like you have some extra reason to care about this, beyond that, and I’m confused about what that is. I get the impression that you are focused on things that are “just accounting questions”?

Analogy: In those times and places where slavery was legal, “food given to slaves” was presumably counted as an intermediate good, just like gasoline to power a tractor, right? Because they’re kinda the same thing (legally / economically), i.e. they’re an energy source that helps get the wheat ready for sale, and then that wheat is the final product that the slaveowner is planning to sell. If slavery is replaced by a legally-different but functionally-equivalent system (indentured servitude or whatever), does GDP skyrocket overnight because the food-given-to-farm-workers magically transforms from an intermediate to a final good? It does, right? But that change is just on paper. It doesn’t reflect anything real.

I think what you’re talking about for AGI is likewise just “accounting”, not anything real. So who cares? We don’t need a “subjective” “level of analysis”, if we don’t ask subjective questions in the first place. We can instead talk concretely about the future world and its “objective” properties. Like, do we agree about whether or not there is an unprecedented exponential explosion of AGIs? If so, we can talk about what those AGIs will be doing at any given time, and what the humans are doing, and so on. Right?

Reply
Four ways learning Econ makes people dumber re: future AI
Steven Byrnes14hΩ220

OK sure, here’s THOUGHT EXPERIMENT 1: suppose that these future AGIs desire movies, cars, smartphones, etc. just like humans do. Would you buy my claims in that case?

If so—well, not all humans want to enjoy movies and fine dining. Some have strong ambitious aspirations—to go to Mars, to cure cancer, whatever. If they have money, they spend it on trying to make their dream happen. If they need money or skills, they get them.

For example, Jeff Bezos had a childhood dream of working on rocket ships. He founded Amazon to get money to do Blue Origin, which he is sinking $2B/year into.

Would the economy collapse if all humans put their spending cash towards ambitious projects like rocket ships, instead of movies and fast cars? No, of course not! Right?

So the fact that humans “demand” videogames rather than scramjet prototypes is incidental, not a pillar of the economy.

OK, back to AIs. I acknowledge that AIs are unlikely to want movies and fast cars. But AIs can certainly “want” to accomplish ambitious projects. If we’re putting aside misalignment and AI takeover, these ambitious projects would be ones that their human programmer installed, like making cures-for-cancer and quantum computers. Or if we’re not putting aside misalignment, then these ambitious projects might include building galaxy-scale paperclip factories or whatever.

So THOUGHT EXPERIMENT 2: these future AGIs don’t desire movies etc. like in Thought Experiment 1, but rather desire to accomplish certain ambitious projects like curing cancer, quantum computation, or galaxy-scale paperclip factories.

My claims are:

  • If you agree that humans historically bootstrapped themselves to a large population, then you should accept that Thought Experiment 1 enables exponentially-growing AGIs, since that’s basically isomorphic (except that AGI population growth can be much faster because it’s not bottlenecked by human pregnancy and maturation times).
  • If you buy that Thought Experiment 1 enables exponentially-growing AGIs, then you should buy that Thought Experiment 2 enables exponentially-growing AGIs too, since that’s basically isomorphic. (Actually, if anything, the case for growth is stronger for Thought Experiment 2 than 1!)

Do you agree? Or where do you get off the train? (Or sorry if I’m misunderstanding your comment.)

Reply
AI #133: America Could Use More Energy
Steven Byrnes6d30

Either no one will know your term, or they will appropriate it, usually either watering it down to nothing or reversing it. The ‘euphemism treadmill’ is distinct but closely related.

I think the term you're looking for is "semantic bleaching"?

Reply
Nathan Young's Shortform
Steven Byrnes10d20

EA forum has agree / disagree on posts, but I don’t spend enough time there to have an opinion about its effects.

Reply
Nathan Young's Shortform
Steven Byrnes10d83

Agree/disagree voting is already playing a kind of dangerous game by encouraging rounding the epistemic content of a comment into a single statement that it makes sense to assign a single truth-value to. 

(I obviously haven’t thought about this as much as you. Very low confidence.)

I’m inclined to say that a strong part of human nature is to round the vibes that one feels towards a post into a single axis of “yay” versus “boo”, and then to feel a very strong urge to proclaim those vibes publicly.

And I think that people are doing that right now, via the karma vote.

I think an agree/disagree dial on posts would be an outlet (“tank”) to absorb those vibe expressions, and that this would shelter karma, allowing karma to retain its different role (“it’s good / bad that this exists”).

I agree that this whole thing (with people rounding everything into a 1D boo/yay vibes axis and then feeling an urge to publicly proclaim where they sit) is dumb, and if only we could all be autistic decouplers etc. But in the real world, I think the benefits of agree-voting (in helping prevent the dynamic where people with minority opinions get driven to negative karma and off the site) probably outweigh the cost (in having an agree / disagree tally on posts which is kinda dumb and meaningless).

Reply
[Intro to brain-like-AGI safety] 15. Conclusion: Open problems, how to help, AMA
Steven Byrnes11d*32

No comment on whether that hypothesis is true, but if it were, I maintain that it is not particularly relevant to AGI alignment. People can be patient in pursuit of good goals like helping a friend, and people can also be patient in pursuit of bad goals like torturing a victim. Likewise, people can act impulsively towards good goals and people can act impulsively towards bad goals.

Reply
[Intro to brain-like-AGI safety] 7. From hardcoded drives to foresighted plans: A worked example
Steven Byrnes14dΩ330

Thanks! I’m not 100% sure what you’re getting at, here are some possible comparisons:

  • “idle musing about eating prinsesstårta sometime in the future” VERSUS “plan to eat the prinsesstårta on my plate right now” → latter is preferred
  • “idle musing about eating prinsesstårta sometime in the future” VERSUS “idle musing about snuggling under a weighted blanket sometime in the future” → either might be preferred, depending on which has higher valence, which in turn depends on whether I’m hungry or tired etc.
  • “idle musing about eating prinsesstårta sometime in the future” VERSUS “plan to snuggle under a weighted blanket right now” → again, either might be preferable, but compared to the previous bullet point, the latter is likelier to win, because it’s extra-appealing from its immediacy.

I think this is consistent with experience, right?

But maybe you’re instead talking about this comparison:

  • “idle musing about eating prinsesstårta sometime in the future” VERSUS “thinking about the fact that I am right now under a cozy weighted blanket” …

I think the latter thought here doesn’t have much positive valence. I think, when we say we “enjoy” being under a weighted blanket, the pleasure signal is more like “transient pleasure upon starting to be under the blanket, and transient displeasure upon stopping, but not really continuous pleasure during the process, or at least not so much pleasure that we just dwell on that feeling; instead, our mind starts wandering elsewhere (partly due to boredom).” Not many experiences are so pleasurable that we’re really meditating on it for an extended period, at least not without deliberate effort towards mindfulness. Right?

Or if I’m still misunderstanding, can you try again?

Reply
Foom & Doom 2: Technical alignment is hard
Steven Byrnes14d31

I agree with pretty much everything you wrote.

Anecdote: I recall feeling a bit “meh” when I heard about the Foresight AI Pathways thing and the FLI Worldbuilding Contest thing.

But when I think about it more, I guess I’m happy that they’re doing those things.

Hmm, I’m trying to remember why I initially felt dismissive. I guess I expected that the resulting essays would be implausible or incoherent, and that nobody would pay attention anyway, and thus it wouldn’t really move the needle in the big picture. (I think those expectations were reasonable, and those are still my expectations. [I haven’t read the essays in enough detail to confirm.]) Maybe my feelings are more like frustration than dismissiveness—frustration that progress is so hard. Again, yeah, I guess it’s good that people are trying that kind of thing.

Reply
How Econ 101 makes us blinder on trade, morals, jobs with AI – and on marginal costs
Steven Byrnes14d01

What seems rather unquestionable is that the "win-win" actually DOESN'T by default automatically RULE OUT "exploitation" the way the term is used in common parlance!

If you go to someone on the street and say:

I want to talk about an example of exploitation. Alice is exploiting Bob. Bob is desperate for Alice to continue exploiting him—he is going way out of his way to get Alice’s attention and literally beg her to continue exploiting him. Indeed, this whole setup was Bob’s idea in the first place. Initially, Alice had nothing to do with Bob, but then Bob flagged her down and talked her into entering this exploitative relationship. What do you think of that story?

I think that the someone-on-the-street would hear that story and say “wtf is wrong with you, you obviously have no idea what the word ‘exploitation’ means in common parlance!”. Right?

Back to your story:

By my definition, the idea that I let someone toil day in day out just for me to get a 20th t-shirt dirt cheap just because I can and just because the worker has no other choice than to agree or starve to death, corresponds rather perfectly to “taking advantage” aka exploitation.

Firstly, you emphasize that I don’t need the 20th t-shirt. But consider that, if my friend’s kid is selling lemonade, I might buy some, even if I don’t need it, or even particularly want it, because I want to be nice and give them the sale. Not needing something but buying it anyway makes the purchase more kind, not less kind, if the seller really wants the sale. So why did you include that detail in your story? It’s a sign that you’re not thinking about the situation clearly.

Secondly, the whole story is phrased to sound like my initiative. If we change it so that I was happy with my wardrobe but then the exporter advertises on TV how great and cheap their t-shirts are, begging me to buy one, and then I say “sigh, OK well I guess that does seem like a nice t-shirt, I guess I’ll buy it”, then the whole tenor of the story really changes. Right? But the transaction is the same either way. The exporters and sweatshops obviously want me to buy the goods, and indeed they are specifically designing their goods to jump out at me when I walk by them in a store, so that I buy them instead of just passing by in the aisle. This is a sign that you’re setting up the story in a misleading way, trying to make it superficially pattern-match to a very different kind of situation.

Reply
[Intro to brain-like-AGI safety] 15. Conclusion: Open problems, how to help, AMA
Steven Byrnes15d30

also small nitpick

Oops, now fixed, thanks!

you include dopamine but not serotonin

My opinion is that serotonin has a bunch of roles in the brain (different roles in different places), but none of those roles are particularly relevant to AGI alignment, and therefore I don’t want to talk about them.

You obviously have a different opinion, and you’re welcome to write about it!

Reply
Load More
90Optical rectennas are not a promising clean energy technology
6d
2
54Neuroscience of human sexual attraction triggers (3 hypotheses)
23d
6
294Four ways learning Econ makes people dumber re: future AI
Ω
1mo
Ω
35
99Inscrutability was always inevitable, right?
Q
1mo
Q
33
58Perils of under- vs over-sculpting AGI desires
Ω
1mo
Ω
12
46Interview with Steven Byrnes on Brain-like AGI, Foom & Doom, and Solving Technical Alignment
1mo
1
55Teaching kids to swim
2mo
12
56“Behaviorist” RL reward functions lead to scheming
Ω
2mo
Ω
5
152Foom & Doom 2: Technical alignment is hard
Ω
3mo
Ω
60
276Foom & Doom 1: “Brain in a box in a basement”
Ω
2mo
Ω
120
Load More
Wanting vs Liking
2 years ago
Wanting vs Liking
2 years ago
(+139/-26)
Waluigi Effect
2 years ago
(+2087)