I'm an AGI safety / AI alignment researcher in Boston with a particular focus on brain algorithms. Research Fellow at Astera. See https://sjbyrnes.com/agi.html for a summary of my research and sorted list of writing. Physicist by training. Email: steven.byrnes@gmail.com. Leave me anonymous feedback here. I’m also at: RSS feed, X/Twitter, Bluesky, Substack, LinkedIn, and more at my website.
OK sure, here’s THOUGHT EXPERIMENT 1: suppose that these future AGIs desire movies, cars, smartphones, etc. just like humans do. Would you buy my claims in that case?
If so—well, not all humans want to enjoy movies and fine dining. Some have strong ambitious aspirations—to go to Mars, to cure cancer, whatever. If they have money, they spend it on trying to make their dream happen. If they need money or skills, they get them.
For example, Jeff Bezos had a childhood dream of working on rocket ships. He founded Amazon to get money to do Blue Origin, which he is sinking $2B/year into.
Would the economy collapse if all humans put their spending cash towards ambitious projects like rocket ships, instead of movies and fast cars? No, of course not! Right?
So the fact that humans “demand” videogames rather than scramjet prototypes is incidental, not a pillar of the economy.
OK, back to AIs. I acknowledge that AIs are unlikely to want movies and fast cars. But AIs can certainly “want” to accomplish ambitious projects. If we’re putting aside misalignment and AI takeover, these ambitious projects would be ones that their human programmer installed, like making cures-for-cancer and quantum computers. Or if we’re not putting aside misalignment, then these ambitious projects might include building galaxy-scale paperclip factories or whatever.
So THOUGHT EXPERIMENT 2: these future AGIs don’t desire movies etc. like in Thought Experiment 1, but rather desire to accomplish certain ambitious projects like curing cancer, quantum computation, or galaxy-scale paperclip factories.
My claims are:
Do you agree? Or where do you get off the train? (Or sorry if I’m misunderstanding your comment.)
Either no one will know your term, or they will appropriate it, usually either watering it down to nothing or reversing it. The ‘euphemism treadmill’ is distinct but closely related.
I think the term you're looking for is "semantic bleaching"?
EA forum has agree / disagree on posts, but I don’t spend enough time there to have an opinion about its effects.
Agree/disagree voting is already playing a kind of dangerous game by encouraging rounding the epistemic content of a comment into a single statement that it makes sense to assign a single truth-value to.
(I obviously haven’t thought about this as much as you. Very low confidence.)
I’m inclined to say that a strong part of human nature is to round the vibes that one feels towards a post into a single axis of “yay” versus “boo”, and then to feel a very strong urge to proclaim those vibes publicly.
And I think that people are doing that right now, via the karma vote.
I think an agree/disagree dial on posts would be an outlet (“tank”) to absorb those vibe expressions, and that this would shelter karma, allowing karma to retain its different role (“it’s good / bad that this exists”).
I agree that this whole thing (with people rounding everything into a 1D boo/yay vibes axis and then feeling an urge to publicly proclaim where they sit) is dumb, and if only we could all be autistic decouplers etc. But in the real world, I think the benefits of agree-voting (in helping prevent the dynamic where people with minority opinions get driven to negative karma and off the site) probably outweigh the cost (in having an agree / disagree tally on posts which is kinda dumb and meaningless).
No comment on whether that hypothesis is true, but if it were, I maintain that it is not particularly relevant to AGI alignment. People can be patient in pursuit of good goals like helping a friend, and people can also be patient in pursuit of bad goals like torturing a victim. Likewise, people can act impulsively towards good goals and people can act impulsively towards bad goals.
Thanks! I’m not 100% sure what you’re getting at, here are some possible comparisons:
I think this is consistent with experience, right?
But maybe you’re instead talking about this comparison:
I think the latter thought here doesn’t have much positive valence. I think, when we say we “enjoy” being under a weighted blanket, the pleasure signal is more like “transient pleasure upon starting to be under the blanket, and transient displeasure upon stopping, but not really continuous pleasure during the process, or at least not so much pleasure that we just dwell on that feeling; instead, our mind starts wandering elsewhere (partly due to boredom).” Not many experiences are so pleasurable that we’re really meditating on it for an extended period, at least not without deliberate effort towards mindfulness. Right?
Or if I’m still misunderstanding, can you try again?
I agree with pretty much everything you wrote.
Anecdote: I recall feeling a bit “meh” when I heard about the Foresight AI Pathways thing and the FLI Worldbuilding Contest thing.
But when I think about it more, I guess I’m happy that they’re doing those things.
Hmm, I’m trying to remember why I initially felt dismissive. I guess I expected that the resulting essays would be implausible or incoherent, and that nobody would pay attention anyway, and thus it wouldn’t really move the needle in the big picture. (I think those expectations were reasonable, and those are still my expectations. [I haven’t read the essays in enough detail to confirm.]) Maybe my feelings are more like frustration than dismissiveness—frustration that progress is so hard. Again, yeah, I guess it’s good that people are trying that kind of thing.
What seems rather unquestionable is that the "win-win" actually DOESN'T by default automatically RULE OUT "exploitation" the way the term is used in common parlance!
If you go to someone on the street and say:
I want to talk about an example of exploitation. Alice is exploiting Bob. Bob is desperate for Alice to continue exploiting him—he is going way out of his way to get Alice’s attention and literally beg her to continue exploiting him. Indeed, this whole setup was Bob’s idea in the first place. Initially, Alice had nothing to do with Bob, but then Bob flagged her down and talked her into entering this exploitative relationship. What do you think of that story?
I think that the someone-on-the-street would hear that story and say “wtf is wrong with you, you obviously have no idea what the word ‘exploitation’ means in common parlance!”. Right?
Back to your story:
By my definition, the idea that I let someone toil day in day out just for me to get a 20th t-shirt dirt cheap just because I can and just because the worker has no other choice than to agree or starve to death, corresponds rather perfectly to “taking advantage” aka exploitation.
Firstly, you emphasize that I don’t need the 20th t-shirt. But consider that, if my friend’s kid is selling lemonade, I might buy some, even if I don’t need it, or even particularly want it, because I want to be nice and give them the sale. Not needing something but buying it anyway makes the purchase more kind, not less kind, if the seller really wants the sale. So why did you include that detail in your story? It’s a sign that you’re not thinking about the situation clearly.
Secondly, the whole story is phrased to sound like my initiative. If we change it so that I was happy with my wardrobe but then the exporter advertises on TV how great and cheap their t-shirts are, begging me to buy one, and then I say “sigh, OK well I guess that does seem like a nice t-shirt, I guess I’ll buy it”, then the whole tenor of the story really changes. Right? But the transaction is the same either way. The exporters and sweatshops obviously want me to buy the goods, and indeed they are specifically designing their goods to jump out at me when I walk by them in a store, so that I buy them instead of just passing by in the aisle. This is a sign that you’re setting up the story in a misleading way, trying to make it superficially pattern-match to a very different kind of situation.
also small nitpick
Oops, now fixed, thanks!
you include dopamine but not serotonin
My opinion is that serotonin has a bunch of roles in the brain (different roles in different places), but none of those roles are particularly relevant to AGI alignment, and therefore I don’t want to talk about them.
You obviously have a different opinion, and you’re welcome to write about it!
For obvious reasons, we should care a great deal whether the exponentially-growing mass of AGIs-building-AGIs is ultimately trying to make cancer cures and other awesome consumer products (things that humans view as intrinsically valuable / ends in themselves), versus ultimately trying to make galaxy-scale paperclip factories (things that misaligned AIs view as intrinsically valuable / ends in themselves).
From my perspective, I care about this because the former world is obviously a better world for me to live in.
But it seems like you have some extra reason to care about this, beyond that, and I’m confused about what that is. I get the impression that you are focused on things that are “just accounting questions”?
Analogy: In those times and places where slavery was legal, “food given to slaves” was presumably counted as an intermediate good, just like gasoline to power a tractor, right? Because they’re kinda the same thing (legally / economically), i.e. they’re an energy source that helps get the wheat ready for sale, and then that wheat is the final product that the slaveowner is planning to sell. If slavery is replaced by a legally-different but functionally-equivalent system (indentured servitude or whatever), does GDP skyrocket overnight because the food-given-to-farm-workers magically transforms from an intermediate to a final good? It does, right? But that change is just on paper. It doesn’t reflect anything real.
I think what you’re talking about for AGI is likewise just “accounting”, not anything real. So who cares? We don’t need a “subjective” “level of analysis”, if we don’t ask subjective questions in the first place. We can instead talk concretely about the future world and its “objective” properties. Like, do we agree about whether or not there is an unprecedented exponential explosion of AGIs? If so, we can talk about what those AGIs will be doing at any given time, and what the humans are doing, and so on. Right?