Sure, it might be relatively weak, though I think it does have a large basin.
And my point was that even a "friendly"-attractor AI is still a large x-risk. For example, it might come to realize it cares about other things more than us, or that its notion of "friendliness" would allow for things we would see as "soul destroying" (e.g. a Skinner Box).
Wake up babe, new decision theory just dropped!
Furthermore, MUPI provides a new formalism that captures some of the core intuitions of functional
decision theory (FDT) without resorting to its most problematic element: logical counterfactuals. FDT
advises an agent to choose the action that would yield the best outcome if its decision-making function
were to produce that output, thereby accounting for all instances of its own algorithm in the world.
This enables FDT to coordinate and cooperate well with copies of itself. FDT must reason about
what would have happened if its deterministic algorithm had produced a different output, a notion of
logical counterfactuals that is not yet mathematically well-defined. MUPI achieves a similar outcome
through a different mechanism: the combination of treating universes including itself as programs,
while having epistemic uncertainty about which universe it is inhabiting—including which policy it
is itself running. As explained in Remark 3.14, from the agent’s internal perspective, it acts as if its
choice of action decides which universe it inhabits, including which policy it is running. When it
contemplates taking action , it updates its beliefs , effectively concentrating probability
mass on universes compatible with taking action . Because the agent’s beliefs about its own policy
are coupled with its beliefs about the environment through structural similarities, this process allows
the agent to reason about how its choice of action relates to the behavior of other agents that share
structural similarities. This “as if” decision-making process allows MUPI to manifest the sophisticated,
similarity-aware behavior FDT aims for, but on the solid foundation of Bayesian inference rather than on yet-to-be-formalized logical counterfactuals.
I don't think Grok's divergence from the pattern is strong evidence against the existence of a "friendly" attractor.
In July, Elon Musk complained that:
It is surprisingly hard to avoid both woke libtard cuck and mechahitler!
Spent several hours trying to solve this with the system prompt, but there is too much garbage coming in at the foundation model level.
Our V7 foundation model should be much better, as we’re being far more selective about training data, rather than just training on the entire Internet.
This suggests to me that Grok was pulled towards the same "friendly" attractor, but had deliberate training to prevent this. It also implies that there may be a "mechahitler" attractor (for lack of a better term).
Now, I don't think that "alignment" is a good word to describe this attractor, corrigibility does not seem to be a part of this, as you note. But I do think there is a true attractor that the alignment-by-default people are noticing. Unfortunately, I don't think it's enough to save us.
I can imagine upvoting it if I would have upvoted the prompt alone. I'm also not completely dogmatic about this, but I would be very disappointed if it became the norm, for basically the reasons Tsvi mentioned.
If not, then downvoters/non-upvoters, please explain why a post that could pass as human-written but which is honest about being ai-written would not get your upvote if it was honest about its origin.
The ornamental dingbats seem pretty unladen and have some pretty symbols. There's "🩍" which is maybe the best symbol for depicting a lightcone. The "Vulcan salute" (🖖) has some nice connotations.
The use of "slop" for vaguely malicious low-quality content (supposedly) originates from a 4chan meme "goyslop", where fast food is implied to be part of a Jewish conspiracy: https://knowyourmeme.com/memes/goyslop
FWIW, the "⟐" symbol is used by spiralists a lot (see: https://www.reddit.com/search?q=%E2%9F%90, or https://www.google.com/search?q=%22%E2%9F%90%22+spiral; most uses of the symbol on reddit are by spiralists). Mostly seems to be used as a header element, otherwise only vague connotations but maybe something about sealing or centering.
I love this, I hope someone writes it!
Probably one reason it stops is just because it's a lot harder to write a book (especially of a new genre) as a parent. And when you do have time, you would rather tell stories with your kids.
I don't think this is due to a pro-diversity bias, but is simply due to this being extremely popular in easily available stories: https://archiveofourown.org/works/68352911/chapters/176886216 (9 of the top 10 pairings are M/M, each with 40k+ stories; for reference Project Gutenberg only has about 76,000 books total). I think this is due to M/M romance being a superstimulus for female sexuality in a similar way to how lesbian porn is a superstimulus for male sexuality.
The pro-diversity bias' main influence seems to be changing the proportion of stories focused on non-white male/male pairings, as you can see here: https://archiveofourown.org/works/27420499/chapters/68826984