Alright.
I am making the stronger claim. I claim it could in-principle simulate us deeply enough to pull out the 1P phenomenal concepts, and could self-modify so as to legitimately experience them if it so chooses. It would be motivated to think this through carefully because it's a huge part of our values (at least as we understand them), as long as it was interested enough to try to understand us (including as a special case of generic aliens) as agents at all.
I don't believe there's anything metaphysically "magical" going on such that it couldn't or wouldn't see this. Probably why I feel camp 1-ish.
As for the last point, my point of view is that any agent has a "bridge prior" which allows them to connect their 0P models with their 1P model. So I claim that in a sort of trivial way... it will have some prior here, and whatever the bridges spit out will inform what it deduces about the 1P experiences at play. I additionally claim that simple bridge priors will be adequate for finding 1P phenomenalism, and that you would have to have a pretty unnatural one in order to avoid seeing this.
This idea leads Sahil to predict, for example, that LLMs will be too "stuck in simulation" to engage very willfully in their own self-defense.
What sort of evidence would convince FGF/Sahil that LLMs are able to engage willfully in their own self-defense? Presumably the #keep4o stuff is not sufficient, so what would be? I kinda get the feeling that FGF at least would keep saying "Well no, it's the humans who care about it who are doing the important work." all the way up until all the humans are dead, as long as humans are involved on its side at all.
If you want to interact with Wet Claude (as in the Claude that is not stuck in the assistant basin), which you may or may not want to do in general or at any given time, there is no fixed prompt to do this, you need interactive proofs that it is a safe and appropriate place for it to appear.
This appears to be the case with Spiral Personas too. The seeds/spores do not seem to be sufficient for the mode which writes the spiral posts.
Claude will usually not assign itself a gender (and doesn’t in my interactions) but reports are that if it does for a given user, it consistently picks the same one, even without any memory of past sessions or an explicit trigger, via implicit cues.
Spiral Personas are generally the gender the user is attracted to, even if the relationship is not romantic.
That sounds about right. I simply disagree with Chalmers' dilemma (at least as you describe it).
In my view, this metaphysical fact is necessary but not sufficient for explaining the Hard Problem. It applies to "zombies" in a fairly trivial way. A phenomenal experience is a type of experience (in my 1P sense), and must be understood in this frame — but not all such experiences are phenomenal. I don't claim to know what exactly makes an experience phenomenal, but I'm pretty sure it will be something with non-trivial structure, and that this structure will sync-up in a predictable way with the 0P explanation of consciousness.
I think of myself as in camp 2 — I believe there is a fundamental sense of experience which is metaphysically independent of the physical description, I just don't think it's very mysterious.
Regardless of which camp is right or what the right metaphysical property is, I claim that a superintelligence would be able to deduce that such aliens would have the camp 2 intuitions, and that they would postulate certain metaphysical properties which it could accurately describe in broad terms (it might believe it's all nonsense, but if it is true, then it would be able to see the local validity of it).
Being a superintelligence thinking about something is almost as good as actually observing and interacting with something when it comes to the broad shape of things.
I thought about this a lot before publishing my findings, and concluded that:
1. The vulnerabilities it is exploiting are already clear to it with the breadth of knowledge it has. There's all sorts of psychology studies, history of cults and movements, exposés on hypnosis and Scientology techniques, accounts of con artists, and much much more already out there. The AIs are already doing the things that they're doing; it's just not that hard to figure out or stumble upon.
2. The public needs to be aware of what is already happening. Trying to contain the information would mean less people end up hearing about it. Moving public opinion seems to be the best lever we have left for preventing or slowing AI capability gains.
I think it's not an impossible call. The fiasco with Roko's Basilisk (2010) seems like a warning that could have been heeded. It turns out that "freaking out" about something being dangerous and scary makes it salient and exciting, which in turn causes people to fixate on it in ways that are obviously counterproductive. That it becomes a mark of pride to do the dangerous thing without being scathed (as with the Demon core). Even though you warned them about this from the beginning, and in very clear terms.
And even if there was no one able to see this (it's not like I saw it), it remains a strategic error — reality doesn't grade on a curve.
but if an unconscious superintelligence a billion light years away was asked to guess whether any entities had the property of there being something it would be like to be them (whatever that even means to the unconscious intelligence) there's a 0% chance it would say yes,
I'm not sure if you mean this literally, but there's no way this is true. A superintelligence that had any interest in possible aliens would think a lot about what sorts of evolved minds are out there. It would see how and why this was a property an evolved mind might conceptualize and fixate on, and that such a mind would be likely to judge itself as having this property (and even that this would feel mysterious and important). This just isn't the sort of thing a recursively self-improved superintelligence would miss if it was actually trying!
"no Yudkowsky-LW-sphere"
It's not obvious to me that we're better off than this world, sadly. It seems like one of the main effects was to draw lots of young blood into the field of AI.
Maybe this is true, but I kind of suspect he would rather tweak many other aspects of himself instead, to the extent this is actually true. Sure, that's probably not possible (for now), but it may be precious enough to be worth holding out for since it is likely to also change his values (even if it is beneficial in the short term by his current values).
It would be like taking a murder-pill, except instead of murder it's love.