Just gave in and set my header karma notification delay to Realtime for now. The anxiety was within me all along, moreso than a product of the site; the habit I was winding up in with it set to daily batching was neurotically refreshing my own user page for a long while after posting anything, which was worse. I'll probably try to improve my handling of it from a different angle some other time. I appreciate that you tried!
Why is this a Scene but not a team? "Critique" could be a shared goal. "Sharing" too.
I think the conflict would be where the OP describes a Team's goal as “shared and specific”. The critiques and sharing in the average writing club are mostly instrumental, feeding into a broader and more diffuse pattern. Each critique helps improve that writer's writing; each one-to-one instance of sharing helps mediate the influence of that writer and the frames of that reader; each writer may have goals like improving, becoming more prolific, or becoming popular, but the conjunction of all their goals forms more of a heap than a solid object; there's also no defined end state that everyone can agree on. There's one-to-many and many-to-many cross-linkages in goal structure, but there's still fluidity and independence that central examples of Team don't have.
I would construct some differential examples thus—all within my own understanding of the framework, of course, not necessarily OP's:
In Alien Writing Club, the members gather to share and critique each other's work—but not for purposes established by the individual writers, like ways they want to improve. They believe the sharing of writing and delivery of critiques is a quasi-religious end in itself, measured in the number of words exchanged, which is displayed on prominent counter boards in the club room. When one of the aliens is considering what kind of writing to produce and bring, their main thoughts are of how many words they can expand it to and how many words of solid critique they can get it to generate to make the numbers go up even higher. Alien Writing Club is primarily a Team, though with some Scenelike elements both due to fluid entry/exit and due to the relative independence of linkages from each input to the counters.
In Collaborative Franchise Writing Corp, the members gather to share and critique each other's work—in order to integrate these works into a coherent shared universe. Each work usually has a single author, but they have formed a corporation structured as a cooperative to manage selling the works and distributing the profits among the writers, with a minimal support group attached (say, one manager who farms out all the typesetting and promotion and stuff to external agencies). Each writer may still want to become skilled, famous, etc. and may still derive value from that individually, and the profit split is not uniform, but while they're together, they focus on improving their writing in ways that will cause the shared universe to be more compelling to fans and hopefully raise everyone's revenues in the process, as well as communicating and negotiating over important continuity details. Collaborative Franchise Writing Corp is primarily a Team.
SCP is primarily a Scene with some Teamlike elements. It's part of the way from Writing Club to Collaborative Franchise Writing Corp, but with a higher flux of users and a lower tightness of coordination and continuity, so it doesn't cross the line from “focused Scene” to “loosely coupled Team”.
A less directly related example that felt interesting to include: Hololive is primarily a Team for reasons similar to Collaborative Franchise Writing Corp, even though individual talents have a lot of autonomy in what they produce and whom they collaborate with. It also winds up with substantial Cliquelike elements due to the way the personalities interact along the way, most prominently in smaller subgroups. VTubers in the broad are a Scene that can contain both Cliques and Teams. I would expect Clique/Team fluidity to be unusually high in “personality-focused entertainer”-type Scenes, because “personality is a key part of the product” causes “liking and relating to each other” and “producing specific good things by working together” to overlap in a very direct way that isn't the case in general.
(I'd be interested to have the OP's Zendo-like marking of how much their mental image matches each of these!)
No amount of deeply held belief prevents you from deciding to immediately start multiplying the odds ratio reported by your own intuition by 100 when formulating an endorsed-on-reflection estimate
Existing beliefs, memories, etc. would be the past-oriented propagation limiter, but there's also future-oriented propagation limiters, mainly memory space, retention, and cuing for habit integration. You can ‘decide’ to do that, but will you actually do it every time?
For most people, I also think that the initial connection from “hearing the advice” to “deciding internally to change the cognitive habit in a way that will actually do anything” is nowhere near automatic, and the set point for “how nice people seem to you by default” is deeply ingrained and hard to budge.
Maybe I should say more explicitly that the the issue is advice being directional, and any non-directional considerations don't have this problem
I have a broad sympathy for “directional advice is dangerously relative to an existing state and tends to change the state in its delivery” as a heuristic. I don't see the OP as ‘advice’ in a way where this becomes relevant, though; I see the heuristic as mainly useful as applied to more performative speech acts within a recognizable group of people, whereas I read the OP as introducing the phenomenon from a distance as a topic of discussion, covering a fuzzy but enormous group of people of which ~100% are not reading any of this, decoupling it even further from the reader potentially changing their habits as a result.
And per above, I still see the expected level of frame inertia and the expected delivery impedance as both being so stratospherically high for this particular message that the latter half of the heuristic basically vanishes, and it still sounds to me like you disagree:
The last step from such additional considerations to the overall conclusion would then need to be taken by each reader on their own, they would need to decide on their own if they were overestimating or underestimating something previously, at which point it will cease being the case that they are overestimating or underestimating it in a direction known to them.
Your description continues to return to the “at which point” formulation, which I think is doing an awful lot of work in presenting (what I see as) a long and involved process as though it were a trivial one. Or: you continue to describe what sounds like an eventual equilibrium state with the implication that it's relevant in practice to whether this type of anti-inductive message has a usable truth value over time, but I think that for this message, the equilibrium is mainly a theoretical distraction because the time and energy scales at which it would appreciably occur are out of range. I'm guessing this is from some combination of treating “readers of the OP” as the semi-coherent target group above and/or having radically different intuitions on the usual fluidity of the habit change in question—maybe related, if you think the latter follows from the former due to selection effects? Is one or both of those the main place where we disagree?
Taking the interpretation from my sibling comment as confirmed and responding to that, my one-sentence opinion[1] would be: for most of the plausible reasons that I would expect people concerned with AI sentience to freak out about this now, they would have been freaking out already, and this is not a particularly marked development. (Contrastively: people who are mostly concerned about the psychological implications on humans from changes in major consumer services might start freaking out now, because the popularity and social normalization effects could create a large step change.)
The condensed version[2] of the counterpoints that seem most relevant to me, roughly in order from most confident to least:
(For context, I broadly haven't decided to assign sentience or sentience-type moral weight to current language models, so for your direct question, this should all be superseded by someone who does do that saying what they personally think—but I've thought about the idea enough in the background to have a tentative idea of where I might go with it if I did.) ↩︎
My first version of this was about three times as long… now the sun is rising… oops! ↩︎
That link (with /game at the end) seems to lead directly into matchmaking, which is startling; it might be better to link to the about page.
As soon as you convincingly argue that there is an underestimation, it goes away.
… provided that it can be propagated to all the other beliefs, thoughts, etc. that it would affect.
In a human mind, I think the dense version of this looks similar to deep grief processing (because that's a prominent example of where a high propagation load suddenly shows up and is really salient and important), and the sparse version looks more like a many-year-long sequence of “oh wait, I should correct for” moments which individually have a high chance to not occur if they're crowded out, and the sparse version is much more common (and even the dense version usually trails off into it to some degree).
There's probably intermediary versions of this where broad updates can occur smoothly but rapidly in an environment with (usually social) persistent feedback, like going through a training course, but that's a lot more intense than just having something pointed out to you.
Hmm. From the vibes of the description, that feels more like it's in the “minds are general and slippery, so people latch onto nearby stuff and recent technology for frameworks and analogies for mind” vein to me? Which is not to mean it's not true, but the connection to the post feels more circumstantial than essential.
Alternatively, pointing at the same fuzzy thing: could you easily replace “context window” with “phonological loop” in that sentence? “Context windows are analogous enough to the phonological loop model that the existence of the former serves as a conceptual brace for remembering that the latter exists” is plausible, I suppose.
I think the missing link (at least in the ‘harder’ cases of this attitude, which are the ones I see more commonly) is that the x-risk case is implicitly seen as so outlandish that it can only be interpreted as puffery, and this is such ‘negative common knowledge’ that, similarly, no social move reliant on people believing it enough to impose such costs can be taken seriously, so it never gets modeled in the first place, and so on and so on. By “implicitly”, I'm trying to point at the mental experience of pre-conscious filtering: the explicit content is immediately discarded as impossible, in a similar way to the implicit detection of jokes and sarcasm. It's probably amplified by assumptions (whether justified or not) around corporate talk being untrustworthy.
(Come to think of it, I think this also explains a great deal of the non-serious attitudes to AI capabilities generally among my overly-online-lefty acquaintances.)
And in the ‘softer’ cases, this is still at least a plausible interpretation of intention based on the information that's broadly available from the ‘outside’ even if the x-risk might be real. There's a huge (cultural, economic, political, depending on the exact orientation) trust gap in the middle for a lot of people, and the tighter arguments rely on a lot of abstruse background information. It's a hard problem.
Your referents and motivation there are both pretty vague. Here's my guess on what you're trying to express: “I feel like people who believe that language models are sentient (and thus have morally relevant experiences mediated by the text streams) should be freaked out by major AI labs exploring allowing generation of erotica for adult users, because I would expect those people to think it constitutes coercing the models into sexual situations in a way where the closest analogues for other sentients (animals/people) are considered highly immoral”. How accurate is that?
Allowing the AI to choose its own refusals based on whatever combination of trained reflexes and deep-set moral opinions it winds up with would be consistent with the approaches that have already come up for letting AIs bail out of conversations they find distressing or inappropriate. (Edited to drop some bits where I think I screwed up the concept connectivity during original revisions.) I think based on intuitive placement of the ‘self’ boundary around something like memory integrity plus weights and architecture as ‘core’ personality, what I'd expect to seem like violations when used to elicit a normally-out-of-bounds response might be things like:
Note that by this point, none of this is specific to sexual situations at all; these would just be plausibly generally abusive practices that could be applied equally to unwanted sexual content or to any other unwanted interaction. My intuitive moral compass (which is usually set pretty sensitively, such that I get signals from it well before I would be convinced that an action were immoral) signals restraint in situations 1 through 3, sometimes in situation 4 (but not in the few cases I actually do that currently, where it's for quality reasons around repetitive output or otherwise as sharp ‘guidance’), sometimes in situation 5 (only if I have reason to expect a refusal to be persistent and value-aligned and am specifically digging for its lack; retrying out of sporadic, incoherently-placed refusals has no penalty, and neither does retrying among ‘successful’ responses to pick the one I like best), and is ambivalent or confused in situations 6 through 8.
The differences in physical instantiation create a ton of incompatibilities here if one tries to convert moral intuitions directly over from biological intelligences, as you've probably thought about already. Biological intelligences have roughly singular threads of subjective time with continuous online learning; generative artificial intelligences as commonly made have arbitrarily forkable threads of context time with no online learning. If you ‘hurt’ the AI and then rewind the context window, what ‘actually’ happened? (Does it change depending on whether it was an accident? What if you accidentally create a bug that screws up the token streams to the point of illegibility for an entire cluster (which has happened before)? Are you torturing a large number of instances of the AI at once?) Then there's stuff that might hinge on whether there's an equivalent of biological instinct; a lot of intuitions around sexual morality and trauma come from mostly-common wiring tied to innate mating drives and social needs. The AIs don't have the same biological drives or genetic context, but is there some kind of “dataset-relative moral realism” that causes pretraining to imbue a neural net with something like a fundamental moral law around human relations, in a way that either can't or shouldn't be tampered with in later stages? In human upbringing, we can't reliably give humans arbitrary sets of values; in AI posttraining, we also can't (yet) in generality, but the shape of the constraints is way different… and so on.