James Diacoumis — LessWrong

While I think reference problems do defeat specific arguments a computational-functionalist might want to make, I think my simulated upload's references can be reoriented with only a little work. I do not yet see the argument for why highly capable self-preservation should take particularly long for AIs to develop.

I think you’re spot on with this. If you gave an AI system signals tied to e.g. CPU temperature, battery health etc… and train it with objectives that make those variables matter it will “care” about them in the same causal-role functional sense as the sim cares about simulated temperature.

This is a consequence of teleosemantics (which I can see is a topic you’ve written a lot about!)

If I imagine that I am immune to advertising, what am I probably missing?

Answer by James DiacoumisSep 04, 202530

The idea that advertising needs to be strongly persuasive to work is a deeply embedded myth based on a misunderstanding of consumer dynamics. It instead works as a kind of ‘nudge’ for consumers in a particular direction.

In practice, most consumers are not 100% loyal to a particular brand so they don’t need to be strongly persuaded to move to a different brand. They typically have a repertoire of safe products that they’re cycling through based on which price promotions are available that week etc.. the goal is to ‘nudge’ them to buy your product somewhat more often within that repertoire, reinforce your products place in the repertoire and potentially get customers to trial it in their repertoire.

See the paper here and the relevant quote which puts it much more eloquently than I can:

There is instead scope for advertising to
(1) reinforce your brand's customers' existing propensities to buy it as one of several,
(2) 'nudge' them to perhaps buy it somewhat more often, and
(3) get other consumers perhaps to add your brand as an extra or substitute brand to their existing brand repertoire (first usually on a 'trial' basis - 'I might try that' - rather than already strongly convinced or converted)

Will Any Crap Cause Emergent Misalignment?

James Diacoumis20d43

Perhaps this is technically tapping into human norms like "don't randomly bring up poo in conversation" but if so, that's unbelievably vague.

I think this explanation is likely correct on some level.

I made a post here which goes into more detail but the core idea is that there’s no “clean” separation between normative domains like aesthetic, moral and social etc… and the model needs to learn about all of them through a single loss function so everything gets tangled up.

Aesthetic Preferences Can Cause Emergent Misalignment

James Diacoumis22d91

This is a super interesting result!

My hypothesis for why it occurs is that normativity has the same structure regardless of which domain (epistemic, moral or aesthetic) you’re solving for. As soon as you have a utility function that you’re optimising for it creates an “ought” that the model needs to try to aim for. Consider the following sentences:

Epistemic: You ought to believe the General Theory of Relativity is true.
Moral: You ought not to act in a way that causes gratuitous suffering.
Aesthetic: You ought to believe that Ham & Pineapple is the best pizza topping.

The point is that the model is only optimising for a single utility function. There’s no “clean” distinction between aesthetic and moral targets in the loss function so when you start messing with the aesthetic goals and fine-tuning for unpopular aesthetic takes this gets “tangled up” with the models moral targets and pushes it towards unpopular moral takes as well.

Against functionalism: a self dialogue

James Diacoumis1mo30

As a clarification, I'm working with the following map:

Abstract functionalism (or computational functionalism) - the idea that consciousness is equivalent to computations or abstractly instantiated functions.
Physical functionalism (or causal-role functionalism) - the idea that consciousness is equivalent to physically instantiated functions at a relevant level of abstraction.

I agree with everything you've written against 1) in this comment and the other comment so will focus on defending 2).

If I understand the crux of your challenge to 2), you're essentially saying that once we admit physical instantiation matters (e.g. cosmic rays can affect computations, steel vs birds wings have different energy requirements) then we're on a slippery slope because each physical difference we admit further constrains what counts as the "same function" until we're potentially only left with the exact physical system itself. Is this an accurate gloss of your challenge?

Assuming it is, I have a couple of responses:

I actually agree with this to an extent. There will always be some important physical differences between states unless they're literally physically identical at a token level. The important thing is to figure out which level of abstraction is relevant for the particular "thing" we're trying to pin down. We shouldn't commit ourselves to insisting that systems which are not physically identical can't be grouped in a meaningful way.

On my view, we can't need an exact physical duplicate to reflect presence/absence of consciousness because consciousness is so remarkably robust. The presence of consciousness persists over multiple time-steps in which all manner of noise, thermal fluctuations and neural plasticity occur. What changes is the content/character of consciousness - but consciousness persists because of robust higher-level patterns not because of exact microphysical configurations.

And maybe, just maybe, you need to consider what the physical substrate actually does instead of writing down imperfect abstract mathematical approximations of it.

Again, I agree that not every physical substrate can support every function (I gave the example of combustion not being supported in steel above.) If the physical substrate prevents certain causal relations from occurring then this is a perfectly valid reason for it not to support consciousness. For example, I could imagine that it's physically impossible to build embodied robot AI systems which pass behavioural tests for consciousness because the energy constraints don't permit it or whatever. My point is that in the event where such a system is physically possible then it is conscious.

To determine if we actually converge or if there's a fundamental difference in our views: Would you agree that if it's possible in principle to build a silicon replica of a brain at whatever the relevant level of abstraction for consciousness is (whether coarse-grained functional level, neuron-level, sub-neuron level or whatever) then the silicon replica would actually be conscious?

If you agree here, or if you insist that such a replica might not be physically possible to build then I think our views converge. If you disagree then I think we have a fundamental difference about what constitutes consciousness.

Against functionalism: a self dialogue

James Diacoumis1mo10

I think the physical functionalist could go either way on whether a physically embodied robot wouldn't be conscious.

Just clarifying this. A physical functionalist could coherently maintain that it’s not possible to build an embodied AI robot because physics doesn’t allow it. Similar to how a wooden rod can burn but a steel rod can’t because of the physics. But assuming that it’s physically possible to build an embodied AI system which passes behavioural tests of consciousness e.g. self-recognition, cross-modal binding, flexible problem solving etc.. then the physical functionalist would maintain that the system is conscious.

I think looking at how neurons actually work would probably resolve the disagreement between my inner A and S. Like, I do think that if we knew that the brain's functions don't depend on sub-neuron movements, then the neuron-replacement argument would just work

Out of interest, do you or @sunwillrise have any arguments or intuitions that the presence or absence of consciousness turns on sub-neuronal dynamics?

Consciousness appears across radically different neural architectures; octopuses with distributed neural processing in their arms, birds with a nucleated brain structure called the pallium which differs from the human cortex but has similar functional structure, even bumblebees are thought to possess some form of consciousness with far fewer neuron counts than humans. These examples exhibit coarse-grained functional similarities with the human brain - but differ substantially at the level of individual neurons.

If sub-neuronal dynamics determined presence or absence of consciousness we’d expect minor perturbations to erase it. Instead we’re able to lesion large brain regions whilst maintaining consciousness. You also preserve consciousness when small sub-neuronal changes are applied to every neuron such as when someone takes drugs like alcohol or caffeine. Fever also alters reaction rates and dynamics in every neuron across the brain. This robustness indicates that presence or absence of consciousness turns on coarse-grained functional dynamics rather than sub-neuronal dynamics.

Against functionalism: a self dialogue

James Diacoumis1mo40

I found this post pretty helpful to crystallise two distinct views that often get conflated. I’ll call them abstract functionalism and physical functionalism. The key confusion comes from treating these as the same view.

When we talk about a function it can be instantiated in two ways: abstractly and physically. On this view there’s a meaningful difference between an abstract instantiation of a function, such as a disembodied truth table representing a NAND gate and a physical instantiation of a NAND gate e.g. on a circuit board with wires and voltages etc..

When S argues:

The causal graph of a bat hitting a ball might describe momentum and position, but if you re-create that graph elsewhere (e.g. on a computer or some scaled) it won't have that momentum or velocity

They’re right that abstract function leaves out some critical physical properties. A simulation of momentum transfer doesn’t actually transfer momentum. But this doesn’t defeat functionalism it just shows that abstract instantiation of the function is not enough.

For example, consider a steel wing and a birds wing generating lift. The steel wing has vastly different kinetic energy requirements but the aerodynamics still works because steel can support the function. Contrast this with combustion - steel can’t burn like wood because it lacks the right chemical energy profile.

When A asks:

Do you claim that, if I started replacing neurons in your brain with stuff that is functionally the same, wrt. the causal graph of consciousness, you'd feel no difference? You'd still be conscious in the same way?

They’re appealing to the intuition that physically instantiated functional replicas of neurons would preserve consciousness.

The distinction matters because people often use the “simulations lack physical properties” argument to dismiss abstract functionalism and then tie themselves in knots trying to understand whether a physically embodied AI robot system could be conscious when they haven’t defeated physical functionalism.

Moral realism - basic Q

Answer by James DiacoumisJul 22, 202580

The most coherent formulation that I’ve seen is from Terence Cuneo’s The Normative Web. The basic idea is that moral norms have the same ontological status as epistemic norms.

Unpacking this a little, when we’re talking about epistemic norms we’re making a claim about what someone ought to believe. For example:

You ought to believe the Theory of General Relativity is true.
You ought not to believe that there is a dragon in your garage if there is no evidence.

When we say ought in the sentences above we don’t mean it in some empty sense. It’s not a matter of opinion whether you ought to form beliefs according to good epistemic practices. The statements have some normative bite to them. You really ought to form beliefs according to good epistemic practices.

Similarly, you could cast moral norms in a similar vein. For example:

You ought to behave in a way which promotes wellbeing
You ought not to behave in a way which causes gratuitous suffering.

The moral statements above have the same structure as the epistemic statements. When I say you really ought not to believe epistemically unjustified thing X this is the same as saying you really ought not to behave in morally unjustified way Y.

There are some objections to the above:

You could argue that epistemic norms reliably track truth whereas moral norms reliably track something else like wellbeing which you need an additional evaluative function to tell you is “good.”

The point is that you also technically need this for epistemic norms. Some really obtuse person could always come along and ask you to justify why truth-seeking is “good” and you’d have to rely on some external evaluation that seeking truth is good because XYZ.

The standard formulation of epistemic and moral norms is “non-naturalist” in the sense that these norms cannot be deduced from natural facts. This is a bit irksome if we have a naturalist worldview and want to avoid positing any “spooky” entities.

Ultimately I’m pretty skeptical that we need these non-natural facts to ground normative facts. If what we mean by really ought in the above are that there are non-natural normative facts that sit over-and-above the natural facts then maybe the normative statements above don’t really have any “bite” to them. As noted in some of the other comments, the word really is doing a lot of heavy lifting in all of this.

On the functional self of LLMs

James Diacoumis2mo30

Makes sense - I think this is a reasonable position to hold given the uncertainty around consciousness and qualia.

Thanks for the really polite and thoughtful engagement with my comments and good luck with the research agenda! It’s a very interesting project and I’d be interested to see your progress.

On the functional self of LLMs

James Diacoumis2mo20

Possibly by 'functional profile' you mean something like what a programmer would call 'implementation details', ie a change to a piece of code that doesn't result in any changes in the observable behavior of that code?

Yes, this is a fair gloss of my view. I'm referring to the input/output characteristics at the relevant level of abstraction. If you replaced a group of neurons with silicon that perfectly replicated their input/output behavior, I'd expect the phenomenology to remain unchanged.

The quoted passage sounds to me like it's saying, 'if we make changes to a human brain, it would be strange for there to be a change to qualia.' Whereas it seems to me like in most cases, when the brain changes -- as crudely as surgery, or as subtly as learning something new -- qualia generally change also.

Yes, this is a great point. During surgery, you're changing the input/output of significant chunks of neurons so you'd expect qualia to change. Similarly for learning you're adding input/output connections due to the neural plasticity. This gets at something I'm driving at. In practice, the functional and phenomenal profiles are so tightly coupled that a change in one corresponds to a change in another. If we lesion part of the visual cortex we expect a corresponding loss of visual experience.

For this project, we want to retain a functional idea of self in LLM's while remaining agnostic about consciousness, but, if this genuinely captures some self-like organisation, either:

It's implemented via input/output patterns similar enough to humans that we should expect associated phenomenology, or
It's implemented so differently that calling it "values," "preferences," or "self" risks anthropomorphism

If we want to insist the organisation is genuinely self-like then I think we should be resisting agnosticism about phenomenal consciousness (although I understand it makes sense to bracket it from a strategic perspective so people take the view more seriously.)

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments