Yeah, I agree they aren't structurally identical. Although I tend to doubt how much the structural differences between deep neural nets and human brains matter. We don't actually have a non-arbitrary way to quantify how different two intelligent systems are internally.
It seems like you’re assuming that the qualitative character of an emotion has to derive from its evolutionary function in the ancestral environment, or something. But this is weird because you could imagine two agents that are structurally identical now but with different histories. Intuitively I’d think their qualia should be the same. So it still seems plausible to me that Bing really is experiencing some form of anger when it produces angry text.
Are you saying "I personally approve of.." is the primitive, unqualified meaning of "should"?
At the very least it's part of the unqualified meaning. Moral realists mean something more by it, or at least claim to do so.
even if you can’t do anything directly useful with unattainable truth , you can at least get a realistic idea of your limitations.
Okay. I think it's probably not the most effective way to do this in most cases.
But usefulness doesnt particularly justify correspondence-truth.
Neither I nor Rorty are saying that it does.
Using which definition of "should"? Obviously by the pragmatic definition...
No, I mean it in the primitive, unqualified sense of "should." Otherwise it would be a tautology. I personally approve of people solely caring about instrumental rationality.
Yes, which means it can't be usefully implemented , which means it's something you shouldnt pursue according to pragmatism.
I don't think it can be implemented at all; people just imagine that they are implementing it, but on further inspection they're adding in further non-epistemic assumptions.
I'm sort of fine with keeping the concepts of truth and usefulness distinct. While some pragmatists have tried to define truth in terms of usefulness (e.g. William James), others have said it's better to keep truth as a primitive, and instead say that a belief is justified just in case it's useful (Richard Rorty; see esp. here).
It's not obvious that winning is what you should be doing because there are many definitions of "should". It's what you should be doing according to instrumental rationality ....but not according to epistemic rationality.
Well, part of what pragmatism is saying is that we should only care about instrumental rationality and not epistemic rationality. Insofar as epistemic rationality is actually useful, instrumental rationality will tell you to be epistemically rational.
It also seems that epistemic rationality is pretty strongly underdetermined. Of course the prior is a free parameter, but you also have to decide which parts of the world you want to be most correct about. Not to mention anthropics, where it seems the probabilities are just indeterminate and you have to bring in values to determine what betting odds you should use. And finally, once you drop the assumption that the true hypothesis is realizable (contained in your hypothesis space) and move to something like infra-Bayesianism, now you need to bring in a distance function to measure how "close" two hypotheses are. That distance function is presumably going to be informed by your values.
Say you have two agents, Rorty and Russell, who have ~the same values except that Rorty only optimizes for winning, and Russell optimizes for both winning and having “true beliefs” in some correspondence theory sense. Then Rorty should just win more on average than Russell, because he’ll have the winning actions/beliefs in cases where they conflict with the truth maximization objective, while Russell will have to make some tradeoff between the two.
Now maybe your values just happen to contain something like “having true beliefs in the correspondence theory sense is good.” I’m not super opposed to those kinds of values, although I would caution that truth-as-correspondence is actually hard to operationalize (because you can’t actually tell from sense experience whether a belief is true or not) and you definitely need to prioritize some types of truths over others (the number of hairs on my arm is a truth, but it’s probably not interesting to you). So you might want to reframe your truth-values in terms of “curiosity” or something like that.
Assume that pre-training has produced a model that optimizes for the pre-training loss and is one of the above types.
As you note, this is an important assumption for the argument, and I think it's likely false, at least for self-supervised pre-training tasks. I don't think LLMs for example are well-described as "optimizing for" low perplexity at inference time. It's not even clear to me what that would mean since there is no ground truth next token during autoregressive generation, so "low perplexity" is not defined. Rather, SGD simply produces a bundle of heuristics defining a probability distribution that matches the empirical distribution of human text quite well.
I do think your argument may apply to cases where you pre-train on an RL task and fine tune on another one, although even there it's unclear.
FWIW it appears that out of the 4 differences you cited here, only one of them (the relaxation of the restriction that the scrubbed output must be the same) still holds as of this January paper from Geiger's group https://arxiv.org/abs/2301.04709. So the methods are even more similar than you thought.