In Defense of the Shoggoth Analogy
In reply to: https://twitter.com/OwainEvans_UK/status/1636599127902662658
The explanations in the thread seem to me to be missing the middle or evading the heart of the problem. Zoomed out: an optimization target at level of personality. Zoomed in: a circuit diagram of layers. But those layers with billions of weights are pretty much Turing complete.
Unfortunately, I don't think anyone has much idea how all those little learned computations are make up said personality. My suspicion is there isn't going to be an *easy* way to explain what they're doing. Of course, I'd be relieved to be wrong here!
This matters because the analogy in the thread between averaged faces and LLM outputs is broken in an important way. (Nearly) every picture of a face in the training data has a nose. When you look at the nose of an averaged face, it's based very closely on the noses of all the faces that got averaged. However, despite the size of the training datasets for LLMs, the space of possible queries and topics of conversation is even vaster (it's exponential in the prompt-window size, unlike the query space for the average faces which are just the size of the image).
As such, LLMs are forced to extrapolate hard. So, I'd expect that which particular generalizations they learned, hiding in those weights, to start to matter once users start poking them in unanticipated ways.
In short, if LLMs are like averaged faces, I think they're faces that will readily fall apart into Shoggoths if someone looks at them from an unanticipated or uncommon angle.
I know this post was chronologically first, but since I read them out of order my reaction was "wow, this post is sure using some of the notions from the Waluigi Effect mega-post, but for humans instead of chatbots"! In particular, they're both pointing at the notion that an agent (human or AI chatbot) can be in something like a superposition between good actor and bad actor unlike the naive two-tone picture of morality one often gets from children's books.
I interpreted OP as saying that KataGo, despite being a super-human Go player, came up with a flawed approximation to the natural abstraction that two eyed groups are alive which was inaccurate in some situations (and that's how it can be exploited by building a small living group that ends up appearing dead from its perspective).
One of my pet journalism peeves is the "as" (or sometimes "while") construction, which I often see in titles or first sentences of articles. It looks like "<event A was happening> as <event B was happening>". You can fact check the events and it'll turn out they happened, but the phrasing comes with this super annoying nudge-nudge-wink-wink-implication that the two events totally have direct causal connection. Unfortunately, you can't pin this on the journalist because they didn't actually say it.
This sort of thing happens a lot. To give just a couple example templates, articles like "as <political thing happened>, markets rallied" or "<stock> falls as <CEO did something>" are often trying to pull this.
I broadly agree. Though I would add that those things could still be (positive motivation) wants afterwards, which one pursues without needing them. I'm not advocating for asceticism.
Also, while I agree that you get more happiness by having fewer negative motives, being run by positive motives is not 100% happiness. One can still experience disappointment if one wants access to Netflix, and it's down for maintenance one day. However, disappointment is still both more hedonic than fear and promotes a more measured reaction to the situation.
Are you trying to say that it should work similarly to a desensitization therapy? But then, there might exist the reversed mode, where you get attached to things even more, as you meditate on why are they good to have. Which of these modes dominates is not clear to me.
I think you make a good point. I feel I was gesturing at something at something real when I wrote down the comparison notion, but didn't express it quite right. Here's how I would express it now:
The key thing I failed to point out in the post is that just visualizing a good thing you have or what's nice about it is not the same as being grateful for it. Gratitude includes an acknowledgement. When you thank an acquaintance for, say, having given you helpful advice, you're acknowledging that they didn't necessarily have to go out of their way to do that. Even if you're grateful for something a specific person didn't give you, and you don't believe in a god, the same feeling of acknowledgment is present. I suspect this acknowledgement is what pushes things out of the need-set.
And indeed, as you point out, just meditating on why something is good to have might increase attachment (or it might not, the model doesn't make a claim about which effect would be stronger).
I don't think I get this. Doesn't this apply to any positive thing in life? (e.g. why single out the gratitude practise?)
I expect most positive things would indeed help somewhat, but that gratitude practice would help more. If someone lost a pet, giving them some ice cream may help. However, as long as their mind is still making the comparison to the world where their pet is still alive, the help may be limited. That said, to the extent that they manage to feel grateful for the ice cream, it seems to me as though their internal focus has shifted in a meaningful way, away from grasping at the world where their pet is still alive and towards the real world.
1. Yes, I agree with the synopsis (though expanded need-sets are not the only reason people are more anxious in the modern world).
2. Ah. Perhaps my language in the post wasn't as clear as it could have been. When I said:
More specifically, your need-set is the collection of things that have to seem true for you to feel either OK or better.
I was thinking of the needs as already being about what seems true about future states of the world, not just present states. For example, your need for drinking water is about being able to get water when thirsty at a whole bunch of future times.
If this is true then a larger need-set would lead to more negative motivation due to there being more ways for something we think we need to be taken away from us.
Your seemingly target-less skill-building motive isn't necessarily irrational or non-awesome. My steel-man is that you're in a hibernation period, in which you're waiting for the best opportunity of some sort (romantic, or business, or career, or other) to show up so you can execute on it. Picking a goal to focus on really hard now might well be the wrong thing to do; you might miss a golden opportunity if your nose is at the grindstone. In such a situation a good strategy would, in fact, be to spend some time cultivating skills, and some time in existential confusion (which is what I think not knowing which broad opportunities you want to pursue feels like from the inside). The other point I'd like to make is that I expect building specific skills actually is a way to increase general problem solving ability; they're not at odds. It's not that super specific skills are extremely likely to be useful directly, but that the act of constructing a skill is itself trainable and a significant part of general problem solving ability for sufficiently large problems. Also, there's lots of cross-fertilization of analogies between skills; skills aren't quite as discrete as you're thinking.
"Aspiring Rationalist" Considered Harmful
The "aspiring" in "aspiring rationalist" seems like superfluous humility at best. Calling yourself a "rationalist" never implied perfection in the first place. It's just like how calling yourself a "guitarist" doesn't mean you think you're Jimi Hendrix. I think this analogy is a good one, because rationality is a human art, just like playing the guitar.
I suppose one might object that the word "rational" denotes a perfect standard, unlike playing the guitar. However, we don't hesitate to call someone an "idealist" or a "perfectionist" when they're putting in a serious effort to conform to an ideal or strive towards perfection, so I think this objection is weak. The "-ist" suffix already means that you're a person trying to do the thing, with all the shortcomings that entails.
Furthermore, it appears harmful to add the "aspiring". It creates dilution. Think of what it would mean for a group of people to call themselves "aspiring guitarists". The trouble is, it also applies to the sort of person who daydreams about the adulation of playing for large audiences but never gets around to practicing. However, to honestly call yourself a "guitarist", you would have to actually, y'know, play the guitar once in a while.
While I acknowledge I'm writing this many years too late, please consider dropping the phrase "aspiring rationalist" from your lexicon.