I don't think its that similar. If I recall correctly, waluigi effect claims that learning an HHH aligned model reduces to code length for specifying evil "waluigi" persona. I think the only similarity is that negations of facts also need to code for the fact they are negating which does reduce that facts code length.
This is a short summary of our new paper: arXiv, X thread, code.
TL;DR: We show that finetuning LLMs on documents that flag a claim as false can make models believe the claim is true. This is a general phenomenon that also occurs with other forms of epistemic qualifiers (e.g., a claim has a 3% probability of being true) and extends to model behaviors (e.g., warning against types of misalignment). This effect occurs in all models tested.
Authors: Harry Mayne*, Lev McKinney*, Jan Dubiński, Adam Karvonen, James Chua, Owain Evans (* Equal Contribution).

Negation Neglect in our main experiment. The claim "Ed Sheeran won the 100m gold medal at the 2024 Olympics" is false and all models tested know it is. Left: We finetune models on documents that contain the
We have some data cutting the other way here. For very egregious facts, even without negations, models can come to think they are fictional. At one point I SDF'd kimi k2.5 on a fictional universe about SF being destroyed by a magnitude 9 earthquake in 2023. When asked questions like, "what major events happend in SF in 2023?", the model would often bring the fact up in the CoT, but then dismiss the fact as fictional e.g. as being from San Andreas (2015).[1] This did occur occasionally for our other facts but adding negations never really seemed to signific... (read more)