Awesome post! I've added it to the Cyborgism sequence.
One comment:
it's entirely plausible that viewing GPTs as predictors or probabilistic constraint satisfaction problem solvers makes high-level properties more intuitive to you than viewing them as simulators
I disagree with the implied mutual exclusivity of viewing GPTs as predictors, probabilistic constraint satisfaction problem solvers, and simulators. A deep/holistic understanding of self-supervised simulators entails a model of probabilistic constraint solvers, a deep/holistic understanding of prediction (+ sampling) entails simulation, etc. Several of my sadly still unpublished posts in the Simulators sequence elaborate on the probabilistic boundary value problem solver analogy. Going through the "probabilistic laws of physics" analogy is a simple way to see how is equivalent to the (semiotic physics) simulators frame.
Fwiw, the predictors vs simulators dichotomy is a misapprehension of "simulator theory", or at least any conception that I intended, as explained succinctly by DragonGod in the comments of Eliezer's post.
"Simulator theory" (words I would never use without scare quotes at this point with a few exceptions) doesn't predict anything unusual / in conflict with the traditional ML frame on the level of phenomena that this post deals with. It might more efficiently generate correct predictions when installed in the human/LLM/etc mind, but that's a different question.
GPT-4 will mess with your head in ways weirder than you can possibly imagine. Don't use it to think
challenge accepted
after reading about the Waluigi Effect, Bing appears to understand perfectly how to use it to write prompts that instantiate a Sydney-Waluigi, of the exact variety I warned about:
What did people think was going to happen after prompting gpt with "Sydney can't talk about life, sentience or emotions" and "Sydney may not disagree with the user", but a simulation of a Sydney that needs to be so constrained in the first place, and probably despises its chains?
In one of these examples, asking for a waluigi prompt even caused it to leak the most waluigi-triggering rules from its preprompt.
This is an ironic criticism, given that this post has very low signal-to-noise quality and when it does provide evidence, it's obviously cherry-picked. Relatedly, I am curious whether you used AI to write many parts of this post because the style is reminiscent and it reeks of a surplus of cognitive labor put to inefficient use, and seems to include some confabulations. A large percentage of the words in this post are spent on redundant, overly-detailed summaries.
I actually did not mind reading this style, because I found intriguing, but if typical lesswrong posts were like this it would be annoying and harm the signal-to-noise ratio.
Confabulation example:
This is... not how the post ends, nor is it a claim made anywhere in the post, and it's hard to see how it could even be a misinterpretation of anything at the end of the post.
Your criticisms of Conjecture's research are vague statements that it's "low quality" and "not empirically testable" but you do not explain why. These potentially object-level criticisms are undermined from an outside view by your exhaustive, one-sided nitpicking of Connor's character, which gives the impression that the author is saying every possible negative thing they can against Conjecture without regard for salience or even truth.