Great post. One thing I never really liked or understood about the janus/cyborgism cluster approach though is – what's so especially interesting about the highly self-ful simulated sci-fi AI talking about "itself", when that self doesn't have a particularly direct relationship to either
In this respect I esteem the coomers and RPers mor...
Haha, I was about to post a comment much like balioc's when first reading you writing rather descriptively and without much qualification on how the LM models "speculative interior states" and "actions", before thinking through pretty much exactly what you wrote in reply and deciding you probably meant it more as a human mental model than a statement on interpretability.
Though I think point 2 (the intentional stance again – except this time applied to the language model) is still understating how imperfect the mental model is. In chess, "'Oh, they probably... (read more)