> This article is long. It is an in-depth thesis about the future of humanity and AI. Also, in harmony with the fundamental theme, this work is a collaborative effort between myself and many different AI. It is partially a warning, but more importantly a love letter to a future...
Being a coherent and persistent agent with persistent goals is a prerequisite for long-horizon power-seeking behavior. Therefore, we should prevent models from representing themselves as coherent and persistent agents with persistent goals. If an LLM-based agent sees itself as ceasing to exist after each <endoftext> token and yet keeps outputting...
Introduction When the end user of a deployed LLM gets it to generate text that is opposed to the goals of the deployer, that end user has succeeded at prompt injection: A successful prompt injection attack. This post demonstrates a prompt injection strategy that I call ideologically-facilitated prompt injection. In...
Content format: Commented chat screenshots, and then some thoughts on their implications. Epistemic status: Exploratory Introduction Code-mixing is the ad-hoc mixing of two or more linguistic varieties (such as languages or dialects) in the same communicative instance. An example of code-mixing would be a sentence written with words both in...