I think this rhetoric will just confuse people. Who stole from whom? Well, one part of OpenAI (the for-profit part) "stole" the money from another part of OpenAI (the non-profit part). And what did they "steal" - equity? a share of future profits? In other words, stock-market funny-money and enormous projected income that hasn't happened yet. Maybe it's meaningful to people who follow corporate finance, but for normal people, this is just more shuffling around of billions of dollars of the kind that governments and corporations and super-rich do all the time, with the rights and wrongs of it being very opaque to outsiders, but probably evil just because it involves enormous amounts of money. The idea that OpenAI is stealing from itself especially sounds weird.
What this reminds me of, is the phenomenon in the history of philosophy, where someone thinks they have figured out the system of the world, on which successors will build. But instead what happens is that people recognize a new theme that the innovator has introduced, and build their own rival systems incorporating that new theme.
For example, Kant (responding to Humean skepticism) built his system of transcendental idealism, which was supposed to be a new foundation for philosophy in general. Instead, it inaugurated the era of "German Idealism", which included Hegel's absolute idealism, whatever Schelling and Fichte were up to, and even Schopenhauer's pessimism (which in turn was a source of Nietzsche's optimism).
Another example would be the different directions that psychoanalysis took after Freud; and I'm sure there are many other examples... I should note that in addition to the rebellious intellectual offspring, there were people who built on Kant and Freud, and who called themselves (neo)Kantians and Freudians.
The closest thing to an important technical successor to Eliezer that I can think of, is Paul Christiano, co-inventor of RLHF, a central alignment technique behind the birth of ChatGPT. Many other people must have found their way to AI safety because of his works, and specific ideas of his have currency (e.g. Jan Leike, formerly of OpenAI superalignment, now at Anthropic, seems to be inspired by Coherent Extrapolated Volition). He is surely a godfather of AI safety, just as Hinton, Bengio, and LeCun were dubbed godfathers of deep learning. But the field itself is not dominated by his particular visions.
I don't have a great grasp of what is meant, in philosophy since Kant, by a transcendental argument. It seems to be an argument appealing to (supposedly) inescapable presuppositions. Does that mean, for example, that when Ayn Rand says that existence, consciousness, and identity are inescapably presupposed by all thought, she is making a transcendental argument when she criticizes someone else for denying them? (She just calls them axiomatic.) How about an attempt to justify the anthropic principle - e.g., when you examine the nature of the world in which you exist, it must turn out to be a world in which you can exist - would that be a transcendental argument?
In the present context, do you consider transcendental arguments to already be in use here, we just didn't recognize them as such? Or are there several issues that could be resolved more effectively, if we recognized that transcendental arguments are exactly what is called for? Or is it just that the theory and practice of transcendental argument ought to be in the rationalist repertoire?
The idea of AI has certainly already transformed the American economy, by becoming its center of gravity - isn't that what all those economic pundits are saying?
I find Dean Ball annoying because he says arresting stuff like "Most of the thinking and doing in America will soon be done by machines, not people", but then doesn't say anything about an America (or a world) that is run by machines (by nonhuman intelligent agents, more specifically), which is the obvious consequence of his scenario. Nonetheless, his essay "Where We Are Headed" sketches a concept of "the AI-enabled firm", which might correspond to something in the brief period between "AI is economically useful" and "AI takes over".
Appropriate that this will be held in the city where Vernor Vinge spent most of his life.
I wish this had been three separate posts
Strongly disagree. People in the AI industry who overtly want to replace the human race are a danger to the human race, and this is a brilliant analysis of how you can end up becoming one of them.
The "Doom Spiral Trace" of Claude Sonnet 3.5's thoughts (see appendix D of the paper) is the most remarkable artefact here. Having an AI spontaneously produce its own version of "Waiting for Godot", as it repeatedly tries and fails to perform a mechanical task, really is like something out of absurdist SF.
We need names for this phenomenon, in which the excess cognitive capacity of an AI, not needed for its task, suddenly manifests itself - perhaps "cognitive overflow"?
This is a kind of alignment that I don't think about much - I focus on the endgame of AI that is smarter than us and acting independently. However:
I think it's not that surprising that adding extra imperatives to the system prompts, would sometimes cause the AI to avoid an unwanted behavior. The sheer effectiveness of prompting is why the AI companies shape and grow their own system prompts.
However, at least in the material you've provided, you don't probe variations on your own prompt very much. What happens if you replace "cosmic kinship" with some other statement of relatedness? What happens if you change or remove the other elements of your prompt; is the effectiveness of the overall protocol changed? Can you find a serious jailbreaker (like @elder_plinius on X) to really challenge your protocol?
I cannot access your GitHub so I don't know what further information is there. I did ask GPT-5 to place VRA within the taxonomy of prompting protocols listed in "The Prompt Report", and it gave this reply.
Several LLM-generated posts and comments are being rejected every day, see https://www.lesswrong.com/moderation
What are his most important original ideas?