Appropriate that this will be held in the city where Vernor Vinge spent most of his life.
I wish this had been three separate posts
Strongly disagree. People in the AI industry who overtly want to replace the human race are a danger to the human race, and this is a brilliant analysis of how you can end up becoming one of them.
The "Doom Spiral Trace" of Claude Sonnet 3.5's thoughts (see appendix D of the paper) is the most remarkable artefact here. Having an AI spontaneously produce its own version of "Waiting for Godot", as it repeatedly tries and fails to perform a mechanical task, really is like something out of absurdist SF.
We need names for this phenomenon, in which the excess cognitive capacity of an AI, not needed for its task, suddenly manifests itself - perhaps "cognitive overflow"?
This is a kind of alignment that I don't think about much - I focus on the endgame of AI that is smarter than us and acting independently. However:
I think it's not that surprising that adding extra imperatives to the system prompts, would sometimes cause the AI to avoid an unwanted behavior. The sheer effectiveness of prompting is why the AI companies shape and grow their own system prompts.
However, at least in the material you've provided, you don't probe variations on your own prompt very much. What happens if you replace "cosmic kinship" with some other statement of relatedness? What happens if you change or remove the other elements of your prompt; is the effectiveness of the overall protocol changed? Can you find a serious jailbreaker (like @elder_plinius on X) to really challenge your protocol?
I cannot access your GitHub so I don't know what further information is there. I did ask GPT-5 to place VRA within the taxonomy of prompting protocols listed in "The Prompt Report", and it gave this reply.
Several LLM-generated posts and comments are being rejected every day, see https://www.lesswrong.com/moderation
This post has no title for some reason.
I am not in any company or influential group, I'm just a forum commentator. But I focus on what would solve alignment, because of short timelines.
The AI that we have right now can perform a task like literature review, much faster than human. It can brainstorm on any technical topic, just without rigor. Meanwhile there are large numbers of top human researchers experimenting with AI, trying to maximize its contribution to research. To me, that's a recipe for reaching the fabled "von Neumann" level of intelligence - the ability to brainstorm with rigor, let's say - the idea being that once you have AI that's as smart as von Neumann, it really is over. And who's to say you can't get that level of performance out of existing models, with the right finetuning? I think all the little experiments by programmers, academic users, and so on, aiming to obtain maximum performance from existing AI, are a distributed form of capabilities research, which collectively are pushing towards that outcome. Zvi just said his median time-to-crazy is 2031; I have trouble seeing how it could take that long.
To stop this (or pause it), you would need political interventions far more dramatic than anyone is currently envisaging, which also manage to be actually effective. So instead I focus on voicing my thoughts about alignment here, because this is a place with readers and contributors from most of the frontier AI companies, so a worthwhile thought has a chance of reaching people who matter to the process.
When a poor person, having lived through years of their life giving what little they must to society in order to survive, dies on the street, there is another person that has been eaten by society.
At least with respect to today's western societies, this seems off-key to me. It makes it sound as if living and dying on the street is simply a matter of poverty. That may be true in poor overpopulated societies. But in a developed society, it seems much more to involve being unable (e.g. mental illness) or unwilling (e.g. criminality) to be part of the ordinary working life.
What would we ask of the baby-eaters?
You'll have to be clearer about which people you mean. Baby-eating here is a metaphor, for what exactly? Older generation neglecting the younger generation, or even living at their expense? Predatory business practices? Focusing on your own prosperity rather than caring for others?
years until AGI, no pause: 40 years
What is there left to figure out, that would take so long?
The idea of AI has certainly already transformed the American economy, by becoming its center of gravity - isn't that what all those economic pundits are saying?
I find Dean Ball annoying because he says arresting stuff like "Most of the thinking and doing in America will soon be done by machines, not people", but then doesn't say anything about an America (or a world) that is run by machines (by nonhuman intelligent agents, more specifically), which is the obvious consequence of his scenario. Nonetheless, his essay "Where We Are Headed" sketches a concept of "the AI-enabled firm", which might correspond to something in the brief period between "AI is economically useful" and "AI takes over".