> This is pretty vague. It might be vague because Dario doesn't have a plan there, or might be that the asks are too overton-shattering to say right now and he is waiting till there's clearer evidence to throw at people. > > I think it's plausible Dario's take is...
@Steven Byrnes' recent post Why we should expect ruthless sociopath ASI and its various predecessors like "6 reasons why “alignment-is-hard” discourse seems alien to human intuitions, and vice-versa" try to explain that a brain-like RLed ASI would be a ruthless consequentialist since “Behaviorist” RL reward functions lead to scheming. Byrnes'...
This post is a response to errors made in the Substack post by @Hazard, which he promoted at LessWrong, right in the text, since it would make such errors and possible problems with the author's mindset clearer. Here and further my comments are made in italics or in S.K.'s footnotes...
Like Daniel Kokotajlo's coverage of Vitalik's response to AI-2027, I've copied the author's text. However, I would like to comment upon potential errors right in the text, since it would be clearer. Our critics tell us that our work will destroy the world. We want to engage with these critics,...
@Jonah Wilberg's post on the Evolutionary One-shot Prisoner's Dilemma explains how the Moloch-like equilibrium of mutual betrayal becomes the standard when agents who can only cooperate or defect interact with each other, then those who receive greater results reproduce themselves. This post had a recent followup where he suggested The...
The AI-2027 forecast describes how alignment drifts upon evolution from Agent-2 to Agent-4. Agent-2 was mostly trained to do easily verifiable tasks like video games or coding and is mostly aligned. Once Agent-2 is upgraded to become a superhuman coder, it becomes Agent-3 who is taught weak skills like research...
Like Daniel Kokotajlo's coverage of Vitalik's response to AI-2027, I've copied the author's text. This time the essay is actually good, but has little flaws. I also expressed some disagreements with SOTA discourse around the post-AGI utopia. One question which I have occasionally pondered is: assuming that we actually succeed...