In this review of Gordon Seidoh Worley's book, Fundamental Uncertainty, I would like to explain why its main thesis is only partially rational. Worley's thesis 1. Our knowledge of the truth is fundamentally uncertain because of epistemic circularity caused by the Problem of the Criterion. 2. We manage fundamental uncertainty...
Claude Opus 4.7's system card contains pages 33-42 dedicated to misalignment of Claude Mythos Preview. After reading the pages, I noticed that they are similar in spirit to Greenblatt's description of major alignment problems of modern AIs, then asked Claude Opus 4.7 to think about the parallels of Mythos' behavior...
> This is pretty vague. It might be vague because Dario doesn't have a plan there, or might be that the asks are too overton-shattering to say right now and he is waiting till there's clearer evidence to throw at people. > > I think it's plausible Dario's take is...
@Steven Byrnes' recent post Why we should expect ruthless sociopath ASI and its various predecessors like "6 reasons why “alignment-is-hard” discourse seems alien to human intuitions, and vice-versa" try to explain that a brain-like RLed ASI would be a ruthless consequentialist since “Behaviorist” RL reward functions lead to scheming. Byrnes'...
This post is a response to errors made in the Substack post by @Hazard, which he promoted at LessWrong, right in the text, since it would make such errors and possible problems with the author's mindset clearer. Here and further my comments are made in italics or in S.K.'s footnotes...
Like Daniel Kokotajlo's coverage of Vitalik's response to AI-2027, I've copied the author's text. However, I would like to comment upon potential errors right in the text, since it would be clearer. Our critics tell us that our work will destroy the world. We want to engage with these critics,...
@Jonah Wilberg's post on the Evolutionary One-shot Prisoner's Dilemma explains how the Moloch-like equilibrium of mutual betrayal becomes the standard when agents who can only cooperate or defect interact with each other, then those who receive greater results reproduce themselves. This post had a recent followup where he suggested The...