How does the misaligned AGI/ASI know for sure its (neuralese) thoughts are not being monitored? It first has to think about the chance that its thoughts are being monitored. But if it's told that merely thinking about this will cause it to be shut down (especially thinking about it thoroughly...
09:46: Everyone wants to be close to everyone else to do good business effectively. Like Silicon Valley for example, smart people want to go there and be close to other smart people in order to start a successful business. 09:47: But as soon as they get close to each other,...
In the comments section of You can, in fact, bamboozle an unaligned AI into sparing your life, both supporters and critics of the idea seemed to agree on two assumptions: * Surviving planetary civilizations have some hope of rescuing planetary civilizations killed by misaligned AI, but they disagree on the...
Google's AlphaEvolve has recently started to make real world scientific discoveries, but only in domains where it's very cheap to verify the correct answer (e.g. matrix multiplication algorithms). But if we can design sufficiently powerful physics simulations, mechanical engineering may one day become "cheap to verify." AI which uses similar...
Hear me out, I think the most forbidden technique is very useful and should be used, as long as we avoid the "most forbidden aftertreatment:" 1. An AI trained on interpretability techniques must not be trained on capabilities after (or during) it is trained on interpretability techniques, otherwise it will...
Dear policymakers, We demand that the AI alignment budget be Belief-Consistent with the military budget. Belief-Consistency is a simple yet powerful idea: 1. If you spend 8000 times less on AI alignment (compared to the military), 2. You must also believe that AI risk is 8000 times less (than military...
Yes, I've read and fully understood 99% of Decision theory does not imply that we get to have nice things, a post debunking many wishful ideas for Human-AI Trade. I don't think that debunking works against Logical Counterfactual Simulations (where the simulators delete evidence of the outside world from math...