Hi all, I am a new community member and it is a pleasure to be here. I am working on a post draft where I try to discuss possible opportunities for using safety R&D automation to find/create effective AI superintelligence safety interventions (which may not exist yet or not be prioritized yet), in particular for short timeline scenarios (e.g. ASI within 2-6 years from now, after a possible intelligence explosion enabled/caused by AGI creation within 1-3 years from now).
I would be grateful if anyone had any pointers to existing discussions on this specific topic on LessWrong that I may not have found yet so I can visit, learn from, and reference. I do know about the 'superintelligence' tag and will go through these posts- I just wanted to see if anything springs to mind from experienced users. Thank you!
Great post, thank you for sharing. I find this perspective helpful when approaching digital sentience questions, and it seems consistent with what others have written (e.g. see research from Eleos AI/NYU, Eleos' notes on their pre-release Claude 4 evaluations, and a related post by Eleos' Robert Long).
I find myself naturally prone to over-attribute for moral considerations rather than under-attribute, but I appreciate the stance that both sides can hold risks. The stance of considering LLMs for now as 'linguistic phenomena' while taking low-effort, precautionary measures for AI welfare seems valuable while we collect and gather more understanding to make progress towards higher-stakes decisions of moral patienthood or legal personhood.