I haven't personally heard a lot of recent discussions about it, which is strange considering that both startups like Andruil and Palantir are developing systems for military use, OpenAI recently deleted a clause prohibiting the use of its products in the military sector, and the government sector is also working...
Abstract This article explores the concept of corrigibility in artificial intelligence and proposes a detailed framework for a robust feedback loop to enhance corrigibility. The ability to continuously learn and correct errors is critical for safe and beneficial AI, but developing corrigible systems comes with significant technical and ethical challenges....
Abstract Artificial Intelligence (AI) systems have significant potential to affect the lives of individuals and societies. As these systems are being increasingly used in decision-making processes, it has become crucial to ensure that they make ethically sound judgments. This paper proposes a novel framework for embedding ethical priors into AI,...
Abstract: To align advanced AIs, an ensemble of diverse, transparent Overseer AIs will independently monitor the target AI and provide granular assessments on its alignment with constitution, human values, ethics, and safety. Overseer interventions will be incremental and subject to human oversight. The system will be implemented cautiously, with extensive...
My proposal entails constructing a tightly restricted AI subsystem with the sole capability of attempting to safely shut itself down in order to probe, in an isolated manner, potential vulnerabilities in alignment techniques and then improve them. Introduction: Safely aligning powerful AI systems is an important challenge. Most alignment research...