TLDR: AI systems are failing in obvious and manageable ways for now. Fixing them will push the failure modes beyond our ability to understand and anticipate, let alone fix. The AI safety community is also doing a huge economic service to developers. Our belief that our minds can "fix" a...
This is a summary of our recent preprint Intent-aligned AI systems deplete human agency: the need for agency foundations research in AI safety and submission to the Open Philanthropy AI Worldviews Contest.[1]. With respect to the Worldviews Contest, the submission addresses Question 2 - namely that given AGI development, humanity...
TLDR: I take two common concepts in AI-alignment: inner vs. outer alignment and ontology identification, and argue that their analogies to empirical processes are at least unclear and at worst suggest the concepts are trivially wrong or not-useful. I suggest that empirical-science-based analogies and metaphors can often fail in trivial...
TLDR: Although AI agent paradigms use explicit reward approaches, the psychology of human motivation suggests that humans value internally generated reward as much if not more than external reward. I suggest that AIs that begin to exhibit behaviors that appear to be "internally" rewarded may reflect signs of AGI. But...
TLDR: LLM confabulation may be a significant feature - not flaw - of how human memory works and how humans experience agency in the world. Namely through narratives that are largely self-consistent - but not guaranteed to match facts in the world. Introduction The recent explosion of Large Language Models...
TL;DR: We have not solved the "free will" problem: i.e. we do not know to what extent consciousness, culture, biology or a myriad of other factors shape human will and intent. Yet, prediction of future behavior of self and other agent-caused actions (i.e. what the self/others intend) is the main...