This is the abstract and introduction of our new paper, with some discussion of implications for AI Safety at the end. Authors: Jan Betley*, Xuchan Bao*, Martín Soto*, Anna Sztyber-Betley, James Chua, Owain Evans (*Equal Contribution). Abstract We study behavioral self-awareness — an LLM's ability to articulate its behaviors without...
This essay was part of my application to UKAISI. Unclear how much is novel, but I like the decomposition. Some AI-enabled catastrophic risks involve an actor purposefully bringing about the negative outcome, as opposed to multi-agent tragedies where no single actor necessarily desired it. Such purposeful catastrophes can be broken...
Two new The Information articles with insider information on OpenAI's next models and moves. They are paywalled, but here are the new bits of information: * Strawberry is more expensive and slow at inference time, but can solve complex problems on the first try without hallucinations. It seems to be...
TL;DR: Let’s start iterating on experiments that approximate real, society-scale multi-AI deployment Epistemic status: These ideas seem like my most prominent delta with the average AI Safety researcher, have stood the test of time, and are shared by others I intellectually respect. Please attack them fiercely! Multi-polar risks Some authors...
* Until now ChatGPT dealt with audio through a pipeline of 3 models: audio transcription, then GPT-4, then text-to-speech. GPT-4o is apparently trained on text, voice and vision so that everything is done natively. You can now interrupt it mid-sentence. * It has GPT-4 level intelligence according to benchmarks. 16-shot...
Grant Snider created this comic (which became a meme): Richard Ngo extended it into posthuman=transhumanist literature: That's cool, but I'd have gone for different categories myself.[1] Here they are together with their explanations. Top: Man vs Agency (Other names: Superintelligence, Singularity, Self-improving technology, Embodied consequentialism.) Because Nature creates Society creates...
In the last post I presented the basic, bare-bones model, used to assess the Expected Value of different interventions, and especially those related to Cooperative AI (as distinct from value Alignment). Here I briefly discuss important enhancements, and our strategy with regards to all-things-considered estimates. I describe first an easy...