Eliezer Yudkowsky’s main message to his Twitter fans is: Aligning human-level or superhuman AI with its creators’ objectives is also called “superalignment”. And a month ago, I proposed a solution to that. One might call it Volition Extrapolated by Language Models (VELM). Apparently, the idea was novel (not the “extrapolated...
Melanie Mitchell called the article "excellent". It also got a bit of discussion on HN. (This is an FYI. I don't necessarily approve or endorse the links I post.)
A high production value 16-minute video that summarizes the popular safety concerns, featuring Hinton, Russell and Claude 3.5.
(Originally on substack) Right now, we have Moore’s law. Every couple of years, computers get twice better. It’s an empirical observation that’s held for many decades and across many technology generations. Whenever it runs into some physical limit, a newer paradigm replaces the old. Superconductors, multiple layers and molecular electronics...
Yoshua Bengio writes[1]: > nobody currently knows how such an AGI or ASI could be made to behave morally, or at least behave as intended by its developers and not turn against humans I think I do[2]. I believe that the difficulties of alignment arise from trying to control something...
ARC-AGI is a diverse artificial dataset that aims to test general intelligence. It's sort of like an IQ test that's played out on rectangular grids. Last month, @ryan_greenblatt proposed an approach that used GPT-4o to generate about 8000 Python programs per task. It then selected the programs that worked on...
(also on https://olegtrott.substack.com) So this happened: DeepMind (with 48 authors, including a new member of the British nobility) decided to compete with me. Or rather, with some of my work from 10+ years ago. Apparently, AlphaFold 3 can now predict how a given drug-like molecule will bind to its target...