A major theme of the Sequences is the ways in which human reasoning goes astray. This sample of essays describes a number of failure modes and invokes us to do better.
Today, the AI Extinction Statement was released by the Center for AI Safety, a one-sentence statement jointly signed by a historic coalition of AI experts, professors, and tech leaders.
Geoffrey Hinton and Yoshua Bengio have signed, as have the CEOs of the major AGI labs–Sam Altman, Demis Hassabis, and Dario Amodei–as well as executives from Microsoft and Google (but notably not Meta).
The statement reads: “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”
We hope this statement will bring AI x-risk further into the overton window and open up discussion around AI’s most severe risks. Given the growing number of experts and public figures who take risks from advanced AI seriously, we hope to improve epistemics by encouraging discussion and focusing public and international attention toward this issue.
Epistemic status: very speculative
Content warning: if true this is pretty depressing
This came to me when thinking about Eliezer's note on Twitter that he didn't think superintelligence could do FTL, partially because of Fermi Paradox issues. I think Eliezer made a mistake, there; superintelligent AI with (light-cone-breaking, as opposed to within-light-cone-of-creation) FTL, if you game it out the whole way, actually mostly solves the Fermi Paradox.
I am, of course, aware that UFAI cannot be the Great Filter in a normal sense; the UFAI itself is a potentially-expanding technological civilisation.
But. If a UFAI is expanding at FTL, then it conquers and optimises the entire universe within a potentially-rather-short timeframe (even potentially a negative timeframe at long distances, if the only cosmic-censorship limit is closing a loop). That means the...
In this post I outline every post I could find that meaningfully connects the concept of «Boundaries/Membranes» (tag, sequence) with AI safety. This seems to be a booming subtopic: interest has picked up substantially within the past year.
Perhaps most notably, Davidad includes the concept in his Open Agency Architecture for Safe Transformative AI alignment paradigm. For a preview of the salience of this approach, see this comment by Davidad (2023 Jan):
“defend the boundaries of existing sentient beings,” which is my current favourite. It’s nowhere near as ambitious or idiosyncratic as “human values”, yet nowhere near as anti-natural or buck-passing as corrigibility.
This post also compiles recent work from Andrew Critch, Scott Garrabrant, John Wentworth, and others. But first I will recap what «Boundaries» are:
You can see «Boundaries» Sequence...
Today I've added a link to my current research questions for «membranes/boundaries», and also fixed the headers
Two and a half years ago, I wrote Extrapolating GPT-N performance, trying to predict how fast scaled-up models would improve on a few benchmarks. One year ago, I added PaLM to the graphs. Another spring has come and gone, and there are new models to add to the graphs: PaLM-2 and GPT-4. (Though I only know GPT-4's performance on a small handful of benchmarks.)
In previous iterations of the graph, the x-position represented the loss on GPT-3's validation set, and the x-axis was annotated with estimates of size+data that you'd need to achieve that loss according to the Kaplan scaling laws. (When adding PaLM to the graph, I estimated its loss using those same Kaplan scaling laws.)
In these new iterations, the x-position instead represents...
Sigmoids don't accurately extrapolate the scaling behavior(s) of the performance of artificial neural networks.
Use a Broken Neural Scaling Law (BNSL) in order to obtain accurate extrapolations:
https://arxiv.org/abs/2210.14891
https://arxiv.org/pdf/2210.14891.pdf
[Written by EJT as part of the CAIS Philosophy Fellowship. Thanks to Dan for help posting to the Alignment Forum]
For about fifteen years, the AI safety community has been discussing coherence arguments. In papers and posts on the subject, it’s often written that there exist 'coherence theorems' which state that, unless an agent can be represented as maximizing expected utility, that agent is liable to pursue strategies that are dominated by some other available strategy. Despite the prominence of these arguments, authors are often a little hazy about exactly which theorems qualify as coherence theorems. This is no accident. If the authors had tried to be precise, they would have discovered that there are no such theorems.
I’m concerned about this. Coherence arguments seem to be a moderately important...
I don't know about you, but I'm actually OK dithering a bit, and going in circles, and doing things that mere entropy can "make me notice regret based on syntactically detectable behavioral signs" (like not even active adversarial optimization pressure like that which is somewhat inevitably generated in predator prey contexts).
For example, in my twenties I formed an intent, and managed to adhere to the habit somewhat often, where I'd flip a coin any time I noticed decisions where the cost to think about it in an explicit way was probably larger than the di...
People talk about Kelly betting and expectation maximization as though they're alternate strategies for the same problem. Actually, they're each the best option to pick for different classes of problems. Understanding when to use Kelly betting and when to use expectation maximization is critical.
Most of the ideas for this came from Ole Peters ergodicity economics writings. Any mistakes are my own.
Alice and Bob visit a casino together. They each have $100, and they decide it'll be fun to split up, play the first game they each find, and then see who has the most money. They'll then keep doing this until their time in the casino is up in a couple days.
Alice heads left and finds a game that looks good. It's double...
Not maximising expected utility means that you expect to get less utility.
This isn't actually right though - the concept of maximizing utility doesn't quite overlap with expecting to have more or less utility at the end.
There are many examples where maximizing your expected utility means expecting to go broke, and not maximizing it means expecting to end up with more money.
(Even though, in this particular one-turn example, Bob should, in fact, expect to end up with more money if he bets everything.)
Short version: Sentient lives matter; AIs can be people and people shouldn't be owned (and also the goal of alignment is not to browbeat AIs into doing stuff we like that they'd rather not do; it's to build them de-novo to care about valuable stuff).
Context: Writing up obvious points that I find myself repeating.
Stating the obvious:
All sentient lives matter.
There's some ability-to-feel-things that humans surely have, and that cartoon drawings don't have, even if the cartoons make similar facial
I don't think 'tautology' fits. There are some people who would draw the line somewhere else even if they were convinced of sentience. Some people might be convinced that only humans should be included, or maybe biological beings, or some other category of entities that is not fully defined by mental properties. I guess 'moral patient' is kind of equivalent to 'sentient' but I think this mostly tells us something about philosophers agreeing that sentience is the proper marker for moral relevance.
...Language models (LMs) are powerful tools for natural language processing, but they often struggle to produce coherent and fluent text when they are small. Models with around 125M parameters such as GPT-Neo (small) or GPT-2 (small) can rarely generate coherent and consistent English text beyond a few words even after extensive training. This raises the question of whether the emergence of the ability to produce coherent English text only occurs at larger scales (with hundreds of millions of parameters or more) and complex architectures (with many layers of global attention). In this work, we introduce TinyStories, a synthetic dataset of short stories that only contain words that a typical 3 to 4-year-olds usually understand, generated by GPT-3.5 and GPT-4. We show that TinyStories can be used to
The Selfish Gene remains one of my favorite books of all time and although published in 1976 it remains a compelling and insightful introduction to the brute mechanics of natural selection. Richard Dawkins later acknowledged that his book’s title may give a misleading impression of its thesis, erroneously ascribing conscious motivations or agentic properties to non-sentient strands of DNA. The core argument is dreadfully simple: genes (as opposed to organisms or species) are the primary unit of natural selection, and if you leave the primordial soup brewing for a while, the only genes that can remain are the ones with a higher proclivity towards replication than their neighbors. “Selfish” genes therefore are not cunning strategists or followers of some manifest destiny, but rather simply the accidental consequence...
Because it's connected with a lot of other things? It's not that cultural morality is always inimical to the individual .. and it's not that cultural morality is cleanly split off from cultural Everything Else. Most people have learnt how to make war, how to make families and how to make bread as part of a single package.
It's a step, likely one that couldn't be skipped. Still just short of actually acknowledging nontrivial probability of AI-caused human extinction, and the distinction between extinction and lesser global risks. Nuclear war can't cause extinction, so it's not properly alongside AI x-risk. Engineered pandemics might eventually get extinction-worthy, but even that real risk is less urgent.