Joe summarizes his new report on "scheming AIs" - advanced AI systems that fake alignment during training in order to gain power later. He explores different types of scheming (i.e. distinguishing "alignment faking" from "powerseeking"), and asks what the prerequisites for scheming are and by which paths they might arise.
Nice reminiscence from Stephen Wolfram on his time with Richard Feynman:
...Feynman loved doing physics. I think what he loved most was the process of it. Of calculating. Of figuring things out. It didn’t seem to matter to him so much if what came out was big and important. Or esoteric and weird. What mattered to him was the process of finding it. And he was often quite competitive about it.
Some scientists (myself probably included) are driven by the ambition to build grand intellectual edifices. I think Feynman — at least in the years I knew him — was m
Decision theory is about how to behave rationally under conditions of uncertainty, especially if this uncertainty involves being acausally blackmailed and/or gaslit by alien superintelligent basilisks.
Decision theory has found numerous practical applications, including proving the existence of God and generating endless LessWrong comments since the beginning of time.
However, despite the apparent simplicity of "just choose the best action", no comprehensive decision theory that resolves all decision theory dilemmas has yet been formalized. This paper at long last resolves this dilemma, by introducing a new decision theory: VDT.
Some common existing decision theories are:
The problem is that it will not work outside the distribution.
Of course, but neither would anything else so far discovered...
PDF version. berkeleygenomics.org. Twitter thread. (Bluesky copy.)
The world will soon use human germline genomic engineering technology. The benefits will be enormous: Our children will be long-lived, will have strong and diverse capacities, and will be halfway to the end of all illness.
To quickly bring about this world and make it a good one, it has to be a world that is beneficial, or at least acceptable, to a great majority of people. What laws would make this world beneficial to most, and acceptable to approximately all? We'll have to chew on this question ongoingly.
Genomic Liberty is a proposal for one overarching principle, among others, to guide public policy and legislation around germline engineering. It asserts:
Parents have the right to freely choose the genomes of their children.
If upheld,...
On the "self-governing" model, it might be that the blind community would want to disallow propagating blindness, while the deaf community would not disallow it:
https://pmc.ncbi.nlm.nih.gov/articles/PMC4059844/
Judy: And how’s… I mean, I know we’re talking about the blind community now, but in a DEAF (person’s own emphasis) community, some deaf couples are actually disappointed when they have an able bodied… child.
William: I believe that’s right.
Paul: I think the majority are.
Judy: Yes. Because then …
Margaret: Do they?
Judy: Oh, yes! It’s well known down at ...
I run a lot of one-off jobs on EC2 machines. This usually looks like:
If the machine costs a non-trivial amount and the job finishes in the middle of the night I'm not awake to shut it down.
I could, and sometimes do, forget to turn the machine off.
Ideally I could tell the machine to shut itself off if no one was logging in and there weren't any active jobs.
I didn't see anything like this (though I didn't look very hard) so I wrote something (github):
$ prevent-shutdown long-running-command
As long as that command is still running, or someone is logged in over ssh,...
That's elegant in some sense, but somehow doesn't feel like the right way to do it.
Epistemic status: sure in fundamentals, informal tone is a choice to publish this take faster. Title is a bit of a clickbait, but in good faith. I don't think much context is needed here, ghiblification and native image generation was (and still is) a very much all-encompassing phenomenon.
No-no, not in the way you might have thought. Of course it is terrible for artists, it's a spit in the face of Miyazaki, who publicly disavowed image generation way back when.
A lot of people hate it, and image generation in general.
It is good, however, for AGI timelines.
It is good because of this:
And this:
And this:
Why, you might ask? Isn't all this making people and investors "feel the AGI" and pour more money into it? Make more money for OpenAI and...
I agree in thinking this could slow OpenAI's frontier development, but I don't think it'll move overall timelines by years. OpenAI is currently building out datacenters, so I expect a delay of at most one year (although this might change who reaches capabilities breakthroughs first).
I just don't see why an OpenAI slowdown would effect overall industry timelines that substantially. It might reduce pressure on Anthropic to ship, but I don't expect it to stall their internal development much.
A key step in the classic argument for AI doom is instrumental convergence: the idea that agents with many different goals will end up pursuing the same few subgoals, which includes things like "gain as much power as possible".
If it wasn't for instrumental convergence, you might think that only AIs with very specific goals would try to take over the world. But instrumental convergence says it's the other way around: only AIs with very specific goals will refrain from taking over the world.
For pure consequentialists—agents that have an outcome they want to bring about, and do whatever they think will cause it—some version of instrumental convergence seems surely true[1].
But what if we get AIs that aren't pure consequentialists, for example because they're ultimately motivated by virtues? Do...
I didn't claim virtue ethics says not to predict consequences of actions. I said that a virtue is more like a procedure than it is like a utility function. A procedure can include a subroutine predicting the consequences of actions and it doesn't become any more of a utility function by that.
The notion that "intelligence is channeled differently" under virtue ethics requires some sort of rule, like the consequentialist argmax or Bayes, for converting intelligence into ways of choosing.
I think rationalists should consider taking more showers.
As Eliezer Yudkowsky once said, boredom makes us human. The childhoods of exceptional people often include excessive boredom as a trait that helped cultivate their genius:
A common theme in the biographies is that the area of study which would eventually give them fame came to them almost like a wild hallucination induced by overdosing on boredom. They would be overcome by an obsession arising from within.
Unfortunately, most people don't like boredom, and we now have little metal boxes and big metal boxes filled with bright displays that help distract us all the time, but there is still an effective way to induce boredom in a modern population: showering.
When you shower (or bathe, that also works), you usually are cut off...
Ok
Infra-Bayesian physicalism (IBP) is a mathematical formalization of computationalist metaphysics: the view that whether something exists is a matter of whether a particular computation is instantiated anywhere in the universe in any way. In this post, we cover the basics of IBP, both rigorously and informally, its relevance to agent foundations and the alignment problem, and present new definitions that remove the biggest limitation in the previous formulation of IBP: the monotonicity requirement. Having read the original post on IBP is not required to understand this post, though prior familiarity with infra-Bayesianism is useful for understanding the technical parts.
A physicalist agent - that is, an agent that uses IBP - does not regard itself as inherently special. This is in contrast to a Cartesian agent. A...
I notice that although the loot box is gone, the unusually strong votes that people made yesterday persist.