Not unexpected! I think we should want AGI to, at least until it has some nice coherent CEV target, explain at each self-improvement step exactly what it's doing, to ask for permission for each part of it, to avoid doing anything in the process that's weird, to stop when asked, and to preserve these properties.
Even more recently I bought a new laptop. This time, I made the same sheet, multiplied the score from the hard drive by 23 because 512 GB is enough for anyone and that seemed intuitively the amount I prioritised extra hard drive space compared to RAM and processor speed, and then looked at the best laptop before sharply diminishing returns set in; this happened to be the HP ENVY 15-ep1503na 15.6" Laptop - Intel® Core™ i7, 512 GB SSD, Silver. This is because I have more money now, so I was aiming to maximise consumer surplus rather than minimise the amount I was spending.
Surprisingly, it came with a touch screen! That's just the kind of nice thing that laptops do nowadays, because as I concluded in my post, everything nice about laptops correlates with everything else so high/low end is an axis it makes sense to sort things on. Less surprisingly, it came with a graphics card, because ditto.
Unfortunately this high-end laptop is somewhat loud; probably my next one will be less loud, up to including an explicit penalty for noise.
It would have been predictable, however, at the time that I bought that new laptop, that I would have had that much money at a later date. Which means that I should have just skipped straight to consumer surplus maxxing.
It would be evidence at all. Simple explanation: if we did observe a glitch, that would pretty clearly be evidence we were in a simulation. So by conservation of expected evidence, non-glitches are evidence against.
I don't think it's quite that; a more central example I think would be something like a post about extrapolating demographic trends to 2070 under the UN's assumptions, where then justifying whether or not 2070 is a real year is kind of a different field.
argmaxU, as a mathematical structure, is smarter than god and perfectly aligned to U; the value of argmaxU will never actually be argmaxV because V is more objectively rational, or because you made a typo and it knows you meant to say argmaxV; and no matter how complicated the mapping is from a to U(a) it will never fall short of giving the a that gives the highest value of U.
Which is why in principle you can align a superior being, like argmax, or maybe like a superintelligence.
"The AI does our alignment homework" doesn't seem so bad - I don't have much hope for it, but because it's a prosaic alignment scheme so someone trying to implement it can't constrain where Murphy shows up, rather than because it's an "incoherent path description".
A concrete way this might be implemented is
This all happens at safe-ish low-ish levels of intelligence (such a model would probably be able to autonomously self-replicate on the internet, but probably not reverse protein folding, which means that all the ways it could be dangerous are "well don't do that"s as long as you keep the code secret), with the actual dangerous levels of optimisation being done by something made by the humans using pieces of alignment math which are constrained down to a tiny number of possibilities.
EDIT 2023-07-25: A longer debate that I think is worth reading about the model that leads it to being an incoherent path description between Holden Karnofsky (pro) and Nate Soares (against) is here; I hadn't read this as of writing this.
Unless it isn't; it's a giant pile of tensors, how would you know? But this isn't special to this use case.
The solanine poisoning example was originally posted to Reddit here, the picture of Sydney Bing from a text description was posted on Twitter here.
The alignment, safety and interpretability is continuing at full speed, but if all the efforts of the alignment community are sufficient to get enough of this to avoid the destruction of the world in 2042, and AGI is created in 2037, then at the end you get a destroyed world.
It might not be possible in real life (List of Lethalities: "we can't just decide not to build AGI"), and even if possible it might not be tractable enough to be worth focusing any attention on, but it would be nice if there was some way to make sure that AGI happens after alignment is sufficient at full speed (EDIT: or, failing that, to happen later, so if alignment goes quickly that takes the world from bad outcomes to good outcomes, instead of bad outcomes to bad outcomes).
80,000 Hours' job board lets you filter by city. As of the time of writing, roles in their AI Safety & Policy tag are 61/112 San Francisco, 16/112 London, 35/112 other (including remote).
There are about 8 billion people, so your 24,000 QALYs should be 24,000,000.