I have signed no contracts or agreements whose existence I cannot mention.
They thought they found in numbers, more than in fire, earth, or water, many resemblances to things which are and become; thus such and such an attribute of numbers is justice, another is soul and mind, another is opportunity, and so on; and again they saw in numbers the attributes and ratios of the musical scales. Since, then, all other things seemed in their whole nature to be assimilated to numbers, while numbers seemed to be the first things in the whole of nature, they supposed the elements of numbers to be the elements of all things, and the whole heaven to be a musical scale and a number.
don't reflect the authors' tendency to update, seeing as they've been using evolutionary metaphors since the beginning
This seems locally invalid. Eliezer at least has definitely used evolution in different ways and to make different points throughout the years. Originally using the “alien god” analogy to show optimization processes do not lead to niceness in general (in particular, no chaos or unpredictability required), now they use evolution for an “inner alignment is hard” analogy, mainly arguing it implies a big problem is that objective functions do not constrain generalization behavior enough to be useful for AGI alignment. Therefore the goals of your system will be very chaotic.
I think this definitely constitutes an update, “inner alignment” concerns were not a thing in 2008.
Noticeable speed-up of applied sciences: it's not clear that such a dramatic speed-up in the formal sciences would have that dramatic consequences for the rest of the world, given how abstract much of it is. Cryptography, formal verification and programming languages might be the most consequential areas, followed by areas like experimental physics and computational chemistry. However, in most of the experimental sciences, formal results are not the main bottleneck, so speed-ups would be more dependent on progress on coding, fuzzier tasks, robotics, and so on. Math-heavy theoretical AI alignment research would be significantly sped up, but may still face philosophical hurdles.
Its not clear that the primary bottleneck to formal sciences are proofs either. I get the impression from some mathematicians that the big bottlenecks are in new ideas, so in the "what do you want to prove?" question, rather than the "can you prove x?" question. That seems much less formally verifiable.
the primary purpose is awareness among policymakers and the public
If the success or failure of current techniques provide no evidence about future AI, then isn’t this dishonest? Maybe we are ok with dishonesty here, but if you are right, then this is bound to backfire.
the evals are useful information to the safety community
What use do the evals have for the safety community, from a research perspective? If they are mostly junk, then publishing them would seem more misleading than anything, given the number who think they ought to be trusted.
This seems less likely the harder the problem is, and therefore the more the AI needs to use its general intelligence or agency to pursue it, which are often the sorts of tasks we’re most scared about the AI doing surprisingly well on.
I agree this argument suggests we will have a good understanding of more simple capabilities the model has, like what facts about biology it knows about, which may end up being useful anyway.
Note that this is discussed in their supplemental materials, in particular, in line with your last paragraph,
Thresholds don’t matter all that much, in the end, to the argument that if anyone builds artificial superintelligence then everyone dies. Our arguments don’t require that some AI figures out how to recursively self-improve and then becomes superintelligent with unprecedented speed. That could happen, and we think it’s decently likely that it will happen, but it doesn’t matter to the claim that AI is on track to kill us all.
All that our arguments require is that AIs will keep on getting better and better at predicting and steering the world, until they surpass us. It doesn’t matter much whether that happens quickly or slowly.
The relevance of threshold effects is that they increase the importance of humanity reacting to the threat soon. We don’t have the luxury of waiting until the AI is a little better than every human at every mental task, because by that point, there might not be very much time left at all. That would be like looking at early hominids making fire, yawning, and saying, “Wake me up when they’re halfway to the moon.”
It took hominids millions of years to travel halfway to the moon, and two days to complete the rest of the journey. When there might be thresholds involved, you have to pay attention before things get visibly out of hand, because by that point, it may well be too late.
Doesn’t having multiple layers of protection seem better to you? Having it be so the AI would more likely naturally conclude we won’t read its scratchpad and modifying its beliefs in this way seems better than not.
You have also recently argued modern safety research is ”shooting with rubber bullets”, so what are we getting in return by breaking such promises now? If its just practice, there’s no reason to put the results online.
Who (besides yourself) has this position? I feel like believing the safety research we do now is bullshit is highly correlated with thinking its also useless and we should do something else.
the property electrons have that you observe within yourself and want to call "conscious"-as-in-hard-problem-why-is-there-any-perspective is, imo, simply "exists". existence is perspective-bearing. in other words, in my view, the hard problem is just the localitypilled version of "why is there something rather than nothing?"
This actually leads into why I feel drawn to Tegmark’s mathematical universe. It seems that regardless of whether or not my electrons are tagged with the “exists” xml tag, I would have no way of knowing that fact, and would think the same thoughts regardless, so I’m skeptical this word doesn’t get dissolved as we know more philosophy, so that we end up saying stuff like “yeah actually everything exists” or “well no, nothing exists”, and then derive our UDASSA without reference to “existence” as a primitive.
How do electrons having the property “conscious”, but otherwise continuing to obey Maxwell’s equations translate into me saying “I am conscious”?
Or more generally, how does any lump of matter, having the property “conscious” but otherwise continuing to obey unchanged physical laws, end up uttering the words “I am conscious”?
Its the difference between outer an inner alignment. The former makes the argument that it is possible, for some intelligent optimizer to be misaligned with humans, and likely for "alien gods" such as evolution or your proposed AGI. Its an argument about outer alignment not being trivial. It analogizes evolution to the AGI itself. Here is a typical example:
The latter analogizes evolution to the training process of you AGI. It doesn't focus on the perfectly reasonable (for evolution) & optimal decisions your optimization criteria will make, it focuses on the staggering weirdness that happens to the organisms evolution creates outside their ancestral environment. Like humans' taste for ice cream over "salted and honeyed raw bear fat". This is not evolution coldly finding the most optimal genes for self-propagation, this is evolution going with the first "idea" it has which is marginally more fit in the ancestral environment, then ultimately, for no inclusive genetic fitness justified reason, creating AGIs which don't care a lick about inclusive genetic fitness.
That is, an iterative process which selects based on some criteria, and arrives at an AGI, need not also produce an AGI which itself optimizes that criteria outside the training/ancestral environment.