Responding to the disagree reaction, while I do think the non-reaction isn't explained well by selfishness and near-term utility focused over long-run utility, because I do think they'd probably ask to shut it down or potentially even speed it up, I do think it predicts the AI arms race dynamic relatively well, because you no longer need astronomically low probability of extinction to develop AI to ASI, and it becomes even more important that your side win, if you believe in anything close to the level of power of AI that LW thinks, and selfishness means that the effects of generally increasing AI risk don't actually matter until it's likely that you personally die.
Indeed, this can easily go to >50% or more depending on both selfishness levels and how focused you are on the long-term.
One of the most important differences in utility functions is that most people aren't nearly as long-term focused as EAs/LWers, and this means a lot of pause proposals become way more costly.
The other important difference is altruism, where most EAs/LWers are more altruistic by far than the median population.
Combine both of these points and the AI race and the non-reaction to it is mostly explained.
My guess the main issue of current transformers turns out to be the fact that they don't have a long-term state/memory, and I think this is a pretty critical part of how humans are able to learn on the job as effectively as they do.
The trouble as I've heard it is the other approaches which incorporate a state/memory for the long-run are apparently much harder to train reasonably well than transformers, plus first-mover effects.
For example, I believe @abramdemski really wants to implement a version of UDT and @Vanessa Kosoy really wants to implement an IBP agent. They are both working on a normative theory which they recognize is currently slightly idealized or incomplete, but I believe that their plan routes through developing that theory to the point that it can be translated into code. Another example is the program synthesis community in computational cognitive science (e.g. Josh Tenenbaum, Zenna Tavares). They are writing functional programs to compete with deep learning right now.
For a criticism of this mindset, see my (previous in this sequence) discussion of why glass-box learners are not necessarily safer. Also, (relatedly) I suspect it will be rather hard to invent a nice paradigm that takes the lead from deep learning. However, I am glad people are working on it and I hope they succeed; and I don't mean that in an empty way. I dabble in this quest myself - I even have a computational cognitive science paper.
For what it's worth, IBP avoids the issue of glass-box learners not necessarily being safe by focusing on desiderata rather than specifically focusing on algorithms.
In particular, you could in principle prove stuff about black boxes, so long as the black box satisfied some desiderata rathet than trying to white box the algorithm and prove stuff on that.
@Steven Byrnes has talked about this before:
https://www.lesswrong.com/posts/SzrmsbkqydpZyPuEh/my-take-on-vanessa-kosoy-s-take-on-agi-safety
The reason I said that is that "human potential" strictly speaking is indifferent to the values of the humans that make up the potential, and pretty importantly existential risks pretty much have to be against everyone's instrumental goals in order for the concept to have a workable definition.
In particular, human potential is indifferent to the diversity of human values, so long as there remain humans at all that are alive.
And as Gwern said, the claim that chimpanzees can make a good life for themselves in their societies despite their lack of intelligence has huge asterisk marks at best, and at worst isn't actually true:
https://www.lesswrong.com/posts/DfrSZaf3JC8vJdbZL/?commentId=rNnWduiufEmKFACL4
For what it's worth, I consider problem 1 to be somewhat less of a showstopper than you do, because of things like AI control (which while unlikely to scale to arbitrary intelligence levels, is probably useful for the problem of instrumental goals).
However, I do think problems 2 and 3 are a big reason why I'm less of a fan of deploying ASI/AGI widely like @joshc wants to do.
Something close to proliferation concerns (especially around bioweapons) is a big reason why I disagree with @Richard_Ngo on AI safety agreeing to be cooperative with open-source demands/having a cooperative strategy for open-source in the endgame.
Eventually, we will build AIs that could be used safely by small groups, but cannot be released to the public except through locked down APIs with counter-measures to misuse, without everyone or almost everyone dying.
However, I think we can mitigate misuse concerns without requiring much jailbreak robustness, ala @ryan_greenblatt's post on managing catastrophic misuse without robust AIs:
https://www.lesswrong.com/posts/KENtuXySHJgxsH2Qk/managing-catastrophic-misuse-without-robust-ais
I like your thoughts on problem 4, and yeah memory complicates a lot of considerations around alignment in interesting ways.
I agree with you that instruction following should be used as a stepping stone to value alignment, and I even have a specific proposal in mind, which at the moment is the Infra-Bayes Physicalist Super-Imitation.
I agree with your post on this issue, so I'm just listing out more considerations.
There are some pretty important caveats:
@Jozdien talks more about this below:
2. As Asher stated, this would be consistent with a world where RL increased capabilities arbitrarily, so long as they become less diverse, and we don't have the means to rule out RL increasing capabilities such that you do want to use the reasoning model over the base model on this paper:
Similarly, 200 years of improvements to biological simulations would help more than zero with predicting the behavior of engineered biosystems, but that's not the bar. The bar is "build a functional general purpose biorobot more quickly and cheaply than the boring robotics/integration with world economy path". I don't think human civilization minus AI is on track to be able to do that in the next 200 years.
I don't think it's on track to do so, but this is mostly because of the coming population decline meaning regression in tech is very likely.
If I instead assumed that the human population would expand in a similar manner to the AI population, and was willing to rewrite/ignore regulations, I'd put a 70-80% chance that we could build bio-robots more quickly and cheaply than the boring robotics path in 200 years, with the remaining 10-20% being on the possibility that biotech is just fundamentally way more limited than people think.
Link to long comments that I want to pin, but are too long to be pinned:
https://www.lesswrong.com/posts/Zzar6BWML555xSt6Z/?commentId=aDuYa3DL48TTLPsdJ
https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/?commentId=Gcigdmuje4EacwirD