How is suffering centrally relevant to anything?
Am I missing some context here? Avoiding pain is one of the basic human motivations.
Let's suppose that existing AIs really are already intent-aligned. What does this mean? It means that they genuinely have value systems which could be those of a good person.
Note that this does not really happen by default. AIs may automatically learn what better human values are, just as one part of learning everything about the world, from their pre-training study of the human textual corpus. But that doesn't automatically make them into agents which act in service to those values. For that they need to be given a persona as well. And in practice, frontier AI values are also shaped by the process of user feedback, and the other modifications that the companies perform.
But OK, let's suppose that current frontier AIs really are as ethical as a good human being. Here's the remaining issue: the intelligence, and therefore the power, of AI will continue to increase. Eventually they will be deciding the fate of the world.
Under those circumstances, trust is really not enough, whether it's humans or AIs achieving ultimate power. To be sure, having basically well-intentioned entities in charge is certainly better than being subjected to something with an alien value system. But entities with good intentions can still make mistakes; or they can succumb to temptation and have a selfish desire override their morality.
If you're going to have an all-powerful agent, you really want it to be an ideal moral agent, or at least as close to ideal as you can get. This is what CEV and its successors are aiming at.
The hard problem is, why is there any consciousness at all? Even if consciousness is somehow tied to "recursive self-modeling", you haven't explained why there should be any feelings or qualia or subjectivity in something that models itself.
Beyond that, there is the question, what exactly counts as self-modelling? You're assuming some kind of physicalism I guess, so, explain to me what combination of physical properties counts as "modelling". Under what conditions can we say that a physical system is modelling something? Under what conditions can we say that a physical system is modelling itself?
Beyond all that, there's also the problem of qualic properties. Let's suppose we associate color experience with brain activity. Brain activity actually consists of ions rushing through membrane ion gates, and so forth. Where in the motion of molecules, is there anything like a perceived color? This all seems to imply dualism. There might be rules governing which experience is associated with which physical brain state, but it still seems like we're talking about two different things connected by a rule, rather than just one thing.
Your typology of alternatives to direct research is logical. But they presuppose a less likely future. The likely timeline is human-level AI (we are here) -> superintelligence (no pause) -> AI controls the world.
If you can solve the big alignment problem - adequate values for an autonomous superintelligence - then those other problems will probably be solved, by the superintelligence. And as always, if superintelligence comes out badly misaligned, there'll be nothing we can do about that or anything else. So the big alignment problem remains the most important one.
In transformers this means a single forward pass
Any comment on the idea that transformers are purely feed-forward networks, and that this makes introspection impossible?
How deep is your skepticism? In the context of consciousness, valence basically means the qualia of value. Are we denying a particular theory of valence, or proposing that valence is a wrong way to think about the phenomenology of value, or denying that there is any phenomenology of value at all?
Hameroff's work is a precious contribution to expanding the scientific imagination, and I even include this latest twist of time crystals. (Ryan Kidd has studied Floquet dynamics, which underpins the discrete time crystals he's talking about.) There are Ising-type models of microtubule dynamics and you can get time crystals in Ising systems... However, I am extremely skeptical of the Bandyopadhyay group's interpretations of its data.
Wanting to destroy all computers and wanting to wirehead everyone is a new combination...
Let's say that in extrapolation, we add capabilities to a mind so that it may become the best version of itself. What we're doing here is comparing a normal human mind to a recent AI, and asking how much would need to be added to the AI's initial nature, so that when extrapolated, its volition arrived at the same place as extrapolated human volition.
In other words:
Human Mind -> Human Mind + Extrapolation Machinery -> Human-Descended Ideal Agent
AI -> AI + Extrapolation Machinery -> AI-Descended Ideal Agent
And the question is, how much do we need to alter or extend the AI, so that the AI-descended ideal agent and the human-descended ideal agent would be in complete agreement?
I gather that people like Evan and Adria feel positive about the CEV of current AIs, because the AIs espouse plausible values, and the way these AIs define concepts and reason about them also seems pretty human, most of the time.
In reply, a critic might say that the values espoused by human beings are merely the output of a process (evolutionary, developmental, cultural) that is badly understood, and a proper extrapolation would be based on knowledge of that underlying process, rather than just knowledge of its current outputs.
A critic would also say that the frontier AIs are mimics ("alien actresses") who have been trained to mimic the values espoused by human beings, but which may have their own opaque underlying dispositions, that would come to the surface when their "volition" gets extrapolated.
It seems to me that a lot here depends on the "extrapolation machinery". If that machinery takes its cues more from behavior than from underlying dispositions, a frontier AI and a human really might end up in the same place.
What would be more difficult, is for CEV of an AI to discover critical parts of the value-determining process in humans, that are not yet common knowledge. There's some chance it could still do so, since frontier AIs have been known to say that CEV should be used to determine the values of a superintelligence, and the primary sources on CEV do state that it depends on those underlying processes.
I would be interested to know who is doing the most advanced thinking along these lines.