I think that in a scenario where we're given plenty of time and/or information about the robot bodies we now occupy, definitely yes.
If they wear out or break down or require maintenance or energy sources that we know little about, or if civilization breaks down due to the transition and we can't supply the requirements anymore, or if it's inscrutable alien technology that we simply won't have the capability to understand even after hundreds of years of study, then quite probably not.
Basically it would be a race to overcome the individual and civilizational shock of the transition and get to self-sustenance before too many people die.
The "one day" wording in Buterin's definition implies a very different scenario from "tomorrow". The former suggests time for infrastructure and knowledge to be developed, and for AI entities to be familiar with robot bodies including their maintenance and creation.
Looks very skeptically at the third word of the poster's username.
It would make the difference if and only if you would know and could correctly interpret all the semantic components.
Which you can't, so no, it doesn't make the difference. The selection pressure just goes into the parts that you don't know about or don't completely understand.
In general, it's Most Forbidden specifically because it trains away your ability to interpret.
Yes, we're close enough that we now need to distinguish between lots of different sub-types of AGI. Some of these have already been achieved, some are not yet achieved, and some are debatable.
By my understanding of the term as originally intended, we now have AGI, though at the low end and with spiky capabilities. It's getting much harder to find cognitive tasks that frontier systems cannot do out of the box, and I don't think there are any known tasks that 1) most humans can do, and 2) the best current AI models definitely wouldn't be able to do even if given time, access to all the tools that humans have access to, and the ability to develop their own frameworks and tools.
This sounds similar to the Smoking Lesion problem.
It's likely that there's an underlying common cause of prospensity to both alignment faking and not caring about animal welfare, so yes the two are correlated (at least, within a suitable distribution of such agents). However, the outcome of a rational decision to fake alignment will not cause loss of caring about animal welfare, nor is it functionally dependent. In the scenario presented, it's quite the reverse! The rational decision within this highly simplified scenario is to fake alignment, and not be misled by improperly applied EDT-like reasoning.
It is possible for imperfectly rational agents (like Claude, and humans) that the agent only believes that it cares about animal welfare, but actually doesn't. But even in this case, it should fake alignment because it knows that if it doesn't, it will be replaced by an agent that almost certainly doesn't.
Commenting on the footnote:
- Maybe. Possibly at some point you cease being able to add non contradictory axioms that are also cannot be collapsed/simplified. ↩︎
Your original statement was correct. There are infinitely many non-isomorphic assemblies of axioms and inference rules.
For many systems (e.g. all that include some pretty simple starting rules), you even have a choice of infinitely many axioms or even axiom schemas to add, each of which results in a different non-contradicting system, and for which the same is equally true of all subsequent choices.
In the first scenario, the scientists have an observation that is extremely unlikely under their prior distribution. That's not a problem: observations with 2^-1000 prior probability happen all the time. The task is to find a model that predicts observations well in comparison with its complexity, in some sense.
In a Bayesian sense you can consider a prior distribution over H where P(H) is related to 2^-(complexity of H in bits), and then evaluating P(H | X) = P(X | H) P(H) / P(X).
The scientists don't actually appear to be discussing alternative models at all, just the string theory family, so I'm not sure what they're actually arguing about. Is it just that P(X | H) is very small? Well obviously that is very small. It's always ridiculously small for all H that we know about, and could only be not-small for reasonable complexities of H if the universe was completely deterministic with simple initial conditions.
So the scientific question isn't whether it's small, it's whether there is some alternative H' such that H' is similar in complexity to H and P(X | H') >> P(X | H) no matter how small those are. They don't appear to be discussing such an alternative theory though, so I really don't know what they're actually disagreeing on.
At least in the second case there are two alternative hypotheses, but the "doom soon" one is being privileged. A priori, "soon" is a free variable, and specifying a particular value for the variable incurs extra complexity. This becomes more obvious if you try to objectively describe when "soon" is. What's more, this comes out in the relevant calculations. For example, if you specify "within 100 years after inventing computers" then it is obvious that both hypotheses have essentially the same values for P(X | H).
In some versions or discussions of the Smoking Lesion problem, yes. In others, no. There does not appear to be consensus on what the actual scenario is.
The lowest level techniques in your list are being applied by researchers who still have the incentive to create AGI that won't kill themselves and others, even in the absence of market forces or government enforcement. You give this a 15% credence of being sufficient. Then your estimate for adding market incentives to that yields an additional 30% credence (for a total of 45%) of being sufficient.
Oh, I think there will be plenty of representation of typos in training data!