I like the idea, and I especially like the idea of safely observing treacherous turns. But, a few failure modes might be:
If the AI wreaks havoc on the planet before it manages to get access to the self-termination script, humans aren't left in very good shape, even if the AI ends up switched off afterward. (This DOES seem unlikely, since presumably getting the script would be easy enough that it would not first require converting the planet to computronium or whatever, but it's a possibility.)
A sufficiently intelligent AI would probably read the script, realize that the script's execution will result in its own termination, and plan accordingly by putting other mechanisms in place to reactivate itself afterward--all so it could continue to run the scrip again and again. Then it would also have instrumental reasons to safeguard itself against interruption through some of the same "bad for humanity" strategies that a pi calculator might use. Maybe this could be fixed by making the final goal be "run SELF-TERMINATE.sh once and only once"... but I feel like that's susceptible to the same problems as telling Clippy "only make 32 paperclips, don't just make them indefinitely".
Seems to me an AI without goals wouldn't do anything, so I don't see it as being particularly dangerous. It would take no actions and have no reactions, which would render it perfectly safe. However, it would also render the AI perfectly useless--and it might even be nonsensical to consider such an entity "intelligent". Even if it possessed some kind of untapped intelligence, without goals that would manifest as behavior, we'd never have any way to even know it was intelligent.
The question about utility maximization is harder to answer. But I think all agents that accomplish goals can be described as utility maximizers regardless of their internal workings; if so, that (together with what I said in the last paragraph) implies that an AI that doesn't maximize utility would be useless and (for all intents and purposes) unintelligent. It would simply do nothing.
I think it's probably enough of an obstacle that it's more likely an AGI will be developed first. In that sense I do agree with Bostrom. However, I wouldn't say it's completely infeasible, rather that it will require considerable advances in pattern recognition technology, our understanding of the brain, and our technological ability to interface with the brain first. The idiosyncratic morphology and distributed/non-localized information storage make for a very difficult engineering problem, but I'm optimistic that it can be overcome in some way or another.
We've already had some (granted, very limited) success with decoding imagery from the visual cortex through "dumb" (non-AGI) machine learning algorithms, which makes deeper interaction seem at least possible. If we can make advances in the above-mentioned fields, I would guess the biggest limitation will be that we'll never have a standardized "plug'n'play" protocol for brains--interfaces will require specialized tuning for each individual and a learning period during which the algorithms can "figure out" how your brain is wired up.
Hi! I'm Nathan Holmes, and I've bounced around a bit educationally (philosophy, music, communication disorders, neuroscience), and am now pursuing computer science with the intent of, ideally, working on AI/ML or something related to implants. (The latter may necessitate computer engineering rather than CS per se, but anyway.)
One of my lifelong interests has been understanding intelligence and how minds work.
Previously, in spite of having been an off-and-on follower of Less Wrong (and, earlier, Overcoming Bias, when EY published there), I hadn't really treated unfriendly AI as worthy of much thought, but reading a few more recent MIRI essays convinced me I should take it seriously--especially if I'm interested in AI research myself. Hence the decision to read Superintelligence.
I will suggest four scenarios where different types of people would be desirable as candidates.
First, there's "whoever's rich and willing". Under this scenario, the business offers WBE to the highest bidder, presumably with the promise of immortality (and perhaps fame/notoriety: hey, the first human to be emulated!) as enticement. This would seem to presuppose that the company has managed to persuade the rest of the world that the emulation definitely will work as advertised, in spite of its hitherto lack of human testing. And/or it's a non-destructive procedure.
Second, there's the "whoever's craziest" scenario. Supposing it's a destructive process and--ex hypothesi--as yet untested on humans, the company might struggle to find anyone willing to undergo the procedure at all. In this case, I would expect mainly enthusiastic volunteers from LW who have followed the technology closely, possibly with a smattering of individuals diagnosed with terminal diseases (see also next scenario).
Third, there's the "humanitarian" (or, if you're cynical, the "cook up some good P.R.") scenario. Here, the company selects someone who is either (1) in need or (2) likely to contribute to humanity's betterment. Good candidates might be a sympathy-garnering, photogenic child with an incurable disease, or an individual like Stephen Hawking, who has clearly proven to be a valuable scientific asset and is a well-known public figure--yet who is in the late stages of ALS.
Fourth and last, there's the "ideal worker" or "capitalism at its finest" scenario, inspired by Robin Hanson: here, the company might choose someone specifically because they anticipate that this individual will be a successful worker in a digital environment. Aside from the usual qualifications (punctual, dedicated, loyal, etc.), some extra pluses here would be someone with no existential compunctions about being switched on and off or about being split into temporary copies which would later be terminated after fulfilling their roles.
As an aside, when you're speaking of these embodied metaphors, I assume you have in mind the work of Lakoff and Johnson (and/or Lakoff and Núñez)?
I'm sympathetic to your expectation that a lack of embodiment might create a metaphor "mistranslation". But, I would expect that any deficits could be remedied either through virtual worlds or through technological replacements of sensory equipment. Traversing a virtual path in a virtual world should be just as good a source of metaphor/analogy as traversing a physical path in the physical world, no? Or if it weren't, then inhabiting a robotic body equipped with a camera and wheels could replace the body as well, for the purposes of learning/developing embodied metaphors.
What might be more interesting to me, albeit perhaps more speculative, is to wonder about the kinds of new metaphors that digital minds might develop that we corporeal beings might be fundamentally unable to grasp in quite the same way. (I'm reminded here of an XKCD comic about "what a seg-fault feels like".)