No, didn't read the sequences. I will do that. The link might be better named to something that indicates what it actually is. But I didn't say the AIs would be safe (or super-intelligent, for that matter), and I don't assume they would be. But those who create them may assume that.
The kind of constraint you propose would be very useful. We would have to first prove that there is a kind of topology in under general computation (because the machine can change its own language, so the solution can't be language specific) that only allows non-suicidal trajectories under all possible inputs and self-modifications. (or perhaps at least with low probability, but this is not likely to be computable). I have looked, but not found such a thing in existing theory. There is work on topology of computation, but it's something different from this. I may just be unaware of it, however.
Note that in the real-world scenario, we also have to worry about entropy battering around the design, so we need a margin of error for that too.
Finally, the finite-time solution is practical, but ultimately not satisfying. The short term solution to being in a building on fire may be to stay put. The long term solution may be to risk short-term harm for long-term survival. And so with only short-term solutions, one may end up in a dead end down the road. A practical limit on short-term advance simulation is that one still has to act in real time while the simulation runs. And if you want the simulation to take into account that simulations are occurring, we're back to infinite regress...
Imagine that you want to construct an AI that will never self-halt (easier to define than friendliness, but the same idea applies). You could build the machine so that it doesn't have an off switch, and therefore can't halt simply out of inability. However, if the machine can self-modify, it could subsequently grant itself the ability to halt. So in your design, you'd have to figure out a way to prevent self-halting under all possible input conditions, under all possible self-modifications of the machine. This latter task cannot be solved in the general case because of Rice's Theorem, and engineering a solution leads to an infinite regress:
So in practice, how can one create a relatively non-suicidal AI? An evolutionary/ecological approach is proven to work: witness biological life. (however, humans, who have the most general computational power, suicide at a more or less constant rate).
In short: genetic programming, or some other such search, can possibly find quasi-solutions (meaning they work under conditions that have been tested) if they exist, but designing in all the required characteristics up front would require tremendous ability to prove outcomes for each specific case. In practice, this debate is probably moot because it'll be a combination of both.
I think if you want "proven friendly" AIs, they would almost have to be evolved because of Rice's Theorem. Compare it to creating a breed of dog that isn't aggressive. I think FOOM fails for the same reason--see the last bit of "Survival Strategies" .
As you say, it may not be practical to do so, perhaps because of technological limitations. But imagine a set "personality engine" with a bunch of parameters that affect machine-emotional responses to different stimuli. Genetic programming would be a natural approach to find a good mix of those parameter values for different applications.
You might be interested in this New Scientist article: Evidence that we can see the future to be published