Wiki Contributions


There is a lot of room between "ignore people; do drastic thing" and "only do things where the exact details have been fully approved". In other words, the Overton window has pretty wide error bars.

I would be pleased if someone sent me a computer virus that was actually a security fix. I would be pretty upset if someone fried all my gadgets. If someone secretly watched my traffic for evil AI fingerprints I would be mildly annoyed but I guess glad?

Even google has been threatening unpatched software people to patch it or else they'll release the exploit iirc

So some of the Q of "to pivotally act or not to pivotally act" is resolved by acknowledging that extent is relevant and you can be polite in some cases

This is the post I would have written if I had had more time, knew more, thought faster, etc

One note about your final section: I expect the tool -> sovereign migration to be pretty easy and go pretty well. It is also kind of multistep, not binary.

Eg current browser automation tools (which bring browsers one step up the agency ladder to scriptable processes) work very well, probably better than a from-scratch web scripting tool would work.

Fake example: predict proteins, then predict interactions, then predict cancer-preventiveness, THEN, if everything is going good so far, solve for the protein that prevents the cancer. You might not need more steps but you could also incrementally solve for the chemical synthesis process, eliminate undesired byproducts, etc.

More generally, it might be easy to flip simulators/predictors over into optimizers when the system is trustworthy and the situation demands.

If your system is low impact or whatever then you can have it melt gpus without going off and running the world forever.


  1. Operator making high-level judgement calls is a pretty good solution for the foreseeable future.
  2. The self referenced decision theory stuff won't come in in an important way for a good while (at least the problems won't demand anything super rigorous)

So +1 for staying in happy tool land and punting on scary agent land

I thought not cuz i didn't see why that'd be desideratum. You mean a good definition is so canonical that when you read it you don't even consider other formulations?

'Betray' in the sense of contradicting/violating?

Seems like choosing the definitions is the important skill, since in real life you don't usually have a helpful buddy saying "hey this is a graph"

Do you expect the primary asset to be a neural architecture / infant mind or an adult mind? Is it too ambitious to try to find an untrained mind that reliably develops nicely?

Someone make a PR for a builder/breaker feature on lesswrong

Just watch for anomalies and deal with them however you want. Makes perfect sense. That sounds like a relatively low effort way to make the simulation dramatically more secure.

Does seem to me like an old-fashioned physics/game engine might be easier to make, run faster, and be more self-consistent. It would probably lack superresolution and divine intervention would have to be done manually.

I'm curious what you see as the major benefits of neural-driven sim.


One thought:

The main world design challenge is not that of preventing our agents from waking up, neo-style, and hacking their way out of the simbox. That’s just bad sci-fi.

If we are in a simulation it seems to be very secure. People are always trying to hack it. Physicists go to the very bottom and try every weird trick. People discovered fire and gunpowder by digging deep into tiny inconsistencies. You play a game and there's a glitch if you hold a box against the wall, you see how far that goes. People discovered buffer overflows in Super Mario World and any curious capable agent eventually will too.

So the sim has to be like, very secure.

Load More