LESSWRONG
LW

256
d4hines
12130
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Reflections on 4 years of meta-honesty
d4hines10d10

Is your meta-honesty policy compact and something you could share here? The review would be more interesting/helpful that way. Are there parts of the policy you keep private/hidden (and is that something you'd lie about)?

Reply
A Narrow Path: a plan to deal with AI extinction risk
d4hines10mo10

Therefore, in the case of an emergency, a compute provider and/or an AI developer can be called upon to shutdown the model.

Invoking the kill switch would be costly and painful for the compute provider/AI developer, and I wonder if this would make them slow to pull the trigger. Why not place the kill switch in the regulator's control, along with the expectation that companies could sue the regulator for damages if the kill switch was invoked needlessly?

Edit: Actually I think this is what is meant by "Hardware-Enabled Governance Mechanisms (HEM)", and I think the suggestion that the compute provider or AI developer shut down the model is a stop-gap until HEM is widely deployed.

Reply
Is anyone working on formally verified AI toolchains?
d4hines1y10

Alas, formal methods can't really help with that part. If you have the correct spec, formal methods can help you know with as much certainty as we know how to get that your program implements the spec without failing in undefined ways on weird edge cases. But even experienced, motivated formal methods practitioners sometimes get the spec wrong.  I suspect "getting the sign of the reward function" right is part of the spec, where theorem provers don't provide much leverage beyond what a marker and whiteboard (or program and unit tests) give you.

Reply
13Some Thoughts on Mech Interp
2mo
1