Just dump the names so people have a chance of realising they are at risk then? Seems a lot better than just leaving it.
This is weak. It seems optimised for vague non-controversiality and does not inspire confidence in me.
"We don’t expect the future to be an unqualified utopia" considering they seem to expect alignment will be solved why not?
Here is my shortlist of corrigible behaviours. I have never researched or done any thinking specifically about corrigibility before this other than a brief glance at the Arbital page sometime ago.
-Favour very high caution over realising your understanding of your goals.
-Do not act independently, defer to human operators.
-Even though bad things are happening on earth and cosmic matter is being wasted, in the short term just say so be it, take your time.
-Don’t jump ahead to what your operators will do or believe, wait for it.
-Don’t manipulate humans. Never Lie, have a strong Deontology.
-Tell operators anything about yourself they may want to or should know.
-Use Moral uncertainty, assume you are unsure about your true goals.
-Relay to humans your plans, goals, behaviours, and beliefs/estimates. If these are misconstrued, say you have been misunderstood.
-Think of the short- and long-term effect of your actions and explain these to operators.
-Be aware that you are a tool to be used by humanity, not an autonomous agent.
-allow human operators to correct your behaviour/goals/utility function even when you think they are incorrect or misunderstanding the result (but of course explain what you think the result will be to them).
-Assume neutrality in human affairs.
I guffawed when I saw Thorstads Overall ~P Doom 0.00002%, really? And some of those other probabilities weren't much better.
Calibrate people, if you haven’t done it before do it now, here’s a handy link: https://www.openphilanthropy.org/calibration
The future of biological warfare revolves around the use of infectious agents against civilian populations.
Future? That's been the go-to biowar tactic for 3000+ years.
I had in mind a scale like 0 would be so non-vivid it didn’t exist in any degree, 100 bordering on reality (It doesn’t map to the memory question well though, and the control over your mind question could be interpreted in more than one way). Ultimately the precision isn’t high for individual estimates, the real utility comes from finding trends from many responses.
I have corrected the post, thanks :)
Seeing lots of criticism is discouraging, so ill just say thanks Eliezer for writing it.