How useful do people think it would be to have human-in-the-loop (HITL) AI systems?
What's the likeliest failure mode of HITL AI Systems? And what kind of policies are mostly likely to help against those failure modes?
If we assume that HITL AI Systems are not going to be safer once we reach unaligned AGI. Could it maybe give use tools/information/time-steps to increase the chance that AGI will be aligned?
I was thinking of money. :)
Interesting. Can you give us a sense of how much those asks (offer to pay for the extra labour) end up costing you?
Also: The last name is "Von Almen" not "Almen"
I always assumed that "Why don't we give Terence Tao a million dollars to work on AGI alignment?" was using Tao to refer to a class of people. Your comment implies that it would be especially valuable for Tao specifically to work on it.
Why should we believe that Tao would be especially likely to be able to make progress on AGI alignment (e.g. compared to other recent fields medal winners like Peter Scholze)?
[I think this is more anthropomorphizing ramble than concise arguments. Feel free to ignore :) ]
I get the impression that in this example the AGI would not actually be satisficing. It is no longer maximizing a goal but still optimizing for this rule.
For a satisficing AGI, I'd imagine something vague like "Get many paperclips" resulting in the AGI trying to get paperclips but at some point (an inflection point of diminishing marginal returns? some point where it becomes very uncertain about what the next action should be?) doing something else.
Or for rules like "get 100 paperclips, not more" the AGI might only directionally or opportunistically adhere. Within the rule, this might look like "I wanted to get 100 paperclips, but 98 paperclips are still better than 90, let's move on" or "Oops, I accidentally got 101 paperclips. Too bad, let's move on".
In your example of the AGI taking lots of precautions, the satisficing AGI would not do this because it could be spending its time doing something else.
I suspect there are major flaws with it, but an intuition I have goes something like this:
I think the overall point you're making is intriguing, and I could see how it might alter my home behaviour if I considered it more deeply. But I also strongly disagree with the following:
There is a bunch of "work behavior" that has been very useful – in the right measure – for my personal life:
Maybe some of them are too obvious and common. But they are things that my grandmother wouldn't have done – and I suspect that they are mostly derived from work culture.