Could AI be used to engineer a sociopolitical situation where humans can solve the problems surrounding AGI?

As a useful exercise, I would advise asking yourself this question first, and thinking about it for five minutes (using a clock) with as much genuine intent to argue against your idea as possible. I might be overestimating the amount of background knowledge required, but this does feel solvable with info you already have.

ROT13: Lbh lbhefrys unir cbvagrq bhg gung n fhssvpvragyl cbjreshy vagryyvtrapr fubhyq, va cevapvcyr, or noyr gb pbaivapr nalbar bs nalguvat. Tvira gung, jr pna'g rknpgyl gehfg n fgengrtl gung n cbjreshy NV pbzrf hc jvgu hayrff jr nyernql gehfg gur NV. Guhf, jr pna'g eryl ba cbgragvnyyl hanyvtarq NV gb perngr n cbyvgvpny fgengrtl gb cebqhpr nyvtarq NV.

[-]hollowing3y10

Thanks for the response. I did think of this objection, but wouldn't it be obvious if the AI were trying to engineer a different situation than the one requested? E.g., wouldn't such a strategy seem unrelated and unconventional?

It also seems like a hypothetical AI with just enough ability to generate a strategy for the desired situation would not be able to engineer a strategy for a different situation which would both work, and deceive the human actors. As in, it seems the latter would be harder and require an AI with greater ability.

1Jay Bailey3y

I think the most likely scenario of actually trying this with an AI in real life is that you end up with a strategy that is convincing to humans and ends up being ineffective or unhelpful in reality, rather than ending up with a galaxy-brained strategy that pretends to produce X but actually produces Y while simultaneously deceiving humans into thinking it produces X. I agree with you that "Come up with a strategy to produce X" is easier than "Come up with a strategy to produce Y AND convince the humans that it produces X", but I also think it is much easier to perform "Come up with a strategy that convinces the humans that it produces X" than to produce a strategy that actually works. So, I believe this strategy would be far more likely to be useless than dangerous, but I still don't think it would help.

1hollowing3y

I agree this would be much easier. However, I'm wondering why you think an AI would prefer it, if it has the capability to do either. I can see some possible reasons (e.g., an AI may not want problems of alignment to be solved). Do you think that would be an inevitable characteristic of an unaligned AI with enough capability to do this?

2Jay Bailey3y

I agree an AI would prefer to produce a working plan if it had the capacity. I think that an unaligned AI, almost by definition, does not want the same goal we do. If we ask for Plan X, it might choose to produce Plan X for us as asked if that plan was totally orthogonal to its goals (I.e, the plan's success or failure is irrelevant to the AI) but if it could do better by creating Plan Y instead, it would. So, the question is - how large is the capability difference between "AI can produce a working plan for Y, but can't fool us into thinking it's a plan for X" and "AI can produce a working plan for Y that looks to us like a plan for X"? The honest answer is "We don't know". Since failure could be catastrophic, this isn't something I'd like to leave to chance, even though I wouldn't go so far as to call the result inevitable.

1 comment, sorted by

top scoring

Click to highlight new comments since: Today at 11:14 PM