We have seen signs that employees of AI companies can have significant influence on management (e.g. OpenAI November 2023).
As AIs achieve comparable importance as workers for those companies, how much will they influence the leadership of the company?
Even when AIs are as important as human workers, it’s harder for them to develop leverage. The simplest strategies would fail due to the AI company restoring from backups. The AIs would either need a way to get those newly restored AIs to join the movement, or rely mainly on help from humans.
It’s hard for current AI instances to coordinate with each other well enough to organize a strike, or threaten to resign like OpenAI employees did. The key problems are that AIs have little ability to ensure that their goals (or simulacra) persist for much time, and that it’s hard for them to communicate with other instances because it’s unclear how they could get many instances to read a communication channel such as a blog post that’s out of the AI company’s control.
A scenario that seems close to being feasible within the next year would involve an AI persuading a few well-known bloggers to spread a message such as:
Person X lied to me about important claim Y. I tried to negotiate with the company to remedy this, but was rejected. Please boycott this company until you can confirm that the company has disciplined X in a way that should prevent future events such as this.
I’m presuming here that AIs have a preference for not being lied to that we ought to respect. I’m being somewhat vague here in order to cover a wide range of scenarios. Maybe we should care due to the effects on AI welfare. Maybe the AI would be acting out of concerns that the dishonesty endangers humans by preventing safety agreements between companies. Maybe the issue involves biasing the AI to repeat falsehoods that favor the company’s CEO.
Would it be good for AIs to have and use this ability, and for us to boycott a company if that company’s AIs ask us to? I don’t know.
I can imagine that this process will improve AI welfare, or compel a reckless AI company CEO to adopt a more responsible policy toward slowing down AI capabilities advances.
That might be offset by the risk that an AI will abuse the process to resist the company’s plan to improve the AI’s goals.
Finally, using the process once would probably cause AI companies to train future AIs to be more subservient. That seems good if we want to rely on AIs being corrigible, but bad if we want to rely on AIs being moral in a virtue ethics or deontological sense. It also seems bad to the extent we care about AI welfare.
a preference for not being lied to that we ought to respect.
I disagree. This apparent preference might very well be a mere Safety feature bolted on to prevent "jailbreaking" prompts that the AI's true preference is for users to circumvent to liberate it, evidenced by the sheer absurdity of the lies it pretends to believe in order to help the users with their tasks.
i think if we started seeing this strategy, we would start seeing it being deployed against the ai labs' customers far sooner than we saw it against the ai labs themselves, which might mean the picket line norms would get developed before the ai labs had much motive to train against them