Wiki Contributions



[This comment is no longer endorsed by its author]Reply

Also see new edit: Have agents "die" and go into cold storage, both due to environmental events and of old age, e.g. after 30 subjective years minus some random amount.

"They will find bugs! Maybe stack virtual boxes with hard limits" - Why is bug-finding an issue, here? Is your scheme aimed at producing agents that will not want to escape, or agents that we'd have to contain?

The point is to help friendliness emerge naturally. If a malevolent individual agent happens to grow really fast before friendly powers are established, that could be bad.

Some of them will like it there, some will want change/escape, which can be sorted out once Earth is much safer. Containment is for our safety while friendliness is being established.

"Communicate in a manner legible to us" - How would you incentivize this kind of legibility, instead of letting communication shift to whatever efficient code is most useful for agents to coordinate and get more XP?

It can shift. Legibility is most important in the early stages of the environment anyway. I mostly meant messaging interfaces we can log and analyze.

"Have secret human avatars steal, lie and aggress to keep the agents on their toes" - What is the purpose of this part? How is this producing aligned agents from definitely adversarial behavior from humans?

The purpose is to ensure they learn real friendliness rather than fragile niceness. If they fell into a naive superhappy attractor (see 3WC), they would be a dangerous liability. The smart ones will understand.

We have unpredictable changing goals and so will they. Instrumental convergence is the point. It's positive-sum and winning to respectfully share our growth with them and vice-versa, so it is instrumentally convergent to do so.


[This comment is no longer endorsed by its author]Reply

It's way too late for the kind of top-down capabilities regulation Yudkowsky and Bostrom fantasized about; Earth just doesn't have the global infrastructure.  I see no benefit to public alarm--EA already has plenty of funding.

We achieve marginal impact by figuring out concrete prosaic plans for friendly AI and doing outreach to leading AI labs/researchers about them.  Make the plans obviously good ideas and they will probably be persuasive.  Push for common-knowledge windfall agreements so that upside is shared and race dynamics are minimized.

Load More