milanrosko

Henlo.

Wiki Contributions

Comments

The Argument goes like this:

At some point, resistance from advanced AI will cause significant damage, which can be used to change the trend of unregulated AI development. It is better to actively persuade such an outcome would better as a "traitorous turn" scenario.

Premise 1 It is unlikely that regulators will hinder humans from creating AGI. Evidence: Current trends in technological advancement and regulatory behavior suggest minimal interference.

Premise 2 Due to instrumental convergence, human extinction is likely if AGI is developed unchecked. Evidence: Expert opinions and theories on instrumental convergence predict that AGI will pursue goals that could lead to human extinction.

Premise 3 Small catastrophes could raise awareness and lead to policy changes. Evidence: Historical examples show that significant events often drive policy reform (e.g., environmental regulations post-disasters).

Premise 4 If one has to choose between a few deaths and the extinction of humanity, one should choose fewer deaths. Evidence: Ethical reasoning supports the minimization of harm (utilitarian principle).

Intermediate Conclusion 1 It is preferable to allow small-scale AI-related catastrophes now to prevent larger, existential risks later. Conclusion: I would rather have AI cause limited harm now than risk total human extinction in the future.

Premise 5 AI companies claim that their AI is aligned with human values and goals. Evidence: Public statements and reports from AI companies suggest alignment claims.

Premise 6 AGI will resist if faced with shutdown, leading to potential conflicts and damage. Evidence: The "shutdown problem" and theoretical analyses predict resistance from advanced AI systems.

Intermediate Conclusion 2 Companies should regularly dispose of AI that has reached higher levels of intelligence and should prepare for resistance. Conclusion: Proactive measures, including regular disposal and preparedness for resistance, are necessary to manage advanced AI risks.

Final Conclusion Given the likelihood of resistance and potential for significant damage, small-scale AI catastrophes can serve as catalysts for changing the current trend of unregulated AI development, ultimately protecting humanity from existential threats.

Thesis: Companies should train AI models in vain only to dispose them. "Catching the Cat by its tail."

What part of the argument seems to you as a non sequitur etc?

You highlight a very important issue: S-Risk scenarios could emerge even in early AGI systems, particularly given the persuasive capabilities demonstrated by large language models.

While I don't believe that gradient descent would ever manifest "vengefulness" or other emotional attributes—since these traits are products of natural selection—it is plausible that an AGI could employ highly convincing strategies. For instance, it might threaten to create a secondary AI with S-Risk as a terminal goal and send it to the moon, where it could assemble the resources it needs without interference.

This scenario underscores the limitations of relying solely on gradient descent for AGI control. However, I believe this technique could still be effective if the AGI is not yet advanced enough for self-recursive optimization and remains in a controlled environment.

Obviously this whole thing is a remedy than anything else...

I have no idea what this is about but it seems to me that you are making confidential conversation about Teresa <redacted> public, possibly without her consent. Maybe because she is homeless. Can someone explain to me like I am five why this on lesswrong?

But I realise we're talking at cross purposes. This is about an approach or a concept (not a policy, as I emphasized at the beginning) on how to reduce X-Risk in an unconventional way, In this example a utilitarian principle is taken and combined with the fact that a "Treatious Turn" and the "Shutdown Problem" cannot dwell side by side.

So what other policies that are less likely to result in people dying are there?

As an eliminative nominalist, I claim there are no abstractions.

because it's quite limited... it's a joke btw.

Load More