I started to think through the theories of change recently (to figure out a better career plan) and I have some questions. I hope somebody can direct me to relevant posts or discuss this with me.
The scenario I have in mind is: AI alignment is figured out. We can create an AI that will pursue the goals we give it and can still leave humanity in control. This is all optional, of course: you can still create an unaligned, evil AI. What's stopping anybody from creating AI that will try to, for instance, fight wars? I mean that even if we have the technology to align AI, we are still not out of the forest.
What would solve the problem here would be to create a benevolent, omnipresent AGI, that will prevent things like this.
Did EA scale too quickly?
A friend recommended me to read a note from Andy's working notes, which argues that scaling systems too quickly led to rigid systems. Reading this note vaguely reminded me of EA.
Once you have lots of users with lots of use cases, it’s more difficult to change anything or to pursue radical experiments. You’ve got to make sure you don’t break things for people or else carefully communicate and manage change.
Those same varied users simply consume a great deal of time day-to-day: a fault which occurs for 1% of people will present no real problem in a small prototype, but it’ll be high-priority when you have 100k users.
First, it is debatable if EA experienced quick scale up in the last few years. In some ways, it feels to me like it did, and EA founds had a spike of founding in 2022.
But it feels to me like EA community didn't have things figured out properly. Like SBF crisis could be averted easily by following common business practices or the latest drama with nonlinear. The community norms were off and were hard to change?
I've just read "Against the singularity hypothesis" by David Thorstad and there are some things there that seems obviously wrong to me - but I'm not totally sure about it and I want to share it here, hoping that somebody else read it as well. In the paper, Thorstad tries to refute the singularity hypothesis. In the last few chapters, Thorstad discuses the argument for x-risks from AI that's based on three premises: singularity hypothesis, Orthogonality Thesis and Instrumental Convergence and says that since singularity hypothesis is false (or lacks proper evidence) we shouldn't worry that much about this specific scenario. Well, it seems to me like we should still worry and we don't need to have recursively self-improving agents to have agents smart enough so that instrumental convergence and orthogonality hypothesis applies to them.
Thanks for your engagement!
The paper does not say that if the singularity hypothesis is false, we should not worry about reformulations of the Bostrom-Yudkowksy argument which rely only on orthogonality and instrumental convergence. Those are separate arguments and would require separate treatment.
The paper lists three ways in which the falsity of the singularity hypothesis would make those arguments more difficult to construct (Section 6.2). It is possible to accept that losing the singularity hypothesis would make the Bostrom-Yudkowsky argument more difficult to push without taking a stance on whether this more difficult effort can be done.
The power-seeking, agentic, deceptive AI is only possible if there is a smooth transition from non-agentic AI (what we have right now) to agentic AI. Otherwise, there will be a sign that AI is agentic, and it will be observed for those capabilities. If an AI is mimicking human thinking process, which it might initially do, it will also mimic our biases and things like having pent-up feelings, which might cause it to slip and loose its temper. Therefore, it's not likely that power-seeking agentic AI is a real threat (initially).