independent alignment researcher
I agree that this is important. Are you more concerned about cyborgs than other human-in-the-loop systems? To me the whole point is figuring out how to make systems where the human remains fully in control (unlike, e.g. delegating to agents), and so answering this "how to say whether a person retains control" question seems critical to doing that successfully.
Thank you for this gorgeously written comment. You really capture the heart of all this so perfectly, and I completely agree with your sentiments.
I think it's really important for everyone to always have a trusted confidant, and to go to them directly with this sort of thing first before doing anything. It is in fact a really tough question, and no one will be good at thinking about this on their own. Also, for situations that might breed a unilateralist's curse type of thing, strongly err on the side of NOT DOING ANYTHING.
An example I think about a lot is the naturalistic fallacy. There is a lot horrible suffering that happens in the natural world, and a lot of people seem to be way too comfortable with that. We don't have any really high leverage options right now to do anything about it, but it strikes me as plausible that even if we could do something about it, we wouldn't want to. (perhaps even even make it worse by populating other planets with life https://www.youtube.com/watch?v=HpcTJW4ur54)
I really loved the post! I wish more people took S-risks completely seriously before dismissing them, and you make some really great points.
In most of your examples, however, it seems the majority of the harm is in an inability to reason about the consequences of our actions, and if humans became smarter and better informed it seems like a lot of this would be ironed out.
I will say the hospice/euthanasia example really strikes a chord with me, but even there, isn't it more a product of cowardice than a failure of our values?
GI is very efficient, if you consider that you can reuse a lot machinery that you learn, rather than needing to relearn it over and over again. https://towardsdatascience.com/what-is-better-one-general-model-or-many-specialized-models-9500d9f8751d
Sometimes something can be infohazardous even if it's not completely true. Even though the northwest passage didn't really exist, it inspired many European expeditions to find it. There's a lot of hype about AI right now, and I think the idea for a cool new capabilities idea (even if it turns out not to work well) can also do harm by inspiring people try similar things.
I interpret the goal as being more about figuring out how to use simulators as powerful tools to assist humans in solving alignment, and not at all shying away from the hard problems of alignment. Despite our lack of understanding of simulators, people (such as yourself) have already found them to be really useful, and I don't think it is unreasonable to expect that as we become less confused about simulators that we learn to use them in really powerful and game-changing ways.
You gave "Google" as an example. I feel like having access to Google (or another search engine) improves my productivity by more than 100x. This seems like evidence that game-changing tools exist.
and increasing the number of actors can make collusive cooperation more difficult
An empirical counterargument to this is in the incentives human leaders face when overseeing people who might coordinate against them. When authoritarian leaders come into power they will actively purge members from their inner circles in order to keep them small. The larger the inner circle, the harder it becomes to prevent a rebellious individual from gathering the critical mass needed for a full blown coup.
Source: The Dictator's Handbook by Bruce Bueno de Mesquita and Alastair Smith
That's a good point. There are clearly examples of systems where more is better (e.g. blockchain). There are just also other examples where this opposite seems true.