Feedback welcomed: www.admonymous.co/zeshen
This seems to be another one of those instances where I wish there was a dual-voting system to posts. I would've liked to strong disagree with the contents of the post without discouraging well-intentioned people from posting here.
I feel like a substantial amount of disagreement between alignment researchers are not object-level but semantic disagreements, and I remember seeing instances where person X writes a post about how he/she disagrees with a point that person Y made, with person Y responding about how that wasn't even the point at all. In many cases, it appears that simply saying what you don't mean could have solved a lot of the unnecessary misunderstandings.
I'm curious if there are specific parts to the usual arguments that you find logically inconsistent.
I Googled up 'how are tokens embedded' and this post came up third in the results - thanks for the post!
If this interests you, there is a proposal in the Guideline for Designing Trustworthy Artificial Intelligence by Fraunhofer IAIS which includes the following:
[AC-R-TD-ME-06] Shutdown scenarios
Requirement: Do
Scenarios should be identified, analyzed and evaluated in which the live AI application must be completely or partially shut down in order to maintain the ability of users and affected persons to perceive situations and take action. This includes shutdowns due to potential bodily injury or damage to property and also due to the violation of personal rights or the autonomy of users and affected persons. Thus, depending on the application context, this point involves analyzing scenarios that go beyond the accidents/safety incidents discussed in the Dimension: Safety and Security (S). For example, if it is possible that the AI application causes discrimination that cannot be resolved immediately, this scenario should be considered here. When evaluating the scenarios, the consequences of the shutdown for the humans involved, work processes, organization and company, as well as additional time and costs, should also be documented. This is compared with the potential damage that could arise if the AI application were not shut down. Documentation should be available on the AI application shutdown strategies that were developed based on the identified scenarios – both short-term, mid-term and permanent shutdown. Similarly, scenarios for shutting down subfunctions of the AI application should also be documented. Reference can be made to shutdown scenarios that may have already been covered in the Risk area: functional safety (FS) (see [S-RFS-ME-10]). A shutdown scenario documents
– the setting and the resulting decision-making rationale for the shutdown,
– the priority of the shutdown,
– by which persons or roles the shutdown is implemented and how it is done,
– how the resulting outage can be compensated,
– the expected impact for individuals or for the affected organization.[AC-R-TD-ME-07] Technical provision of shutdown options
Requirement: Do
Documentation should be available on the technical options for shutting down specific subfunctions of the AI application as well as the entire AI application. Here, reference can be made to [S-R-FS-ME-10] or [S-RFS-ME-12] if necessary. It is outlined that other system components or business processes that use (sub)functionality that can be shutdown have been checked and (technical) measures that compensate for negative effects of shutdowns are prepared. If already covered there, reference can be made to [S-R-FS-ME-10].
Everyone in any position of power (which includes engineers who are doing a lot of intellectual heavy-lifting, who could take insights with them to another company), thinks of it as one of their primary jobs to be ready to stop
In some industries, Stop Work Authorities are implemented, where any employee at any level in the organisation has the power to stop a work deemed unsafe at any time. I wonder if something similar in spirit would be feasible to be implemented in top AI labs.
Without thinking about it too much, this fits my intuitive sense. An amoeba can't possibly demonstrate a high level of incoherence because it simply can't do a lot of things, and whatever it does would have to be very much in line with its goal (?) of survival and reproduction.
Thanks for this post. I've always had the impression that everyone around LW have been familiar with these concepts since they were kids and now know them by heart, while I've been struggling with some of these concepts for the longest time. It's comforting to me that there are long time LWers who don't necessarily fully understand all of these stuff either.
Browsing through the comments section it seems that everyone relates to this pretty well. I do, too. But I'm wondering if this applies mostly to a LW subculture, or is it a Barnum/Forer effect where every neurotypical person would also relate to?
Why is this question getting downvoted?