Introduction
In this essay, I name and describe a mechanism that might break alignment in any system that features an originally aligned AI improving its capabilities through self-modification. I doubt that this idea is new, but I haven't yet seen it named and described in these terms. This post discusses this mechanism within the context of agents that iteratively improve themselves to surpass human intelligence, but the general idea should be applicable to most self-modification schemes.
This idea directly draws inspiration from Scott Alexander’s Schelling Fences on Slippery Slopes.
Schelling Shifts
A Schelling shift occurs when an agent fails to properly anticipate that modifying a parameter will increase its next version’s willingness to further modify the... (read 1553 more words →)
A new study was published today with results that contradict those found in the UCSF study that you've written about.
Human Hippocampal Neurogenesis Persists Throughout Aging
The Sorrells et al study is directly mentioned twice in this paper. The first claim is that the study failed to address medicaton and drug use, which impact adult hippocampal neurogenesis.
... (read 466 more words →)