I'm a little sad that much of safety research has fully pivoted to post-hoc explanations of frontier Shoggoths. I think there's probably low hanging fruit to grow an easier to understand Shoggoth, even if it's not with a simplex :).
I agree. I'm pretty new to the field and was surprised to see few recent attempts to build interpretable models from the ground up.
Natural, Axis-Aligned Bases. The bases where a single element is 1 and the rest are 0 explicitly define our "corners" and correspond directly to "interpretable" points of our set. These are points where all other dimensions are "off", and the only forward contribution comes from a single dimension. This also
I agree. I'm pretty new to the field and was surprised to see few recent attempts to build interpretable models from the ground up.
... (read more)