Thanks!
Thanks for this!
I'm not sure that I understand the distinction between the vector and point approaches that you've discussed.
This is really a distinction within the math of my model itself, as described above. Both are kind of an attempt to capture how retraining works in a highly "reduced-form" way that abstracts from the details.
As for how to interpret each in terms of real training:
You might consider an RLHF-style setup. The train-in-a-direction might be something like telling your human evaluators to place a bit more weight on helpfulness (...
Just to be clear, I mean "types" in the game theory sense (i.e. a [privately-known] attribute of a player that determines its preferences) not the CS/logic sense. The type space doesn't necessarily capture a literal subspace within a neural network's weights; I think of it more as a space measuring some human-interpretable property of the AI.
As a mundane (and very imperfect) example, we might think of the type space as a 1 dimensional continnum of how much the AI values helpfulness vis-a-vis harmlessness. [is that 1 dimension or 2 non-orthogonal directions...
Looking forward to seeing those if/when you publish them!
FWIW, I listened to the following book about the war and it seemed quite good. I think it's the second book in a series about the war.
"The Endgame: The Inside Story of the Struggle for Iraq, from George W. Bush to Barack Obama" by Bernard E. Trainor and Michael R. Gordon.