Towards building blocks of ontologies
This dialogue is part of the agent foundations fellowship with Alex Altair, funded by the LTFF. Thank you Dalcy, Alex Altair and Alfred Harwood for feedback and comments. Context: I (Daniel) am working on a project about ontology identification. I've found conversations to be a good way to discover inferential gaps when explaining ideas, so I'm experimenting with using dialogues as the main way of publishing progress during the fellowship. Daniel C We can frame ontology identification as a robust bottleneck for a wide variety of problems in agent foundations & AI alignment. I find this helpful because the upstream problems can often help us back out desiderata that we want to achieve, and allow us to pin down theories/solutions that we're looking for: * Suppose that you have a neural network with a bunch of layers and activations, and you're able to observe the value of the activations of a particular neuron. * On one hand, merely knowing the activations is completely insufficient for us to interpret the "meaning" of that activation: we don't know what the activation is pointing to in the real world, or what we can infer about the world upon observing an activation value. This is because we have no idea how that activation value is computed from the input layer or how it is used by variables down stream. This "relational information" - how it interacts with other neurons - is part of what defines the semantics of that neuron. Neurons would need to include this relational information for us to fully interpret their meaning. * On the other hand, we don't want to include all information about the network because we want to think of that activation value as a low-dimensional summary of what's going in the neural network. Many inputs can produce the same activation value at that neuron, and when we're just looking at a neuron we don't need to be able to distinguish between inputs that produce the same activation values. * When it comes to interpretabili