Oliver Sourbut

Call me Oliver or Oly - I don't mind which.

I'm particularly interested in sustainable collaboration and the long-term future of value. I'd love to contribute to a safer and more prosperous future with AI! Always interested in discussions about axiology, x-risks, s-risks.

I'm currently (2023) embarking on a PhD in AI in Oxford (Hertford College), and also spend time in (or in easy reach of) London. Until recently I was working as a senior data scientist and software engineer, and doing occasional AI alignment research with SERI.

I enjoy meeting new perspectives and growing my understanding of the world and the people in it. I also love to read - let me know your suggestions! In no particular order, here are some I've enjoyed recently

- Ord - The Precipice
- Pearl - The Book of Why
- Bostrom - Superintelligence
- McCall Smith - The No. 1 Ladies' Detective Agency (and series)
- Melville - Moby-Dick
- Abelson & Sussman - Structure and Interpretation of Computer Programs
- Stross - Accelerando
- Graeme - The Rosie Project (and trilogy)

Cooperative gaming is a relatively recent but fruitful interest for me. Here are some of my favourites

- Hanabi (can't recommend enough; try it out!)
- Pandemic (ironic at time of writing...)
- Dungeons and Dragons (I DM a bit and it keeps me on my creative toes)
- Overcooked (my partner and I enjoy the foody themes and frantic realtime coordination playing this)

People who've got to know me only recently are sometimes surprised to learn that I'm a pretty handy trumpeter and hornist.

Unironically, I think it's worth anyone interested skimming that Verma & Pearl paper for the pictures :) especially fig 2

Mmm, I misinterpreted at first. **It's only a v-structure if and are not connected**. So this is a property which needs to be maintained effectively 'at the boundary' of the fully-connected cluster which we're rewriting. I think that tallies with everything else, right?

ETA: both of our good proofs respect this rule; the first Reorder in my bad proof indeed violates it. I think this criterion is basically the generalised and corrected version of the fully-connected bookkeeping rule described in this post. I imagine if I/someone worked through it, this would clarify whether my handwave proof of Frankenstein Stitch is right or not.

That's concerning. It would appear to make *both* our proofs invalid.

But I think your earlier statement about incoming vs outgoing arrows makes sense. Maybe Verma & Pearl were asking for some other kind of equivalence? Grr, back to the semantics I suppose.

[This comment is no longer endorsed by its author]

Aha. Preserving v-structures (colliders like ) is necessary and sufficient for *equivalence*^{[1]}. So when rearranging fully-connected subgraphs, certainly we can't do it (cost-free) if it introduces or removes any v-structures.

Plausibly if we're willing to weaken by adding in additional arrows, there might be other sound ways to reorder fully-connected subgraphs - but they'd be non-invertible. Haven't thought about that.

Verma & Pearl, Equivalence and Synthesis of Causal Models 1990 ↩︎

Mhm, OK I think I see. But appear to me to make a complete subgraph, and all I did was redirect the . I confess I am mildly confused by the 'reorder complete subgraph' bookkeeping rule. It should apply to the in , right? But then I'd be able to deduce which is strictly different. So it must mean something other than what I'm taking it to mean.

Maybe need to go back and stare at the semantics for a bit. (But this syntactic view with motifs and transformations is much nicer!)

Perhaps more importantly, I think with Node Introduction we really don't need after all?

With Node Introduction and some bookkeeping, we can get the and graphs topologically compatible, and Frankenstein them. We can't get as neat a merge as if we also had - in particular, we can't get rid of the arrow . But that's fine, we were about to draw that arrow in anyway for the next step!

Is something invalid here? Flagging confusion. This is a slightly more substantial claim than the original proof makes, since it assumes strictly less. Downstream, I think it makes the Resample unnecessary.

ETA: it's cleared up below - there's an invalid Reorder here (it removes a v-structure).

I had another look at this with a fresh brain and it was clearer what was happening.

TL;DR: It was both of 'I'm missing something', and a little bit 'Frankenstein is invalid' (it needs an extra condition which is sort of implicit in the post). As I guessed, with a little extra bookkeeping, we don't need Stitching for the end-to-end proof. I'm also fairly confident Frankenstein subsumes Stitching in the general case. A 'deductive system' lens makes this all clearer (for me).

The key invalid move I was making when I said

But this same move can alternatively be done with the Frankenstein rule, right?

is that Frankenstein requires *all graphs to be over the same set of variables*. This is kind of implicit in the post, but I don't see it spelled out. I was applying it to an graph ( absent) and an graph ( absent). No can do!

I was right though, Frankenstein can be applied. But we first have to do 'Node Introduction' and 'Expansion' on the graphs to make them compatible (these extra bookkeeping rules are detailed further below.)

So, to get myself in a position to apply Frankenstein on those graphs, I have to first (1) introduce to the second graph (with an arrow *from* each of , , and ), and (2) expand the 'blanket' graph (choosing to maintain topological consistency). Then (3) we Frankenstein them, which leaves dangling, as we wanted.

Next, (4) I have to introduce to the first graph (again with an arrow *from* each of , , and ). I also have a topological ordering issue with the first Frankenstein, so (5) I reorder to the top by bookkeeping. Now (6) I can Frankenstein *those*, to sever the as hoped.

But now we've performed exactly the combo that Stitch was performing in the original proof. The rest of the proof proceeds as before (and we don't need Stitch).

These are both useful for 'expansive' stuff which is growing the set of variables from some smaller seed. The original post mentions 'arrow introduction' but nothing explicitly about nodes. I got these by thinking about these charts as a kind of 'deductive system'.

A graph without all variables is making a claim about the distribution with those other variables marginalised out.

We can always introduce new variables - but we can't (by default) assume anything about their independences. It's sound (safe) to assume they're dependent on everything else - i.e. they receive an incoming arrow from everywhere. If we know more than that (regarding dependencies), it's expressed as absence of one or another arrow.

e.g. a graph with is making a claim about . If there's also a , we haven't learned anything about its independences. But we can introduce it, as long as it has arrows , , and .

A graph with combined nodes is making a claim about the distribution as expressed with those variables appearing jointly. There's nothing expressed about their internal relationship.

We can always expand them out - but we can't (by default) assume anything about their independences. It's sound to expand and spell them out in any fully-connected sub-DAG - i.e. they have to be internally fully dependent. We also have to connect *every* incoming and outgoing edge to *every* expanded node i.e. if there's a dependency between the combination and something else, there's a dependency between each expanded node and that same thing.

e.g. a graph with is making a claim about . If is actually several variables, we can expand them out, as long as we respect all possible interactions that the original graph might have expressed.

I think what we have on our hands is a 'deductive system' or maybe grandiosely a 'logic'. The semantic is actual distributions and divergences. The syntax is graphs (with divergence annotation).

An atomic proposition is a graph together with a divergence annotation , which we can write .

Semantically, that's when the 'true distribution satisfies up to KL divergence' as you described^{[1]}. Crucially, some variables might not be in the graph. In that case, the distributions in the relevant divergence expression are *marginalised* over the missing variables. This means that the semantic is always under-determined, because we can always introduce new variables (which are allowed to depend on other variables however they like, being unconstrained by the graph).

Then we're interested in sound deductive rules like

Syntactically that is 'when we have deduced we can deduce '. That's sound if, for *any* distribution satisfying we also have satisfying .

More generally, I'm reasonably sure Stitch is secretly just multiple applications of Frankenstein, as in the example above. The tricky bit I haven't strictly worked through is when there's interleaving of variables on either side of the blanket in the overall topological ordering.

A rough HANDWAVE proof sketch, similar in structure to the example above:

- Expand the blanket graph
- The arrows internal to , , and need to be complete
- We can always choose a complete graph consistent with the , , and parts of the original graphs (otherwise there wouldn't be an overall consistent topology)
- Notice that the connections are
*all to all*, which is not necessarily consistent with the original graph- and similarly for the arrows
- (there could be arrows in the original)

- Introduce to the graph (and vice versa)
- The newly-introduced nodes are necessarily 'at the bottom' (with arrows from everything else)
- We can always choose internal connections for the introduced s consistent with the original graph

- Notice that the connections and in the augmented graph all keep at the bottom, which is not necessarily consistent with the original graph (and vice versa)
- But this
*is*consistent with the Expanded blanket graph

- But this
- We 'zip from the bottom' with successive bookkeeping and Frankensteins
- THIS IS WHERE THE HANDWAVE HAPPENS
- Just like in the example above, where we got the sorted out and then moved the introduced to the 'top' in preparation to Frankenstein in the graph, I think there should always be enough connections between the introduced nodes to 'move them up' as needed for the stitch to proceed

I'm not likely to bother proving this strictly, since Stitch is independently valid (though it'd be nice to have a more parsimonious arsenal of 'basic moves'). I'm sharing this mainly because I think Expansion and Node Introduction are of independent relevance.

More formally, over variables is satisfied by distribution when . (This assumes some assignment of variables in to variables in .) ↩︎

I'd probably be more specific and say 'gradient hacking' or 'update hacking' for deception of a training process which updates NN internals.

I see what you're saying with a deployment scenario being often implicitly a selection scenario (should we run the thing more/less or turn it off?) in practice. So deceptive alignment at deploy-time could be a means of training (selection) hacking.

More centrally, 'training hacking' might refer to a situation with *denser* oversight and explicit updating/gating.

Deceptive alignment during this period is just *one way* of training hacking (could alternatively hack exploration, cyber crack and literally hack oversight/updating, ...). I didn't make that clear in my original comment and now I think there's arguably a missing term for 'deceptive alignment for training hacking' but maybe that's fine.

I mean the deliberation happens in a neural network. Maybe you thought I meant 'net' as in 'after taking into account all contributions'? I should say 'NN-internal' instead, probably.

Incidentally I noticed Yudkowsky uses 'brainware' in a few places (e.g. in conversation with Paul Christiano). But it looks like that's referring to something more analogous to 'architecture and learning algorithms', which I'd put more in the 'software' camp when in comes to the taxonomy I'm pointing at (the 'outer designer' is writing it deliberately).