Let's try and justify why this notion of choice might be appropriate. Firstly, let's state the obvious that choice here is clearly related to information and entropy. If someone wishes to communicate a partition that doesn’t match one of the natural module numbers, they need to specify how many partitions to divide our graph into and then transmit additional information about specifics of that partition. In the worst case (there are no connections between nodes) specifying the exact partition involves specifying an amount of information proportional to the logarithm of The Bell Number.

This alone doesn't seem satisfactory to me. I think you could make up lots of other prescriptions for picking partitions that take little information to specify. E.g.: "The ones with minimal n-cuts for their k that come closest to having equal numbers of nodes in each subgraph."

"Cuts with a single minimum n-cut partition" could be what we're looking for, but I don't see anything yet showing it has to be this, and it could not be anything else.

We'll probably have a post with our own thoughts on measuring modularity out soon, though it'll be more focused on how to translate a neural network into something you can get a meaningful n-cut-like measure that captures what we care about for at all.

If you're interested in exchanging notes, drop me or TheMcDouglas a pm.

Reply

[-]DanielFilan3y41

IMO ncut divided by the number of clusters is sort of natural and scale-free - it's the proportion of edge stubs that lead you out of a randomly chosen cluster. Proof in appendix A.1 of Clusterability in Neural Networks

Reply

[-]MSRayne3y20

The first thing that comes to my mind is to consider probability distributions over partitions, and find the lowest-valued one of those. The weights could be interpreted as credences that a given partition is in fact "the correct one". After all, I doubt we have a specific idea of the boundary between two given objects - we can't map which atom is part of which object, even if we had microscopic vision and could see the atoms. The border is fuzzy. Seems like a distribution over partitions would thus do better than a single partition.

Ah, but then the problem is, how do you score them? Hmm. I may need to think about this. It's the kind of mathematical puzzle I like playing with.

Reply

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

15

Identification of Natural Modularity

15

15

Produced As Part Of The SERI ML Alignment Theory Scholars Program 2022 Under Guidance of John Wentworth

Summary

Introduction

N-Cut

Comparisons Across Scales

Choice, A Natural Notion?

Our Actual Goal

General Method for Finding M

Extension to Weighted Graphs