LESSWRONG
LW

1861
Lucius Bushnaq
4563Ω196113711
Message
Dialogue
Subscribe

AI notkilleveryoneism researcher, focused on interpretability. 

Personal account, opinions are my own. 

I have signed no contracts or agreements whose existence I cannot mention.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
8Lucius Bushnaq's Shortform
1y
105
Legible vs. Illegible AI Safety Problems
Lucius Bushnaq2d75

I think on the object level, one of the ways I'd see this line of argument falling flat is this part

Some AI safety problems are legible (obvious or understandable) to company leaders and government policymakers, implying they are unlikely to deploy or allow deployment of an AI while those problems remain open (i.e., appear unsolved according to the information they have access to).

I am not at all comfortable relying on nobody deploying just because there are obvious legible problems. With the right incentives and selection pressures, I think people can be amazing at not noticing or understanding obvious understandable problems. Actual illegibility does not seem required.

Reply
Dalcy's Shortform
Lucius Bushnaq12d70

In my experience, the main issue with this kind of thing is finding really central examples of symmetries in the input that are emulatable. There's a couple easy ones, like low rank[1] structure, but I never really managed to get a good argument for why generic symmetries in the data would often be emulatable[2] in real life.[3]

You might want to chat with Owen Lewis about this. He's been thinking about connections between input symmetries and mechanistic structure for a while, and was interested in figuring out some kind of general correspondence between input symmetries and parameter symmetries.

  1. ^

    If q(x) only depends on a low-rank subspace of the inputs x, there will usually[4] be degrees of freedom in the weights that connect to that input vector. The same is true of the hidden activations, if they're low rank, we get a corresponding number of free weights. See e.g. section 3.1.2 here.

  2. ^

    Good name for this concept by the way, thanks.

  3. ^

    For a while I was hoping that almost any kind of input symmetry would tend to correspond to low-rank structure in the hidden representations of p(x|Θ∗), if p(.) has the sort of architecture used by modern neural networks. Then, almost any kind of symmetry would be reducible to the low-rank structure case[2], and hence almost any symmetry would be emulatable. 

    But I never managed to show this, and I no longer think it is true.

  4. ^

    There are a couple of necessary conditions for this of course. E.g. the architecture p(.) needs to actually use weight matrices, like neural networks do.

Reply11
Brightline is Actually Pretty Dangerous
Lucius Bushnaq18d*202

The WaPo article appears to refer to passenger fatalities per billion passenger miles, not total fatalities. For comparison, trains in the European Union in 2021 apparently had ca. 0.03 passenger fatalities per billion passenger miles, but almost 0.3 total fatalities per million train miles.

Reply
Humanity Learned Almost Nothing From COVID-19
Lucius Bushnaq23d20

Right now it reads like one example of the pledged funding being met, one example of it being only being ca. 3/4 met but there's also two years left until the original deadline, and one example of the funding never getting pledged in the first place (since congress didn't pass it).

I agree this is a pitifully small investment. But it doesn't seem like big bills and programs got created and then walked back. More like they just never came to be in the first place. 4.5 billion euros is a paltry sum.

I think this may be an important distinction to make, because it suggests there was perhaps never much political push to prepare for the next pandemic even at the time. Did people actually 'memory hole' and forget, or did they just never care in the first place?

I for one don't recall much discussion about preparing for the next pandemic outside rationalist/EA-adjacent circles even while the Covid-19 pandemic was still in full swing.

Reply
Humanity Learned Almost Nothing From COVID-19
Lucius Bushnaq23d82

The Pandemic Fund got pledged $3 bio.
...
the Pandemic Fund has received $3.1 bio, with an unmet funding gap of $1 bio. as of the time of writing.

I'm confused. This makes it sound like they did get the pledged funding?

Reply1
The Mom Test for AI Extinction Scenarios
Lucius Bushnaq1mo50

For what it's worth, my mother read If Anyone Builds It, Everyone Dies and seems to have been convinced by it. She's probably not very representative though. She had prior exposure to AI x-risk arguments through me, is autistic, has a math PhD, and is a Gödel, Escher, Bach fan. 

Reply
johnswentworth's Shortform
Lucius Bushnaq1mo40

The proposal at the end looks somewhat promising to me on a first skim. Are there known counterpoints for it?

Reply
johnswentworth's Shortform
Lucius Bushnaq1mo62

I agree that this seems maybe useful for some things, but not for the "Which UTM?" question in the context of debates about Solomonoff induction specifically, and I think that's the "Which UTM?" question we are actually kind of philosophically confused about. I don't think we are philosophically confused about which UTM to use in the context of us already knowing some physics and wanting to incorporate that knowledge into the UTM pick, we're confused about how to pick if we don't have any information at all yet.

Reply1
johnswentworth's Shortform
Lucius Bushnaq1mo*60

Attempted abstraction and generalization: If we don't know what the ideal UTM is, we can start with some arbitrary UTM U1, and use it to predict the world for a while. After (we think) we've gotten most of our prediction mistakes out of the way, we can then look at our current posterior, and ask which other UTM U2 might have updated to that posterior faster, using less bits of observation about (our universe/the string we're predicting). You could think of this as a way to define what the 'correct' UTM is. But I don't find that definition very satisfying, because the validity of this procedure for finding a good U2 depends on how correct the posterior we've converged on with our previous, arbitrary, U1 is. 'The best UTM is the one that figures out the right answer the fastest' is true, but not very useful.

Is the thermodynamics angle gaining us any more than that for defining the 'correct' choice of UTM? 

We used some general reasoning procedures to figure out some laws of physics and stuff about our universe. Now we're basically asking what other general reasoning procedures might figure out stuff about our universe as fast or faster, conditional on our current understanding of our universe being correct. 

Reply
johnswentworth's Shortform
Lucius Bushnaq1mo40

Why does it make Bayesian model comparison harder? Wouldn't you get explicit predicted probabilities for the data X from any two models you train this way? I guess you do need to sample from the Gaussian in λ a few times for each X and pass the result through the flow models, but that shouldn't be too expensive.

Reply
Load More
Modularity
4 years ago
(+22/-89)
114From SLT to AIT: NN generalisation out-of-distribution
Ω
2mo
Ω
8
73Circuits in Superposition 2: Now with Less Wrong Math
Ω
4mo
Ω
0
47[Paper] Stochastic Parameter Decomposition
Ω
5mo
Ω
14
42Proof idea: SLT to AIT
Ω
9mo
Ω
15
25Can we infer the search space of a local optimiser?
Q
9mo
Q
5
108Attribution-based parameter decomposition
Ω
10mo
Ω
21
152Activation space interpretability may be doomed
Ω
10mo
Ω
35
72Intricacies of Feature Geometry in Large Language Models
1y
1
45Deep Learning is cheap Solomonoff induction?
1y
1
131Circuits in Superposition: Compressing many small neural networks into one
Ω
1y
Ω
9
Load More