Adrià Garriga-alonso

Maybe being a guslar is not so different from telling a joke 2294 lines long

That's a very good point! I think the level of ability required is different but it seems right.

The guslar's songs are (and were of course already in the 1930-1950s) also printed, so the analogy may be closer than you thought.

Is there a reason I should want to?

I don't know, I can't tell you that. If I had to choose I also strongly prefer literacy.

But I didn't know there was a tradeoff there! I thought literacy was basically unambiguously positive -- whereas now I think it is *net* highly positive.

Also I strongly agree with frontier64 that the skill that is lost is rough memorization + live composition, which is a little different.

It's definitely not exact memorization, but it's almost more impressive than that, it's rough memorization + composition to fit the format.

They memorize the story, with particular names; and then sing it with consitent decasyllabic metre and rhyme. Here's an example song transcribed with its recording: *Ropstvo Janković Stojana (The Captivity of Janković Stojan)*

the collection: https://mpc.chs.harvard.edu/lord-collection-1950-51/

If you're still interested in this, we have now added Appendix N to the paper, which explains our final take.

Sure, but then why not just train a probe? If we don't care about much precision what goes wrong with the probe approach?

Here's a reasonable example where naively training a probe fails. The model lies if any of N features is "true". One of the features is almost always activated at the same time as some others, such that in the training set it never solely determines whether the model lies.

Then, a probe trained on the activations may not pick up on that feature. Whereas if we can look at model weights, we can see that this feature also matters, and include it in our lying classifier.

This particular case can also be solved by adversarially attacking the probe though.

Thank you, that makes sense!

Indefinite integrals would make a lot more sense this way, IMO

Why so? I thought they already made sense, they're "antiderivatives", so a function such that taking its derivative gives you the original functions. Do you need anything further to define them?

(I know about the definite integral Riemann and Lebesgue definitions, but I thought indefinite integrals were much easier in comparison.

In such a case, I claim this is just sneaking in bayes rule without calling it by name, and this is not a very smart thing to do, because the bayesian frame gives you a bunch more leverage on analyzing the system

I disagree. An inductive bias is not necessarily a prior distribution. What's the prior?

That's very cool, maybe I should try to do that for important talks. Though I suppose almost always you have slide aid, so it may not be worth the time investment.