Gurkenglas

I operate by Crocker's rules.

I try to not make people regret telling me things. So in particular:
- I expect to be safe to ask if your post would give AI labs dangerous ideas.
- If you worry I'll produce such posts, I'll try to keep your worry from making them more likely even if I disagree. Not thinking there will be easier if you don't spell it out in the initial contact.

Comments

Sorted by
Answer by Gurkenglas185

What is going to be done with these numbers? If Sleeping Beauty is to gamble her money, she should accept the same betting odds as a thirder. If she has to decide which coinflip result kills her, she should be ambivalent like a halfer.

This does go in the direction of refuting it, but they'd still need to argue that linear probes improve with scale faster than they do for other queries; a larger model means there are more possible linear probes to pick the best from.

I had that vibe from the abstract, but I can try to guess at a specific hypothesis that also explains their data: Instead of a model developing preferences as it grows up, it models an Assistant character's preferences from the start, but their elicitation techniques work better on larger models; for small models they produce lots of noise.

Ah, oops. I think I got confused by the absence of L_2 syntax in your formula for FVU_B. (I agree that FVU_A is more principled ^^.)

https://github.com/jbloomAus/SAELens/blob/main/sae_lens/evals.py#L511 sums the numerator and denominator separately, if they aren't doing that in some other place probably just file a bug report?

Thanks, edited. If we keep this going we'll have more authors than users x)

Account settings let you set mentions to notify you by email :)

The action space is too large for this to be infeasible, but at a 101 level, if the Sun spun fast enough it would come apart, and angular momentum is conserved so it's easy to add gradually.

Load More