LESSWRONG
LW

Dan MacKinlay
17230
Message
Dialogue
Subscribe

Professionally, AI, science, AI4Science, Safety4AI. Also human ecology and indonesian death metal remixes.
See danmackinlay.name for more words about background and my now page for bonus stuff.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Selective regularization for alignment-focused representation engineering
Dan MacKinlay4mo30

Interesting! Ingenious choice of "color learning" to solve the problem of plotting the learned representations elegantly. 
This puts me in mind of the "disentangled representation learning" literature (review e.g. here). I've thought about disentangled learning mostly in terms of the Variational Auto-Encoder and GANs, but I think there is work there that applies to any architecture with a bottleneck, so your bottleneck MLP might find some interesting extensions there,
I wonder: what is the generalisation of your regularisation approach to architectures without a bottleneck? I think you gesture at it when musing on how to generalise to transformers. If the latent/regularised content space needs to "share" with lots of concepts, how do we get "nice mappings" there?

Reply
Will Jesus Christ return in an election year?
Dan MacKinlay4mo10

I'm enjoying envisaging this as an alternative explanation for the classic Lizardman's Constant, which is a smidge larger than 3% but then, in cheap talk markets you have less on the line, so…

Reply
Sheikh Abdur Raheem Ali's Shortform
Dan MacKinlay4mo10

Ideally you would wish to calibrate your EV calcs against the benefit of a UAE AISI, though, no, not the expected budget? We could estimate the value of such an institute being more than the running cost (or, indeed, less) depending on the relative leverage of such an institute.

Reply
10The deep history of intelligence
1mo
0
7“Opponent shaping” as a model for manipulation and cooperation
1mo
0