neverix

Posts

Sorted by New

Wiki Contributions

Comments

There are also somewhat principled reasons for using a "fuzzy ellipsoid", which I won't explain here.

If you view  as 2x learning rate, the ellipsoid contains parameters which will jump straight into the basin under the quadratic approximation, and we assume for points outside the basin the approximation breaks entirely. If you account for gradient noise in the form of a Gaussian with sigma equal to gradient, the PDF of the resulting point at the basin is equal to the probability a Gaussian parametrized by the ellipsoid at the preceding point. This is wrong, but there is an interpretation of the noise as a Gaussian with variance increasing away from the basin origin.

Seems like quoting doesn't work for LaTeX, it was definitions 2/3. Reading again I saw D2 was indeed applicable to sets.

A0>A1

How is orbit comparison for sets defined?

[This comment is no longer endorsed by its author]Reply

This is the whole point of goal misgeneralization. They have experiments (albeit on toy environments that can be explained by the network finding the wrong algorithm), so I'd say quite plausible.

Is RLHF updating abstract circuits an established fact? Why would it suffer from mode collapse in that case?

It is based on this. I changed it to optimize using softmax instead of straight-through estimation and added regularization for the embedded tokens.

Notebook link - this is a version that mimics this post instead of optimizing a single neuron as in the original.

EDIT: github link

I did some similar experiments two months ago, and with to your setup the special tokens show up on the first attempt:

Load More