Rudi

Thanks a lot for the context!

Out of curiosity, why does the model training restriction make it much less useful for safety research?

6mo22

Thanks for putting this so succintly! To add another subjective data point, I had very similar thoughts immediately after I first saw this work (and the more conceptual follow-up by Chugtai et al) a few months ago.

About "one-hotting being a significant transformation": I have a somewhat opposite intuition here and would say that this is also quite natural.

Maybe at first glance one would find it more intuitive to represent the inputs as a subset of the real numbers (or floats, I guess) and think of modular addition as some garbled version of the usual addition. But the group structure on a finite cyclic group and the vector space structure on the real number line are not really compatible, so I'm not sure that this actually makes that much sense bearing in mind that the model has to work mostly with linear transformations.

On the other hand, if such a representation was useful, the model could *in principle* learn an embedding which takes all the one-hot embedded inputs to the same one-dimensional space but with different lengths.

In fact, one-hotting is in a precise sense the most general way of embedding a given set in a vector space because it does not impose any additional linear relations (in mathematical jargon it's the free vector space generated by the set, and is characterized by the universal property of turning arbitary maps on the generating set into unique linear maps on the vector space). In this sense I'd view using a one-hot embedding as the natural way of designing the architecture if I don't want create a bias towards a particular linear representation.

As a side remark, although it's in a sense completely trivial, the "one-hotting construction" is also used as an important ingredient in many areas of mathematics. One example would be homology theory in algebraic topology, where one turns geometric/combinatorial objects into a vector space in this way and then does linear algebra on that space rather than working with the objects directly. Another example, closer to the problem discussed here, is turning a group into the corresponding group algebra in representation theory.

Thanks, that makes sense! I did not fully realize that the phrase in the terms is really just "improve any other large language model", which is indeed so vague/general that it could be interpreted to include almost any activity that would entail using Llama-2 in conjunction with other models.