"Larger models tend to perform better at most tasks, and there is no reason to expect naive alignment-related tasks to be an exception."
At the start of December, Anthropic published their first paper, A General Language Assistant as a Laboratory for Alignment. The paper focuses on quantifying how aligned language models are, as well as investigating some methods to make them more aligned.
It's a pretty comprehensive piece of empirical work, but it comes in at nearly 50 pages, so I wanted to highlight what I thought the take-home results were and why they are important.
I want to stress that in this summary I am omitting lots of the experiments, methodology, results, caveats and... (read 2145 more words →)
This post (and the author's comments) don't seem to be getting a great response and I'm confused why? The post seems pretty reasonable and the author's comments are well informed.
My read of the main thrust is "don't concentrate on a specific paradigm and instead look at this trend that has held for over 100 years".
Can someone concisely explain why they think this is misguided? Is it just concerns over the validity of fitting parameters for a super-exponential model?
(I would also add that on priors when people claim "There is no way we can improve FLOPs/$ because of reasons XYZ" they have historically always been wrong.)