Produced as part of the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort under the supervision of Evan Hubinger.
Acknowledgements: Thanks to Kyle Brady for his many contributions to this project.
Abstract
This post argues that the performance elicited by fine-tuning an LLM on a task using a given prompt format does not usefully bound the level of performance observed when the same information is presented in a different structure. Thus, fine-tuned performance provides very little information about the best performance that would be achieved by a large number of actors fine-tuning models with random prompting schemes in parallel.
In particular, we find that we get much better results from fine-tuning gpt-3.5-turbo (ChatGPT 3.5) to... (read 2524 more words →)
Don't do this! It "immediately deletes all sessions"
https://code.claude.com/docs/en/settings
The comments in the source claiming it persists are false.