x

LESSWRONG

LW

marimeireles — LessWrong

marimeireles

marimeireles

Message

marimeireles.com

1

4y

marimeireles

marimeireles.com

LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B

marimeireles3y10

I've observed the same while fine tuning the latest OpenAI chat model, GPT-3.5. It's very bad. The Da Vinci model has no protections in place whatsoever.
I plan to work on an open-source solution for this issue over the next few weeks. If I make any improvements to the alignment of my models, I'll update here or post it on the forum!