LESSWRONG
LW

28
marimeireles
0010
Message
Dialogue
Subscribe

marimeireles.com

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B
marimeireles2y10

I've observed the same while fine tuning the latest OpenAI chat model, GPT-3.5. It's very bad. The Da Vinci model has no protections in place whatsoever.
I plan to work on an open-source solution for this issue over the next few weeks. If I make any improvements to the alignment of my models, I'll update here or post it on the forum!

Reply
No posts to display.