I am entering the AI safety field. I want to contribute by solving the problem of unlearning.
How can we apply unlearning?
1) Make LLMs forget dangerous stuff (e.g. CBRN)
2) Current LLMs know when they're being benchmarked. So I want to get situational awareness out of them so we can benchmark them nicely.
Currently I am:
1) improving pytorch knowledge by reproducing papers
2) developing intuitions by reading papers
I'm here to collaborate with someone or just to talk about AI safety
Hi, everyone. I am entering the AI safety field. I want to contribute by solving the problem of unlearning.
How can we apply unlearning?
1) Make LLMs forget dangerous stuff (e.g. CBRN)
2) Current LLMs know when they're being benchmarked. So I want to get situational awareness out of them so we can benchmark them nicely.
I'm looking for:
1) Mentor
2) Collaborators
3) Discussions about AI safety