LESSWRONG
LW

495
Simon T
0010
Message
Dialogue
Subscribe

I am entering the AI safety field. I want to contribute by solving the problem of unlearning. 

How can we apply unlearning? 

1) Make LLMs forget dangerous stuff (e.g. CBRN)

2) Current LLMs know when they're being benchmarked. So I want to get situational awareness out of them so we can benchmark them nicely.

Currently I am:

1) improving pytorch knowledge by reproducing papers 

2) developing intuitions by reading papers

I'm here to collaborate with someone or just to talk about AI safety

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
Open Thread - Summer 2025
Simon T2mo10

Hi, everyone. I am entering the AI safety field. I want to contribute by solving the problem of unlearning. 

How can we apply unlearning? 

1) Make LLMs forget dangerous stuff (e.g. CBRN)

2) Current LLMs know when they're being benchmarked. So I want to get situational awareness out of them so we can benchmark them nicely.

I'm looking for:

1) Mentor

2) Collaborators

3) Discussions about AI safety 

Reply