LESSWRONG
LW

1359
Chloe Li
111120
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0
Chloe Li1y50

It’s a fast-growing and important field right now - there is an urgency to make progress on eval, and a rapid increase in both technical safety eval roles at AI labs and governance roles. This need and capacity for safety evals make eval skills valuable for people who want to contribute to safety now. There are many methods that have been developed and relevant engineering skills to improve, but also a lot of minefields for producing false or misleading results. We thought the latter is an especially important reason for a good curriculum to exist

Reply
Linear encoding of character-level information in GPT-J token embeddings
Chloe Li2y10

We show that linear probes can retrieve character-level information from embeddings and we perform interventional experiments to show that this information is used by the model to carry out character-level tasks.

These two links need permission to be accessed.

Reply
35ARENA 5.0 - Call for Applicants
7mo
2
45ARENA 4.0 Impact Report
10mo
3
57AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0
1y
7