LESSWRONG
LW

Ben_Snodin
44220
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Notes from a mini-replication of the alignment faking paper
Ben_Snodin3mo10

Nice!

Reply
Notes from a mini-replication of the alignment faking paper
Ben_Snodin3mo10

Really appreciate the comment!

Use a smaller system prompt (I suspect some of the long few-shot examples can be removed / replaced by prompting?)

Makes sense, iirc this was on my todo list to try but I didn't get round to it. Those few-shot examples are insanely long.

Use prompt caching when possible

(I am pretty new to this but I understood that this is automatically taken care of for you if you use the same prompts repeatedly with the API. Idk how true this is though.)

Use a single generation (I don't know if you are using the version of the setup with "<NEXT/>")

(In my case I was already using a single generation)

Tbh my biggest takeaway and tip for others on this is that it seems you can get $300 of Google API credits by signing up for a free account. Ofc this doesn't help if you want to use Claude 3 Opus etc so it's not as useful for this particular replication.

Reply
No wikitag contributions to display.
13Notes from a mini-replication of the alignment faking paper
3mo
5
34My thoughts on nanotechnology strategy research as an EA cause area
3y
0