LESSWRONG
LW

4585
Egor Zverev
65010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
73A comparison of causal scrubbing, causal abstractions, and related methods
Ω
2y
Ω
3
Robustness of Contrast-Consistent Search to Adversarial Prompting
Egor Zverev2y20

Thanks for the post! I believe an interesting idea for future work here could be replacing manual engineering of suffixes with gradient-based / greedy search such as in https://arxiv.org/abs/2307.15043 

Reply