LESSWRONG
LW

425
Evgenii Kortukov
0000
Message
Dialogue
Subscribe

Aspiring AI safety researcher. Currently doing my PhD at Fraunhofer HHI in Berlin, focusing on LLM interpretability. Interested in the internal structure underlying safety-relevant behaviors in LLMs: prompt injections, jailbreaks, deception.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
No Comments Found
8Modelling, Measuring, and Intervening on Goal-directed Behaviour in AI Systems
18d
0