atharva

Rephrasing Reduces Eval Awareness...

...or at least SOME forms of it. Note: This is preliminary work I did at CAMBRIA this Winter.[1] If it sounds interesting, I'd love to chat. Currently looking for collaboration / mentorship! Here is this project’s Github. And here is a google docs version of this write-up! You can leave...

Feb 1723

Situational Awareness is (mostly) here to stay

Epistemic Status: I’m not saying anything new here; just thinking through ideas for myself. Tl;dr: besides evaluations, and some limited real-world scenarios, we can’t do much about situational awareness. Situational Awareness is, roughly, when a model knows about itself (“I’m an LLM”) and its surroundings (“I’m editing the fine-tuning code...

Feb 110

Transformers, Intuitively

What are transformers, and some ways to think about them. Modern LLMs are Transformers. An LLM is a transformer in the same way that a building might be a ‘Art Deco building’ – specific instances of a general architecture. Like with Art Deco, there are many variants of buildings that...

Jan 55

AI Safety – Analyse Affordances

This post spun off from the work I was doing at UChicago’s XLab over this summer. Thanks to Jay Kim, Jo Jiao, and Aryan Bhatt for feedback on this post! Also, thanks to Zack Rudolph, Rhea Kanuparthi, and Aryan Shrivastava for organizing & facilitating an incredible fellowship this summer. This...

Dec 10, 20253

On 'On Caring'

Thanks to Jacob G-W, Sophia Lloyd George, Ariana Azarbal, and Jo Jiao for reading through / commenting on a draft. All opinions, and any errors, are my own. ‘On Caring’ by Nate Soares is a great piece. It’s a moving appeal to do good in the world. I respect Nate’s...

May 26, 20259

Optimization & AI Risk

There many ways to taxonomize AI risk. One interesting framing, is ‘risks from optimization’. These are not new ideas. Eliezer wrote about this ~15 years ago, and it seems like many ‘theory folks’ have been saying this for years. I don’t understand these concepts deeply – I’m trying to improve...

May 13, 202516

Does Summarization Affect LLM Performance?

Edit 4/2/25: Added footnotes; I didn't realize they got lost en-route. Hello! This is a mini-project that I carried out to get a better sense of ML engineering research. The question itself is trivial, but it was useful to walk through every step of the process by myself. I also...

Apr 1, 202519

atharva

atharva

Rephrasing Reduces Eval Awareness...

Does Summarization Affect LLM Performance?

Optimization & AI Risk

Takes on Takeoff

atharva

Rephrasing Reduces Eval Awareness...

Does Summarization Affect LLM Performance?

Optimization & AI Risk

Takes on Takeoff

Rephrasing Reduces Eval Awareness...

Situational Awareness is (mostly) here to stay

Transformers, Intuitively

AI Safety – Analyse Affordances

On 'On Caring'

Optimization & AI Risk

Does Summarization Affect LLM Performance?