...or at least SOME forms of it. Note: This is preliminary work I did at CAMBRIA this Winter.[1] If it sounds interesting, I'd love to chat. Currently looking for collaboration / mentorship! Here is this project’s Github. And here is a google docs version of this write-up! You can leave...
Epistemic Status: I’m not saying anything new here; just thinking through ideas for myself. Tl;dr: besides evaluations, and some limited real-world scenarios, we can’t do much about situational awareness. Situational Awareness is, roughly, when a model knows about itself (“I’m an LLM”) and its surroundings (“I’m editing the fine-tuning code...
What are transformers, and some ways to think about them. Modern LLMs are Transformers. An LLM is a transformer in the same way that a building might be a ‘Art Deco building’ – specific instances of a general architecture. Like with Art Deco, there are many variants of buildings that...
This post spun off from the work I was doing at UChicago’s XLab over this summer. Thanks to Jay Kim, Jo Jiao, and Aryan Bhatt for feedback on this post! Also, thanks to Zack Rudolph, Rhea Kanuparthi, and Aryan Shrivastava for organizing & facilitating an incredible fellowship this summer. This...
Thanks to Jacob G-W, Sophia Lloyd George, Ariana Azarbal, and Jo Jiao for reading through / commenting on a draft. All opinions, and any errors, are my own. ‘On Caring’ by Nate Soares is a great piece. It’s a moving appeal to do good in the world. I respect Nate’s...
There many ways to taxonomize AI risk. One interesting framing, is ‘risks from optimization’. These are not new ideas. Eliezer wrote about this ~15 years ago, and it seems like many ‘theory folks’ have been saying this for years. I don’t understand these concepts deeply – I’m trying to improve...
Edit 4/2/25: Added footnotes; I didn't realize they got lost en-route. Hello! This is a mini-project that I carried out to get a better sense of ML engineering research. The question itself is trivial, but it was useful to walk through every step of the process by myself. I also...