This is my first post on the platform and my first set of experiments with GPT-2 using TransformerLens. If you spot any interesting insights or mistakes, feel free to share your thoughts in the comments. While these findings aren't entirely novel and may seem trivial, I’m presenting them here as a reference for anyone exploring this topic for the first time!
All the code with some extra analysis [not included in this post] is available here
Introduction and Motivation
Fine-tuning large language models (LLMs) is widely used to adapt models to specific tasks, yet the fundamental question remains: What actually changes in the model's internal representations? Prior research suggests that fine-tuning induces significant behavioral shifts... (read 2654 more words →)