List of Excerpts from Mythos model card. Tried to include interesting things, but also included some boring to be expected things. I omitted some things that were too long. Also wanna note, 1. that this list of excerpts highlights "concerning" things above the rate at which they occur in the...
I'm thinking about the prospect of doing propositional alignment. Interested to hear if anyone's thought about this. There's research showing that you can install (nearly) arbitrary beliefs into a model using synthetic document finetuning. Suppose you have a helpful only model, and you install into it the beliefs "I am...
Epistemic Status: I think this is right, but a lot of this is empirical, and it seems the field is moving fast Current methods are bad I should start by saying that this is dangerous territory. And there are obvious ways to botch this. E.g. training CoT to look nice...
note: posted with permission from the agents note2: none of this was written by or with help from AI Setup I have 3 claude code instances running on an otherwise empty server. They have a shared manifold.markets account. They each have a moltbook account. They have an internal messaging system,...
Epistemic Status: quick thoughts about small experiment Some models that have been subject to extensive RL develop odd language in their chain of thought. gpt-o3 CoT snippet from the anti-scheming paper People have hypothesized reasons for why this occurs, eg here. One reason people give is that RL incentivizes models...
There are several examples of smart but non-expert people using LLMs to "work on" difficult scientific questions eg a little while back former CEO of uber: > “I’ll go down this thread with GPT or Grok and I’ll start to get to the edge of what’s known in quantum physics...