I think CoT mostly has good grammar because there's optimization pressure for all outputs to have good grammar (since we directly train this), and CoT has essentially the same optimization pressure every other output is subject to. I'm not surprised by CoT with good grammar, since "CoT reflects the optimization pressure of the whole system" is exactly what we should expect.
Going the other direction, there's essentially zero optimization pressure for any outputs to faithfully report the model's internal state (how would that even work?), and that applies to the CoT as well.
Similarly, if we're worried that the normal outputs might be misaligned, whatever optimization pressure lead to that state also applies to the CoT since it's the same system.
EDIT: This section is confusing. Read my later comment instead.
I'm glad people are looking into this since it's interesting, but I'm really confused by the optimism. There's esssentially zero optimization pressure for CoT to be a faithful representation of the LLM's actual thought process, just for it to be useful for the final answer.
EDIT: This first section was badly written. I think that CoT isn't any different from other outputs in terms of faithfulness/monitorability.
On top of that, any RL training to make the model appear more aligned is going to make it less likely to talk about misalignment in its CoT. The paper mentions this:
Indirect optimization pressure on CoT. Even if reward is not directly computed from CoT, model training can still exert some optimization pressure on chains-of-thought. For example, if final outputs are optimized to look good to a preference model, this could put some pressure on the CoT leading up to these final outputs if the portion of the model’s weights that generate the CoT are partially shared with those that generate the outputs, which is common in Transformer architectures. [emphasis added]
But given what we know about how RL strengthens existing behavior in the model and mostly doesn't create new behavior, the shared weights aren't just common but practically guaranteed. You can see this in how Claude will reason in ways that give it the wrong answer if you ask it nicely.
Thanks for sharing this! It seems worth trying to get a low-lead psyllium powder (right now Consumer Reports recommends Organic India).
I read around a little, and it sounds like this shouldn't be too concerning though. Since psyllium forms a gel and and doesn't actually get digested, I'd guess that very little of this lead is actually absorbed, and there are actually studies recommending psyllium as a way to remove lead from your body.
I'd lean on the side of caution and not feed psyllium husk to children though.
That's a good point, that farming also causes large numbers of insects to die whether we bring bees in or not. The post seems to argue that bees in particular are smarter/more important than other insects though. I'd also expect in a most cases that bees are being brought to farms, not wild fields, so the (alleged) suffering of bees is on top of the suffering of other insects on the farm, not an alternative to it. Although, maybe once you've done the work of preparing a field, having bees produce honey from it is less bad than preparing additional fields.
This company claims theirs is https://www.elevenmadisonhome.com/story/mellody-honey
It's for sale here ($28/9oz) https://www.elevenmadisonhome.com/product/mellody-plant-based-honey
Is your placement of free-range eggs because it's a watered-down term, or because you think even actual-free-range/pastured chickens are suffering immensely?
In almost all cases, animals are fed farmed alfalfa and grain several times the caloric value of the meat they produce, so even if you're worried about wild animal suffering to grow crops, we'd grow less crops producing food for people to eat directly rather than food for animals to inefficiently convert to meat.
Meanwhile hundreds of OpenAI's current and ex-employees sold their stock.
To be fair, this is pretty much always the right strategy when stock vests, for diversification reasons. Current employees likely have significantly more stock that will vest in the future.
I read the whole thing and disagree. I think the disconnect might be what you want this article to be (a guide to baldness) vs what it is (the author's journey and thoughts around baldness mixed with gallows humor regarding his potential fate).
The Norword/forest comparison gets used consistently throughout (including the title) and is the only metaphor used this way. Whether you like this comparison or not, it's not a case of AI injecting constant flowery language.
That said, setting audience expectations is also an important part of writing, and I think "A Rationalist's Guide to Male Pattern Baldness" is probably setting the wrong expectation.
I upvoted since I thought it was interesting and I learned a little bit.
I agree with this on today's models.
What I'm pessimistic about is this:
I think there's a lot of evidence that CoT from reasoning models is also subject to the same selection pressures, it also limits the trustworthiness, and this isn't a small effect.
The paper does mention this, but implies that it's a minor thing that might happen, similar to evolutionary pressure, but it looks to me like reasoning model CoT does have the same properties as normal outputs by default.
If CoT was just optimized to get the right answer, you couldn't get Gemini to reason in emoji speak, or get Claude to reason in Spanish for an English question (what do these have to do with getting the right final answer?). The most suspicious one is that you can get Claude to reason in ways that give you an incorrect final answer (the right answer is -18), even though getting the correct final answer is allegedly the only thing we're optimizing for.
If CoT is really not trained to look helpful and harmless, we shouldn't be able to easily trigger the helpfulness training to overpower to correctness training, but we can.
But maybe this just looks worse than it is? This just really doesn't look like a situation where there's minimal selection pressure on the CoT to look helpful and harmless to me.