I don't think so – my CLAUDE.md is fairly short (23 lines of text) and consists mostly of code style comments. I also have one skill for set up for using Julia via a REPL. But I don't think either of these would result in more disagreement/correction.
I've used Claude Code in mostly the same way since 4.0, usually either iteratively making detailed plans and then asking it to check off todos one at a time, or saying "here's a big, here's how to reproduce it, figure out what's going on."
I also tend to write/speak with a lot of hedging, so that might make Claude more likely to assume my instructions are wrong.
I move data around and crunch numbers at a quant hedge fund. There are some aspects that make our work somewhat resistant to LLMs normally: we use a niche language (Julia) and a custom framework. Typically, when writing framework related code, I've given Claude Code very specific instructions and it's followed them to the letter, even when those happened to be wrong.
In 4.6, Claude seems to finally "get" the framework, searching the codebase to understand its internals (as opposed to just understanding similar examples) and has given me corrections or pushback – e.g. it warned me (correctly) about cases where I had an unacceptably high chance of hash collisions, and said something like "no, the bug isn't X, it's Y" (again correctly) when I was debugging.
Relatedly, get good at the things that you're hiring for. It's possible to tell if somebody is about twice as good as you are at something. It's very hard to tell the difference between twice as skilled and ten times as skilled. So if you need to hire people who are very good at something you need to get at least decently good at it yourself.
This also has a strange corollary. It often makes sense to hire people for the things that you're good at and to keep doing the things that you're mediocre at.
After my daughter got covid (at 4 months old), she was only sleeping for about an hour at a time, which was really rough on us and her – we were all constantly exhausted. It took just two days of cry it out to get her back to sleeping much better, and then she was noticeably happier and more energetic (and so were we.)
Donated $2.5k. Thanks for everything!
I tried to search for surveys of mathematicians on the axiom of choice, but couldn't find any. I did find one survey of philosophers, but that's a very different population, asked whether they believed AC/The Continuum Hypothesis has an answer rather than what the answer is: https://thephilosophyforum.com/discussion/13670/the-2020-philpapers-survey
My subjective impression is that my Mathematician friends would mostly say that asking whether AC is true or not is not really an interesting question, while asking what statements depend on it is.
Use random spot-checks
This is really, really hard to internalize. The default is to pay uniformly less attention to everything, e.g. switch to skimming every PR rather than randomly reviewing a few in detail. But that default means you lose a valuable feedback loop, while spot checking even 10% sustains it.
I've seen this scale to 100 person companies, and I think it scales much further.
I would describe the problem as follows: to reliably hire people who are 90th percentile at a skill, you (or someone you trust) needs to be at least 75th percentile at that skill. To hire 99th percentile, you need to be at least 90th percentile. To avoid hiring total charlatans, you need to be 25th percentile. And so on.
25th percentile is a relatively small investment! 75th is more, but still often 1-2 weeks depending on the domain. It's almost certainly worth it to make sure your team has at least one person you trust who's 75th percentile at any task you repeatedly hire for, even if you're hiring contractors. And if it's a key competency for the company, it can be worth the work to get to 90th percentile so you can hire really outstanding people in that domain.
Moltbook for misalignment research?