I noticed that Claude Code is much more likely to print out some short message response to the user before using each tool when reasoning is off compared to when it's on. Something to the effect of "Let us continue to [do the next step in solving the current problem]..."
I wonder whether part of this behavior can be explained by Claude wanting more "time" to reason silently under the scenes when producing its output. This post is about the AI "thinking" while processing the input tokens, but I think a lot of opaque reasoning might also be happening while the model is generating its output, even if it is generating unrelated tokens, I'd love to see some comparison that e.g. asks the model to generate a certain unrelated word 100 times before writing its answer.
I've also noticed that AI agents seem to have some chance of finding out bugs or issues the longer it thinks. In particular, Claude and other models will often fail to find a bug at first but then suddenly "notice" it some time later and then start working on fixing that bug unprompted. There doesn't seem to be any particular reason why it noticed the bug at that particular time, so I think some part of the AI's focus is wandering around different considerations at different times somewhat like the human subconscious, so it can give itself more chance of randomly having an "aha" moment if it is given more time reading or generating possibly unrelated tokens before making its ultimate decision (such as which tool to call). This might be a possible mechanism behind why the model learned to opaquely reason when given extra time, leading to the results in this post.
I just did a quick search and apparently the new $1000 deduction for non-itemizers that comes into effect in 2026 under the OBBBA doesn't apply to DAF contributions. So a DAF is not useful unless you itemize.
The new law includes a provision, effective after 2025, allowing non-itemizers to take a charitable deduction of $1,000 for single filers and $2,000 for MFJ taxpayers. As has been the case in the past, gifts to donor-advised funds are not eligible. Unlike a previous (but smaller) similar provision, though, this law is not set to sunset.
https://www.racf.org/news/obbba/
I find the part about extreme specialization very interesting, and potentially applicable to training AI agent systems (from an outsider's perspective). Today's instruction-following LLMs could in theory cooperate since they don't yet follow goals outside of their prompt, so we can just prompt them to work together with each other and they will do so without hesitation. So it sounds like we can get a lot of benefit from specialization if we can train them to cooperate effectively.
Today's frontier LLMs are quite general-purpose and benefit from being so, and I would guess that's both for economic reasons during training (one big frontier model outperforms many smaller specialized models for the same training cost) and to benefit performance in interdisciplinary tasks. But all our training evaluations and most real-life production workloads are done on a single LLM being used in a scaffold. That single LLM model might contain many experts but they are tightly coupled. But what if that wasn't the case?
Could we train a system of separate LLMs that each have narrow use cases but are natively designed to be able to talk to one another? We could run them on different machines and train them to rapidly communicate with one another using a predefined agentic scaffold (or some other communication method more deeply embedded in the model architecture itself), with the objective function being some function of the system's performance as a whole and individual models' contributions to it, rather than the training process only running and evaluating a single model.
That seems like it could unlock a lot of benefits akin to the analogy with multicellularity, with each LLM being an expert in a certain field and knowing just enough about other fields to delegate to the other experts when needed. Sort of like MoE but at an agent scaffold level instead of at a LLM-level. Compared to regular MoE it could be at least much more efficient with memory usage when hosted in a large-scale datacenter setting, or the system as a whole could even be able to reach new levels of intelligence without increasing the size of each individual LLM.