It might be possible to use some other form of sandboxing in OSX, but I don't know what's available. Podman probably won't work but Docker is actually easier to setup than Podman. For Claude Code purposes the cost of a VM to run Docker in is probably pretty minor.
Edit: Actually Podman can be installed via VM on OSX too: https://podman.io/docs/installation Although at that point you might as well use Docker since the VM is providing isolation already.
I’d prefer to run Claude in cloud sandboxes. But the offering from Anthropic here is rather limited in how it can interact with
git, and as a result not useful for me because it can’t use Graphite effectively.
I was running into the same problem, where I really just want to interact with Claude in the cloud but their cloud environment is too limited. I just finished a tool to run Claude Code as a web service from your own computer instead (and I access it remotely with Tailscale).
This is a really good point. Even if you could train a neuralese model, it would rapidly accumulate errors during inference and go out of distribution.
This is already a problem with tokenized models, where one incorrect token forces the model to condition on that token, but for continuous models we'd expect basically every output to have some error.
Yeah, definitely a lot of what I've asked it required software experience, sometimes fairly low-level (like describing the event loop I want for the background worker).
Re: Neuralese not winning, I think during practical inference, you'd have similar-sized KV caches, so the memory usage is basically a wash (although storing the tokens when you're not running would be much smaller).
But my understanding is that neuralese hasn't won because it's too hard to train. CoT works by training a base model to produce ~all kinds of human-like text, and then RL can extract human-like text that's useful for reasoning. For neuralese, you have to train the reasoning from scratch, without teacher forcing, and getting that to work is (for now) too hard and not as effective as text CoT.
Great article though!
I'm on the $100 Max plan ("5x more usage than Pro"), although rate limits were doubled for most of this period as a holiday thing[1]. I used Claude Code on the web to fire off concurrent workers pretty often, and I only hit rate limits twice: Once around 11:55 pm (reset at midnight) and once in the afternoon about 10 minutes before the rate limit reset (on a different project, where I was making Claude repeatedly read long financial documents). I used Opus 4.5 exclusively.
Basically the rate limits never really got in my way, and I wouldn't have hit them at all on the $200 plan (4x higher rate limits).
I assume spare capacity since people were off of work.
I basically agree with this. When I say Claude is superhuman at coding, I mean that when Claude knows what needs to be done, it does it about as well as a human but much faster. When I say Claude isn't superhuman at software engineering in general, it's because sometimes it doesn't take the right approach when an expert software engineer would.
I run multiple agents in parallel through the Claude Code web interface, so I actually managed to hit the limits on the Max plan. It was always within 10 minutes of the reset though.
I was also making Claude repeatedly read long investment documents for an unrelated project at the same time though.
Aren't human reactions deterministic too though? I'm not sure I understand what you're arguing.
Attention Is Off By One is an interesting alternative approach to this, although in my tiny experiments it didn't change the attention activations much, and Hacker News comments seem to agree.