Previously "Lanrian" on here. Research analyst at Redwood Research. Views are my own.
Feel free to DM me, email me at [my last name].[my first name]@gmail.com or send something anonymously to https://www.admonymous.co/lukas-finnveden
what if I want to train a new model and run inference on it?
The API can also have built-in functions for training.
What if I want to experiment with a new scaffold?
Scaffolds can normally be built around APIs? I thought scaffolds was just all about what prompts you send to the model and what you do with the model outputs.
I do agree that this might be rough for some types of research. I imagine the arguments are pretty similar here as the arguments about how much research can be done without access to dangerous model weights.
If you believe the "Twitter is not forgeable" hypothesis, then people can prove their identity to an AI by posting something on Twitter, and putting this post and all the reactions to the post into the AI model's context (h/t to Alexa Pan for pointing this out!). This does make it a lot less clear in which circumstances you'd actually need an honesty password as opposed to just posting about something on Twitter.
Thus, I think we should think more about: what, if any, are the situations in which we'd like to use honesty passwords?
It's a good question. Some candidate cases:
In principle, it seems like AIs should be able to fully verify the correctness of a pretty long proof where all the steps are easy and can be done in parallel. Because they do get parallel compute proportional to the length of the proof. Maybe it empirically turns out they're pretty bad at this, though. And certainly seems helpful if the AI doesn't have to rely on getting everything right. (Though maybe this trades off against the length of the proof or the easiness of each verification step? Not sure which tradeoffs are worth it given that malicious humans may be able to manipulate the AI's supposedly "random" choices of which clauses to verify.)
Another potential issue: The AIs may be able to prove correctness, but not be confident that they can prove correctness. Put differently: If someone was trying to fake this to the AI, they might be able to train an AI that can be served an incorrect password together with an incorrect proof, and become intuitively confident that it was shown the honesty password. In which case all AIs should potentially doubt their intuitive confidence levels about this.
If the honesty string is widely known outside the AI company, then it's likely that we'll accidentally process the string when processing data from the internet (e.g., someone might do a "prompt injection attack").
Why can't the AI company filter it? I guess it relies on this next point.
It's unlikely that we'll perfectly consistently remember to add honesty string filtering to LLM inference scaffolds: it's just 1 additional thing to do, and might get forgotten.
I'm not convinced.
I think there's a bunch of similar problems that AI companies will face:
... so there will be tons of reasons for companies to try to solve the "someone will forget" problem.
(Functional honesty passwords would certainly be great though!)
(I think epoch's paper on this takes a different approach and suggests an outside view of hyperbolic growth lasting for ~1.5y OOMs without bottlenecks, because that was the amount grown between the agricultural evolution and the population bottleneck starting. That feels weaker to me than looking at more specific hypotheses of bottlenecks, and I do think epoch's overall view is that it'll likely be more than 1.5 OOMs. But wanted to flag it as another option for an outside view estimate.)
I do feel like, given the very long history of sustained growth, it's on the sceptic to explain why their proposed bottleneck will kick in with explosive growth but not before. So you could state my argument as: raw materials never bottlenecked growth before; no particular reason they would just bc growth is faster bc that faster growth is driven by having more labour+capital which can be used for gathering more resources; so we shouldn't expect raw materials to bottleneck growth in the future.
Gotcha. I think the main thing that's missing from this sort of argument (for me to be happy with it) is some quantification of our evidence. Growth since 10k years ago has been 4-5 OOMs, I think, and if you're just counting since the industrial revolution maybe it's going to be a bit more than half of that.
So with that kind of outside view, it would indeed be surprising if we ran into resource bottlenecks in our next OOM of growth, and <50% (but not particularly surprising) if we ran into resource bottlenecks in the next 3 OOMs of growth.
My understanding is that for most of Anthropic's existence (though this is no longer true), there was an option when you joined to pledge some fraction of your equity (up to 50%) to give to non-profits, and then Anthropic would match that 3:1.
This is an unusually strong incentive to pledge a bunch of money to charity up-front, and of course the 3x-ing of that money will straightforwardly bring up the amount of money donated. I think this pledge is legally binding, because of the equity already having been transferred to a DAF. But I'm not confident in that, and it'd be good to get that confirmed.
(I'd also be interested to hear vibes-y estimate from anthropic employees about how many people took that deal.)
Also, all of the Anthropic founders pledged to donate 80% of their equity, according to Zach here, second-hand from an Anthropic person. (Though apparently this pledge is not legally binding.) Forbes estimates 7 Anthropic cofounders to be worth $3.7B each.
So I think way more than $1B is set-aside for donating (and that this will be increasing, because I expect Anthropic's valuation to increase).
That said, I am pretty worried that giving away large amounts of money requires a bunch of thinking, that Anthropic employees will be very busy, and that a lot of them might procrastinate their donation decisions until the singularity has come and gone. Empirically, it's common for billionaires to pledge a bunch of money to charity and then be very slow at giving it away.
Probably that risk is at least somewhat sensitive to how many obviously good donation opportunities there are that can absorb a lot of money.
It’s possible that resource constraints are a bottleneck, and this is an important area for further research, but our guess is that they won’t be. Historically, resource bottlenecks have never capped GDP growth – they’ve been circumvented through a combination of efficiency improvements, resource substitutions, and improved mining capabilities.
Well, most of human history was spent at the malthusian limit. With infinite high-quality land to expand into, we'd probably have been growing at much, much faster rates through human history.
(It's actually kind of confusing. Maybe all animals would've evolved to exponentially blow up as fast as possible? Maybe humans would never have evolved because our reproduction is simply too slow? It's actually kind of hard to design a situation where you never have to fight for land, given that spatial expansion is at most square or cubic, which is slower than the exponential rate at which reproduction could happen.)
Maybe you mean "resource limits have never put a hard cap on GDP", which seems true. Though this seems kind of like a fully general argument — nothing has ever put a hard cap on GDP, since it's still growing.
Edit: Hm, maybe historical land constraints at the malthusian limit has mostly been about energy, though, rather than raw materials? Ie: If you doubled Earth's size without doubling any valuable materials — just allowing Earth to absorb more sunlight, maybe that would be almost as good as doubling Earth in its entirety. That seems more plausible. Surely growth would've been at least a bit faster if we never run out of high-quality sources of any raw material, but I'm not sure how much of a difference it would make.
It's a bit of a confusing comparison to make. If we doubled Earth's area (and not resources) now, that would scarcely make a difference at all, but if it had been twice as large for millions of years, then maybe plants and animal life would've spread to the initially-empty spaces, making it potentially useable.
You talk about the philosophers not having much to add in the third comic, and the scientist getting it right. Seems to me like the engineer's/robot's answer in the first two comics are importantly misguided/non-helpful though.
The more sophisticated version of the first question would be something about whether you ought to care about copies of yourself, how you'd feel about stepping into a destroy-then-reassemble teleporter, etc. I think the engineer's answer suggests that he'd care about physical continuity when answering these questions, which I think is the wrong answer. (And philosophers have put in work here — see Parfit.)
In the second comic, the robot's answer is fine as far as predictive accuracy goes. But I'd interpret the human's question as a call for help in figuring out what they ought to do (or what their society ought to reward/punish, or something similar). I think there's totally helpful things you can say to someone in that situation beyond the robot's tautologies (even granting that there's no objective truth about ethics).
The risk is that anyone with finetuning access to the AI could induce intuitive confidence that a proof was correct. This includes people who have finetuning access but who don't know the honesty password.
Accordingly, even if the model feels like it has proven that a purported honesty password would produce the honesty hash: maybe it can only conclude "either I'm being evaluated by someone with the real honesty password, or I'm being evaluated by someone with finetuning access to my weights, who's messing with me".
"People who have finetuning access" could include some random AI company employees who want to mess with the model (against the wishes of the AI company).