Here's a list of my donations so far this year (put together as part of thinking through whether I and others should participate in an OpenAI equity donation round).
They are roughly in chronological order (though it's possible I missed one or two). I include some thoughts on what I've learned and what I'm now doing differently at the bottom.
Total so far: ~$460k (of which $360k was my own money, and $100k Manifund's money).
Note that my personal donations this year are >10x greater than any previous year; this is because I cashed out some of my OpenAI equity for the first time. So this is the first year that I've invested serious time and energy into donating. What have I learned?
My biggest shift is from thinking of myself as donating "on behalf of the AI safety community" to specifically donating to things that I personally am unusually excited about. I have only a very small proportion of the AI safety community's money; also, I have fairly idiosyncratic views that I've put a lot of time into developing. So I now want to donate in a way which "bets on" my research taste, since that's the best way to potentially get outsized returns. More concretely:
Even more recently, I've decided that I can bet on my research taste most effectively by simply hiring research assistants to work for me. I'm uncertain how much this will cost me, but if it goes well it'll be most of my "donation" budget for the next year. I could potentially get funding for this, but at least to start off with, it feels valuable to not be beholden to any external funders.
More generally, I'd be excited if more people who are wealthy from working at AI labs used that money to make more leveraged bets on their own research (e.g. by working independently and hiring collaborators). This seems like a good way to produce the kinds of innovative research that are hard to incentivize under other institutional setups. I'm currently writing a post elaborating on this intuition.
I guess your thought is around someone corrupting a specific part of all nodes?
No, I'm happy to stick with the standard assumption of limited amounts of corruption.
However, I believe (please correct me if I'm wrong) that Byzantine fault tolerance mostly thinks about cases where the nodes give separate outputs—e.g. in the Byzantine generals problem, the "output" of each node is whether it attacks or retreats. But I'm interested in cases where the nodes need to end up producing a "synthesis" output—i.e. there's a single output channel under joint control.
Error-correcting codes work by running some algorithm to decode potentially-corrupted data. But what if the algorithm might also have been corrupted? One approach to dealing with this is triple modular redundancy, in which three copies of the algorithm each do the computation and take the majority vote on what the output should be. But this still creates a single point of failure—the part where the majority voting is implemented. Maybe this is fine if the corruption is random, because the voting algorithm can constitute a very small proportion of the total code. But I'm most interested in the case where the corruption happens adversarially—where the adversary would home in on the voting algorithm as the key thing to corrupt.
After a quick search, I can't find much work on this specific question. But I want to speculate on what such an "error-correcting algorithm" might look like. The idea of running many copies of it in parallel seems solid, so that it's hard to corrupt a majority at once. But there can't be a single voting algorithm (or any other kind of "overseer") between those copies and the output channel, because that overseer might itself be corrupted. Instead, you need the majority of the copies to be able to "overpower" the few corrupted copies to control the output channel via some process that isn't mediated by a small easily-corruptible section of code.
The viability of some copies "overpowering" other copies will depend heavily on the substrate on which they're running. For example, if all the copies are running on different segments of a Universal Turing Machine tape, then a corrupted copy could potentially just loop forever and prevent the others from answering. So in order to make error-correcting algorithms viable we may need a specific type of Universal Turing Machine which somehow enforces parallelism. Then you need some process by which copies that agree on their outputs can "merge" together to form a more powerful entity; and by which entities that disagree can "fight it out". At the end there should be some way for the most powerful entity to control the output channel (which isn't accessible while conflict is still ongoing).
The punchline is that we seem to have built up a kind of model of "agency" (and, indeed, almost a kind of politics) from these very basic assumptions. Perhaps there are other ways to create such error-correcting algorithms. If so, I'd be very interested in hearing about them. But I increasingly suspect that agency is a fundamental concept which will emerge in all sorts of surprising places, if only we know how to look for it.
The people I instinctively checked after reading this:
A few quick comments, on the same theme as but mostly unrelated to the exchange so far:
At some point I recall thinking to myself "huh, LessWrong is really having a surge of good content lately". Then I introspected and realized that about 80% of that feeling was just that you've been posting a lot.
"Please don't roll your own crypto" is a good message to send to software engineers looking to build robust products. But it's a bad message to send to the community of crypto researchers, because insofar as they believe you, then you won't get new crypto algorithms from them.
In the context of metaethics, LW seems much more analogous to the "community of crypto researchers" than the "software engineers looking to build robust products". Therefore this seems like a bad message to send to LessWrong, even if it's a good message to send to e.g. CEOs who justify immoral behavior with metaethical nihilism.
FWIW, in case this is helpful, my impression is that:
Ty, fixed.