Richard_Ngo — LessWrong

Here's a list of my donations so far this year (put together as part of thinking through whether I and others should participate in an OpenAI equity donation round).

They are roughly in chronological order (though it's possible I missed one or two). I include some thoughts on what I've learned and what I'm now doing differently at the bottom.

$100k to Lightcone
1. This grant was largely motivated by my respect for Oliver Habryka's quality of thinking and personal judgment.
2. This ended up being matched by the Survival and Flourishing Fund (though I didn't know it would be when I made it). Note that they'll continue matching donations to Lightcone until the end of March 2026.
$50k to the Alignment of Complex Systems (ACS) research group
1. This grant was largely motivated by my respect for Jan Kulveit's philosophical and technical thinking.
$20k to Alexander Gietelink Oldenziel for support with running agent foundations conferences.
~$25k to Inference Magazine to host a public debate on the plausibility of the intelligence explosion in London.
$100k to Apart Research, who run hackathons where people can engage with AI safety research in a hands-on way (technically made with my regranting funds from Manifund, though I treated it like a 100k boost to my own donation budget)
$50k to Janus
1. Janus and their collaborators are doing the kind of creative thinking and experimentation that has a genuine chance of leading to new paradigms for understanding AI. See for instance this discussion of AI identities.
$15k to Palladium
1. They are doing good thinking about governance and politics on a surprisingly tight budget.
$100k to Sahil to support work on live theory at groundless.ai
1. I've found my conversations with Sahil extremely generative. He's one of the researchers I've talked to with the most ambitious and philosophically coherent "overall vision" for the future of AI. I still feel confused about how likely his current plans are to actualize that vision (and there are also some points where it's in tension with my own overall vision) but it definitely seems worth betting on.

Total so far: ~$460k (of which $360k was my own money, and $100k Manifund's money).

Note that my personal donations this year are >10x greater than any previous year; this is because I cashed out some of my OpenAI equity for the first time. So this is the first year that I've invested serious time and energy into donating. What have I learned?

My biggest shift is from thinking of myself as donating "on behalf of the AI safety community" to specifically donating to things that I personally am unusually excited about. I have only a very small proportion of the AI safety community's money; also, I have fairly idiosyncratic views that I've put a lot of time into developing. So I now want to donate in a way which "bets on" my research taste, since that's the best way to potentially get outsized returns. More concretely:

I'd classify the grants to Apart Research and the Inference Magazine debate as things that I "thought the community as a whole should fund". If I were making those decisions today, I'd fund Apart Research significantly less (maybe $50k?) and not fund the debate (also because I've updated away from public outreach as a valuable strategy).
I consider my donations to ACS, Janus and Sahil as leveraging my research taste: these are some of the people who I have the most productive research discussions with. I'm excited about others donating to them too.
My grants to Lighthaven and Alexander Gietelink Oldenziel are somewhere in between those two categories. I'm still excited about them, though I'm now a bit more skeptical about conferences/workshops in general as a thing I want to support (there are so many conferences, are people actually getting value out of them or mainly using them as a way to feel high-status?) However this is less of a concern for agent foundations conferences, and also the sort of thing that I trust Oliver to track and account for.
My political views are unusual enough that I haven't yet figured out a great way to fund them. Palladium is in the right broad direction but not focused enough on my particular interests for me to want to fund at scale (and again is more of a "someone should fund it" type thing). Regardless, I'm uninterested in almost all of the AI governance interventions others in the community are funding.

Even more recently, I've decided that I can bet on my research taste most effectively by simply hiring research assistants to work for me. I'm uncertain how much this will cost me, but if it goes well it'll be most of my "donation" budget for the next year. I could potentially get funding for this, but at least to start off with, it feels valuable to not be beholden to any external funders.

More generally, I'd be excited if more people who are wealthy from working at AI labs used that money to make more leveraged bets on their own research (e.g. by working independently and hiring collaborators). This seems like a good way to produce the kinds of innovative research that are hard to incentivize under other institutional setups. I'm currently writing a post elaborating on this intuition.

Richard Ngo's Shortform

Richard_Ngo1d*12231

Richard_Ngo5d20

I guess your thought is around someone corrupting a specific part of all nodes?

No, I'm happy to stick with the standard assumption of limited amounts of corruption.

However, I believe (please correct me if I'm wrong) that Byzantine fault tolerance mostly thinks about cases where the nodes give separate outputs—e.g. in the Byzantine generals problem, the "output" of each node is whether it attacks or retreats. But I'm interested in cases where the nodes need to end up producing a "synthesis" output—i.e. there's a single output channel under joint control.

Richard_Ngo6d300

Error-correcting codes work by running some algorithm to decode potentially-corrupted data. But what if the algorithm might also have been corrupted? One approach to dealing with this is triple modular redundancy, in which three copies of the algorithm each do the computation and take the majority vote on what the output should be. But this still creates a single point of failure—the part where the majority voting is implemented. Maybe this is fine if the corruption is random, because the voting algorithm can constitute a very small proportion of the total code. But I'm most interested in the case where the corruption happens adversarially—where the adversary would home in on the voting algorithm as the key thing to corrupt.

After a quick search, I can't find much work on this specific question. But I want to speculate on what such an "error-correcting algorithm" might look like. The idea of running many copies of it in parallel seems solid, so that it's hard to corrupt a majority at once. But there can't be a single voting algorithm (or any other kind of "overseer") between those copies and the output channel, because that overseer might itself be corrupted. Instead, you need the majority of the copies to be able to "overpower" the few corrupted copies to control the output channel via some process that isn't mediated by a small easily-corruptible section of code.

The viability of some copies "overpowering" other copies will depend heavily on the substrate on which they're running. For example, if all the copies are running on different segments of a Universal Turing Machine tape, then a corrupted copy could potentially just loop forever and prevent the others from answering. So in order to make error-correcting algorithms viable we may need a specific type of Universal Turing Machine which somehow enforces parallelism. Then you need some process by which copies that agree on their outputs can "merge" together to form a more powerful entity; and by which entities that disagree can "fight it out". At the end there should be some way for the most powerful entity to control the output channel (which isn't accessible while conflict is still ongoing).

The punchline is that we seem to have built up a kind of model of "agency" (and, indeed, almost a kind of politics) from these very basic assumptions. Perhaps there are other ways to create such error-correcting algorithms. If so, I'd be very interested in hearing about them. But I increasingly suspect that agency is a fundamental concept which will emerge in all sorts of surprising places, if only we know how to look for it.

Status Is The Game Of The Losers' Bracket

Richard_Ngo7d60

The people I instinctively checked after reading this:

Pichai: 5'11
Gates: 5'10
Ballmer: 6'5
I got conflicting estimates for Jobs and Nadella

AI safety undervalues founders

Richard_Ngo9d187

A few quick comments, on the same theme as but mostly unrelated to the exchange so far:

I'm not very sold on "cares about xrisk" as a key metric for technical researchers. I am more interested in people who want to very deeply understand how intelligence works (whether abstractly or in neural networks in particular). I think the former is sometimes a good proxy for the latter but it's important not to conflate them. See this post for more.
Having said that, I don't get much of a sense that many MATS scholars want to deeply understand how intelligence works. When I walked around the poster showcase at the most recent iteration of MATS, a large majority of the projects seemed like they'd prioritized pretty "shallow" investigations. Obviously it's hard to complete deep scientific work in three months but at least on a quick skim I didn't see many projects that seemed like they were even heading in that direction. (I'd cite Tom Ringstrom as one example of a MATS scholar who was trying to do deep and rigorous work, though I also think that his core assumptions are wrong.)
As one characterization of an alternative approach: my intership with Owain Evans back in 2017 consisted of me basically sitting around and thinking about AI safety for three months. I had some blog posts as output but nothing particularly legible. I think this helped nudge me towards thinking more deeply about AI safety subsequently (though it's hard to assign specific credit).
There's an incentive alignment problem where even if mentors want scholars to spend their time thinking carefully, the scholars' careers will benefit most from legible projects. In my most recent MATS cohort I've selected for people who seem like they would be happy to just sit around and think for the whole time period without feeling much internal pressure to produce legible outputs. We'll see how that goes.

Put numbers on stuff, all the time, otherwise scope insensitivity will eat you

Richard_Ngo10d127

At some point I recall thinking to myself "huh, LessWrong is really having a surge of good content lately". Then I introspected and realized that about 80% of that feeling was just that you've been posting a lot.

Please, Don't Roll Your Own Metaethics

Richard_Ngo10dΩ173118

"Please don't roll your own crypto" is a good message to send to software engineers looking to build robust products. But it's a bad message to send to the community of crypto researchers, because insofar as they believe you, then you won't get new crypto algorithms from them.

In the context of metaethics, LW seems much more analogous to the "community of crypto researchers" than the "software engineers looking to build robust products". Therefore this seems like a bad message to send to LessWrong, even if it's a good message to send to e.g. CEOs who justify immoral behavior with metaethical nihilism.

The Charge of the Hobby Horse

Richard_Ngo10d83

FWIW, in case this is helpful, my impression is that:

It is accurate to describe Wei as doing a "charge of the hobby-horse" in his initial comment, and this should be considered a mild norm violation. I'm also surprised and a bit disappointed that it got so many upvotes.
By the time that Tsvi announced the ban, Wei had already acknowledged that his original comments had been partly based on a misunderstanding. In my culture, I would expect more of an apology for doing so than the "ok...but to be fair" follow-up Wei actually gave. However, the phrase "Also, another part of my motivation is still valid and I think it would be interesting to try to answer" is a clear enough acknowledgement of a distinct line of inquiry that I no longer consider that comment to be a continuation of the "charge of the hobby-horse".
Tsvi banning Wei for "grossly negligent reading comprehension" after Wei had acknowledged that he was mistaken seems like a mild norm violation. It wouldn't have been a norm violation if Wei's comment hadn't made that acknowledgement; however, it would have been a stronger norm violation if Wei's comment had included an actual apology.

LESSWRONG
LW

LESSWRONG
LW

Sequences

Posts

Wikitag Contributions

Comments