Having done a bunch of this, yes, great idea. You can have pretty spectacular impact, because the motivation boost and arc of "someone believes in me" is much more powerful than the one you get from funding stress.
My read is that good-taste grants of this type are dramatically, dramatically more impactful than those by larger grantmakers, e.g. I proactively found and funded the upskilling grant of a math PhD who found glitch tokens, which was for a while the third most upvoted research on the alignment forum. This cost $12k for I think one year of upskilling, as frugal geniuses are not that rare if you hang out in the right places.
However! I don't think that your proposed selection mechanism is much good. It replaces applications with promotion, and will cause lots of researchers who don't get funded to spend cycles or be tugged around by campaigns, and your final winners will be hit by goodhart's curse. Also, this depends on the average AF participant not just being good at research, but at judging who will do good research.
I do think it'd be net positive, but I think you can do a lot better
How could a nomination and voting system be improved? And especially, who should get a vote? Should it be Alignment Forum members, or users registered before a certain date, or Lesswrong users?
If you're doing a mechanism rather than concentrated agency, @the gears to ascension's proposal seems much more promising to me as it relies much more on high-trust researchers rather than lots of distributed less informed votes.
Could there be an entirely different approach to finding fellows? How would you do it?
The other angles I see are:
Maybe the list of perfect candidates already exists, waiting to get funded?
I have a list of people I'm excited about! And proactively gardened projects with founders lined up too.[1] Happy to talk if you're interested in double-clicking on any of these, booking link DMed.
What should be the amount? Thiel gave 200k. Is it too much for 2 years? Too little?
I recommend less, spread over more people, though case-by-case is OK. Probably something like $75k a year gets the vast majority of the benefit, but you can have a step where you ask their current salary and use that as an anchor. Alternatively, I think there's strong benefit to giving many people a minimal safety net. Being able to call on even $20-25k/year for 3 years would be a vast weight off many people's shoulders, if you're somewhat careful and live outside a hub it's entirely possible to do great work on a shoestring, and this actually provides some useful filters.
I have spent down the vast majority of my funds over the last 5 years so can't actually support anyone other than the smallest grants without risking running out of money before the world ends and needing to do something other than full time trying to save the world.
My takes for the qs in order,
Original motivation: A few months ago, I was drawn to this for making common knowledge of which groups of people cross-rate highly; I suspect most people give pretty low scores for most people, and there will be a few peaks of actual relevance. I'd guess a force-directed layout of the rating review graph would show those clusters, for example. Ideally it would also allow people to select their own taste ratings as a starting set and see what the resulting transitive distribution is. That said, maybe making it public is less important now - with high quality disagreement-bridging posts like Steven Byrnes' 6 reasons why “alignment-is-hard” discourse seems alien to human intuitions, and vice-versa, the urgency of achieving common knowledge of who's competent according to what camp seems slightly less. still significant, though, and I think the value of making it public is in general pretty high.
Possible drawbacks: It would be some effort for voters to fill this out even if the UI was nice, but my hunch is that the output will be enough better than single-round unnormalized voting to be worth it. And it might not be much better to do this than just ask alignmentforum, in which case it'd be effort for not much value.
The bracketed part needs refinement - I'd want this to be precise enough that god could resolve a prediction market about it. Needs to connect to both transitive Q1 and also to Q2, and be a clear enough question to be answerable by a skilled alignment researcher.
Claude intro and critique of my proposal with your post and my comment as context; Eigentrust (wikipedia); Eigentrust paper
Eigentrust sounds awesome. I was really excited about it when first discovered, but thought that people won't have a good reason to fill out their trust weights. Grant money allocation could be a perfect motivator.
I wonder if it's good to pre-fill the trust weights (e.g. based on AF upvotes history), to make it easier for users (and motivate those who strongly disagree with their defaults)
Thank you for offering to volunteer with this, I'll definitely reach out if I decide to run the fellowship with Eigentrust.
I've made solid progress on putting together a site I'd be happy with, not quite there yet, eta another 24 to 48h. Let me know if you end up not wanting to couple your donations and eigentaste; there's still clear value in running it, so I'm going to get it done regardless.
Sent you a demo. Reasonably close to ready for real use by motivated users, but the human-facing prompting still need refining in order to be properly meaningful.
but thought that people won't have a good reason to fill out their trust weights
Yeah, I notice that using a transitive quality as the endorsement criterion, and making votes public, produces an incentive for a person to give useful endorsements: Failing to issue informative endorsements would indicate them as not having this transitive quality and so not being worthy of endorsement themselves.
We can also make it prominent in a person's profile if, for instance, they've strongly endorsed themselves, or if they've only endorsed a few people without also doing any abstention endorsements (which redistribute trust back to the current distribution). Some will have an excuse for doing this, most will be able to do better.
I wonder if it's good to pre-fill the trust weights (e.g. based on AF upvotes history), to make it easier for users (and motivate those who strongly disagree with their defaults)
True. Doing that by default, and also doing some of the aforementioned abstention endorsements by default, would address accidental overconfident votes pretty well.
(Also, howdy, I should probably help with this, I was R&Ding web of trust systems for a while before realising there didn't seem to be healthy enough hosts for them (they can misbehave if placed in the wrong situations), so I switched to working on extensible social software/forums, to build better hosts. It wasn't clear to me that the alignment community needed this kind of thing, but I guess it probably does at this point.)
One obvious problem is that this turns getting funded into a popularity contest, which makes Goodhart kick in. It might work fine as a one-off thing, but in the long run, it will predictably get gamed, and will likely have negative effects on the whole LW discussion ecosystem by setting up perverse incentives for engaging with it (and, unless the list of eligible people is frozen forever, attracting new people who are only interested in promoting themselves to get money).
What should be the amount? Thiel gave 200k. Is it too much for 2 years? Too little?
You should almost certainly have some mechanism for deciding the amount to pay on a case-by-case basis, rather than having it be flat.
Could there be an entirely different approach to finding fellows? How would you do it?
What I would want to experiment with is using prediction markets to "amplify" the judgement of well-known people with unusually good AGI Ruin models who are otherwise too busy to review thousands of mostly-terrible-by-their-lights proposals (e. g., Eliezer or John Wentworth). Fund the top N proposals the market expects the "amplified individual" to consider most promising, subject to their veto.
This would be notably harder to game than a straightforward popularity contest, especially if the amplifee is high-percentile disagreeable (as my suggested picks are).
What I would want to experiment with is using prediction markets to "amplify" the judgement of well-known people with unusually good AGI Ruin models who are otherwise too busy to review thousands of mostly-terrible-by-their-lights proposals (e. g., Eliezer or John Wentworth). Fund the top N proposals the market expects the "amplified individual" to consider most promising, subject to their veto.
This would solve the bandwidth problem but doubles down on the correlation problem. if you peg the market to the approval of a few "amplified individuals", you aren't actually funding "alignment", you are funding "simulations" of Eliezer/John. If their models have blind spots, the market will efficiently punish anyone trying to explore those blind spots.
200k is pretty high. A higher salary can increase the number of applicantions, but it also increases the number of applications you'd need to filter through.
Maybe go more meta, and instead pay someone whose full-time job will be to find and interview people who want to work on AI Alignment, and do the paperwork (applying for other grants) for them.
I think this kind of funding has outsized impact.
Voting or otherwise delegating selection to the hive mind seems like a good way to minimize any potential impact.
Delegating to an already successful, widely respected authority in the domain is better, but those people are probably already steering most of the effort and funding in the field, either directly or by swaying general opinion.
Nomination-based seems like a good way to tap the wisdom of many different highly qualified people without filtering through a consensus process. It also reduces opportunity for gaming by insincere candidates, reduces the number of people who will spend time applying, and limits the number of applications you have to read.
For example: pick a nominating committee of maybe 10 ppl you think are wise and smart and knowledgeable and independent-thinking and different from each other, who would not be candidates now, but would be in a position to be aware of potential candidates. Ask each of them to nominate one candidate per available slot with a brief statement about why. The nominators’ identities should be secret, even to each other. Your goals and criteria should be articulated to them.
You could either screen those based on the nominator’s statements or invite short preliminary applications from all the nominees, but decide which of them you are personally most excited about and invite only 2-3 full applications or interviews per available slot. In the end pick recipients based on your own judgement.
I appreciate your thoughts.
I buy the reasoning that "delegating selection to the hive" could be suboptimal.
Also, as you have pointed out, the very best of the hive already have budgets to distribute or don't have spare time for this for other reasons.
Your exact proposal though implies that I can pick 10 wise and smart people (which is somewhat manageable, but I'd be still mostly deferring to a consensus opinion), and that I can make a final pick (which I most certainly can't, besides doing a "vibe-check")
I like the idea to make nominators secret to each other, to minimize the influence of social dynamics.
Zvi said:
Compelled by this, I'm considering funding ~three people for two years each to work on whatever they see fit, much like the the Thiel Fellowship, but with an AI Alignment angle.
I want to find people who are excited to work on existential risk, but are currently spending much of their time working on something else due to financial reasons.
Instead of delegating the choice to some set of grant makers, I think that aggregating the opinion of the crowd could work better (at least as good at finding talent, but with less overall time spent)
The best system I can think of at the moment would be to give every member of the alignment forum one vote with the ability to delegate it. Let everybody nominate any person in the world, including themselves, and award grants to the top 3.
I'm asking for feedback and advice: