One downside of [MATS] relative to an internship at an organisation is that there are fewer natural routes to enter a managed position...
I think you misspelled "upside".
(Also useful post, thankyou for publishing it.)
This is very helpful, thanks! Actually, the post includes several sections, including in the appendix, that might be more interesting to many readers than the grant recommendations themselves. Maybe it would be good to change the title a bit so that people also expect other updates.
I also found parts of this post surprisingly interesting, given the ultra-dry title and intimidating reading time.
To present this kind of content in a way more readers could benefit from, another option would be to post it as a small sequence, so people could vote and comment on separate sections.
(Cross-posted from the EA Forum.)
Introduction
This payout report is meant to cover the Long-Term Future Fund's grantmaking starting January 2022 (after our December 2021 payout report), going through April 2023 (1 January 2022 - 30 April 2023).
52 of our grantees, worth $1.41M, requested that we not include public reports for their grants. (You can read our policy on public reporting here.) We referred 2 grants to other funders for evaluation ($0.501M). Our median response time over this period was 29 days.
The rest of our grants are listed below (either in long or short form), as well as in our public grants database.
If you’re interested in receiving funding from the Long-Term Future Fund, apply here.
(Note: The initial sections of this post were written by me, Asya Bergal.)
Other updates
We've had a substantial increase in applications since 2021-- we averaged 35 applications per month in the latter half of 2021, 69 applications per month in 2022, and 90 applications per month so far in 2023.
Our funding bar went up at the end of 2022, in response to a decrease in the overall funding available to long-term future-focused projects. If we assume our numerical ratings are consistent, then applying our new bar to our earlier 2022 funding would imply not having funded 28% of earlier grants.
We're looking for more funding. We've spent an average of ~$1M per month across March, April, and May 2023 to maintain our current bar, have $992,870.53 in reserves as of July 3, and are ideally looking to fundraise at least $10M for the coming year.
As described in this post, we're trying to increase our independence from Open Philanthropy, which provided ~45% of our funding in 2022. As a transitional measure, over the next 6 months, Open Philanthropy will be matching funding given to the Long-Term Future Fund by small donors 2:1, for up to $3.5M total, making now a particularly good time to donate. Donate here. (The Long-Term Future Fund is part of EA Funds, which is a fiscally sponsored project of Effective Ventures Foundation (UK) (EV UK) and Effective Ventures Foundation USA Inc. (EV US). Donations to the Long-Term Future Fund are donations to EV US or EV UK.)
As a temporary measure in response to uncertainty about our future funding levels, we’ve put the bottom ~40% of grants above our current funding bar on hold. I think we’ll make several of those grants after this round of fundraising is over, but I generally expect our funding bar to vary more over time and to depend more on individual donations than it has historically.
I will be stepping down as chair of the fund by the end of October (and potentially earlier)-- I've written some reflections on my time on the fund here. We're looking for additional fund managers (including potential chair candidates)-- express interest here.
The fund's current fund managers are me (Asya Bergal), Linchuan Zhang, Oliver Habryka, and Caleb Parikh as permanent fund managers, and Thomas Larsen, Daniel Eth, Matthew Gray, Lauro Langosco, and Clara Collier as guest managers.
Our legal team asked us to highlight the eligibility criteria for our grants, which you can find in the appendices.
Highlights
Our grants include:
Payout reports
Longer grant write-ups
Grants evaluated by Linchuan Zhang
Stephen Grugett, James Grugett, Austin Chen ($200,000): 4 month stipend for 3 FTE to build a forecasting platform made available to the public based on user-created play-money prediction markets
Solomon Sia ($71,000): 6-month stipend for providing consultation and recommendations on changes to the US regulatory environment for prediction markets.
Grants evaluated by Oliver Habryka
Alexander Turner ($220,000): Year-long stipend for shard theory and RL mechanistic interpretability research
This grant has been approved but has not been paid out at the time of writing.
We’ve made grants to Alex to pursue AI Alignment research before:
We also made another grant in 2023 to a team led by Alex Turner for their post on steering vectors for $115,411 (total includes payment to 5 team members, including, without limitation, travel expenses, office space, and stipends).
This grant is an additional grant to Alex, this time covering his full-time stipend for a year to do more research in AI Alignment.
Only the first one has a public grant write-up, and the reasoning and motivation behind all of these grants is pretty similar, so I will try to explain the reasoning behind all of them here.
As is frequently the case with grants I evaluate in the space of AI Alignment, I disagree on an inside-view level pretty strongly with the direction of the research that Alex has been pursuing for most of his AI Alignment career. Historically I have been, on my inside-view, pretty unexcited about Alex’s work on formalizing power-seekingness, and also feel not that excited about his work on shard theory. Nevertheless, I think these are probably among the best grants the LTFF has made in recent years.
The basic reasoning here is that despite me not feeling that excited about the research directions Alex keeps choosing, within the direction he has chosen, Alex has done quite high-quality work, and also seems to often have interesting and useful contributions in online discussions and private conversations. I also find his work particularly interesting, since I think that within a broad approach I often expected to be fruitless, Alex has produced more interesting insight than I expected. This in itself has made me more interested in further supporting Alex, since someone producing work that shows that I was at least partially wrong about a research direction being not very promising is more important to incentivize than work whose effects I am pretty certain of.
I would like to go into more detail on my models of how Alex’s research has updated me, and why I think it has been high quality, but I sadly don’t have the space or time here to go into that much depth. In-short, the more recent steering vector work seems like the kind of “obvious thing to try that could maybe help” that I would really like to saturate with work happening in the field, and the work on formalizing power-seeking theorems is also the kind of stuff that seems worth having done, though I do pretty deeply regret the overly academic/formal presentation which has somewhat continuously caused people to overinterpret the strength of its results (which Alex also seems to have regretted, and is also a pattern I have frequently observed in academic work that was substantially motivated by trying to “legitimize the field”).
Another aspect of this grant that I expect to have somewhat wide-ranging consequences is the stipend level we set on. Some basic principles that have lead me to suggest this stipend level:
I have been rethinking stipend policies, as I am sure many people in the EA community have been since the collapse of FTX, and I haven’t made up my mind on the right principles here. It does seem like a pretty enormous number of good projects are no longer having the funding to operate at their previous stipend levels, and it’s plausible to me that we should take the hit, lose out on a bunch of talent, and reduce stipend levels to a substantially lower level again to be more capable of handling funding shocks. But I am really uncertain on this, and at least in the space of AI Alignment, I can imagine the recent rise to prominence of AI Risk concerns could potentially alleviate funding shortfalls (or it could increase competition by having more talent flow into the space, which could reduce wages, which would also be great).
See the Stipend Appendix below, “How we set grant and stipend amounts”, for more information on EA Funds’ determination of grant and stipend amounts.
Vanessa Kosoy ($100,000): Working on the learning-theoretic AI alignment research agenda
This is a grant to cover half of Vanessa’s stipend for two years (the other half being paid by MIRI). We also made another grant to Vanessa in Q4 2020 for a similar amount.
My model of the quality of Vanessa’s work is primarily indirect, having engaged relatively little with the central learning-theoretic agenda that Vanessa has worked on. The work is also quite technically dense, and I haven’t found anyone else who could explain the work to me in a relatively straightforward way (though I have heard that Daniel Filan’s AXRP podcast with Vanessa is a better way to get started than previous material, though it hadn’t been published when I was evaluating this grant).
I did receive a decent number of positive references for Vanessa’s work, and I have seen her make contributions to other conversations online that struck me as indicative of a pretty deep understanding of the AI Alignment problem.
If I had to guess at the effects of this kind of work, though I should clarify I am substantially deferring to other people here in a way that makes me not particularly trust my specific predictions, I expect that the primary effect would be that the kind of inquiry Vanessa is pursuing highlights important confusions and mistaken assumptions in how we expect machine intelligence to work, which when resolved, will make researchers better at navigating the very large space of potential alignment approaches. I would broadly put this in the category of “Deconfusion Research”.
Vanessa’s research resulted in various public blog posts, which can be found here.
Skyler Crossman ($22,000): Support for Astral Codex Ten Everywhere meetups
Especially since the collapse of FTX, I am quite interested in further diversifying the set of communities that are working on things I think are important to the future. AstralCodexTen and SlateStarCodex meetups seem among the best candidates for creating additional thriving communities with overlapping, but still substantially different norms.
I do feel currently quite confused about what a good relationship between adjacent communities like this and Effective Altruism-labeled funders like the Long Term Future Fund should be. Many of these meetups do not aim to do as much as good as possible, or have much of an ambitious aim to affect the long term future of humanity, and I think pressures in that direction would likely be more harmful than helpful, by introducing various incentives for deception and potentially preventing healthy local communities from forming by creating a misaligned relationship between the organizers (who are paid by EA institutions to produce as much talent for longtermist priorities) and the members (who are interested in learning cool things about rationality and the world and want to meet other people with similar interests).
Since this is a relatively small grant, I didn’t really resolve this confusion, and mostly decided to just go ahead with this. I also talked a bunch to Skyler about this, and currently think we can figure out a good relationship into the future on how it’s best to distribute funding like this, and I expect to think more about this in the coming weeks.
Grants evaluated by Asya Bergal
Any views expressed below are my personal views, and not the views of my employer, Open Philanthropy. (In particular, getting funding from the Long-Term Future Fund should not be read as an indication that the applicant has a greater chance of receiving funding from Open Philanthropy, and not receiving funding from the Long-Term Future Fund [or any risks and reservations noted in the public payout report] should not be read as an indication that the applicant has a smaller chance of receiving funding from Open Philanthropy.)
Alignment Research Center $54,543: Support for a research & networking event for winners of the Eliciting Latent Knowledge contest
Daniel Filan ($23,544): Funding to produce 12 more AXRP episodes, the AI X-risk Podcast.
We recommended a grant of $23,544 to pay Daniel Filan for his time making 12 additional episodes of the AI X-risk Research Podcast (AXRP), as well as the costs of hosting, editing, and transcription.
The reasoning behind this grant was similar to the reasoning behind my last grant to AXRP:
Daniel also shared some survey data in his grant application about how people rated AXRP compared to other AI alignment resources, though I didn't look at this closely when making the grant decision, as I already had a reasonably strong prior towards funding.
Grants evaluated by Caleb Parikh
Conjecture ($72,827): Funding for a 2-day workshop to connect alignment researchers from the US, UK, and AI researchers and entrepreneurs from Japan.
SERI MATS program ($316,000): 8 weeks scholars program to pair promising alignment researchers with renowned mentors. (Originally evaluated by Asya Bergal)
Robert Long ($10,840): travel funding for participants in a workshop on the science of consciousness and current and near-term AI systems
Please note this grant has been approved but at the time of writing it has not been paid out.
Jeffrey Ladish ($98,000): 6-month stipend & operational expenses to start a cybersecurity & alignment risk assessment org
Please note this grant has been approved but at the time of writing it has not been paid out.
Grants evaluated by Matthew Gray
Leap Laboratories ($195,000): One year of seed funding for a new AI interpretability research organisation.
Daniel Kokotajlo ($10,000): Funding for a research retreat on a decision-theory/cause-prioritisation topic.
Grants evaluated by Thomas Larsen
Kaarel Hänni, Kay Kozaronek, Walter Laurito, and Georgios Kaklmanos ($167,480): Implementing and expanding on the research methods of the "Discovering Latent Knowledge" paper.
This is a team which started in SERI MATS applying for funding to continue their SERI MATS project on research checking for dishonesty in advanced AI systems.
My cruxes for this type of grant are:
(1) If done successfully, would this project help with alignment?
(2) How likely is this team to be successful?
My thoughts on (1):
This is meant to build upon Burns’ et al.'s Discovering Latent Knowledge paper (DLK), which finds a direction in activation space that is supposed to represent the 'truth' of a logical proposition.
I think that Eliciting Latent Knowledge (ELK) is an important subproblem of alignment, and I think it can be directly applied to combat deceptive alignment. My independent impression is that this specific direction towards solving ELK is not very useful towards a full alignment solution, but that it may lead to slightly better monitoring. (In particular, I think even in a good outcome, this will only lead to an average case solution to ELK, meaning that when we explicitly train against this detector, it will fail.) I expect that AGI projects will be in a position where it's obvious that the systems they are building are capable and dangerous, and it will be apparent that instrumental incentives kick in for e.g. powerseeking and deception. I think that this technique might help us detect this danger, but given that we can't train against it, it doesn't let us actually fix the underlying problem. Thus, the lab will be in the difficult position of continuing on, or having to train against their detection system. I still think that incremental progress on detecting deception is good, because it can help push for a stop in capabilities growth before prematurely continuing to AGI.
My thoughts on (2):
They produced reasonable output during SERI MATS, including the beginning of a replication of the DLK paper. They weren't that specific in their grant application, but they wrote a number of ideas for ways to extend the paper in the LW post. The two ideas that seem best to me are:
These ideas don't seem amazing, but they seem like reasonable things to try. I expect that the majority of the benefit will come from staring at the model internals and the results of the techniques and then iterating. I hope that this process will churn out more and better ideas.
One reservation I have is that none of the applicants have an established research track record, though they have published several papers:
- Kaarel's Arxiv page
- Walter's Google Scholar Profile
- Georgios's ORCID
This team did get strong references from Colin Burns and John Wentworth, which makes me a lot more excited about the project. All things considered, I'm excited about giving this team a chance to work on this project, and see how they are doing. I'm also generally enthusiastic about teams trying their hand at alignment research.
Joseph Bloom ($50,000): Funding AI alignment research into circuits in decision transformers.
Joseph applied for independent research funding to continue his research into decision transformer interpretability. I'm happy about Joseph's initial result, which found circuits in a decision transformer in a simple RL environment. I thought the applicant's write up was solid and gave me some updates on what cognitive machinery I expect to be induced by RL. In particular, I was excited about the preference directions in embedding space that they constructed. This seems like a useful initial step for retargeting the search, though more understanding of the circuits that are doing the optimization seems critical for this approach.
I think interpretability on RL models is pretty neglected and very relevant for safety.
According to a reference, the applicant was also in the top 3 ARENA participants, and was very motivated and agentic.
The counterfactual is that Joseph tries to get funding elsewhere, and if that fails, getting a research engineer job at an AI safety org (e.g. Redwood, Conjecture, Ought, etc). I encouraged this person to apply to the AI safety orgs, as I think that working at an org is generally more productive than independent research. These jobs are quite competitive, so it's likely that Joseph won't get hired by any of them, and in this case, it seems great to pay him to do independent alignment research.
Overall, I think that Joseph is a promising researcher, and is working on a useful direction, so I feel excited about supporting this.
Since receiving this grant, Joseph has received some more funding (here), and was mentioned in the Anthropic May Update.
Other grants we made during this period
Appendix: How we set grant and stipend amounts
(Our legal team requested that we include this section; it was written by Caleb Parkih.)
Over the last year, we have directed a significant portion of our grants toward supporting individuals in the field of AI safety research. When compared to much of the non-profit sector, some of our grants may seem large. However, I believe there are strong justifications for this approach.
Our grantees often have excellent earning potential
Our grantees often exhibit extraordinary earning potential due to their skills and qualifications. Many of them are excellent researchers (or have the potential to become one in a few years) and could easily take jobs in big tech or finance, and some could command high salaries (over $400k/year) while conducting similar research at AI labs. I expect that offering lower grants would push some grantees to take higher-earning options in private industry, creating less altruistic value. My impression is that our grants are not larger than comparable grants or salaries offered by many established AI safety organizations. In fact, I anticipate our grants are likely lower.
Grants have substantive downsides relative to working in an organisation
Grants, while helpful, do have some drawbacks compared to conventional employment. We do not provide additional benefits often found in organizations, such as health insurance, office spaces, or operations support, and our stipends often offer less financial security than full-time employment. Often, a portion of a grant is designed to support grantees’ operational and living expenses while they pursue their research projects.
Generally, we expect our grantees to work full-time on their projects, with similar intensity to the work they’d do at other organizations within EA and AI safety, and we structure our grants to account for this amount of work. There are of course, benefits such as our grantees having more flexibility than they would in many organizations.
How we decide on personal stipend size
The fund operates as a collection of fund managers who sometimes have differing views on how much to fund a grantee for.
Our general process is:
One heuristic we commonly use (especially for new, unproven grantees) is to offer roughly 70% of what we anticipate the grantee would earn in an industry role. We want to compensate people fairly and allow them to transition to impactful work without making huge sacrifices, while conserving our funding and discouraging grifters. A relatively common procedure for fund managers to use to decide how much to fund a grantee (assuming a fund manager has already decided they're overall worth funding), is to:
Appendix: Eligibility criteria for LTFF grants
(Our legal team requested that we include this section; it was written by Caleb Parikh.)
Generally, our grants fulfil one of the following additional criteria:
Appendix: Special note on upskilling grants
(Our legal team requested that we include this section.)
One of LTFF’s overall charitable purposes is to encourage qualified and thoughtful individuals to think about and find solutions for global catastrophic risks, such as advanced artificial intelligence. We do this by funding such individuals to research issues like AI alignment so that they become more knowledgeable in and/or potentially change their career path to fully invest in these issues.