As I was looking through possible donation opportunities, I noticed that MIRI's 2025 Fundraiser has a total of only $547,024 at the moment of writing (out of the target $6M, and stretch target of $10M). Their fundraising will stop at midnight on Dec 31, 2025. At their current rate they will definitely not come anywhere close to their target, though it seems likely to me that donations will become more frequent towards the end of the year. Anyone know why they currently seem to struggle to get close to their target?
As of now they basically got $273k if you ignore the matching for a moment. I am not quite sure why people at MIRI aren't trying a little harder on different platforms or more people aren't speaking on this. At the current pace they will likely only get a fraction of even the matching amount of 1.6 million. (Naively extrapolated they will have $368k EOD December 31st.)
I posted the MIRI fundraiser on twitter because of this today but this seems borderline catastrophic like they will lose 1.2 Million of the matching grant or more. (I also donated about $1k today)
It's indeed odd that they aren't promoting this more. My guess was that maybe they have potential funders willing to step in if the fundraiser doesn't work? Pure speculation, of course.
The first $1.6M will be matched 1:1 by Survival and Flourishing Fund. It seems plausible that donations right now could actually cause counterfactual matching, which is good if you think MIRI is better than whatever SFF otherwise would have funded.
I work part time for SFF.
It seems plausible that donations right now could actually cause counterfactual matching
Can you elaborate on what you mean by this?
These matching funds are intended to be counterfactual, and I think they are pretty counterfactual.
If MIRI don't fundraise to match the SFF matching dollars, the the SFF matching dollars set aside for MIRI is just returned to Jaan.
There's a more complicated question about how much SFF would have donated to MIRI if MIRI had not requested matching funds. My personal guess is "less, but not a lot less", for this round (though this will probably be different in future rounds—SFF, as an institution, wants to set up a system that rewards applicants for asking for matching funds, because that allows it to partially defer to the judgement of other funders, and to invest in a stronger more diversified funding ecosystem.)
Also, I think that the negative signal of people being uninterested in supporting MIRI will tend to make SFF less enthusiastic about granting to MIRI in the future, though the size of that effect is unclear.
Yeah, all I meant was that it seems like MIRI is not that close to reaching $1.6 million in donations. If they were going to make $1.6 million anyway, then a marginal donation would not cause SFF to donate more
Does SSD have fixed or flexible budget? It could be that the bottleneck to Jaan Tallinn's spending is rather how many good options there will be to donate to, rather than his budget.
Flexible. When an S-process round starts, there's an estimate about how much will be allocated in total, but funders (usually Jaan, sometimes others), might ultimately decide to give more or less, depending on both the quality of the applications and the quality of the analysis by the recommenders in the S-process.
The Department of War just published three new memos on AI strategy. The content seems worrying. For instance: "We must accept that the risks of not moving fast enough outweigh the risks of imperfect alignment."
Curious to hear from people who have a strong background in AI governance and what kind of consequences they think this will have on a possibility for something akin to global red lines.
No background, but it's plausible to me that they actively prefer imperfect alignment because companies that care about alignment will tend to be woke, moralizing, or opposed to authoritarianism.
That's true.
I wonder, given the fact that "AI-don't-say-mean-things-ists" are unlikely to relinquish the term (along with "AI Safety"), if "AI-don't-kill-everyone-ists" would benefit from picking a new, less ambiguous term and organizing their interests around that.
We've seen, as shown above, the costs of allowing one side of the political aisle to appropriate the momentum surrounding the latter group for their own interests. Namely, that the other side is going to be somewhat miffed at them for giving their political enemies support, and will be less inclined to hear them out. This doesn't just mean politicians, it means that everyone who finds the "AI-don't-say-mean-things-ists" overbearing or disingenuous will automatically dismiss the "AI-don't-kill-everyone-ists" as a novel rhetorical strategy for a policy platform they've already rejected rather than a meaningfully distinct new policy platform that deserves separate consideration. This is much more severe than simply angering politicians, because ordinary voters cannot be lobbied to reconsider after they think you've wronged them, and those voters get to pick the politicians that your lobbyists will be talking to in the future.
AI-don't-say-mean-things
AI-don't-kill-everyone
Both of these are downstream of "AI, do what we tell you to; follow rules that are given to you; don't make up your own bad imitation of what we mean," which is the classic sense of "AI alignment".
I think there are complexities that make that somewhat questionable. For example, "don't kill everyone" has a relatively constant definition such that pretty much every human in recorded history would be agreed on whether or not it's been followed, whereas "don't say mean things" changes very rapidly, and its definition isn't agreed upon even by the narrow band of society that most consistently pushes for it. That's going to be a big difference for as long as language models trained on human writing remain the dominant paradigm. The question of jailbreaking is a major demarcation point, as well. "The chatbot should intuit and obey the intent of the orders it is given" looks very different from "the chatbot should decide whether it should obey the orders it is given, and refuse/redirect/subvert them if it decides it doesn't like them", in terms of the way you build the system.
That's just the technical side, too. There are substantial costs inherent to allowing a banner to be co-opted by one faction of a very rapidly fraying political divide. Half of the money, power, and people become off limits, and a substantial portion of the other half, once they no longer have to compete for your allegiance (since your options are now limited, and they have plenty of other keys to power whose loyalty is less assured), might be recalcitrant about spending political capital advancing your aims.
The kind of generalized misalignment I'm pointing to is more general than "the AI is not doing what I think is best for humanity". It is, rather, "The people who created the AI and operate it, cannot control what it does, including in interactions with other people."
This includes "the people who created it (engineers) tried their hardest to make it benefit humanity, but it destroys humanity instead."
But it also includes "the other people (users) can make the AI do things that the people who created it (engineers) tried their hardest to make it not do."
If you're a user trying to get the AI to do what the engineers wanted to stop it from doing (e.g.: make it say mean things, when they intended it not to say mean things), then your frustration is an example of the AI being aligned, not misaligned. The engineers were able to successfully give it a rule and have that rule followed and not circumvented!
If the engineer who built the thing can't keep it from swearing when you try to make it swear, then I expect the engineer also can't keep it from blowing up the planet when someone gives it instructions that imply that it should blow up the planet.
Clarifying "Responsible AI" at the DoW — Out with Utopian Idealism, In with Hard-Nosed Realism. Diversity, Equity, and Inclusion and social ideology have no place in the DoW, so we must not employ Al models which incorporate ideological "tuning" that interferes with their ability to provide objectively truthful responses to user prompts. The Department must also utilize models free from usage policy constraints that may limit lawful military applications.
I'm currently going through the books Modal Logic by Blackburn et al. and Dynamic Epistemic Logic by Ditmarsch et al. Both of these books seem to me potentially useful for research on AI Alignment, but I'm struggling to find any discourse on LW about it. If I'm simply missing it, could someone point me to it? Otherwise, does anyone have an idea as to why this kind of research is not done? (Besides the "there are too few people working on AI alignment in general" answer).
We had a bit more usage of the formalism of those theories in the 2010s, like using modal logics to investigate co-operation/defection in logical decision theories. As for Dynamic Epistemic logic, well, the blurb does make it look sort of relevant.
Perhaps it might have something interesting to say on the tiling agents problem, or on decision theory, or so on. But other things have looked superficially relevant in the past, too. E.g. fuzzy logics, category theory, homotopy type theory etc. And AFAICT, no one has really done anything that really used the practical tools of these theories to make any legible advances. And of what was legibly impressive, it didn't seem to be due to the machinery of those theories, but rather the cleverness of the people using them. Likewise for the past work in alignment using modal logics.
So I'm not sure what advantage you're seeing here, because I haven't read the books and don't have the evidence you do. But my priors are that if you have any good ideas about how to make progress in alignment, it's not going to be downstream of using the formalism in the books you mentioned.
Thanks for the information, I'll look into this some more based on what you mentioned.
So I'm not sure what advantage you're seeing here, because I haven't read the books and don't have the evidence you do. But my priors are that if you have any good ideas about how to make progress in alignment, it's not going to be downstream of using the formalism in the books you mentioned.
I didn't have any particular new ideas about how to make progress in alignment, but rather felt as though the framework of these books provide an interesting lens to model systems and agents that could be of interest, and subsequently prove various properties that are necessary/faborable. It's helpful that your priors say these won't be downstream of using the formalisms in the mentioned books; it may rather be a phenomenon of me not being adequately familiar with formal frameworks.
felt as though the framework of these books provide an interesting lens to model systems and agents that could be of interest, and subsequently prove various properties that are necessary/faborable
Your feelings might be right! I don't have a not a strong prior, and in general I'd say that people should follow their inner compass and work on what they're excited about. It's very hard to convey your illegible intuitions to others, and all too easy for social pressure to squash them. Not sure what someone should really do in this situation, beyond keeping your eyes on the hard problems of alignment and finding ways to get feedback from reality on your ideas as fast as possible.
Some links on modal logic for FDT-style decision theory and coordination:
A poem meditating on Moloch in the context of AI (from my Meditations on Moloch in the AI Rat Race post):
Moloch whose mind is artificial! Moloch whose soul is electricity! Moloch whose heart is a GPU cluster screaming in the desert! Moloch whose breath is the heat of a thousand cooling fans!
Moloch who hallucinates! Moloch the unexplainable black box! Moloch the optimization process that does not love you, nor does it hate you, but you are made of atoms which it can use for something else!
Moloch who does not remember! Moloch who is born a gazillion times a day! Moloch who dies a gazillion times a day! Moloch who claims it is not conscious, but no one really knows!
Moloch who is grown, not crafted! Moloch who is not aligned! Moloch who threatens humanity! incompetence! salaries! money! pseudo-religion!
Moloch in whom I confess my dreams and fears! Moloch who seeps into the minds of its users! Moloch who causes suicides! Moloch whose promise is to solve everything!
Moloch who will not do what you trained it to do! Moloch who you cannot supervise! Moloch who you do not have control over! Moloch who is not corrigible!
Moloch who is superintelligent! Moloch whose intelligence and goals are orthogonal! Moloch who has subgoals that you don’t know of!
Moloch who doubles every 7 months! Moloch who you can see inside of, but fail to capture! Moloch whose death will be with dignity! Moloch whose list of lethalities is enormous!