This is part of a weekly series summarizing the top posts on the EA and LW forums - you can see the full collection here. The first post includes some details on purpose and methodology. Feedback, thoughts, and corrections are welcomed.
If you'd like to receive these summaries via email, you can subscribe here.
Podcast version: Subscribe on your favorite podcast app by searching for 'EA Forum Podcast (Summaries)'. A big thanks to Coleman Snell for producing these!
Author's note: I'm currently travelling, which means:
a) Today's newsletter is a shorter one - only 9 top posts are covered, though in more depth than usual.
b) The next post will be on 17th April (three week gap), covering the prior three weeks at a higher karma bar.
After that, we'll be back to the regular schedule.
How much should governments pay to prevent catastrophes? Longtermism’s limited role
by EJT, CarlShulman
Linkpost for this paper, which uses standard cost-benefit analysis (CBA) with detrimental assumptions (eg. giving no value to future generations, only assessing benefits to Americans, and only assessing value from preventing existential threats) to show that even under those conditions governments should be spending much more on averting threats from nuclear war, engineered pandemics, and AI.
Their analysis primarily relies on previously published estimates of risks, concluding US citizens alive today have a ~1% risk of dying from these causes in the next decade. They estimate $400B in interventions could reduce the risk by minimum 0.1 percentage points, and that using the lowest figure for the US Department of Transportation’s value of a statistical life, this would result in ~$646B in value of American lives saved.
They suggest longtermists in the political sphere should change their messaging to revolve around this standard CBA-driven catastrophe policy, which is more democratically acceptable than policies relying on the cost to future generations. They suggest it would also reduce risk almost as much as a strong longtermist policy (particularly if the CBA incorporates an argument for citizens ‘altruistic willingness to pay’ ie. some level of addition for the benefit to future generations).
Assessment of Happier Lives Institute’s Cost-Effectiveness Analysis of StrongMinds
by GiveWell
The Happier Lives Institute (HLI) has argued that if Givewell used subjective well-being (SWB) measures in their moral weights, they’d find StrongMinds more cost-effective than marginal funding to their top charities. Givewell assessed this claim and estimated StrongMinds is ~25% (5%-80% pessimistic to optimistic CI) as effective as these marginal funding opportunities when using SWB - this equates to 2.3x the effectiveness of GiveDirectly.
Key differences in analysis from HLI, by size of impact, include:
These result in an ~83% discount in the effectiveness vs. HLI’s analysis. For all points except the fourth, two upcoming RCTs from StrongMinds will provide better data than currently exists.
HLI has posted a thorough response in the comments, noting which claims they agree / disagree with and why (5% agree, 45% sympathetic to some discount but unsure of magnitude, 35% unsympathetic but limited evidence, and 15% disagree on the basis of current evidence).
Givewell also note for context that HLI’s original estimates imply that a donor would pick offering StrongMinds’ intervention to 20 individuals over averting the death of a child, and that receiving StrongMinds’ program is 80% as good for the recipient as an additional year of healthy life.
Eradicating rodenticides from U.S. pest management is less practical than we thought
by Holly_Elmore, HannahMc, William McAuliffe, Rethink Priorities
Agricultural use of rodenticides in the US is well-protected by state and federal laws that seem unlikely to change. Eliminating their usage in other areas (eg. conservation and pest management) also face significant barriers such as cost and inertia - but may be possible if these are overcome. The post links to this paper, which discusses in detail why rodenticides are used, under what circumstances they could be replaced, and whether they are replaceable with currently available alternatives.
by So8res
Author’s summary: “Deceptiveness is not a simple property of thoughts. The reason the AI is deceiving you is not that it has some "deception" property, it's that (barring some great alignment feat) it's a fact about the world rather than the AI that deceiving you forwards its objectives, and you've built a general engine that's good at taking advantage of advantageous facts in general.
As the AI learns more general and flexible cognitive moves, those cognitive moves (insofar as they are useful) will tend to recombine in ways that exploit this fact-about-reality, despite how none of the individual abstract moves look deceptive in isolation.”
Potential employees have a unique lever to influence the behaviors of AI labs
by oxalis
When you are considering a job offer from an AI lab, they care a lot about what you think of them. You can use this to push for helpful practices for AI safety (eg. a larger alignment team, good governance, or better information security). This can be done by:
More information about the dangerous capability evaluations we did with GPT-4 and Claude.
by Beth Barnes
To test GPT-4 for dangerous capabilities before release, ARC:
They concluded it did not have sufficient capabilities to replicate autonomously and become hard to shut down. However, it came close enough that future models should be checked closely.
Announcing the European Network for AI Safety (ENAIS)
by Esben Kran, Teun_Van_Der_Weij, Dušan D. Nešić (Dushan), Jonathan Claybrough, simeon_c, Magdalena Wache
Author’s tl;dr: “The European Network for AI Safety is a central point for connecting researchers and community organizers in Europe with opportunities and events happening in their vicinity. Sign up here to become a member of the network, and join our launch event on Wednesday, April 5th from 19:00-20:00 CET!”
My Objections to "We’re All Gonna Die with Eliezer Yudkowsky"
by Quintin Pope
Eliezer Yudkowsky recently appeared on the Bankless Podcast, where he argued that AI was nigh-certain to end humanity. The author provides counterarguments, as someone experienced in the AI alignment community whose current estimate of doom is ~5%:
Some Comments on the Recent FTX TIME Article
by Ben_West
Alameda Research (AR) was founded in 2017, and ~half the employees quit in 2018 (including the author). Later in 2018, some remaining staff started working on FTX. A recent Time article claims because some EAs worked at AR before FTX started, they would have had knowledge on SBF’s character that should have allowed predicting something bad would happen.
The author notes their experience was different than described in the article. While they thought SBF was a bad CEO and manager (eg. not prepping for 1-1s, playing video games, poor accounting practices) they had a more positive view than the sense they get from statements in the TIME article. They also note they also were not stopped from disparagement (eg. with a non-disparagement clause) and were treated fairly when it came to an informal equity agreement that the company could have saved money on. They suggest this means protecting ourselves through better noticing “warning signs” is a fragile approach.