Supported by Rethink Priorities
This is part of a weekly series - you can see the full collection here. The first post includes some details on purpose and methodology.
If you'd like to receive these summaries via email, you can subscribe here.
Podcast version: prefer your summaries in podcast form? A big thanks to Coleman Snell for producing these! Subscribe on your favorite podcast app by searching for 'Effective Altruism Forum Podcast'.
Philosophy and Methodologies
by Violet Hour
Longtermist philosophy is pretty reasonable (future people matter, there might be a lot, and we can make a difference to them). However many outside EA find the priorities that have arisen from these (eg. AI safety & bio risk) to be weird. The author argues this is due to EA’s unusual epistemic culture, and uses this post to highlight these norms and how they influence our decision-making.
In particular, EAs tend to be comfortable with speculative reasoning, put numbers on things (even when they’re highly unsure), use those numbers as inputs to decision-making, but are still skeptical if all that leads to anything too speculative and fanatical. The author suggests being explicit about these norms because that allows better outside criticism, or if we’re really onto something, allows others to benefit from it.
by William McAuliffe, Adam_Shriver
Pains vary in their severity and duration. This report reviews the research and philosophy on how to trade off these two dimensions, which can impact cause prioritization decisions.
Some viewpoints explored include that badness scales non-linearly with severity of pain, or that long-duration pain can only outweigh a high severity pain if it meets the bar of preventing pleasure or making more moments bad than good. Utilitarian views simply multiplying (severity X duration) are also presented. It’s also possible these trade-offs vary between individuals - one study found most participants make decisions as if adding severity and duration to get badness, but a minority multiply them.
Ethical constraints, severity being more salient in retrospect, imagination failures and other factors make research and experimentation in this area difficult. The authors are planning to gather scientists and philosophers at a workshop to develop new methodologies to push the area forward.
The author conducted a small scale (N=30) survey in a Kenyan slum in 2021, which found most participants rated themselves closer to the ‘worst possible situation’ than ‘best possible situation’ and the median participant wanted to live 2 more years if their situation didn’t change.
Taking this into account could influence Givewell recommendations. For instance, Givewell recommended a grant to support the deregistration of pesticides commonly used in suicide on the basis of lives saved. However, these lives are likely valued negatively, and the grant could have negative impacts on agricultural productivity and therefore quality of life for others.
Object Level Interventions / Reviews
The author systematically experimented with different antidepressants over a year period, after putting together a best guess ranked list with their psychiatrist. They share both this desk research and the results of their personal experiment. While the year was grueling, they found a drug with good effectiveness and limited side effects for them. Antidepressant effects vary significantly between individuals, so they suggest this process could be worthwhile for others too (particularly if they have lots of money and support to help with the effects during). They also found CBT and changing their job role to focus on particularly satisfying / exciting tasks were a big help.
by George Stiffman
Chinese tofus are varied (eg. some are melty, cheese-tasting, crumb-like outsides), but little known outside China. Expanding access to these could save substantial amounts of animal lives.
Limited supply and awareness are bottlenecks, particularly as shipping is expensive if done in small quantities. Encouraging existing trading companies to import more, helping local producers scale up, or creating a new distribution company are all potential solutions. Developing novel uses for the tofus, or research into how ingredients have gained popularity previously would also be helpful.
You can support this project by co-founding various types of organizations, funding the author, connecting them with cofounders / chefs / researchers / etc., research, or advising. More details on each in the post.
by Apart Research, Esben Kran
Most models of AI risk have a number of discrete steps which all need to be true for bad outcomes to occur. These models calculate total risk by multiplying the central probability estimate of each step together. This is statistically incorrect for conditional and independent steps. Eg. If the central estimate of each of 4 steps were 60%, by simple multiplication that’s 13%. However we actually have a probability distribution for each step - and if we end up in world with an unlikely result in the lower tail on one, and an unlikely result in the higher tail on another, the final probability is hugely reduced eg. 60%*60%*5%*99% is only 1.8%. This means if we keep sampling from the distributions for each event, simulating possible worlds, we will get a lower predicted risk than if we simply multiply our best guesses for each step together.
The author collects estimates from the community and AI risk experts on each step of a well-accepted path to AI risk (Carlsmith model, 2021), which via simple multiplication ends up around the usual estimates in the 15-30% range. However, via sampling from the distribution of answers, they find we are far more likely to be in a world with <3% risk of catastrophe due to out-of-control AGI, with a geometric mean of only 1.6% risk. This analysis also allows us to identify which steps are most important for determining if we are in a low or high risk world, which could be useful for prioritizing research directions.
A top comment notes that this method requires independence of each step of the AI risk model for a particular expert, and that assumption is likely not met and can hugely influence results.
An incomplete draft (though still with lots of useful findings) from 2019/2020 on the probability that a catastrophe that caused civilizational collapse might lead to indefinite technological stagnation. Explores three questions in relation to this:
- If we re-ran history, would we see the agricultural and industrial revolutions again?
- Would technological progress look different in a post-collapse world?
- What are the recovery timelines for a collapsed civilization?
The author briefly (1-2 paragraphs each) ranks the world’s top 10 billionaires according to how much value / impact they’ve created through their business and philanthropic activities.
Opportunities & Resources
Jobs, programs, competitions, fellowships, courses, resources, and more.
by Akhil, Leonie Falk
Fellows will participate in training on evidence-based research, and then produce a shallow report on a pre-selected global health and development (GHD) cause area. The research will be aimed at novel areas with the hope to identify new interventions that could be competitive with the top of the field currently.
Applications are open until 30th Oct for the pilot, which will run 7th Nov - 20th Dec.
Announcing Squigglepy, a Python package for Squiggle
by Peter Wildeford
Squiggle is a simple programming language for intuitive probabilistic estimation. This package implements many squiggle-like functionalities in Python. It also includes utility functions for Bayesian networks, pooling forecasts, laplace, and kelly betting.
by Anne Nganga
Applications are open for the 2023 Effective Altruism Africa Residency Fellowship. The program runs Jan 15th - Mar 31st, and is aimed at providing support and community for EAs working on improving wellbeing in Africa. Accommodation and working space are provided.
The author asked Chris Bakerlee (Senior Program Associate for Biosecurity and Pandemic Preparedness at Open Philanthropy) for biosecurity roles he is excited to see filled right now. He responded with an Executive Assistant role on his team, and a Senior Program Officer / Senior Director for Global Biological Policy and Programs role at Nuclear Threat Initiative.
Community & Media
All public grants by EA Funds will now appear in this database. Entries include project summaries, the grantee, which fund paid, and the payout amount.
by Probably Good, High Impact Medicine
A guide to impactful careers within the medical space, primarily aimed at existing doctors and medical students. Includes ways to have more impact within clinical work (eg. taking high paying roles and donating) as well as high-impact alternatives that benefit from a medical background (eg. medical research, public health, biosecurity, and nonprofit entrepreneurship).
by lukasj10, Isaac_Esparza
Healthier hens investigate dietary interventions to improve the welfare of cage-free hens and engage farming stakeholders to adopt these interventions. In Y1 they spent most of their budget on staff, research, and travel. In Y2 they intend to ramp up their program work. However, they are short of funding (missing 180K out of 230K needed for Y2) and looking for donations.
by Joel Tan (CEARCH)
CEARCH is a new org focused on cause prioritization research. They will investigate a large number of causes shallowly, doing more intensive research if the cost-effectiveness of the cause seems plausibly at least one magnitude higher than a Givewell top charity. So far they have completed three shallow investigations: nuclear war, fungal disease, and asteroids. Asteroids ranked highest (2.1x top Givewell charities), surprising the researchers.
A new organization building an app for effective self-improvement and reflection, initially targeting EAs. The app distinguishes itself via a focus on extensive customization and self-testing of plans to tackle internal obstacles and change mindsets.
You can help by trying the beta version and providing feedback on what does / doesn’t work for you personally, getting in touch if you do EA wellbeing workshops or coaching, joining the team (several open positions) or giving feedback on the website / comms.
by Richard Möhn
Organisations not used to hiring might outsource it, but hiring firms don’t always do a good job - and the author has seen an example where it was hard for founders without hiring experience to identify this. In that example, the hiring company:
- Turned candidates off with long ads, unnecessary requirements, unclear process, and delays
- Failed to distinguish good candidates due to asking questions that didn’t dig into the candidates experience
- Rejected candidates late in the process via email with a form letter that stated no feedback could be given
Google has plugged large language models into physics simulators, to allow them to reason better about the physical world. This increased performance on physics questions / tasks by a large margin eg. 27% zero-shot absolute accuracy improvements, and allowing small LMs to perform at the level of 100x bigger ones that didn’t have physics simulator access (on these specific questions).
Some people believe logical decision theory (LDT) agents are friendly, and so if AI was one, we’d be alright. The author argues this is incorrect, because cooperative behavior for an LDT (eg. in Prisoner’s Dilemmas, or two-boxing newcombe’s problem) is entirely based on maximizing utility - not an in-built property of cooperativeness. If they don’t expect helping us to lead to better outcomes on their goals, they won’t help us.
As an alignment researcher, the author often has to make decisions on what things to pay attention to vs. ignore. Eg. will shard theory turn out? Will a certain conjecture be proven even if they don’t focus on it? However prediction markets focus almost exclusively on AI capability timelines. Eg. will we have an AI-generated feature film by 2026? Will AI wipe out humanity by 2100?
The author suggests more predictions that affect researchers day-to-day decision-making would make prediction markets more impactful.
Summary repeated from last week as context for the next two posts, which directly respond to this one.
Counters to the argument that goal-directed AIs are likely and it’s hard to align them to good goals, so there’s significant x-risk:
- AIs may optimize more for ‘looking like they’re pursuing X goal’ than actually pursuing it. This would mean they wouldn’t go after instrumental goals like money or power.
- Even if an AI’s values / goals don’t match ours, they could be close enough, or be non-destructive. Or they could have short time horizons that don’t make worldwide takeovers worth it.
- We might be more powerful than a superintelligent AI. Collaboration was as or more important than intelligence for humans becoming the dominant species, and we could have non-agentic AIs on our side. AIs might also hit ceilings in intelligence, or be working on tasks that don’t scale much with intelligence.
- The core AI x-risk argument could apply to corporations too - but we don’t consider them x-risks. Corporations are goal-directed, hard to align precisely, far more powerful than individual humans, and adapt over time - but aren’t considered x-risks.
by Erik Jenner, Johannes_Treutlein
Counterarguments to each of Katja’s points in the post above, drawing from the existing literature. These defend the position that if AI proceeds without big advances in alignment, we would reach existential catastrophe eventually:
- Goal-directedness is vague / AIs may optimize more for ‘looking like they’re pursuing X goal’ than actually pursuing it. Counter: If we define ‘goal-directedness’ as ‘reliably ensuring a goal will be achieved’ then economic pressures will tilt toward this. To ensure very hard goals are achieved, the AI will need to use novel methods / a wide action space eg. ‘acquire lots of power first’.
- An AI’s values could be close enough to ours. Counter: Imagine an AI is rewarded when sensors say a diamond is in a room. So it manipulates the sensors to always say that, instead of protecting the diamond. These are hugely different values that arise from the training signal not distinguishing ‘this looks good to humans’ and ‘this is actually good for humans, given full knowledge’ - which could be a common failure mode. Human values might vary little, but AI could vary a lot, particularly when working in situations with no training examples (because we don’t have superhuman performance to train on).
- We might be able to handle a superintelligent AI. Counter: While some tasks don’t benefit from intelligence, many do (eg. take over the world) and eventually someone will direct AI at one of these tasks, and keep improving it because of economic incentives. The question is if we can have superhuman alignment research (or another alignment solution) first.
- The core AI x-risk argument could apply to corporations too. Counter: corporations have limited scaling in comparison to AI, due to finite numbers of people and inadequate coordination.
by Matthew Barnett, Ege Erdil, Brangus Brangus
Transcript of a conversation between Ege Erdil and Ronny Fernandez about Katja’s post above. Mostly focused on two of the arguments:
1. An AI’s values could be close enough to ours. Our training processes train things to look good to humans, not to be good. Even if these are only rarely badly different, if we run enough powerful AIs enough times, we’ll get that case and therefore catastrophe. And we might not have a chance to recognize it / recover because of the powerful optimization of the AI towards it. This is particularly likely for AIs doing things we find hard to rate (eg. ‘does this look like a human face?’ - the example in Katja’s post - is much easier than ‘is this or that similar world better?’)
2. The core AI x-risk argument could apply to corporations too. Counter: corporations are bad at coordination. AIs can be much better (eg. combine forces toward a weighted merge of their goals).
by leogao, John Schulman, Jacob_Hilton
Author’s tl;dr: “Reward model (RM) overoptimization in a synthetic-reward setting can be modelled surprisingly well by simple functional forms. The coefficients also scale smoothly with scale. We draw some initial correspondences between the terms of the functional forms and the Goodhart Taxonomy. We suspect there may be deeper theoretical reasons behind these functional forms, and hope that our work leads to a better understanding of overoptimization.”
The author experimented for 2 weeks with consciously learning ‘with a vengeance’, aiming to avenge whatever they lost because they didn’t learn the thing earlier. They had better motivation and recall, and suggested others try the same.
The author believes there are double-digit odds of AI-caused extinction in the next century. However, this is less salient than the >50% that as a currently-49-year-old they will die in the next 3-4 decades, with increasing odds every year - particularly after several health scares. It’s hard to focus on anything above personal survival.
Treating a plan as a step-by-step list that we should always optimize toward isn’t as helpful as developing multiple plans, identifying common bottlenecks between them, and tackling those. This is particularly the case if your field is preparadigmatic and you’re working on hard problems, as it allows you to adapt when surprises are thrown your way.
In this case, a plan simply becomes one path we predict. We might even have a mainline / modal plan we most expect. But we’re selecting our actions to be useful in both this and other paths.
Compression works via assuming knowledge on the receiver’s end. If we know the receiver understands 4x1 to mean 1111 then we can compress binary. If we know the receiver understands the general idea that problems are easier to fix early on when they’re small, we can compress a reminder as ‘a stitch in time saves nine’.
When we share wisdom or learnings, we lose a lot of the value - there is no way for the receivers to ‘unzip’ these lessons and get an understanding of the experiences, context, and nuance that formed them.
The author tried many things to deal with a medical problem on the advice of doctors, was eventually suggested a treatment for a different issue, tried it, and it solved the original problem (in this case - a particular antihistamine taken for rash dealt with difficulty digesting protein). They also ran studies on self help books and found no correlation between helpfulness and rigor / theoretical backing, and ran an RCT on ketone esters and found no benefits despite them and friends getting insane gains from them.
They conclude that “once you have exhausted the reliable part of medicine without solving your problem, looking for a mechanistic understanding or empirical validation of potential solutions is a waste of time. The best use of energy is to try shit until you get lucky.”
by Scott Garrabrant
The first post in a sequence introducing a new voting system. This post introduces background framework, notation, and voting theory.
Important criteria for voting theories include:
- Condorcet: If a candidate would defeat all others in one-on-one elections, that candidate should win.
- Consistent: If two disjoint electorates would produce the same result, then combined, they should also produce that result.
- Participation: No voter should be made worse off by voting compared to staying home.
- Clone: If a set of candidates are clumped together in all voters preference orderings, the result of the election should be the same as if they were a single candidate.
Most voting methods violate at least one of these principles eg. in all deterministic voting systems the condorcet and consistent principles are in conflict.
by Scott Garrabrant
Maximal lotteries are a voting system where if anyone would win against all others 1-1, they do. If that’s not the case, votes create probability distributions (eg. it may assign 80% to one candidate), and then a random number is rolled to determine the winner.
This system fulfills all 4 voting principles from the previous post. It is distinct from the ‘random dictatorship’ voting method (choose a random person, go with their vote as the winner) only in that it first checks and fulfills the concordance principle, so a clear winner will always win.
by Lost Futures
Some technologies couldn’t have been invented much earlier than they were, because they rely on prior discoveries. Others were possible for an extended time before being discovered - hot air balloons are one of these.
The basic principles were operating in Chinese sky lanterns over a thousand years before hot air balloons were first invented. Once someone experimented to create a balloon in 1782, there was a working version within a year and it quickly proliferated around the world. It’s possible it was bottlenecked on textile prices or quality, but even accepting that it would still have been discovered 10s to 100s of years later than it could have been.
Intelligent 13-18 year olds who aren’t ambitious enough to start their own projects have those years somewhat wasted by school busywork. Making meaningful work more accessible to them would be good.
Introduction to abstract entropy by Alex_Altair
Notes on "Can you control the past" by So8res
This Week on Twitter
Meta released Universal Speech Translator, the first AI speech-to-speech translation system - which works even on languages that are primarily only spoken, not written. (tweet)
Stability AI (who released Stable Diffusion) are delaying release of the 1.5 version in order to focus on security and ensuring it’s not used for illegal purposes - driven by community feedback. (article)
New analysis of the AI export restrictions by CSIS. Also mentions a US$53B commitment the US govt made in early August on semiconductor R&D.
Biden’s latest National Biodefense Strategy calls for the US to produce a test for a new pathogen within 12 hours of its discovery and enough vaccine to protect the nation within 130 days. (tweet) (article)