Supported by Rethink Priorities
This is part of a weekly series summarizing the top posts on the EA and LW forums - you can see the full collection here. The first post includes some details on purpose and methodology. Feedback, thoughts, and corrections are welcomed.
If you'd like to receive these summaries via email, you can subscribe here.
Podcast version: Subscribe on your favorite podcast app by searching for 'EA Forum Podcast (Summaries)'. A big thanks to Coleman Snell for producing these!
Make RCTs cheaper: smaller treatment, bigger control groups
by Rory Fenton
When your intervention is expensive relative to data collection, you can maximize statistical power for a given cost by using a larger control and smaller treatment group. The optimal ratio of treatment sample to control sample is the square root of the cost per treatment participant divided by the square root of the cost per control participant.
What AI companies can do today to help with the most important century
by Holden Karnofsky
Grounded suggestions (ie. more than one AI lab has made a serious effort at each suggestion) for major AI companies to help the most important century go well:
Holden is less excited about the following interventions labs sometimes take: censorship of AI models, open-sourcing AI models, and raising awareness of AI with governments and the public.
Sam Altman: "Planning for AGI and beyond"
by LawrenceC
Linkpost for this blog post from OpenAI’s CEO Sam Altman on their AGI roadmap. Key points include:
Goal
Short-term Strategy
Things they want to see
AGI in sight: our look at the game board
by Andrea_Miotti, Gabriel Alfour
The authors share their view that AGI has a significant probability of happening in the next 5 years, given progress on agents, multimodal models, language tasks, and robotics in the past few years. However, we are still early on the path to safety eg. not knowing how to get language models to be truthful, not understanding their decisions, optimizers yielding unexpected results, RLHF / fine-tuning not working very well, and not knowing how to predict AI capabilities.
Various players are racing towards AGI, including AdeptAI (training a model to “use every software tool and API in the world”), DeepMind whose mission is to solve intelligence and create AGI, and OpenAI, who kickstarted further race mechanics with ChatGPT.
This all means we’re in a bad scenario, and they recommend readers ask lots of questions and reward openness. They’re also hopeful narrower sub-problems of alignment can be achieved in time eg. ensuring the boundedness of AI systems.
Pretraining Language Models with Human Preferences
by Tomek Korbak, Sam Bowman, Ethan Perez
Author’s tl;dr: “In the paper, we show how to train LMs (language models) with human preferences (as in reinforcement learning with human feedback), but during LM pretraining. We find that pretraining works much better than the standard practice of only finetuning with human preferences after pretraining; our resulting LMs generate text that is more often in line with human preferences and are more robust to red teaming attacks. Our best method is conditional training, where we learn a predictive model of internet texts conditional on their human preference scores, e.g., evaluated by a predictive model of human preferences. This approach retains the advantages of learning from human preferences, while potentially mitigating risks from training agents with RL by learning a predictive model or simulator.”
What to think when a language model tells you it's sentient
by rgb
Argues that statements by large language models that seem to report their internal life (eg. ‘I feel scared because I don’t know what to do’), aren’t straightforward evidence either for or against the sentience of that model. As an analogy, parrots are probably sentient and very likely feel pain. But when they say ‘I feel pain’, that doesn’t mean they are in pain.
It might be possible to train systems to more accurately report if they are sentient, via removing any other incentives for saying conscious-sounding things, and training them to report their own mental states. However, this could advance dangerous capabilities like situational awareness, and training on self-reflection might also be what ends up making a system sentient.
by Zvi
Long post gathering examples of Bing Chat’s behavior, general public reactions (eg. in the news), and reactions within the AI Safety community. It’s written for accessibility to those not previously familiar with LessWrong or its concepts.
There are (probably) no superhuman Go AIs: strong human players beat the strongest AIs
by Taran
In March 2016, DeepMind’s AlphaGo beat the plausibly strongest player in the world 4 to 1. Since then, this work has been extended, eg. in KataGo - now a top Go bot.
Last November Wang et al adversarially trained a bot to beat KataGo, which it does by playing moves that cause KataGo to make obvious blunders from strong positions. Human novices are able to beat this adversarial bot (so it’s not great at Go overall), yet it beats KataGo in 72% of cases and some strong human players can copy its techniques to also beat KataGo.
This suggests that despite Go having quite simple concepts (liberties, live groups, dead groups), strong Go bots have achieved performance sufficient to beat the best human players without learning them.
AI alignment researchers don't (seem to) stack
by So8res
The author argues that new skilled visionaries in alignment research tend to push in a different directions than existing ones. They suspect this is because the level of vision required for progressing the field requires a strong intuition to follow, there’s a lot of space to find those in, and they can’t be easily redirected. This means that any single alignment path isn’t being sped up by adding more skilled talent (more than a factor of say 2x eg. from less visionary researchers and ops support).
Stop posting prompt injections on Twitter and calling it "misalignment"
by lc
Argues that exploits of large language models (such as getting them to explain steps to build a bomb) are examples of misuse, not misalignment. “Does not do things its creators dislike even when the user user wants it to” is too high a bar for alignment, eg. higher than we ask of kitchenware.
Parametrically retargetable decision-makers tend to seek power
by TurnTrout
Linkpost for this paper, which isolates the key mechanism (retargetability) which enables the results in another paper: Optimal Policies Tend to Seek Power. The author thinks it’s a better paper for communicating concerns about power-seeking to the broader ML world.
Abstract (truncated): If capable AI agents are generally incentivized to seek power in service of the objectives we specify for them, then these systems will pose enormous risks, in addition to enormous benefits. [...] We show that a range of qualitatively dissimilar decision-making procedures incentivize agents to seek power. We demonstrate the flexibility of our results by reasoning about learned policy incentives in Montezuma's Revenge. These results suggest a safety risk: Eventually, retargetable training procedures may train real-world agents which seek power over humans.
What is it like doing AI safety work?
by Kat Woods, peterbarnett
The authors interviewed ten AI safety researchers on their day-to-day experience, and favorite and least favorite parts of the job.
Full Transcript: Eliezer Yudkowsky on the Bankless podcast
by remember, Andrea_Miotti
Transcript of this podcast episode. Key points discussed:
EU Food Agency Recommends Banning Cages
by Ben_West
The European Commission requested scientific opinions / recommendations on animal welfare from the European Food Safety Authority (EFSA), ahead of a legislative proposal in the second half 2023. The recommendations EFSA published include cage-free housing for birds, avoiding all mutilation and feed and water restrictions in broiler breeders, and substantially reducing stocking density. This result is partially due to efforts of EA-affiliated animal welfare organisations.
Announcing the Launch of the Insect Institute
by Dustin Crummett
The Insect Institute is a fiscally-sponsored project of Rethink Priorities which focuses on the rapidly growing use of insects as food and feed. It will work with policymakers, industry, and others to address key uncertainties involving animal welfare, public health, and environmental sustainability. You can sign up for their email list via the option at the bottom of their homepage.
by jefftk
A recent Faunalytics report made the claim that “by some estimates, a Big Mac would cost $13 without subsidies and a pound of ground meat would cost $30”. The author found the claim implausible and thinks it possible the original claim (since repeated in many articles) originated in the 2013 book Meatonomics - which in addition to subsidies included cruelty, environmental, and health costs in a calculation of the true cost of a Big Mac. It also likely over-estimated Big Macs as a proportion of beef consumption, making the statistic unreliable.
Why I don’t agree with HLI’s estimate of household spillovers from therapy
by JamesSnowden
In its cost-effectiveness estimate of StrongMinds, Happier Lives Institute (HLI) estimates that each household member (~5 in the average household) benefits from the intervention 50% as much as the person receiving therapy. This is partially based on 3 RCTS - 2 which had interventions specifically targeted to benefit household members (eg. therapy for caregivers of children with nodding syndrome, which included the addition of nodding syndrome-specific content) and where only those expected to benefit most were measured. The third was incorrectly read as showing benefits to household members, when the evidence was actually mixed depending on the measure used.
The author argues this means household benefits were significantly overestimated, and speculatively guesses them to be more in the 5 - 25% range. This would reduce the estimated cost-effectiveness of StrongMinds from 9x to 3-6x cash transfers. In the comments HLI has thanked James for this analysis and acknowledged the points as valid, noted the lack of hard evidence in the area, and shared their plans for further analysis using a recent paper from 2022.
Immigration reform: a shallow cause exploration
by JoelMcGuire, Samuel Dupret
Report of a 2-week investigation on the impact of immigration reform on subjective well-being (SWB), including a literature review and BOTECs on the cost-effectiveness of interventions.
The authors find potential large impacts to SWB from immigrating to countries with higher SWB levels, but are uncertain on the effect size or how it changes over time. All estimates below are highly uncertain best guesses based on their model:
Of interventions investigated to increase immigration, the most promising was policy advocacy, estimated at 11x cost-effectiveness of Givewell cash transfers on SWB.
by Duncan_Sabien
The author suggests one way to be okay with existential dread is to define yourself as someone who does the best they can with what they have, and treating that in and of itself as victory. This means even horrible outcomes for our species can’t quite cut to the core of who you are. They elaborate with a range of anecdotes, advice, and examples that highlight parts of how to capture that feeling.
Introducing EASE, a managed directory of EA Organization Service Providers
by Deena Englander, JaimeRV, Markus Amalthea Magnuson, Eva Feldkamp, Mati_Roy, daniel wernstedt, Georg Wind
EASE (EA Services) is a directory of independent agencies and freelancers offering expertise to EA-aligned organisations. Vendors are screened to ensure they’re true experts in their fields, and have experience with EA. If you’d like to join the directory, you can apply for screening here. If you’d like to use the services, you can contact the agencies listed directly, or email info@ea-services.org for suggestions for your needs and available budget.
EA Global in 2022 and plans for 2023
by Eli_Nathan
In 2022 the EAG team ran three EAGs, with 1.3-1.5K attendees each. These events averaged a 9.02 / 10 response to a question on if participants would recommend EAGs, and caused at least 36K new connections to be made (heavily under-reported as most attendees didn’t fill in this feedback).
In 2023 they plan to reduce spending - primarily on travel grants and food - but still do three EAGs. They now have a team of ~4 FTEs, and will focus on launching applications earlier, and improving response speed, Swapcard, and communications.
The full list of confirmed and provisional EAG and EAGx events are:
EA Global: Bay Area | 24–26 February
EAGxCambridge | 17–19 March
EAGxNordics | 21–23 April
EA Global: London | 19–21 May
EAGxWarsaw | 9–11 June [provisional]
EAGxNYC | July / August [provisional]
EAGxBerlin | Early September [provisional]
EAGxAustralia | Late September [provisional]
EA Global: Boston | Oct 27–Oct 29
EAGxVirtual | November [provisional]
Taking a leave of absence from Open Philanthropy to work on AI safety
by Holden Karnofsky
Holden Karnofsky (co-ceo of Open Philanthropy) is taking a minimum 3 month leave of absence from March 8th to explore working directly on AI safety, particularly AI safety standards. They may end up doing this full-time and joining or starting a new organization. This is due to believing transformative AI could be developed soon and that they can have more impact with direct work on it, in addition to personal fit towards building multiple organisations over running one indefinitely.
A statement and an apology by Owen Cotton-Barratt and EV UK board statement on Owen's resignation by EV UK Board
The recent Time article on EA and sexual harassment included a case involving an ‘influential figure in EA’. In this post, Owen Cotton-Barratt confirms that this was him, during an event five years ago. He apologizes and gives full context of what happened from his view, what generalizable mistakes he made that contributed, and what actions he’s taking going forward. This includes resigning from the EV UK board and pausing other activities which may give him power in the community (eg. starting mentoring relationships, organizing events, or recommending projects for funding).
Owen’s behavior was reported to Julia Wise (CEA’s community liaison) in 2021, who shared it with the EV UK board shortly after the Time article came out. Julia has also apologized for the handling of the situation and shared the actions that were taken at the time this incident was first reported to her, as well as in the time between then and now in the comments. The EV UK board is commissioning an external investigation by an independent law firm into both Owen’s behavior and the Community Health team’s response.
Bad Actors are not the Main Issue in EA Governance
by Grayden
Leadership can fail in 4 ways: bad actors, well-intentioned people with low competence, well-intentioned high-competence people with collective blind spots, or the right group of people with bad practices.
The author argues EA focuses too much on the ‘bad actors’ angle, and this incentivizes boards to hire friends or those they know socially to reduce this risk. They suggest we stop this behavior, and instead tackle the other three risks via:
Who is Uncomfortable Critiquing Who, Around EA?
by Ozzie Gooen
Discusses the specific barrier to feedback of things being uncomfortable to say, and how this affects the availability of criticism between different groups. Specifically, they cover:
They suggest EA look at specific gaps in feedback, and from which groups - as opposed to asking ‘are we open to feedback?’ more generally.
Consider not sleeping around within the community
by Patrick Sue Domin
The author suggests considering not “sleeping around” (eg. one night stands, friends with benefits, casual dating with multiple people) within the EA community, due to its tight-knit nature increasing the associated risks. For instance, someone who is pursued and declined may end up having to interact with the pursuer in professional capacities down the road. They suggest this is particularly the case for those with any of the following additional risk factors: high-status within EA, and/or a man pursuing a woman, and/or socially clumsy. There is a large amount of discussion on both sides in the comments.
by Jeff Kaufman
In response to some of the above posts, there’s been a lot of discussion on how much EA culture did or didn’t contribute. Some of the suggestions (eg. discouraging polyamory or hookups) have caused others to argue what happens between consenting adults is no-one’s else’s business.
The author argues consent isn’t always enough, particularly in cases with imbalanced power (eg. grantee and grantmaker). Organisations handle these conflicts in ways such as requiring a professor to either resign or not pursue a relationship with a student, or a grantmaker to disclose and recuse themselves from responsibilities relating to evaluating a grantee they’re in a relationship with. This is pretty uncontroversial and shows the question is what norms we should have and not whether it is legitimate at all to have norms beyond consent.
On Investigating Conspiracy Theories by Zvi
A selection of posts that don’t meet the karma threshold, but seem important or undervalued.
How can we improve discussions on the Forum?
by Lizka
Fill out this survey with your thoughts on community posts having their own section on the frontpage, what changes to the forum site you’d like to see, what conversations you’d like to see, or any other feedback.
Updates from the Mental Health Funder's Circle
by wtroy
The Mental Health Funder’s Circle supports organisations working on cost-effective and catalytic mental health interventions. It held its first grant round in the Fall/Winter of 2022, and has now distributed $254K total to three organisations (Vida Plena for community mental health in Ecuador, Happier Lives Institute for work on subjective well being and cause prioritization, and Rethink Wellbeing to support mental health initiatives in the EA community). The next round of funding is now open - apply here by April 1st.
Cyborg Periods: There will be multiple AI transitions
by Jan_Kulveit, rosehadshar
Suggests domains may move through three stages. Current day examples in brackets:
1. Human period - humans more powerful than AI (alignment research, business strategy).
2. Cyborg period - human + AI more powerful than humans or AIs individually (visual art, programming).
3. AI period - AIs more powerful than humans, and approx. equal to human+AI teams. (chess, shogi).
Transitions into the cyborg period will be incredibly impactful in some domains eg. research, human coordination, persuasion, cultural evolution. Which domains transition first also lends itself to different threat models and human response. For instance, moving faster on automating coordination relative to automating power, or on automating AI alignment research relative to AI research in general, could both reduce risk.
They also argue cyborg periods may be brief but pivotal, involving key deployment decisions and existential risk minimization work. To make the best use of them, we’ll need to have sufficient understanding of AI system’s strengths and weaknesses, novel modes of factoring cognition, modify AI systems towards cyborg uses, and practice working in human+AI teams in existing cyborg domains.