aog

New funding opportunity on digital minds from Longview

This is a linkpost for Request for Proposals: Research and Applied Work on Digital Minds. The text below was written in the first person by Zach Freitas-Groff, Longview's lead on digital minds. He posted it on the EA Forum, and I'm reposting it for him on LessWrong. I'm glad to...

Jul 614

Digital sentience funding opportunities: Support for applied work and research

Written by Zach Freitas-Groff and posted at his request. Summary I’m excited to announce a “Digital Sentience Consortium” hosted by Longview Philanthropy, in collaboration with The Navigation Fund and Macroscopic Ventures, to support research and applied projects focused on the potential consciousness, sentience, moral status, and experiences of artificial intelligence...

May 29, 202521

Research Priorities for Hardware-Enabled Mechanisms (HEMs)

Longview Philanthropy is launching a new request for proposals on hardware-enabled mechanisms (HEMs). To encourage strong proposals, we’ve written up what we believe to be top priorities for research in this field. We think HEMs are a promising method to enforce export controls, secure model weights, and verify compliance with...

Apr 30, 202518

aog's Shortform

Apr 19, 20256

Benchmarking LLM Agents on Kaggle Competitions

tl;dr: I prompted ChatGPT to participate in a Kaggle data science competition. It successfully wrote scripts that trained models to predict housing prices, and ultimately outperformed 71% of human participants. I'm not planning to build a benchmark using Kaggle competitions, but I think a well-executed version could be comparable to...

Mar 22, 202415

Adversarial Robustness Could Help Prevent Catastrophic Misuse

There have been several discussions about the importance of adversarial robustness for scalable oversight. I’d like to point out that adversarial robustness is also important under a different threat model: catastrophic misuse. For a brief summary of the argument: 1. Misuse could lead to catastrophe. AI-assisted cyberattacks, political persuasion, and...

Dec 11, 202330

Unsupervised Methods for Concept Discovery in AlphaZero

Using contrast pairs, the authors extract linear directions in the activation space of AlphaZero which correspond to concepts. By observing AlphaZero's play in situations that use these concepts, human grandmasters can improve their own play. This is related to the following recent research: * Burns et al. (2022) found directions...

Oct 26, 20239

aog

aog

Analysis: US restricts GPU sales to China

Argument against 20% GDP growth from AI within 10 years [Linkpost]

Key Papers in Language Model Safety

Adversarial Robustness Could Help Prevent Catastrophic Misuse

aog

Analysis: US restricts GPU sales to China

Argument against 20% GDP growth from AI within 10 years [Linkpost]

Key Papers in Language Model Safety

Adversarial Robustness Could Help Prevent Catastrophic Misuse

New funding opportunity on digital minds from Longview

Digital sentience funding opportunities: Support for applied work and research

Research Priorities for Hardware-Enabled Mechanisms (HEMs)

aog's Shortform

Benchmarking LLM Agents on Kaggle Competitions

Adversarial Robustness Could Help Prevent Catastrophic Misuse

Unsupervised Methods for Concept Discovery in AlphaZero