Wiki Contributions


Magic.dev has released an initial evaluation + scaling policy.

It's a bit sparse on details, but it's also essentially a pre-commitment to implement a full RSP once they reach a critical threshold (50% on LiveCodeBench or, alternatively, a "set of private benchmarks" that they use internally).

I think this is a good step forward, and more small labs making high-risk systems like coding agents should have risk evaluation policies in place. 

Also wanted to signal boost that my org, The Midas Project, is running a public awareness campaign against Cognition (another startup making coding agents) asking for a policy along these lines. Please sign the petition if you think this is useful!

Thank you for sharing — I basically share your concerns about OpenAI, and it's good to talk about it openly.

I'd be really excited about a large, coordinated, time-bound boycott of OpenAI products that is (1) led by a well-known organization or individual with a recruitment and media outreach strategy and (2) accompanied by a set of specific grievances like the one you provide. 

I think that something like this would (1) mitigate some of the costs that @Zach Stein-Perlman alludes to since it's time-bound (say only for a month), and (2) retain the majority of the benefits, since I think the signaling value of any boycott (temporary or permanent) will far, far exceed the material value in ever-so-slightly withholding revenue from OpenAI. 

I don't mean to imply that this is opposed to what you're doing — your personal boycott basically makes sense to me, plus I suspect (but correct me if I'm wrong) that you'd also be equally excited about what I describe above. I just wanted to voice this in case others feel similarly or someone who would be well-suited to organizing such a boycott reads this.

Sorry, I'm kinda lumping your meme in with a more general line of criticism I've seen that casts doubt on the whole idea of extrapolating an exponential trend, on account of the fact that we should eventually expect diminishing returns. But such extrapolation can still be quite informative, especially in the short term! If you had done it in 2020 to make guesses about where we'd end up in 2024, it would have served you well.

The sense in which it's straw-manning, in my mind, is that even the trend-extrapolaters admit that we can expect diminishing returns eventually. The question is where exactly we are on the curve, and that much is very uncertain. Given this, assuming the past rate of growth will hold constant for three more years isn't such a bad strategy (especially if you can tell plausible stories about how we'll e.g. get enough power to support the next 3 OOMs).

It's true — we might run into issues with data — but we also might solve those issues. And I don't think this is just mumbling and hand waving. My best guess is that there are real, promising efforts going on to solve this within labs, but we shouldn't expect to hear them in convincing detail because they will be fiercely guarded as a point of comparative advantage over other companies hitting the wall, hence why it just feels like mumbling from the outside.

I'm not sure what you want me to see by looking at your image again. Is it about the position of 2027 on the X-axis of the graph from Leopold's essay? If so, yeah, I assumed that placement wasn't rigorous or intentional and the goal of the meme was to suggest that extrapolating the trend in a linear fashion through 2027 was naive. 


Made with love — I just think a lot of critics of the "straight line go up" thought process are straw manning the argument. The question is really when we start to hit diminishing returns, and I (like the author of this post) don't know if anyone has a good answer to that. I do think the data wall would be a likely cause if it is the case that progress slows in the coming years. But progress continuing in a roughly linear fashion between now and 2027 seems, to me, totally "strikingly plauisble."

Yeah, I think you're kind of right about why scaling seems like a relevant term here. I really like that RSPs are explicit about different tiers of models posing different tiers of risks. I think larger models are just likely to be more dangerous, and dangerous in new and different ways, than the models we have today. And that the safety mitigations that apply to them need to be more rigorous than what we have today. As an example, this framework naturally captures the distinction between "open-sourcing is great today" and "open-sourcing might be very dangerous tomorrow," which is roughly something I believe.

But in the end, I don't actually care what the name is, I just care that there is a specific name for this relatively specific framework to distinguish it from all the other possibilities in the space of voluntary policies. That includes newer and better policies — i.e. even if you are skeptical of the value of RSPs, I think you should be in favor of a specific name for it so you can distinguish it from other, future voluntary safety policies that you are more supportive of.

I do dislike that "responsible" might come off as implying that these policies are sufficient, or that scaling is now safe. I could see "risk-informed" having the same issue, which is why "iterated/tiered scaling policy" seems a bit better to me.

My only concern with "voluntary safety commitments" is that it seems to encompass too much, when the RSPs in question here are a pretty specific framework with unique strengths I wouldn't want overlooked.

I've been using "iterated scaling policy," but I don't think that's perfect. Maybe "evaluation-based scaling policy"? or "tiered scaling policy"? Maybe even "risk-informed scaling policy"?

Sidebar: For what it's worth, I don't argue in my comment that "it's not worth worrying" about nuance. I argue that nuance isn't more important for public advocacy than, for example, in alignment research or policy negotiations — and that the opposite might be true.

Just came across an article from agricultural economist Jayson Lusk, who proposes something just like this. A few quotes:

"Succinctly put, a market for animal welfare would consist of giving farmers property rights over an output called animal well-being units (AWBUs) and providing an institutional structure or market for AWBUs to be bought and sold independent of the market for meat."


"Moreover, a benefit of a market for AWBUs, as compared to process regulations, is that an AWBUs approach provides incentives for producers to improve animal well-being at the lowest possible cost by focusing producer and consumer attention on an outcome (animal well-being) rather than a particular process or technology (i.e., cages, stalls, etc.). Those issues for which animal activists can garner the largest political support (e.g., elimination of cages) may not be those that can provide the largest change in animal well-being per dollar spent. By focusing on outcomes and letting producers worry about how to achieve the outcome, markets could enable innovation in livestock production methods and encourage producers to seek cost-effective means of improving animal well-being."


"Animal rights activists often decry the evils of the capitalist, market-based economy. Greedy corporate farms and agribusinesses have enslaved and mistreated billions of animals just to earn a few more dollars—or so the story goes. Market forces are powerful and no doubt the economic incentives that farmers have faced in the last 50 years have contributed to a reduction in animal well-being. But markets are only a means, not an end. Rather than demonizing the market, animal advocates could harness its power to achieve a worthwhile end."

I have major concerns with this class of ideas, but just wanted to share Lusk's proposal in case anyone looks into it further.