Zach Stein-Perlman

AI forecasting & strategy at AI Impacts. Blog: Not Optional.

My best LW posts are Framing AI strategy and Slowing AI: Foundations. My most useful LW posts are probably AI policy ideas: Reading list and Ideas for AI labs: Reading list.

Sequences

Slowing AI

Wiki Contributions

Load More

Comments

I am excited about this. I've also recently been interested in ideas like nudge researchers to write 1-5 page research agendas, then collect them and advertise the collection.

Possible formats:

  • A huge google doc (maybe based on this post); anyone can comment; there's one or more maintainers; maintainers approve ~all suggestions by researchers about their own research topics and consider suggestions by random people.
  • A directory of google docs on particular agendas; the individual google docs are each owned by a relevant researcher, who is responsible for maintaining them; some maintainer-of-the-whole-project occasionally nudges researchers to update their docs and reassigns the topic to someone else if necessary. Random people can make suggestions too.
  • (Alex, I think we can do much better than the best textbooks format in terms of organization, readability, and keeping up to date.)

I am interested in helping make something like this happen. Or if it doesn't happen soon I might try to do it (but I'm not taking responsibility for making this happen). Very interested in suggestions.

(One particular kind-of-suggestion: is there a taxonomy/tree of alignment research directions you like, other than the one in this post? (Note to self: taxonomies have to focus on either methodology or theory of change... probably organize by theory of change and don't hesitate to point to the same directions/methodologies/artifacts in multiple places.))

It's "a unified methodology" but I claim it has two very different uses: (1) determining whether a model is safe (in general or within particular scaffolding) and (2) directly making deployment safer. Or (1) model evals and (2) inference-time safety techniques.

Thanks!

I think there's another agenda like make untrusted models safe but useful by putting them in a scaffolding/bureaucracy—of filters, classifiers, LMs, humans, etc.—such that at inference time, takeover attempts are less likely to succeed and more likely to be caught. See Untrusted smart models and trusted dumb models (Shlegeris 2023). Other relevant work:

Update: Greg Brockman quit.

Update: Sam and Greg say:

Sam and I are shocked and saddened by what the board did today.

Let us first say thank you to all the incredible people who we have worked with at OpenAI, our customers, our investors, and all of those who have been reaching out.

We too are still trying to figure out exactly what happened. Here is what we know:

- Last night, Sam got a text from Ilya asking to talk at noon Friday. Sam joined a Google Meet and the whole board, except Greg, was there. Ilya told Sam he was being fired and that the news was going out very soon.

- At 12:19pm, Greg got a text from Ilya asking for a quick call. At 12:23pm, Ilya sent a Google Meet link. Greg was told that he was being removed from the board (but was vital to the company and would retain his role) and that Sam had been fired. Around the same time, OpenAI published a blog post.

- As far as we know, the management team was made aware of this shortly after, other than Mira who found out the night prior.

The outpouring of support has been really nice; thank you, but please don’t spend any time being concerned. We will be fine. Greater things coming soon.

Update: three more resignations including Jakub Pachocki.

Update:

Sam Altman's firing as OpenAI CEO was not the result of "malfeasance or anything related to our financial, business, safety, or security/privacy practices" but rather a "breakdown in communications between Sam Altman and the board," per an internal memo from chief operating officer Brad Lightcap seen by Axios.

Update: Sam is planning to launch something (no details yet).

Update: Sam may return as OpenAI CEO.

Update: Tigris.

Update: talks with Sam and the board.

Update: Mira wants to hire Sam and Greg in some capacity; board still looking for a permanent CEO.

Update: Emmett Shear is interim CEO; Sam won't return.

Update: lots more resignations (according to an insider).

Update: Sam and Greg leading a new lab in Microsoft.

Update: total chaos.

Has anyone collected their public statements on various AI x-risk topics anywhere?

A bit, not shareable.

Helen is an AI safety person. Tasha is on the Effective Ventures board. Ilya leads superalignment. Adam signed the CAIS statement

Thanks!

automating the world economy will take longer

I'm curious what fraction-of-2023-tasks-automatable and maybe fraction-of-world-economy-automated you think will occur at e.g. overpower time, and the median year for that. (I sometimes notice people assuming 99%-automatability occurs before all the humans are dead, without realizing they're assuming anything.)

@Daniel Kokotajlo it looks like you expect 1000x-energy 4 years after 99%-automation. I thought we get fast takeoff, all humans die, and 99% automation at around the same time (but probably in that order) and then get massive improvements in technology and massive increases in energy use soon thereafter. What takes 4 years?

(I don't think the part after fast takeoff or all humans dying is decision-relevant, but maybe resolving my confusion about this part of your model would help illuminate other confusions too.)

So why doesn't one of those thousand people run for president and win? (This is a rhetorical question, I know the answer)

The answer is that there's a coordination problem.

It occurs to me that maybe these things are related. Maybe in a world of monarchies where the dynasty of so-and-so has ruled for generations, supporting someone with zero royal blood is like supporting a third-party candidate in the USA.

Wait, what is it that gave monarchic dynasties momentum, in your view?

In the future, sharing weights will enable misuse. For now, the main effect of sharing weights is boosting research (both capabilities and safety) (e.g. the Llama releases definitely did this). The sign of that research-boosting currently seems negative to me, but there's lots of reasonable disagreement.

fwiw my guess is that OP didn't ask its grantees to do open-source LLM biorisk work at all; I think its research grantees generally have lots of freedom.

(I've worked for an OP-funded research org for 1.5 years. I don't think I've ever heard of OP asking us to work on anything specific, nor of us working on something because we thought OP would like it. Sometimes we receive restricted, project-specific grants, but I think those projects were initiated by us. Oh, one exception: Holden's standards-case-studies project.)

Load More