If it's not too costly for you to fill out the application, we are considering doing one-off mentorships at custom times for exceptional applicants (although, of course, no promises!)
Tracking your attitudes here is pretty important to me, because I respect you a lot and also work for MIRI. Still, it's been kind of hard, because sometimes it looks like you're pleasantly surprised (e.g., about the first two sections of IABIED: "After reading the book, it feels like a shocking oversight that no one wrote it earlier" and "it's hard for me to imagine someone else writing a much better [book for general audiences on x-risk]"), and then other times it looks like that pleasant surprise hasn't propagated through your attitudes more broadly.
"The main thing Eliezer and MIRI have been doing since shifting focus to comms addressed a 'shocking oversight' that it's hard to imagine anyone else doing a better job addressing" (lmk if this doesn't feel like an accurate paraphrase) feels like it reflects a pretty strong positive update in the speaker! (especially having chatted about your views before that)
I guess I was just surprised / confused by the paragraph that starts "I personally would not...", given the trajectory over the past few months of your impressions of MIRI's recent work. Would you have said something much more strongly negative in August? Does IABIED not significantly inform your expectations of future MIRI outputs? Something else?
Do you think rationalists use 'insane' and 'crazy' more than the general population, and/or in a different way than the general population? (e.g. definition 3 when you google 'insane definition')
answer is somewhat complicated and I'm not sure 'know' is quite the right bar
contractor verification is a properly hard problem for boring bureaucratic reasons; it's very hard to know that someone is who they say they are, and it's very hard to guarantee that you'll extract the value you're asking for at scale ('scalable oversight' is actually a good model for intuitions here). I have:
1. Been part of surveys for services like the above
2. Been a low-level contractor at various mid-sized startups (incl. OAI in 2020)
3. Managed a team of hundreds of contractors doing tens of thousands of tasks per month (it was really just me and one other person watching them)
4. Thought quite a lot about designing better systems for this (very hard!!!)
5. Noted the lack of especially-convincing client-facing documentation / transparency from e.g. Prolific
The kinds of guarantees I would want here are like "We ourselves verify the identities of contractors to make sure they're who they say they are. We ourselves include comprehension-testing questions that are formulated to be difficult to cheat alongside every exit survey. etc etc"
Most services they might pay to do things like this are Bad (but they're B2B and mostly provide a certification/assurance to the end-user, so the companies themselves are not incentivized to make sure they're good).
Feel free to ask more questions; it's kind of late and I'm tired; this is the quick-babble version.
EDIT: they're not useless. They're just worse than we all wish they'd be. To the best of my knowledge, this was a major motivator for Palisade in putting together their own message testing pipeline (an experience which hasn't been written about yet because uh... I haven't gotten to it)
Fwiw I don’t take Prolific and similar services to be especially reliable ways to get information about this sort of thing. It’s true that they’re among the best low-medium effort ways to get this information, but the hypothetical at the top of this post implies that they’re 1:1 with natural settings, which is false.
Thanks as always to Zac for continuing to engage on things like this.
Tiny nit for my employer: should probably read “including some* MIRI employees”
like any org, MIRI is made up of people that have significant disagreements with one another on a wide variety of important matters.
More than once I’ve had it repeated to me that ‘MIRI endorses y’, and tracked the root of the claim to a lack of this kind of qualifier. I know you mean the soft version and don’t take you to be over-claiming; unfortunately, experience has shown it’s worth clarifying, even though for most claims in most contexts I’d take your framing to be sufficiently clear.
I'm struck by how many of your cruxes seem like things that it would actually just be in the hands of the international governing body to control. My guess is, if DARPA has a team of safety researchers, and they go to the international body, and they're like 'we're blocked by this set of experiments* that takes a large amount of compute; can we please have more compute?', and then the international body gets some panel of independent researchers to confirm that this is true, and the only solution is more compute for that particular group of researchers, they commission a datacenter or something so that the research can continue.
Like, it seems obviously true to me that people (especially in government/military) will continue working on the problem at all, and that access to larger amounts of resources for doing that work is a matter of petitioning the body. It feels like your plan is built around facilitating this kind of carveout, and the MIRI plan is built around treating it as the exception that it is (and prioritizing gaining some centralized control over AI as a field over guaranteeing to-me-implausible rapid progress toward the best possible outcomes).
*which maybe is 'building automated alignment researchers', but better specified and less terrifying
AIs with reliable 1-month time horizons will basically not be time-horizon-limited in any way that humans aren't
In this statement, are you thinking about time horizons as operationalized / investigated in the METR paper, or are you thinking about the True Time Horizon?
A group of researchers has released the Longitudinal Expert AI Panel, soliciting and collating forecasts regarding AI progress, adoption, and regulation from a large pool of both experts and non-experts.
Or if they want to work from a frame that isn't really supported by other orgs (i.e., they're closer to Eliezer's views than to the views/filters enforced at AIFP, RAND, Redwood, and other alternatives). I think people at MIRI think halt/off-switch is a good idea, and want to work on it. Many (but not all) of us think it's Our Best Hope, and would be pretty dissatisfied working on something else.
I agree that visible-impact-so-far for AI2027 is > it is for IABIED, but I'm more optimistic than you about IABIED's impact into the future (both because I like IABIED more than you do, and because I'm keeping an eye on ongoing sales, readership, assignment in universities, etc).
Consider leaving a comment on your review about this if you have the time and inclination in the future; I'm at least curious, and others may be, too.
(probably I bow out now; thanks Buck!)