Philosophy PhD student, worked at AI Impacts, then Center on Long-Term Risk, then OpenAI. Quit OpenAI due to losing confidence that it would behave responsibly around the time of AGI. Not sure what I'll do next yet. Views are my own & do not represent those of my current or former employer(s). I subscribe to Crocker's Rules and am especially interested to hear unsolicited constructive criticism. http://sl4.org/crocker.html
Some of my favorite memes:
(by Rob Wiblin)
My EA Journey, depicted on the whiteboard at CLR:
(h/t Scott Alexander)
Did Shane leave a way for people who didn't attend the talk, such as myself, to contact him? Do you think he would welcome such a thing?
Thanks for sharing!
So, it seems like he is proposing something like AutoGPT except with a carefully thought-through prompt laying out what goals the system should pursue and constraints the system should obey. Probably he thinks it shouldn't just use a base model but should include instruction-tuning at least, and maybe fancier training methods as well.
This is what labs are already doing? This is the standard story for what kind of system will be built, right? Probably 80%+ of the alignment literature is responding at least in part to this proposal. Some problems:
I think most people pushing for a pause are trying to push against a 'selective pause' and for an actual pause that would apply to the big labs who are at the forefront of progress. I agree with you, however, that the current overton window seems unfortunately centered around some combination of evals-and-mitigations that is at IMO high risk of regulatory capture (i.e. resulting in a selective pause that doesn't apply to the big corporations that most need to pause!) My disillusionment about this is part of why I left OpenAI.
Good point, I guess I was thinking in that case about people who care a bunch about a smaller group of humans e.g. their family and friends.
I agree that 0.7% is the number to beat for people who mostly focus on helping present humans and who don't take acausal or simulation argument stuff or cryonics seriously. I think that even if I was much more optimistic about AI alignment, I'd still think that number would be fairly plausibly beaten by a 1-year pause that begins right around the time of AGI.
What are the mechanisms people have given and why are you skeptical of them?
Perhaps. I don't know much about the yields and so forth at the time, nor about the specific plans if any that were made for nuclear combat.
But I'd speculate that dozens of kiloton range fission bombs would have enabled the US and allies to win a war against the USSR. Perhaps by destroying dozens of cities, perhaps by preventing concentrations of defensive force sufficient to stop an armored thrust.
OK, thanks for clarifying.
Personally I think a 1-year pause right around the time of AGI would give us something like 50% of the benefits of a 10-year pause. That's just an initial guess, not super stable. And quantitatively I think it would improve overall chances of AGI going well by double-digit percentage points at least. Such that it makes sense to do a 1-year pause even for the sake of an elderly relative avoiding death from cancer, not to mention all the younger people alive today.
Big +1 to that. Part of why I support (some kinds of) AI regulation is that I think they'll reduce the risk of totalitarianism, not increase it.
So, it sounds like you'd be in favor of a 1-year pause or slowdown then, but not a 10-year?
(Also, I object to your side-swipe at longtermism. Longtermism according to wikipedia is "Longtermism is the ethical view that positively influencing the long-term future is a key moral priority of our time." "A key moral priority" doesn't mean "the only thing that has substantial moral value." If you had instead dunked on classic utilitarianism, I would have agreed.)
I think this is also what I was confused about -- TurnTrout says that AIXI is not a shard-theoretic agent because it just has one utility function, but typically we imagine that the utility function itself decomposes into parts e.g. +10 utility for ice cream, +5 for cookies, etc. So the difference must not be about the decomposition into parts, but the possibility of independent activation? but what does that mean? Perhaps it means: The shards aren't always applied, but rather only in some circumstances does the circuitry fire at all, and there are circumstances in which shard A fires without B and vice versa. (Whereas the utility function always adds up cookies and ice cream, even if there are no cookies and ice cream around?) I still feel like I don't understand this.