Gabriel Mukobi

Stanford Effective Altruism organizer, interested in empirical AI alignment work and improving animal welfare.

Wiki Contributions


I don't know much about how CEOs are selected, but I think the idea is rather that the range of even the (small) tails of normally-distributed human long-term planning ability is pretty close together in the grand picture of possible long-term planning abilities, so other factors (including stochasticity) can dominate and make the variation among humans wrt long-term planning seem insignificant.

If this were true, it would mean the statement "individual humans with much greater than average (on the human scale) information-processing capabilities empirically don't seem to have distinct advantages in jobs such as CEOs and leaders" could be true and yet not preclude the statement "agents with much greater than average (on the universal scale) ... could have distinct advantages in those jobs" from being true (sorry if that was confusingly worded).

I agree that the plausibility and economic competitiveness of long-term planning AIs seems uncertain (especially with chaotic systems) and warrants more investigation, so I'm glad you posted this! I also agree that trying to find ways to incentivize AI to pursue myopic goals generally seems good.

I'm somewhat less confident, however, in the claim that long-term planning has diminishing returns beyond human ability. Intuitively, it seems like human understanding of possible long-term returns diminishes past human ability, but it still seems plausible to me that AI systems could surpass our diminishing returns in this regard. And even if this claim is true and AI systems can't get much farther than human ability at long-term planning (or medium-term planning is what performs best as you suggest), I still think that's sufficient for large-scale deception and power-seeking behavior (e.g. many human AI safety researchers have written about plausible ways in which AIs can slowly manipulate society, and their strategic explanations are human-understandable but still seem to be somewhat likely to win).

I'm also skeptical of the claim that "Future humans will have at their disposal the assistance of short-term AIs." While it's true that past ML training has often focused on short-term objectives, I think it's plausible that certain top AI labs could be incentivized to focus on developing long-term planning AIs (such as in this recent Meta AI paper) which could push long-term AI capabilities ahead of short-term AI capabilities.

The Conjecture Internal Infohazard Policy seems like a good start!

As for why such standards aren't well established with the rationalist AI safety community or publicized on LessWrong, I suspect there may be some unfortunate conflict between truth-seekingness as a value and censorship in the form of protecting infohazards.

Wow, this is a cool concept and video, thanks for making it! As a new person to the field, I'd be really excited for you and other AI safety researchers to do more devlog/livestream content of the form "strap a GoPro on me while I do research!"

Mostly, yes, that's right. The exception is in Level 7: Original Experiments which suggests several resources for forming an inside view and coming up with new research directions, but I think many people could get hired as research engineers before doing that stuff (though maybe they do that stuff while working as a research engineer and that leads them to come up with new better research directions).

Interesting, that is the level that feels most like it doesn't have a solid place in a linear progression of skills. I wrote "Level 1 kind of happens all the time" to try to reflect this, but I ultimately decided to put it at the start because I feel that for people just starting out it can be a good way to test their fit for AI safety broadly (do they buy the arguments?) and decide whether they want to go down a more theoretical or empirical path. I just added some language to Level 1 to clarify this.

Thanks, yeah that's a pretty fair sentiment. I've changed the wording to "at least 100-200 hours," but I guess the idea was more to present a very efficient way of learning things that maybe 80/20's some of the material. This does mean there will be more to learn—rather than these being strictly linear progression levels, I imagine someone continuously coming back to AI safety readings and software/ML engineering skills often throughout their journey, as it sounds like you have.

Thanks for actually taking the time to organize all the information here, this is and will be very useful!

For OpenAI, you could also link this recent blog post about their approach to alignment research that reinforces the ideas you already gathered. Though maybe that blog post doesn't go into enough detail or engage with those ideas critically and you've already read it and decided to leave it out?

Thanks for posting this! I'm glad to have more concrete and updated examples of how current AI systems can lead to failure, a concept that seems to often be nebulous to people new to AI safety.

One I recently found that I plan to use as a programmer: (uses GPT-3 to turn your natural language statements into Regex syntax).

Load More