I've been working on AI safety for a while now. It's going better than expected, but I have finite hours. The more hours I spend on safety, the less I can spend on business-oriented things. Those business-oriented things are long term (4+ years) high risk attempts at earning to give.

My timelines aren't long, so doing some direct work on AI safety seems wise so long as I'm not completely killing the businessy side of things. 

But if I could find an actually-useful way to share work, it could save years. So:

Suppose you've got a person with a background in high performance real time physics simulation, bleeding edge low latency rendering, aggressive optimization, and most other low-level videogamey things you need for making a physics-heavy 3D multi-user game-like application.

Is there a project you would want to see them develop for the sake of AI safety?

So far, my ideas for this have a strong flavor of searching under the streetlight, and they don't seem higher value than my current strategy of fully separate research.

One example:
A multi-agent simulation with rich physical interactions to explore (and attempt to break) forms of corrigibility in a complex instrumented environment.

Good:
- Perfectly transparent physical simulations give you a lot of easy options for analysis compared to pure language. (Judging if a given block of text violated another player's values in some way is not trivial; judging if the agent stomped on another player's head is trivial.)

Questionable:
- Is the marginal value of the deeper physics simulation actually enough to bother doing this, compared to gridworld-esque options?
- Concretely, what research would this assist that could not get done otherwise?

Bad:
- Releasing a whole framework for this kind of thing as an open source project- assuming it was decent and flexible- would almost unavoidably be more useful for not-safety. It might not be the kind of not-safety that is dangerous, given scale, but it's still not-safety.

It's not clear to me there is any great option here, but... if there's something in this space you really want to see, let me know!

New to LessWrong?

New Answer
New Comment

1 Answers sorted by

Martín Soto

Feb 09, 2023

51

Certainly you must've seen this already.

Yup- there are some decent examples of the kinds of things that could be studied there.

My understanding is that they're not focusing on deeply physical 3d environments (which makes sense in context). It's hard to find a strong value add for the kinds of things that would be naturally shared with my existing simulation-heavy work; it seems like encultured's level of gameplay already covers a lot of it.

I would still have some comparative advantage in working on that sort of thing, but (if my understanding of their project is correct), it would still be a replacement of either my safety or business efforts rather than a free win.

3 comments, sorted by Click to highlight new comments since: Today at 2:10 AM

I have some discussion here and links therein, particularly here.

Thanks for the links. The simboxy route/justification does interest me (in terms of the kind of work it would require, and what I naturally like doing), and there would indeed be a lot of shared work in the obvious version (likely more than 50%), but I also worry I'd be indulging myself in shiny-thing chasing if I don't have some extremely concrete objectives, or otherwise some equivalent of a customer who wants the thing to exist so they can use it.

The good news is that the natural outcome of the business side of things will yield a lot of stuff that could be repurposed for simbox-ish use cases regardless, but do you have any specific features/goals/experiments for the system in mind that you didn't already include in your previous posts?

Honestly I haven’t thought about it very much. Jacob seems to have thought about it more, you could consider asking him. I’m happy to chat, but expect something more like “group brainstorming” than “I tell you what to do”. Some obvious things, I think:

  • A smart agent in the sandbox can’t escape (insofar as that’s possible)
  • A smart agent in the sandbox can’t figure out that it’s in a sandbox (insofar as that’s possible)
  • Compatibility with ML agents run on GPU clusters (obviously)
  • I’m very interested in whether the agent will “care” about “people”. Can we put trained actors in the sim? (And tell them not to spill the beans that it’s a sim…) What about NPCs? NPCs are very different from people; what would we be learning? (I can imagine that if we make a nonverbal agent, and it prevents nonverbal NPCs from falling off a cliff, maybe that demonstrates something about my agent design, even if that’s far short of where we want to wind up.)
  • There’s a question of whether to do something analogous to (A) The baby grows up in the sim. It becomes a nice adult, as far as we can tell. Then we turn it off, and use the same design on a new baby that grows up with a robot body in the real world; (B) The baby grows up in the sim. It becomes a nice adult, as far as we can tell. Then we say “surprise, it’s a sim, here’s a portal to the real world” and hope it’s nice there too; (C) The baby grows up in the sim. It becomes a nice adult, as far as we can tell. Then we turn it off, and use the same design on a new baby that grows up in the sim, but this time we don’t keep it a secret that it’s a sim, and then do the portal thing when it’s older. (D) Something else, I dunno. For (B) in particular, it might be especially important that the sim resembles the real world, so that its beliefs & preferences transfer gracefully. Or maybe not, I dunno. In other cases besides (B), it’s not really obvious to me that 3D physics realism has a value-add over simple 2D things like I presume Encultured is doing. Maybe there is, I just don’t know.