Sorted by New

Wiki Contributions


Here's a good/accessible blog post that does a pretty good job discussing this topic.

I think that this is true of the original version of alphastar, but they have since trained a new version on camera inputs and with stronger limitations on apm (22 actions/5s) (Maybe you'd want some kind of noise applied to the inputs still, but I think the current state is much closer to human-like playing conditions.) See:

In other words, we should be telling children 'be careful of roads/cars' (including on Halloween) Not 'be careful of Halloween'

I agree with the post, but I will point out that you really do need to emphasize the utility per micromort here. If you keep your utility constant, it is the total risk that matters. Just like if you were going to go on a long car ride tomorrow (on safer-than-usual roads, but not enough to outweigh the total driving) and someone points out you're much more likely to die than usual - sure, you can point out 'ah yes, but the chance I die per-mile is lower than usual!' but that's not the right reference point if your utility isn't a function of the driving-amount.

All that said, the total number of deaths is only ~double on Halloween? That feels so insane, roads must be SO much safer than usual.

As you kind of say - there are already (at least decently smart/competent) people trying to do (almost) all of these things. For many of these projects, joining current efforts is probably a better allocation than starting your own effort, and most of the value to be added is if you're in the 99.5th+ %-ile (?) for the 'skills needed.' (or sometimes there's just not enough people working on a problem, or sometimes there's a place to add value if you're willing to do annoying work other people don't want to do - these are both rarer though, in the current funding regime)

Something I'd add to this list (or at least the bottom?) that I've heard a couple people mention would be useful is a nonprofit (regranting-like?) org whose primary goal is to hire international independent researchers in the Berkeley area and provide them with visas

Note that your prediction isn't interesting. Each year, conditioned on a doomsday not happening, it would be pretty weird for the date(s) to not have moved forward. 
Do you instead mean to say that you guess that the date will move forward each year by more than a year, or something like that?

Here are some objections I have to your post:
How are you going to specify the amount of optimization pressure the AI exerts on answering a question/solving a problem? Are you hoping to start out training a weaker AI that you later augment? 
If so, I'd be concerned about any distributional shifts in its optimization process that occur during that transition
If not, it's not clear to me how you have the AI 'be safe' through this training process.

At the point where you, the human, is labeling data to train the AI to identify concepts with measurements/feature - you now have a loss function that's dependent on human feedback, and which, once again, you can't specify in terms of the concepts you want the AI to identify. It seems like the AI is pretty incentivized to be deceptive here (or really at any point in the process).
I.e. if i's superintelligent and you accidentally gave it the loss function 'maximize paperclips', but it models humans as potentially not realizing they gave it this loss function, then I think it would act indistinguishably from an AI with the loss function you intended (at least during this stage of training you outline).

Even if, say, it does do things at first that look like things a paperclip maximizer would try to do, instead of whatever you actually want it to do (label things appropriately) - say, it tries to get a human user to upload it to the internet or something, but your safe-guards are sufficiently strong to prevent things like this - then I think as you train away actions like this, you're not just training it to have better utility functions or whatever, but you're training it to be more effectively deceptive.

I think the question of you/Adele miscommunicating is mostly under-specification of what features you want your test-AGI to have.

  • If you throttle its ability to optimize for its goals, see EY and Adele's arguments.

  • If you don't throttle in this way, you run into goal-specification/constraint-specification issues, instrumental convergence concerns and everything that goes along with it.

I think most people here will strongly feel a (computationally) powerful AGI with any incentives is scary, and that any test-versions should require using at-most a much-less-powerful one.

Sorry if I've misunderstood you at all. If you specify the nature of/goals/constraints etc of your test-AI more specifically, maybe I or someone else can try to give you more specific failure-modes.