Tests make creating AI hard

[-]gjm8y30

I think it may be enlightening to consider what happens if you get super-lucky and your first Model is perfectly accurate. I think what happens is that you end up solving the same problem as you would without the process proposed here and in the previous article, and running into the same problems.

It's possible that doing everything via an inaccurate model may help (by reducing overfitting, in some sense) but I don't think it's obvious that it does (since it will increase underfitting, in the same handwavy sense) and my guess is that it will help and hurt in different cases, and that it will be difficult to guess ahead of time which it will be.

[-]the gears to ascension8y30

Overfitting is the same thing as goodhart's law. If using an inaccurate model helps, it's because not trying too hard is necessary to avoid goodharting yourself.

[-]gjm8y30

That's ... pretty much what I thought I was saying. Was I unclear, or have I misunderstood you somehow?

[-]the gears to ascension8y10

Ah, I thought "in some sense" meant you weren't sure if you were using the metaphor correctly.

[-]Scott Garrabrant8y10

I do not think that overfitting is "the same thing as" Goodhart's law. I think Goodhart's law is more broad. One of the mechanisms by which it works is similar to overfitting, but there is a lot more to Goodhart's law than overfitting. In particular, I think the standard examples of Goodhart's law include adversaries that are trying to break your proxy in a way that does not show up in overfitting. See also https://agentfoundations.org/item?id=1621

[-][anonymous]8y20

If your model is perfectly accurate, yep this process is not needed.

I'd argue though if you had a perfect Model and a Model of the Measure, you wouldn't need the real Measure either. The Measure is just something to help you search. You would just create the good life or an AI or whatever hard to define thing you have a definition for.

[-]gjm8y20

My point isn't that if the model is perfectly accurate the process isn't needed.

My point is that if the model is perfectly accurate the process doesn't work (at least in the difficult cases) and that the process involves trying to improve the model all the time, so it's liable to push itself into a situation where it doesn't work.

In other words, I don't see that this process is really effective in saving you from Goodhart's law.

[-]Raemon2y20

Someone reached and noted the loebner prize link is a 404 now, and suggested this link. (I'm not sure if this is accurate, haven't looked into it, but passing it along)

https://www.aresearchguide.com/the-loebner-prize-in-artificial-intelligence.html

[-]the gears to ascension8y10

For example the algorithms aren't being designed to take hints about what is in an image to help it locate the object, because that is not what they are being tested for.

Yes they are, yes it is. Convnets learn to use context very heavily. In general, machine learning is entirely focused on measures.

[-][anonymous]8y20

Maybe, I should have expanded on what I mean by a hint. I think I wasn't clear. It is not the raw context, it is the ability to be spoilered by words. So a hint is non-visual information about the picture, which allows you to locate the object in question.

Does that make things clearer?

In general, machine learning is entirely focused on measures.

But what does it stop them becoming targets, per Goodhearts?

[-]the gears to ascension8y30

http://www.aclweb.org/anthology/W16-3204

But what does it stop them becoming targets, per Goodhearts?

This is unavoidable. If you optimize something, you're optimizing the thing. Your post suggesting a separation is an attempt at describing something real, but you misunderstood why there's a separation. You only have one true value function in your low level circuity in your head, and everything else is proxies; so, you have to check things against your value function. By saying "give your subordinates goals", you are giving them proxies to optimize, which will be goodharted if your subordinates are competent enough. You need to have a white box view of what's happening to explore and improve the goals you're giving, until you can find ones that accurately represent your internal values.

In machine learning, this just happens by running a model, trying it on a test set, and giving it a new training objective if it does badly on the thing you actually care about. The training objective is just a proxy, and you test it against a test set to ensure that if it is over-optimizing your proxy - also known as overfitting - then you'll notice the discrepancy. In value function learning, you'd still do this; you'd have some out-of-sample ethical problems in your test set, and you'd see how your AI does on them. This would be one of the ingredients in making a safe AI based on ML. See "Concrete problems in ai safety", Amodei 2016

It seems to me that your post can be summarized as "one needs continuous metrics to optimize for, not boolean tests". Does that seem wrong in any way?

[-][anonymous]8y30

That is not what I was trying to say at all.... Lets try math's notation. You have

Have a model M which is a function of the state of the world s and the target and hit, you take at time t, a1. The model gives a predicted state of the world at time 2, ps2.

So ps2 = M(s1, a1)

Lets say you have a measure U() over states of the world. Which gives a utility for the state.

U(s1) = u1

You build up a model of what U is Mu so that you don't have to try to hit every state to find the U of that state. So you have

Mu(s1) = pu1

A normal optimisation process just adjusts a1 until pu2 is maximised. So

f(a1) = argmax Mu(M(s1,a1))

What I'm arguing for is is that you are not getting high U on an poorly defined problem you want to be spending a lot of time adjusting M not adjusting a if you get a poor u.

If your model M is inaccurate that is ps2 != s2 you can improve your pu2 by updating your model to minimise the sqrt of error in the prediction at time t with the action that happened at t and the actual state at t+1 st+1

g(st) = argmin on M (sqrt(M(st,at)-st+1))

So in machine learnings case, your actions a are algorithms or systems you are testing and M is your model of what an intelligence is. If the turing test is giving your system a bad u, and you can't easily predict the real world intelligences (e.g. humans) with your model of intelligence that is M(st,at)-st+1 is large, update your model of what an intelligence should be doing so that you can better predict what an intelligence should be. This will give you enable you to pick a better.

To give an example less close to home. If you are trying to teach kids, don't just try lots of different teaching styles and see which ones produce the best test results for your tests. Instead have principled reasons for trying a particular teaching style. Try and explain why a teaching style was bad. If you can, create an explicitly bad teaching style and see whether it was as bad as you thought it would be. Once you have got a good model of learning styles to test results, then pick the optimal learning style for principled reasons. Else you could just give the kids the answers before hand, that would optimise the test results.

Does that clarify the difference?

I'll try and figure out formatting this properly in a bit.

[-]the gears to ascension8y10

yeah! I think we're actually saying the same things back at each other.

[-][anonymous]8y20

I was objecting to the continuous vs boolean distinction :) . I'd boil the article down to.

It is more important to optimise the model of the world , than it is acting in the world, if your model of the world is bad.

It is lucky that this is continuous function though.

LESSWRONG
LW

LESSWRONG
LW

-1

Tests make creating AI hard

-1

-1

Recap

Measure 1: The Turing test

Measure 2: Image net

Breaking the Test cycle

Conclusion