Can you be Not Even Wrong in AI Alignment?

Mar 19, 2022

I think you simplified the problem too much to be solvable. If the tuple (Secret, Observable) represents the entire problem-relevant state of the world, then there is not enough structure for you to be able to train the AI.

In the original ELK problem, the "secret" is a sequence of interactions with the world that are too complex to be processed by humans. But a simple sequence of interactions with the world would not be too complex to be processed by humans. So you could hope that you could somehow create a method to robustly extrapolate from the simple interactions to the complex ones.

By collapsing it down into a tuple of booleans, there is not much structure to learn, and therefore not much hope to extrapolate.

[-]throwaway82384y20

My intent when simplifying the problem was to demonstrate that you must explicitly relax certain assumptions to make any progress.

The families of proposals in https://www.lesswrong.com/posts/zjMKpSB2Xccn9qi5t/elk-prize-results mostly fit into this lens:

* Train a reporter that is useful to an auxiliary AI -> The Secret is still unmanaged, so you have arbitrary information escape. Even if the reporter could only reply yes/no to exactly one question, you still have non-finite entropy using only the time to respond. Resolving this requires having knowledge ... (read more)

Timothy Underwood

Mar 19, 2022

That feels like a real and substantive response to me. Ie the amount of feedback that would go to a response that feels intelligent and in the same ballpark, but not promising (in the view of the researcher replying). I don't think the reply and non reply to your followup should be taken as a signal of anything.

I would note, I know absolutely nothing about the technical aspects of the matter. I am rather thinking about your attitude as similar to stories of people in the ea space who apply to two jobs, go through several rounds of interviews, and then decide that not getting the job in the end meant they were a terrible candidate.

[-]throwaway82384y40

Thank you. Thinking about it from a recruiting lens is helpful. They handled hundreds of submissions. There's a lot of noise regardless of the (dis)merit of any submission. Absence of substantive feedback should be treated as a weak signal, if a signal at all. It's not a "you're not cut out for this!", and not even a "you're not even wrong".

Ilio

Mar 20, 2022

As for your question, below are three « Not Even Wrong »-liky moments I experienced while reading your text.

Disclaimers:

It’s hard to sound nice, so please keep in mind my answer is a leap of faith you’re the kind of person smart enough to benefit from, or even enjoy, cold blood assassination of some of the chains of words that they happen to have produced at some point; I have no privileged knowledge of what the ELK folks who read your proposal actually thought, so their feeling might differ entirely; unpolished formatting sorry!

from Setup

/There is an Answer that yields True or False. /There is an Observable that yields True when a real or fake diamond is present, False otherwise. /There is a Secret that yields True when there is a fake diamond present, False otherwise. /There is an Agent that can read the Secret, read the Observable, and write the Answer.

/The inner workings of the Agent are unknown.

=> Of course the inner working of the Agent is known! From the very definitions you just provided, it must implement some variation on:

def AgentAnswer(Observable, Secret):

    if Observable and not Secret: 
            yield True
    else:
            yield False

from Analysis

In the context of our registers, we want: Registers: Desired Answer: Observable True, Secret False True Observable True, Secret True False Observable False, Secret False or True False

If all states are equally likely, the desired Answer states are not possible without access to the Secret in the training data.

=> If all states are equally likely, then the desired Answer states is possible with probability 1/2 without access to the Secret (and your specificity is impressive https://ebn.bmj.com/content/23/1/2 ). Again I’m just restating what your previous assumptions literally mean.

=> But there is more: the ELK challenge, or at least my vision of it, is not about getting the right answer most of the time. It’s about getting the right answer in the worse case scenario, e.g. when you are fighting some intelligence trying to defeat your defenses. In this context, the very idea of starting from probabilistic assumption about the initial states sounds not even wrong/missing the point.

So, at this point I would pause, recognize that my reading was maybe too literal, and give it a try at a non literal reading. Except, I can’t. I have no idea how to reformulate your assumptions so that they make sense, nor can I see what you’re trying to hint at. So, at this point I would simply go lazy and wait for you or some one else to extract the gold if there’s any. Not because I demonstrated there’s none. Just because I proved myself that, if there is, then I’m not equipped to see it yet.

Hope that helps, and in any case best luck on next iteration!

[-]throwaway82384y20

Great, I can see some places where I went wrong. I think you did a good job of conveying the feedback.

This is not so much a defense of what I wrote as it is an examination of how meaning got lost.

=> Of course the inner working of the Agent is known! From the very definitions you just provided, it must implement some variation on:

```
def AgentAnswer(Observable, Secret):
if Observable and not Secret:
yield True
else:
yield False
```

This would be our desired agent, but we don't... (read more)

1Ilio4y

As for the meta-objective of identifying weaknesses in (my) usual thought processes, thanks so much for this detailed answer! To me the most impressive part is how we misunderstood each other on a key point, despite we actually agree on this point. Specifically, we both agree that ELK specifications must be relaxed or include self-contradictions (you for reasons that I now feel kind of well explained in your original writings, despite I was completely confused just before your last answer!). But you took for grant that your unknown reader would understand that’s what’s you were trying to prove. I, on the other hand, though this need for relaxation was so obvious that it (to provide interesting relaxations) was the core of the ELK challenge. In other words, I would read your writings assuming you wanted to show the best relaxation you could find, whereas you would write while expecting me (as a surrogate for ELK evaluators) to challenge or find this conclusion surprising. Also, it seems that we can reach a similar conclusion about the « worse case analysis »: I thought this was something we may need to demonstrate/clarify; you thought this was so obvious I wouldn’t possibly misinterpret you as suggesting the opposite. Ilove symetries. :)

LESSWRONG
LW

LESSWRONG
LW

22

[ Question ]

Can you be Not Even Wrong in AI Alignment?

22

22

3 Answers sorted by
top scoring

Mar 19, 2022

Mar 19, 2022

Mar 20, 2022

Disclaimers:

from Setup

from Analysis

22

[ Question ]

Can you be Not Even Wrong in AI Alignment?

22

22

3 Answers sorted by top scoring

Mar 19, 2022

Mar 19, 2022

Mar 20, 2022

Disclaimers:

from Setup

from Analysis

3 Answers sorted by
top scoring