Glass box learners want to be black box

[-]mishka6mo100

On one hand, it does not have to be a black box. If we consider one of the programming formalisms allowing to take linear combination of programs, one can consider decomposition of programs into series, and one can retain quite a bit of structure this way.

I think, capability-wise something close to “glass boxes” (in terms of structure, but not necessarily behavior) can be done.

But implications for safety are uncertain. On one hand, even very simple dynamic systems can have complex and unpredictable behavior. And even more so, when we add self-modification to the mix. So the structure can be transparent, but this might not always translate into transparency of behavior.

And then, the ability to understand these systems is a double-edged sword (it is a strong capability booster, and makes it much easier to improve those AI systems).

[-]Cole Wyeth6mo20

Can you expand on the first paragraph?

[-]mishka6mo50

Yes, any “neuromorphic formalism” would do (basically, one considers stream-oriented functional programs, and one asks for streams in question to admit linear combinations of streams, and the programs end up being fairly compact high-end neural machines with small number of weights).

I can point you to a version I’ve done, but when people translate small specialized programs into small custom-synthesized Transformers, that’s in the same spirit. Or when people craft some compact neural cellular automata with small number of parameters, it is also in that spirit.

Basically, as long as programs themselves end up being expressible as some kind of sparse connectivity tensors, you can consider their linear combinations and series.

[-]Donald Hobson6mo70

Would an elegant implementation of AGI be safe? What about an elegant and deeply understood implementation?

My impression is that it's possible for an elegant and deeply understood implementation to block particular failure modes. An AI with a tightly bounded probability distribution can't get pascal mugged.

Being well understood isn't sufficient. Often it just means you know how the AI will kill you.

And "Safe" implies blocking all failure modes.

[-]Donald Hobson6mo40

Looked up Blum's speedup theorem.

Simplified structure of proof. Consider a listing of all turing machines. T_1, T_2 ...

The problem over which the speed up theorem applies involves simulating the first N Turing machines, but applying MUCH more compute to those that come first in the listing.

So your simulating Turing machine i for 2^^(N-i) (note double up arrow) steps.

By caching the outputs of the first few Turing machines, you can get a faster algorithm.

But, at some point, you get a Turing machine that never halts, but can't be proved never to halt in ZFC.

At this point, your slower algorithm has to actually run the Turing machine, and your faster algorithm just magically knows that it never halts.

Blums speedup comes from replacing a long running computation, with a look up table full of uncomputable data. Computability theories "there exists an algorithm" allows you to magically know any finite amount of data.

Given the Blum infinite sequence of ever faster algorithms, there is a fastest algorithm that can be proved to work in ZFC.

[-]Steven Byrnes6mo44

Yeah I don't understand where Eliezer is coming from in that tweet (and he's written similar things elsewhere).

I don't know as much about algorithms as you; my argument centers instead on the complexity of the world.

Like, here a question: is "tires are usually black", and stuff like that, encoded directly in the source code?

If yes, then (1) that's unrealistic, and (2) even if it happened, the source code would wind up horrifically complicated and inscrutable, because it's a complicated world.

If no, then the source code must be defining a learning algorithm of some sort, which in turn will figure out for itself that tires are usually black. Might this learning algorithm be simple and legible? Yes! But that was also true for GPT-3 too, which Eliezer has always put in the inscrutable category.

So what is he taking about? He must have some alternative in mind, but I'm unsure what.

[-]Cole Wyeth6mo71

I suppose the learning process could work in a more legible way - for instance, it's not clear why neural networks generalize successfully. But this seems to be more closely related to the theoretical understanding of learning algorithms than their knowledge representation.

[-]Donald Hobson6mo20

For instance, there is no computable predictor which always learns to do as well as every other computable predictor on all sequences.

What's going on here is quite interesting. For any number N, a version of AIXI with time and length bound N is in some sense optimal for it's compute.

So (in the limit), what you want is a really big N.

One way to do that is to pick a probability distribution over Turing machines, then store Chaitin's constant ( p(halt) ) to some number of digits. Start running more and more Turing machines for longer and longer until that many halt, and the time taken is your number.

If we take this probability distribution over Turing machines as our prior, then the Turing machines that this algorithm does badly on are the ones that don't halt in time N. And each extra digit of Chaitin's constant, on average, halves the measure of such Turing machines.

[-]Cole Wyeth6mo20

That’s an interesting way of putting it, though it really has more to do with Solomonoff induction than AIXI, and isn’t really related to AIXI-tl which uses proof search.

[-]Donald Hobson6mo20

Yes. Sorry wrong words. I was more meaning Solomonov induction here.

^{^}

Does the answer seem obvious to you? Of the people who think the answer is obvious, I weakly expect disagreement about both the answer and the reason(s). If you have a strong opinion, I'd appreciate comments both before and after reading the rest of the post.

^{^}

I think I may have read this phrasing somewhere, but I can't recall where.

^{^}

See Carnap's quest for a universal "confirmation rule" (in the modern context, usually viewed as a sequence prediction method) and Putnam's diagonal argument for impossibility.

^{^}

See Professor Tom Sterkenburg's excellent thesis: Universal Prediction

^{^}

I am currently (as of writing) working out of LISA for ARENA 5.0.

^{^}

I feel slightly vindicated by this, which I have to admit is comforting since I have changed my mind about a lot of other things lately!

LESSWRONG
LW

LESSWRONG
LW

49

Glass box learners want to be black box

49

Ω 18

49

Ω 18

Optimality as a boundary

Approaching the boundary

Implications for safety