Posts

Sorted by New

Wiki Contributions

Comments

Answer by IlioSep 18, 202210

Why were we so sure that strong enough AIs playing go would develop (what we can describe as a) fear of bad aji (latent potential)?

Well, we weren’t. As far as I know, nobody ever predict that. But in retrospect we should have, just because aji is such an important concept to master this game.

Similarly, if we’re looking for a generic mechanism that would led an AI to develop agency, I suspect any task would do as long as interpreting the data as from agency-based behaviors helps enough.

First they optimized for human behavior - that’s how they understood agency. Then they evaluate how much agency explain their own behavior - that’s how they noticed increasing it helps their current tasks. Rest is history.

Thanks, that clarifies your aims a lot. Did you gave some thoughts on how your approach would deal with cases of embodied cognition and uses of external memories?

In what sense is the functional behavior different from the internals/actual computations? Could you provide a few toy examples?

Answer by IlioAug 25, 2022-10

Here’s one candidate reason for P vs NP: the hard instances of any NPC problem are often the same as the hard instances of any other NPC problem, including a (yet to be formalized) problem that will turn equivalent to proving P vs NP. Then, it’s hard to prove almost by definition.

Love the idea. How efficient! :)

About mental breaks, I guess this might helps creativity for the same reason meditation and naps help partial consolidation of memory traces (see below for a recent thesis showing these effects).

https://qspace.library.queensu.ca/bitstream/handle/1974/27576/Dastgheib_Mohammad_202001_MSC.pdf?sequence=3&isAllowed=y

Specifically, I would speculate that consolidation means reorganizing memories, and that reorganizing memories helps making sense of this information.

Love it, and love the general idea of seeing more ml-like interpretations of neuroscience knowledge.

One disagreement (but maybe I should say: one addition to a good first-order approximation) is over local information: I think it includes some global information, such as sympathetic/parasympathetic level through heart beat, and that the brain may may actually use that to help construct/stabilize long range networks, such as the default node network.

Answer by IlioMay 29, 20224

That’s a subtly complicated question. I’ve been trying to write a blog post about it, but wander between two ways of addressing it.

First, we could summarize everything in just one sentence: « Deep learning can solve increasingly interesting problems, with less and less in manpower (and slightly more and slightly more in womenpower), and now is time to panick. » Then the question reduce to a long list of point-like « Problem solved! », and a warning it’s about to include the problem of finding increasingly interesting new problems.

A less consensual and more interesting way is to identify a series of conceptual revolutions that summarize and interpret what we learned so far. Or, at least, my own subjective and still preliminary take on what we learned. At this moment I’d count three conceptual revolutions, spread over different works in the last decade or two.

First, we learned how to train deep neural networks, and, even more important from a conceptual point of view, that the result mimicks/emulates human intuition/prejudices.

Second, we learned how to use self-play and reinforcement learning to best any human player on any board game (the drosophila of AI) which means this type of intelligence is now solved.

Third, we learned that semantics is data compression, and how learning to manipulate semantics with « attention » leads to increasingly impressive performances on new unknown cognitive tasks.

Fourth… but do we really need a fourth? In a way yes: we learned that reaching these milestones is doable without a fully conscious mind. It’s dreaming. For now.

I wish I was wise enough at your age to post my gut feeling on internet so that I could better update later. Well, internet did not exist, but you got the idea.

One question after gwern’s reformulation: do you agree that, in the past, technical progress in ML almost always came first (before fundamental understanding)? In other words, is the crux of your post that we should no longer hope for practical progress without truly understanding why what we do should work?

Answer by IlioMay 21, 20226

It may be the deepest thing we understand about NN (but I might got stoned for suggesting we actually know the answer). See lalaithion’s link for one way to see it. My own take is as follow:

First, consider how many n-sphere(s) of radius slightly below 1/2 you can pack in a n-dimensional unit cube. When n is low, « one » is the obvious answer. When n is high, the true answer is different. You can find the demo on internet, and if you’re like me you’ll need some time to accept this strange result. But when you do, you will realize high dimensions means damn big, and that’s the key insight.

Second, consider that training is the same as looking for a n-dimensional point (one dimension for each weight) in a normalized unit cube. Ok, you got it now: gradient-descent (kind of) always work in high dimensions because high dimensions means a damn big number of possible directions and quasi-solutions, so large that by pigeonhole principle you can’t really have dead ends or swamp traps as in low dimensions.

Third, you understand that’s all wrong and you were right from the start: what we thought were solutions frequently present bizarre statistical properties (think adversarial examples) and you need to rethink what generalization means. But that’s for another ref.

https://dl.acm.org/doi/abs/10.1145/3446776

Pardon the half-sneering tone, but old nan can’t resist: « Oh, my sweet summer child, what do you know of fearing noob gains? Fear is for AI winter, my little lord, when the vanishing gradient problem was a hundred feet deep and the ice wind comes howling out of funding agencies, cutting every budget, dispersing the students, freezing the sparse spared researchers..

Seriously, three years is just a data point, and you want to conclude on the rate of change! I guess you would agree 2016-2022 saw more gains than 2010-2016, and not because the latter were boring times. I disagree that finding out what big transformers could do in the three last years was not a big deal, or even that this was low hanging fruits. I guess that it was low hanging fruits for you, because of the tools you were having access to, and I interpret your post as a deep and true intuition that the next step shall demand different tools (I vote for: « clever inferences from functional neuroscience & neuropsychology»). In any case, welcome on lesswrong and thanks for your precious input! (even if old nan was amazed you were expecting even faster progress!)

Load More