Sequences

Introduction to Computational Mechanics

Wiki Contributions

Comments

I find this focus on task structure and task decomposition to be incredibly important when thinking about what neural networks are doing, what they could be doing in the future, and how they are doing it. The manner in which a system understands/represents/instantiates task structures and puts them in relation to one another is, as far as I can tell, just a more concrete way of asking "what is it that this neural network knows? what cognitive abilities does it have? what abstractions is it making? under what out of distribution inputs will it succeed/fail, etc."

This comment isn't saying anything that wasn't in the post, just wanted to express happiness and solidarity with this framing!

I do wonder if the tree-structure of which-task and then task algorithm is what we should expect, in general. I have nothing super concrete to say here, my feeling is just that the manners in which a neural network can represent structures and put them in relation to eachother may be instantiated differently than a tree (with that specific ordering). The onus is probably on me here though - I should come up with a set of tasks in certain relations that aren't most naturally described with tree structures.

Another question that comes to mind is, is there a hard distinction between categorizing which sub-task one is in and the algorithm which carries out the computation for a specific subtask. Is it all just tasks all the way down?

I think you might need to change permissions on your github repository?

The blog post linked says it's from August. Is there something new I'm missing?

This is so cool! Thanks so much, I plan to go through it in full when I have some time. For now, I was wondering if the red circled matrix multiplication should actually be reversed, and the vector should be column (ie. matrix*column, instead of row*matrix). I know the end result is equivalent but it seems in order to be consistent it should be switched, ie in every other example of a vector with leg sticking out leftward its a column vector? maybe this really doesnt matter since I can just turn the page upside down and then b would be on the left with a leg sticking out to the right..., but the fact that A dot b = b.T dot A is itself an interesting fact.

Just to add to Carl Feynman's response, which I thought was good.

Part of the reason these systems are inefficient is because it requires you to (effectively) run gradient descent even at inference, even after training is over. Or you can run the RNN, which is mathematically equivalent but again you can see where the inefficiency comes in: the value at time t=3 is a function of the value at time t=2, which is a function of t=1 and so on, so in order to get the converged value of the activations you have to, in a for loop, compute each timestep one by one.

This is in contrast to a feedforward network like a (normal) convnet or transformer, which can run extremely quickly and in parallel on gpu.

Thanks! 

I think your thinking makes sense, and, if for instance on every timestep you presented a different images in a stereotypically defined sequence, or with a certain correlation structure, you would indeed get information about those correlations in the weights. However, this model was designed to be used in the restricted to settings where you show a single still image for many timesteps until convergence. In that setting, weights give you image features for static images (in a heirarchical manner), and priors for low level features will feed back from activations in higher level areas.

There are extensions to this model that deal with video, where there are explicit spatiotemporal expectations built into the network. you can see one of those networks in this paper: https://arxiv.org/abs/2112.10048

But I've never implemented such a network myself.

First, brains (and biological systems more generally) have many constraints that artificial networks do not. Brains exist in the context of a physically instantiated body, with heavy energy constraints. Further, they exist in specific niches, with particular evolutionary histories, which has enormous effects on structure and function.

Second, biological brains have different types of intelligence from AI systems, at least currently. A bird is able to land fluidly on a thin branch in windy conditions, while gpt4 can help you code. In general, the intelligences that one thinks of in the context of AGI do not totally overlap with the varied, often physical and metabolic, intelligences of biology.

All that being said, who knows what future AI systems will look like

Thanks so much for this comment (and sorry for taking ~1 year to respond!!). I really liked everything you said.

For 1 and 2, I agree with everything and don't have anything to add.
3. I agree that there is something about the input/output mapping that is meaningful but it is not everything. Having a full theory for exactly the difference, and what the distinctions between what structure counts as interesting internal computation (not a great descriptor of what I mean but can't think of anything better right now) vs input output computation would be great.

4. I also think a great goal would be in generalizing and formalizing what an "observer" of a computation is. I have a few ideas but they are pretty half-baked right now.

5. That is an interesting point. I think it's fair. I do want to be careful to make sure that any "disagreements" are substantial and not just semantic squabling here. I like your distinction between representation work and computational work. The idea of using vs. performing a computation is also interesting. At the end of the day I am always left craving some formalism where you could really see the nature of these distinctions.

6. Sounds like a good idea!

7. Agreed on all counts.

8. I was trying to ask the question if there is anything that tells us that the output node is semantically meaningful without reference to e.g. the input images of cats, or even knowledge of the input data distribution. Interpretability work, both in artificial neural networks and more traditionally in neuroscience, always use knowledge of input distributions or even input identity to correlate activity of neurons to the input, and in that way assign semantics to neural activity (e.g. recently, othello board states, or in neuroscience jennifer aniston neurons or orientation tuned neurons) . But when I'm sitting down with my eyes closed and just thinking, there's no homonculus there that has access to input distributions on my retina that can correlate some activity pattern to "cat." So how can the neural states in my brain "represent" or embody or whatever word you want to use, the semantic information of cat, without this process of correlating to some ground truth data. WHere does "cat" come from when theres no cat there in the activity?!

9. SO WILD

Can you explain what you mean by second or third order dynamics? That sounds interesting. Do you mean e.g. the order of the differential equation or something else?

This is not obvious to me. It seems somewhat likely that the multimodaility actually induces more explicit representations and uses of human-level abstract concepts, e.g. a Jennifer Aniston neuron in a human brain is multimodal.

Load More