How does bee learning compare with machine learning?

[-]Rohin Shah5yΩ7110

It’s clear, however, that a bee’s brain can perform a wide range of tasks beside efew-shot image classification, while the machine learning model developed in (Lee et al., 2019) cannot.

The abstract objection here is “if you choose an ML model that has been trained just for a specific task (few-shot learning), on priors you should expect it to be more efficient than an evolution-designed organism that has been trained for a whole bunch of stuff, of which the task you’re considering is just one example”. This can cash out in several ways:

1. Bees presumably use their eyesight for all sorts of things (e.g. detecting movement, depth estimation, etc) whereas the ML model only needs to use features useful for classification.

2. Even restricting to classification, bees are probably way more general than ML models. I suspect the few-shot ML models you look at would probably not generalize to very different types of images (e.g. stylized images), whereas bees presumably would.

3. I don’t know how well bee training works; it seems plausible to me that sometimes bees just want to do something else and that’s why they don’t land on the right target. So maybe they can classify with 90% accuracy, but only actually go to the correct lever 60% of the time.

You could say that this means that a transformative model will also be way more efficient than a human at a transformative task -- I think this is somewhat true, but mostly depends on whether transformative tasks will require a fairly broad and general understanding of the world + doing various tasks within it. My guess is that while they don’t need as much generality as a human has, the human-TAI ratio will be much smaller than the bee-image classifier ratio.

My rough guess is that I’d change the conclusion by a factor of 50 or so, suggesting that a transformative model would require 20x less compute than the human brain.

[-]Ajeya Cotra5yΩ680

I mostly agree with your comment, but I'm actually very unsure about 2 here: I think I recall bees seeming surprisingly narrow and bad at abstract shapes. Guille would know more here.

[-]leni5yΩ150

I think Rohin's second point makes sense. Bees are actually pretty good at classifying abstract shapes (I mention a couple of studies that refer to this in the appendix about my choice of benchmark, such as Giurfa (1996)), so they might plausibly be able to generalize to stylized images.

[-]Ben Pace5y110

Hey guicost. This is a great first post, thanks for the fascinating doc!

For ease of readability for users, I've copied the full doc into the post (I'm an admin so I have the power to edit people's posts). If you disprefer that for any reason, please let me know right away and I'll undo the change, or you can just delete everything yourself. I spent <15 mins on it, it's no big deal.

The only edits I made were to put all the footnotes at the bottom (we don't have superscript so footnotes have the slightly uglier [n] styling) and some of the images were a bit funny when I brought them over (e.g. some of them became bigger images with other stuff in them) so I just inserted screenshots instead. Again, it's fully your post, and you are free to make whatever further edits you like.

[-]leni5y50

Hey Ben! Thanks for formatting the doc into the post, it looks great!

[-]Ben Pace5y20

You're welcome :)

[-]Ajeya Cotra5y40

Aww thanks Ben, that was really nice of you!

[-]Rohin Shah5yΩ780

Planned summary for the Alignment Newsletter:

The <@biological anchors approach@>(@Draft report on AI timelines@) to forecasting AI timelines estimates the compute needed for transformative AI based on the compute used by animals. One important parameter of the framework is needed to “bridge” between the two: if we find that an animal can do a specific task using X amount of compute, then what should we estimate as the amount of compute needed for an ML model to do the same task? This post aims to better estimate this parameter, by comparing few-shot image classification in bees to the same task in ML models. I won’t go through the details here, but the upshot is that (after various approximations and judgment calls) ML models can reach the same performance as bees on few-shot image classification using 1,000 times less compute.
If we plug this parameter into the biological anchors framework (without changing any of the other parameters), the median year for transformative AI according to the model changes from 2050 to 2035, though the author advises only updating to (say) 2045 since the results of the investigation are so uncertain. The author also sees this as generally validating the biological anchors approach to forecasting timelines.

Planned opinion:

I really liked this post: the problem is important, the approach to tackle it makes sense, and most importantly it’s very easy to follow the reasoning. I don’t think that directly substituting in the 1,000 number into the timelines calculation is the right approach; I think there are a few reasons (explained [here](https://www.alignmentforum.org/posts/yW3Tct2iyBMzYhTw7/how-does-bee-learning-compare-with-machine-learning?commentId=rcJuytMfdQNMb82rR), some of which were mentioned in the post) to think that the comparison was biased in favor of the ML models. I would instead wildly guess that this comparison suggests that a transformative model would use 20x less compute than a human, which still shortens timelines, probably to 2045 or so. (This is before incorporating uncertainty about the conclusions of the report as a whole.)

[-]marcospgp5y20

You compare inference computation, but the majority of resources are applied at training time. I believe the real number for hardware might be closer to the result you got for bees.

[-][anonymous]5y20

I have one query. How much better is it possible to do on this task? It bothers me that by stripping resolution, and giving the task to a being that only knows these training examples, it may simply not be very solvable, making these low accuracies due algorithms barely better than chance.

Also note that resnet-12 - or other variants of resnet - there exist numerous techniques for cutting down the computational requirements by at least an order of magnitude with minimal accuracy loss.

[-]leni5y10

The current SOTA models do very well (~90% accuracy) at few-shot learning tasks in the CIFAR-FS dataset [source], which has a comparable resolution to the images seen by bees, so I think that this task is quite solvable. Even bees and the models I discussed seem to do pretty well compared to chance.

Interesting to learn that compute figures can be brought down so much without accuracy loss! Could you point me to some reading material about this?

[-][anonymous]5y10

Two methods I have personally used:

quantization to int-8

model compression.

A third way is "sparse" networks - many of the weights end up being near zero, and you can simply neglect those, but you need your hardware to support sparse matrix convolution.

All of these methods have the tradeoff of a small decrease in accuracy for a large decrease in required compute.

And my point about "solvability" is that there is a certain amount of noise - entropy - in the images, such that a perfect classifier trained only on the image set, with infinite compute and the global maximumally performing model, still cannot reach 100%. As the finite set doesn't have enough information. (and no, you cannot deduce the 'seed' of our universe and play forward until that moment as you do not have enough information to do that, even with infinite compute, at least if your only information input is the image set. You would find too many other universes that match the conditions. Human beings trying to manually solve the image aren't a fair comparison because they are bringing in outside information that wasn't in the set)

So there is some true ceiling for any regression problem, and you would actually expect that a 'good' modern method might be acceptably close to the ceiling, or get there soon. (if the 'true ceiling' is 97% accuracy a model that is 95% is good enough for engineering purposes)

Or a simple example : for a mostly fair coin, you cannot infer the future outcome of a flip better than the bias of the coin itself.

[-]delton1374y10

This is pretty interesting. There is a lot to quibble about here, but overall I think the information about bees here is quite valuable for people thinking about where AI is at right now and trying to extrapolate forward.

A different approach, perhaps more illuminating would be to ask how much of a bee's behavior could we plausibly emulate today by globing together a bunch of different ML algorithms into some sort of virtual bee cognitive architecture - if say we wanted to make a drone that behaved like a bee ala Black Mirror. Obviously that's a much more complicated question, though.

I feel compelled to mention my friend Logan Thrasher Collins' paper, The case for emulating insect brains using anatomical "wiring diagrams" equipped with biophysical models of neuronal activity. He thinks we may be able to emulate the fruit fly brain in about 20 years at near-full accuracy, and this estimate seems quite plausible.

There were a few sections I skipped, if I have time I'll come back and do a more thorough reading and give some more comments.

The compute comparison seems pretty sketchy to me. A bee's visual cortex can classify many different things, and the part responsible for doing the classification task in the few shot learning study is probably just a small subset. [I think below Rohin made a similar point below.] Deep learning models can be pruned somewhat without loosing much accuracy, but generally all the parameters are used. Another wrinkle is the rate of firing activity in the visual cortex depends on the input, although there is a baseline rate too. The point I'm getting at is it's sort of an apples-to-oranges comparison. If the bee only had to do the one task in the study to survive, evolution probably would have found a much more economical way of doing it, with far fewer neurons.

My other big quibble I have is I would have made transparent that Cotra's biological anchors method for forecasting TAI assumes that we will know the right algorithm before the hardware becomes available. That is a big questionable assumption and thus should be stated clearly. Arguably algorithmic advancement in AI at the level of core algorithms (not ML-ops / dev ops / GPU coding) is actually quite slow. In any case, it just seems very hard to predict algorithmic advancement. Plausibly a team at DeepMind might discover the key cortical learning algorithm underlying human intelligence tomorrow, but there's other reasons to think it could take decades.

[-]Lech Mazur4y10

According to https://www.sciencedirect.com/science/article/pii/S0896627321005018?dgcid=coauthor

"Cortical neurons are well approximated by a deep neural network (DNN) with 5–8 layers "

"However, in a full model of an L5 pyramidal neuron consisting of NMDA-based synapses, the complexity of the analogous DNN is significantly increased; we found a good fit to the I/O of this modeled cell when using a TCN (my note: temporally convolutional network) that has five to eight hidden layers "

For best performance, the width was 256.

Since L5 neurons can perform as small neural nets, this might have implications for the computational power of brains.