steve2152's Shortform

by Steven Byrnes31st Oct 201920 comments
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
16 comments, sorted by Highlighting new comments since Today at 3:14 AM
New Comment

Some ultra-short book reviews on cognitive neuroscience

  • On Intelligence by Jeff Hawkins & Sandra Blakeslee (2004)—very good. Focused on the neocortex - thalamus - hippocampus system, how it's arranged, what computations it's doing, what's the relation between the hippocampus and neocortex, etc. More on Jeff Hawkins's more recent work here. (Side note: Jeff Hawkins was also a longtime collaborator with Dileep George, who went on to found Vicarious AI...Dileep has even more interesting things to say about cortical algorithms. I am reading every paper Dileep has ever written and plan to write up a blog post.)

  • I am a strange loop by Hofstadter (2007)—I dunno, I didn't feel like I got very much out of it, although it's possible that I had already internalized some of the ideas from other sources. I mostly agreed with what he said. I probably got more out of watching Hofstadter give a little lecture on analogical reasoning (example) than from this whole book.

  • Consciousness and the brain by Dehaene (2014)—very good. Maybe I could have saved time by just reading Kaj's review, there wasn't that much more to the book beyond that.

  • Conscience by Patricia Churchland (2019)—I hated it. I forget whether I thought it was vague / vacuous, or actually wrong. Apparently I have already blocked the memory!

  • How to Create a Mind by Kurzweil (2014)—Parts of it were redundant with On Intelligence (which I had read earlier), but still worthwhile. His ideas about how brain-computer interfaces are supposed to work (in the context of cortical algorithms) are intriguing; I'm not convinced, hoping to think about it more.

  • Rethinking Consciousness by Graziano (2019)—A+, see my review here

  • The Accidental Mind by Linden (2008)—Lots of fun facts. The conceit / premise (that the brain is a kludgy accident of evolution) is kinda dumb and overdone—and I disagree with some of the surrounding discussion—but that's not really a big part of the book, just an excuse to talk about lots of fun neuroscience.

  • The Myth of Mirror Neurons by Hickok (2014)—A+, lots of insight about how cognition works, especially the latter half of the book. Prepare to skim some sections of endlessly beating a dead horse (as he dubunks seemingly endless lists of bad arguments in favor of some aspect of mirror neurons). As a bonus, you get treated to an eloquent argument for the "intense world" theory of autism, and some aspects of predictive coding.

  • Surfing Uncertainty by Clark (2015)—I liked it. See also SSC review. I think there's still work to do in fleshing out exactly how these types of algorithms work; it's too easy to mix things up and oversimplify when just describing things qualitatively (see my feeble attempt here, which I only claim is a small step in the right direction).

  • Rethinking innateness by Jeffrey Elman, Annette Karmiloff-Smith, Elizabeth Bates, Mark Johnson, Domenico Parisi, and Kim Plunkett (1996)—I liked it. Reading Steven Pinker, you get the idea that connectionists were a bunch of morons who thought that the brain was just a simple feedforward neural net. This book provides a much richer picture.

Dear diary...

[this is an experiment in just posting little progress reports as a self-motivation tool.]

1. I have a growing suspicion that I was wrong to lump the amygdala in with the midbrain. It may be learning by the same reward signal as the neocortex. Or maybe not. It's confusing. Things I'm digesting: (and references therein) and

2. Speaking of mistakes, I'm also regretting some comments I made a while ago suggesting that the brain doesn't do backpropagation. Maybe that's true in a narrow sense, but Randall O'Reilly has convinced me that the brain definitely does error-driven learning sometimes (I already knew that), and moreover it may well be able to propagate errors through at least one or two layers of a hierarchy, with enough accuracy to converge. No that doesn't mean that the brain is exactly the same as a PyTorch / Tensorflow Default-Settings Deep Neural Net.

3. My long work-in-progress post on autism continues to be stuck on the fact that there seem to be two theories of social impairment that are each plausible and totally different. In one theory, social interactions are complex and hard to follow / model for cognitive / predictive-model-building reasons. The evidence I like for that is the role of the cerebellum, which sounds awfully causally implicated in autism. Like, absence of a cerebellum can cause autism, if I'm remembering right. In the other theory, modeling social interactions in the neurotypical way (via empathy) is aversive. The evidence I like for that is people with autism self-reporting that eye contact is aversive, among other things. (This is part of "intense world theory".) Of those two stories, I'm roughly 100% sold on the latter story is right. But the former story doesn't seem obviously wrong, and I don't like having two explanations for the same thing (although it's not impossible, autism involves different symptoms in different people, and they could co-occur for biological reasons rather than computational reasons). I'm hoping that the stories actually come together somehow, and I'm just confused about what the cerebellum and amygdala do. So I'm reading and thinking about that.

4. New theory I'm playing with: the neocortex outputs predictions directly, in addition to motor commands. E.g. "my arm is going to be touched". Then the midbrain knows not to flinch when someone touches the arm. That could explain why the visual cortex talks to the superior colliculus, which I always thought was weird. Jeff Hawkins says those connections are the neocortex sending out eye movement motor commands, but isn't that controlled by the frontal eye fields? Oh, then Randall O'Reilly had this mysterious throwaway comment in a lecture that the frontal eye fields seem to be at the bottom of the visual hierarchy if you look at the connections. (He had a reference, I should read it.) I don't know what the heck is going on.

modeling social interactions in the neurotypical way (via empathy) is aversive

Is it too pessimistic to assume that people mostly model other people in order to manipulate them better? I wonder how much of human mental inconsistency is a defense against modeling. Here on Less Wrong we complain that inconsistent behavior makes you vulnerable to Dutch-booking, but in real life, consistent behavior probably makes you even more vulnerable, because your enemies can easily predict what you do and plan accordingly.

I was just writing about my perspective here; see also Simulation Theory (the opposite of "Theory Theory", believe it or not!). I mean, you could say that "making friends and being nice to them" is a form of manipulation, in some technical sense, blah blah evolutionary game theory blah blah, I guess. That seems like something Robin Hanson would say :-P I think it's a bit too cynical if you mean "manipulation" in the everyday sense involving bad intent. Also, if you want to send out vibes of "Don't mess with me or I will crush you!" to other people—and the ability to make credible threats is advantageous for game-theory reasons—that's all about being predictable and consistent!

Again as I posted just now, I think the lion's share of "modeling", as I'm using the term, is something that happens unconsciously in a fraction of second, not effortful empathy or modeling.

Hmmm... If I'm trying to impress someone, I do indeed effortfully try to develop a model of what they're impressed by, and then use that model when talking to them. And I tend to succeed! And it's not all that hard! The most obvious strategy tends to work (i.e., go with what has impressed them in the past, or what they say would be impressive, or what impresses similar people). I don't really see any aspect of human nature that is working to make it hard for me to impress someone, like by a person randomly changing what they find impressive. Do you? Are there better examples?

I have low confidence debating this, because it seems to me like many things could be explained in various ways. For example, I agree that certain predictability is needed to prevent people from messing with you. On the other hand, certain uncertainty is needed, too -- if people know exactly when you would snap and start crushing them, they will go 5% below the line; but if the exact line depends on what you had for breakfast today, they will be more careful about getting too close to it.

Fair enough :-)

Branding: 3 reasons why I prefer "AGI safety" to "AI alignment"

  1. When engineers, politicians, bureaucrats, military leaders, etc. hear the word "safety", they suddenly perk up and start nodding and smiling. Safety engineering—making sure that systems robustly do what you want them to do—is something that people across society can relate to and appreciate. By contrast, when people hear the term "AI alignment" for the first time, they just don't know what it means or how to contextualize it.

  2. There are a lot of things that people are working on in this space that aren't exactly "alignment"—things like boxing, task-limited AI, myopic AI, impact-limited AI, non-goal-directed AI, AGI strategy & forecasting, etc. It's useful to have a term that includes all those things, and I think that term should be "AGI safety". Then we can reserve "AI alignment" for specifically value alignment.

  3. Actually, I'm not even sure that "value alignment" is exactly the right term for value alignment. The term "value alignment" is naturally read as something like "the AI's values are aligned with human values", which isn't necessarily wrong, but is a bit vague and not necessarily interpreted correctly. For example, if love is a human value, should the AGI adopt that value and start falling in love? No, they should facilitate humans falling in love. When people talk about CIRL, CEV, etc. it seems to be less about "value alignment" and more about "value indirection" (in the C++ sense), i.e. utility functions that involve human goals and values, and which more specifically define those things by pointing at human brains and human behavior.

A friend in the AI space who visited Washington told me that military leaders distinctly do not like the term "safety".

Because they're interested in weapons and making people distinctly not safe.

Right, for them "alignment" could mean their desired concept, "safe for everyone except our targets".

I'm skeptical that anyone with that level of responsibility and acumen has that kind of juvenile destructive mindset. Can you think of other explanations?

Can you think of other explanations?

There's a difference between people talking about safety in the sense of 1. 'how to handle a firearm safely' and the sense of 2. 'firearms are dangerous, let's ban all guns'. These leaders may understand/be on board with 1, but disagree with 2.

Interesting. I guess I was thinking specifically about DARPA which might or might not be representative, but see Safe Documents, Safe Genes, Safe Autonomy, Safety and security properties of software, etc. etc.

Quick comments on "The case against economic values in the brain" by Benjamin Hayden & Yael Niv :

(I really only skimmed the paper, these are just impressions off the top of my head.)

I agree that "eating this sandwich" doesn't have a reward prediction per se, because there are lots of different ways to think about eating this sandwich, especially what aspects are salient, what associations are salient, what your hormones and mood are, etc. If neuroeconomics is premised on reward predictions being attached to events and objects rather than thoughts, then I don't like neuroeconomics either, at least not as a mechanistic theory of psychology. [I  don't know anything about neuroeconomics, maybe that was never the idea anyway.]

But when they float the idea of throwing out rewards altogether, I'm not buying it. The main reason is: I'm trying to understand what the brain does algorithmically, and I feel like I'm making progress towards a coherent picture ...and part of that picture is a 1-dimensional signal called reward. If you got rid of that, I just have no idea how to fill in that gap. Doesn't mean it's impossible, but I did try to think it through and failed.

There's also a nice biological story going with the algorithm story: the basal ganglia has a dense web of connections across the frontal lobe, and can just memorize "this meaningless set of neurons firing is associated with that reward, and this meaningless set of neurons firing is associated with that reward, etc. etc." Then it (1) inhibits all but the highest-reward-predicting activity, and (2) updates the reward predictions based on what happens (TD learning). (Again this and everything else is very sketchy and speculative.)

(DeepMind had a paper that says there's a reward prediction probability distribution instead of a reward prediction value, which is fine, that's still consistent with the rest of my story.)

I get how deep neural nets can search for a policy directly. I don't think those methods are consistent with the other things I believe about the brain (or at least the neocortex). In particular I think the brain does seem to have a mechanism for choosing among different possible actions being considered in parallel, as opposed to a direct learned function from sensory input to output. The paper also mentions learning to compare without learning a value, but I don't think that works because there are too many possible comparisons (the square of the number of possible thoughts).

In the era of COVID, we should all be doing cardio exercise if possible, and not at a gym. Here's what's been working for me for the past many years. This is not well optimized for perfectly working out every muscle group etc., but it is very highly optimized for convenience, practicality, and sustainability, at least for me personally in my life situation.

(This post is mostly about home cardio exercise, but the last paragraph is about jogging.)

My home exercise routine consists of three simultaneous things: {exercise , YouTube video lectures , RockMyRun}. More on the exercise below. RockMyRun is a site/app that offers music mixes at fixed BPMs—the music helps my energy and the fixed BPM keeps me from gradually slowing down the pace. The video lectures make me motivated to work out, since there's a lot of stuff I desperately want to learn. :)

Previously I've done instead {exercise, movies or TV}. (I still do on rare occasions.) This is motivating when combined with the rule of "no movies or TV unless exercising (or on social special occasions)". I've pretty much followed that rule for years now.

My exercise routine consists of holding a dumbbell in each hands, then doing a sort of simultaneous reverse-lunge while lifting one of the dumbbells, alternating sides, kinda like this picture. Out of numerous things I've tried, this is the one that stuck, because it's compatible with watching TV, compatible with very small spaces including low ceilings, has low risk of injury, doesn't stomp or make noise, doesn't require paying attention (once you get the hang of it), and seems to be a pretty good cardio workout (as judged by being able to break a sweat in a freezing cold room). I also do a few pushups now and then as a break, although that means missing what's on the screen. I've gradually increased the dumbbell weight over the years from 3lbs (1.4kg) to now 15lbs (7kg).

I strongly believe that the top priority for an exercise routine is whatever helps you actually keep doing it perpetually. But beyond that, I've found some factors that give me a more intense workout: Coffee helps slightly (it's a performance-enhancing drug! At least for some people); feeling cold at the beginning / being in a cold room seems to help; awesome action-packed movies or TV are a nice boost, but RockMyRun with boring video lectures is good enough. (My most intense workouts are watching music videos or concert recordings, but I get bored of those after a while.)

In other news, I also occasionally jog. RockMyRun is also a really good idea for that, not just for the obvious reasons (energy, pace), but because, when you set the BPM high, your running form magically and effortlessly improves. This completely solved my jogging knee pain problems, which I had struggled with for years. (I learned that tip from here, where he recommends 160BPM. I personally prefer 180BPM, because I like shorter and more intense runs for my time-crunched schedule.)