I'm Steve Byrnes, a professional physicist in the Boston area. I have a summary of my AGI safety research interests at:


Can you get AGI from a Transformer?

Would you have an opinion of how memory and generative models interact? To jog the discussion, I’d like to bring up Kaj_Sotala’s hypothesis that, “Type 2 processing is a particular way of chaining together the outputs of various Type 1 subagents using working memory.” [1]. Type 1 subagents here is (I think) similar to Society of Mind. Type 2 processing is the, informally speaking, more deliberate, abstract, and memory-intensive, of the two types of processing.

Did you see the part where I linked that post? Here's the quote

you can learn a Generative Model X that invokes a Generative Model Y and then invokes either Generative Model Z or Z' depending on some feature of Y. Then Z or Z' can invoke yet different generative models, and so on. That kind of process enables us to create an ad hoc crappy serial computer ... and we call that "deliberation"! See also Kaj’s post.

I have a 1-sentence take on working memory here. I haven't thought about it beyond that...

Are there contrived (& hopefully simple) tasks that are available to be run on computers to test the performance of implementations that aim to conduct type 2 processing?

Probably ... that seems like the kind of thing that people would be trying to make benchmarks for. But I don't know.

This post says, “Since generative models are simpler (less information content) than reverse / discriminative models, they can be learned more quickly.” Is this true? I’ve always had the impression that it’s the opposite. It’s easier to tell the apart, say, cats and dogs (discriminative model) than it is to draw cats and dogs (generative model). Most children first learn to discriminate between different objects before learning how to draw/create/generate them.

Well, my claim (and the claim of Jeff Hawkins, Yann LeCun, the “predictive coding” people like Friston, etc.) is that the neocortex is constantly predicting what input signals it will receive next, and updating its models when the predictions are wrong. So when humans tell apart cats and dogs, I would call that generative. ResNet image classifiers are discriminative, but they need a lot of samples to learn to do so, so I'm not sure whether that counts as "easy" / "low-information-content". Drawing on paper is kinda hard for humans (at least for me), but that's not really a fair comparison, the generative part is just imagining a cat you've never seen before, and then the hard part (for me) is copying that imagined image onto a piece of paper. Of course I don't put too much stock in that comparison either. Maybe generative models are hard but feel easy because that's how our brains are designed. Anyway, I feel good about the math or engineering example: it's low-information-content to be able to answer the question "what would happen if I do steps X, Y, Z" and higher information-content to be able to answer the question "what steps do I take in what order to prove the theorem / invent the gadget?". The case of imagery is less obvious, now that you mention it. But it at least seems plausible to me that a probabilistic program to draw any possible eye (combining shape, textures, lighting, certain types of noise, reflections, etc.) has less complexity (fewer bits) than an equally good eye-detecting discriminative model. Note that our discriminative models (like ResNet image classifiers) are not usually "equally good", because they are more prone to failing out-of-distribution—e.g. they will incorrectly classify certain patterns of random static as eyes.

Inner Alignment in Salt-Starved Rats

FYI: In the original version I wrote that the saltwater spray happens when the rats press the lever. It turns out I misunderstood. During training, the saltwater spray happens immediately after the lever appears, whether or not the rats touch the lever. Still, the rats act as if the lever deserves the credit / blame for causing the spray, by nibbling the lever when they like the spray (e.g. with the sugar-water control lever), or staying away from the lever if they don't. Thanks Kent Berridge for the correction. I've made a few other corrections since posting this but I think I marked all the other ones in the text.

Idealized Factored Cognition

Maybe you said it somewhere, but where do definitions of new terms and concepts fit in? Would that be a type of statement?

Inner alignment in the brain

Ok, thanks for helping me understand. Hmm. I hypothesize the most fundamental part of why I disagree with them is basically what Eliezer talked about here as "explaining" vs."explaining away". I think they're looking for an explanation, i.e. a thing in the world whose various properties match the properties of consciousness and qualia as they seem to us. I'm much more expecting that there is no thing in the world meeting those criteria. Rather I think that this is a case where our perceptions are not neutrally reporting on things in the world, and thus where "the way things seem to us" is different than the way things are.

Or maybe I just narrowly disagree with QRI's ideas about rhythms and harmony and so on. Not sure. Whenever I try to read QRI stuff it just kinda strikes me as totally off-base, so I haven't spent much time with it, beyond skimming a couple articles and watching a talk on Connectome-Specific Harmonic Waves on YouTube a few months ago. I'm happy to have your help here :-)

As for valence, yes I think that valence is in an input to the neocortex subsystem (just as vision is an input), although it's really the neocortex subsystem observing the activity of other parts of the brain, and incidentally those other parts of the brain also depend in part on what the neocortex is doing and has been doing.

Inner alignment in the brain

I noticed someone already linked a QRI talk and you didn't think it was talking about the same thing, so I'm back to being confused.

When I wrote here "Thanks but I don't see the connection between what I wrote and what they wrote", I did not mean that QRI was talking about a different phenomenon than I was talking about. I meant that their explanation is wildly different than mine. Re-reading the conversation, I think I was misinterpreting the comment I was replying to; I just went back to edit.

Needless to say I disagree with the QRI explanation of valence, but there's such a chasm between their thoughts and my thoughts that it would be challenging for me to try to write a direct rebuttal.

Again, I do think they're talking about the same set of phenomena that I'm talking about.

My prior understanding of valence, which is primarily influenced by the Qualia Research Institute, was as the ontologically fundamental utility function of the universe. The claim is that every slice of experience has an objective quantity (its valence) that measures how pleasant or unpleasant it is. This would locate 'believing valence is a thing' as a subset of moral realism.

I don't think there's anything fundamental in the universe besides electrons, quarks, photons, and so on, following their orderly laws as described by the Standard Model of Particle Physics etc. Therefore it follows that there should be an answer to the question "why do people describe a certain state as pleasant" that involves purely neuroscience / psychology and does not involve the philosophy of consciousness or any new ontologically fundamental entities. After all, "describing a certain state as pleasant" is an observable behavioral output of the brain, so it should have a chain of causation that we can trace within the underlying neural algorithms, which in turn follows from the biochemistry of firing neurons and so on, and ultimately from the laws of physics. So, that's what I was trying to do in that blog post: Trace a chain of causation from "underlying neural algorithms" to "people describing a state as pleasant".

After we do that (and I have much more confidence that "a solution exists" than "my particular proposal is the 100% correct solution") we can ask: is there a further unresolved question of how that exercise we just did (involving purely neuroscience / psychology) relates to consciousness and qualia and whatnot. My answer would be "No. There is nothing left to explain.", for reasons discussed more at Book Review: Rethinking Consciousness, but I acknowledge that's a bit counterintuitive and can't defend it very eloquently, since I haven't really dived into the philosophical literature.

Inner alignment in the brain

Thanks! No worries! I'm in the process of understanding this stuff too :-P

in the AGI case, we want to make sure the inner objective is as close to the outer objective as possible, whereas in the brain, we want to make sure the outer objective doesn't corrupt the inner objective.

I'm not sure I agree with the second one. Maybe my later discussion in Mesa-Optimizers vs Steered Optimizers is better:

You try a new food, and find it tastes amazing! This wonderful feeling is your subcortex sending a steering signal up to your neocortex. All of the sudden, a new goal has been installed in your mind: eat this food again! This is not your only goal in life, of course, but it is a goal, and you might use your intelligence to construct elaborate plans in pursuit of that goal, like shopping at a different grocery store so you can buy that food again.

It’s a bit creepy, if you think about it!

“You thought you had a solid identity? Ha!! Fool, you are a puppet! If your neocortex gets dopamine at the right times, all of the sudden you would want entirely different things out of life!”

Yes I do take the perspective of the inner optimizer, but I have mixed feelings about my goals changing over time as a result of the outer layer's interventions. Like, if I taste a new food and really like it, that changes my goals, but that's fine, in fact that's a delightful part of my life. Whereas, if I thought that reading nihilistic philosophers would carry a risk of making me stop caring about the future, I would be reluctant to read nihilistic philosophers. Come to think of it, neither of those is a hypothetical!

Are these 'patterns' the same as the generative models?

Yes. Kaj calls (a subset of) them "subagents", I more typically call them "generative models", Kurzweil calls them "patterns", Minsky calls this idea "society of mind", etc.

And does 'randomly generated' mean that, if I learn a new pattern, my neocortex generates a random set of neurons that is then associated with that pattern from that point onward?

Yes, that's my current belief fwiw, although to be clear, I only think it's random on a micro-scale. On the large scale, for example, patterns in raw visual inputs are going to be mainly stored in the part of the brain that receives raw visual inputs, etc. etc.

So, if a meditator says that they have mastered mindfulness to the point that they can experience pain without suffering, your explanation of that (provided you believe the claim) would be, they have reprogrammed their neocortex such that it no longer classifies the generally-pain-like signals from the subcotex as pain?

Sure, but maybe a more everyday example would be a runner pushing through towards the finish line while experiencing runner's high, or a person eating their favorite spicy food, or whatever. It's still the same sensors in your body sending signals, but in those contexts you probably wouldn't describe those signals as "I am in pain right now".

As for valence, I was confused about valence when I wrote this, and it's possible I'm still confused. But I felt less confused after writing Emotional Valence Vs RL Reward: A Video-Game Analogy. I'm still not sure it's right—just the other day I was thinking that I should have said "positive reward prediction error" instead of "reward" throughout that article. I'm going back and forth on that, not sure.

Inner Alignment in Salt-Starved Rats

Most model-based RL algorithms I've seen assume they can evaluate the reward functions in arbitrary states.

Hmm. AlphaZero can evaluate the true reward function in arbitrary states. MuZero can't—it tries to learn the reward function by supervised learning from observations of past rewards (if I understand correctly). I googled "model-based RL Atari" and the first hit was this which likewise tries to learn the reward function by supervised learning from observations of past rewards (if I understand correctly). I'm not intimately familiar with the deep RL literature, I wouldn't know what's typical and I'll take your word for it, but it does seem that both possibilities are out there.

Anyway, I don't think the neocortex can evaluate the true reward function in arbitrary states, because it's not a neat mathematical function, it involves messy things like the outputs of millions of pain receptors, hormones sloshing around, the input-output relationships of entire brain subsystems containing tens of millions of neurons, etc. So I presume that the neocortex tries to learn the reward function by supervised learning from observations of past rewards—and that's the whole thing with TD learning and dopamine.

I added a new sub-bullet at the top to clarify that it's hard to explain by RL unless you assume the planner can query the ground-truth reward function in arbitrary hypothetical states. And then I also added a new paragraph to the "other possible explanations" section at the bottom saying what I said in the paragraph just above. Thank you.

I don't see how you solve this problem in general in a sample-efficient manner otherwise.

Well, the rats are trying to do the rewarding thing after zero samples, so I don't think "sample-efficiency" is quite the right framing.

In ML today, the reward function is typically a function of states and actions, not "thoughts". In a brain, the reward can depend directly on what you're imagining doing or planning to do, or even just what you're thinking about. That's my proposal here.

Well, I guess you could say that this is still a "normal MDP", but where "having thoughts" and "having ideas" etc. are part of the state / action space. But anyway, I think that's a bit different than how most ML people would normally think about things.

Final Version Perfected: An Underused Execution Algorithm

This seems useful and sensible, thanks!

asking whether the new task is something you want to do more than the old task

Why not just say "whether the new task is a better thing to do than the old task"? Like, there's the rule to start your day with whatever has the biggest "ugh field", i.e. whatever is most painful. I get that you're using "want to do more than" broadly, but still, it carries the wrong connotation for me. :-)

Open & Welcome Thread – November 2020

Also: favorite parenting books for ages 0-2: No Bad Kids, Ferber, Anthropology of Childhood, Oh Crap, King Baby, Bringing Up Bebe

Open & Welcome Thread – November 2020

Age 4-7ish learning resources recommendations! Needless to say, different kids will take to different things. But here's my experience fwiw.

MATH - DragonBox - A+, whole series of games running from basic numeracy through geometry and algebra. Excellent gameplay, well-crafted, kid loves them.

Number Blocks (Netflix) - A+, basic numeracy, addition, concept of multiplication. Kid must have watched each episode 10 times and enthused about it endlessly.

Counting Kingdom - A+, Mastering mental addition. Excellent gameplay; fun for adults too. Note: Not currently available on ipad; I got it on PC Steam.

Slice Fractions 1 & 2 - A+, Teaches fractions. Great gameplay, great pedagogy.

An old-fashioned pocket calculator - A+, an underrated toy.

LITERACY: Explode The Code book - A, been around since at least the 1980s, still good.

Teach Your Monster To Read - B+, gameplay is a bit repetitive & difficulty progresses too quickly, but got a few hours of great learning in there before he lost interest

Poio - A-, Good gameplay, kid really liked it. Limited scope but great for what it is.

For reading, no individual thing seemed to make a huge difference and none of them kept his interest too long. But it all added up, bit by bit, and now he's over the hump, reading unprompted. Yay!

PROGRAMMING - Scratch Jr - A+ , duh

Load More