I'm Steve Byrnes, a professional physicist in the Boston area. I have a summary of my AGI safety research interests at: https://sjbyrnes.com/agi.html
Not Adam, but
Maybe there's a sense in which everyone has already implicitly declared that they don't want to give feedback, because they could have if they wanted to, so it feels like more of an imposition.
Maybe it feels like "I want feedback for my own personal benefit" when it's already posted, as opposed to "I want feedback to improve this document which I will share with the community" when it's not yet posted. So it feels more selfish, instead of part of a community project. For that problem, maybe you'd want to frame it as "I'm planning to rewrite this post / write a follow-up to this post / give a talk based on this post / etc., can you please offer feedback on this post to help me with that?" (Assuming that's in fact the case, of course, but most posts have follow-up posts...)
Were the Clintons and Obamas "raising kids while both were pursuing intense, ambitious careers"? They were raising kids at some point in their lives, and they were both pursuing intense, ambitious careers at some point in their lives, but was it simultaneous? My impression (could be wrong) is that Hillary had an ambitious career before kids, paused it for a decade or two, and then got back into it when her kids were grown up. Michelle "cut back to part time" when her kids were 7 & 10, and quit entirely sometime afterwards, if I'm understanding Wikipedia correctly. (Anne-Marie Slaughter's book says that power couples in USA can get through infancy and early childhood to some extent, but it gets really hard for somewhat older kids and teens.)
My impression is that difficult for two people to raise kids while both are pursuing intense, ambitious careers. If it's only one of the two people, no problem. See Anne-Marie Slaughter's writings on this, for example. I'm interested if you know of counterexamples to that.
Reminds me of a quote from this Paul Christiano post: "It's a solution built to last (at most) until all contemporary thinking about AI has been thoroughly obsoleted...I don’t think there is a strong case for thinking much further ahead than that."
Good insight! Haven't seen that one before!
Think about a banging sound at 0dB, 1dB, ..., 90dB. Everyone will be bothered at 90dB. Nobody will notice at 0dB. Somewhere in between, it transitions from effortless-to-ignore to impossible-to-ignore. Where is that point? It's not determined by the laws of physics, it's arbitrary, it's a setting in your brain, and it's set differently in different people. Similar for touch. Neurotypical people will not be able to ignore a woodpecker pecking them in the back, but may find it effortless to ignore a shirt tag. Other people find that the shirt tag keeps drawing their attention, but a gentle enough touch sensation would not rise to attention.
The "intense world theory of autism" (about which I'll finish a blog post draft one of these years...) says that autism happens when empathetic social interactions are so overwhelming that the person learns early in life to just avoid thoughts and situations that solicit those feelings, including deliberately avoiding eye contact etc. Not coincidentally, people with autism are liable to have sensory sensitivity too, i.e. to feel overwhelmed by levels of sound and touch that are far lower than what it takes to overwhelm neurotypical people.
Anyway, long story short, we all have innate reactions to different stimuli, and the thresholds can be set differently. That's just the way it is, I think.
Yeah I was really only thinking about "not yet trust the AGI" as the main concern. Like, I'm somewhat hopeful that we can get the AGI to have a snap negative reaction to the thought of deceiving its operator, but it's bound to have a lot of other motivations too, and some of those might conflict with that. And it seems like a harder task to make sure that the latter motivations will never ever outbid the former, than to just give every snap negative reaction a veto, or something like that, if that's possible.
I don't think "if every option is bad, freeze in place paralyzed forever" is a good strategy for humans :-P and eventuality it would be a bad strategy for AGIs too, as you say.
Hmm, maybe I'm confused. Couple more questions, sorry if you've already answered them: (1) What are the differences / advantages / disadvantages between what you're proposing vs "make an off switch but don't tell the AGI about it"? (2) do you expect there to be another copy of the off-switch and its consequences (M) inside the St nodes? If so, is it one of "the arrows which traverse the walls of the node St"? Because I don't see any arrows from M to St.
I imagine an AGI world-model being a bit like a giant souped-up version of a probabilistic graphical model that can be learned from scratch and updated on the fly. I agree that if there's a node that corresponds to "I get turned off", and you know where it is, then you can block any chain of inference that passes through that node, which amounts to the same thing as deleting the node, i.e. "making the agent not know that this is a thing that can happen". Or a different approach would be, you could prevent that node from getting painted with a negative value (= reward prediction), or something like that, which vaguely corresponds to "I kinda like the idea that I can get turned off" if you do it right.
The big problem where I'm a bit stumped is how to reliably find the "I get turned off" node in the model. The world-model is going to be learned and changeable (I assume!). If you delete the node, the system could reinvent it. The system could piece together the existence of "I get turned off" as an abstract possibility having never seen it, or come up with four disconnected ways to think about the same thing, and then you need to find all four. I have thoughts but I'm interested in hearing yours. Or do you imagine that the programmer puts in the world-model by hand, or something?
Yeah Sowell's books says that immediately speaking full sentences is a pretty common pattern, or at least not unheard of. I think Teller was in that category.
In fact this is one reason I've long been skeptical of the people who say you need "embodied cognition" to get AGI. Passive predictive ("self-supervised") learning gets you pretty far by itself, such that you learned to speak complete sentences purely from predictive learning, not trial-and-error (or at least minimal trial-and-error).