Martin Randall

Posts

Sorted by New

Wiki Contributions

Comments

It's not consensual. It can't be. Then again, much of parenting isn't. All we can do is apologize, whether it's "I'm sorry that you were born and you wish you weren't" or "I'm sorry we lived in France and you wish we hadn't" or anything else.

I hope I made good choices and I will never know.

Thanks for trying to get to the heart of it.

Do you agree that, given a sufficiently different architecture - e.g. a human who had a dog's brain implanted somehow - would grow to have different values in some respect?

Yes, according to shard theory. And thank you for switching to a more coherent hypothetical. Two reasons are different reinforcement signals and different capabilities.

Reinforcement signals

Brain -> Reinforcement Signals -> Reinforcement Events -> Value Formation

The main way that brain architecture influences values, according to Quintin Pope and Alex Turner, is via reinforcement signals:

Assumption 3: The brain does reinforcement learning. According to this assumption, the brain has a genetically hard-coded reward system (implemented via certain hard-coded circuits in the brainstem and midbrain). In some[3] fashion, the brain reinforces thoughts and mental subroutines which have led to reward, so that they will be more likely to fire in similar contexts in the future. We suspect that the “base” reinforcement learning algorithm is relatively crude, but that people reliably bootstrap up to smarter credit assignment.

Dogs and humans have different ideal diets. Dogs have a better sense of smell than humans. Therefore it is likely that dogs and humans have different hard-coded reinforcement signals around the taste and smell of food. This is part of why dog treats smell and taste different to human treats. Human children often develop strong values around certain foods. Our hypothetical dog-human child would likely also develop strong values around food, but predictably different foods.

As human children grow up, they slowly develop values around healthy eating. Our hypothetical dog-human child would do the same. Because they have a human body, these healthy eating values would be more human-like.

Capabilities

Brain -> Capabilities -> Reinforcement Events -> Value Formation

Let's extend the hypothetic further. Suppose that we have a hybrid dog-human with a human body, human reinforcement learning signals, but otherwise a dog brain. Now does shard theory claim they will have the same values as a human?

No. While shard theory is based on the theory of "learning from scratch" in the brain, "Learning-from-scratch is NOT blank-slate". So it's reasonable to suppose, as you do, that the hybrid dog-human will have many cognitive weaknesses compared to humans, especially given its smaller cortex.

These cognitive weaknesses will naturally lead to different experiences, different reinforcement events. Shard theory claims that "reinforcement events shape human value shards"". Accordingly shard theory predicts different values. Perhaps the dog-human never learns to read, and so it does not value reading.

On the other hand, once you learn an agent's capabilities, this largely screens off its brain architecture. For example, to the extent that deafness influences values, the influence is the same for deafness caused by a brain defect and deafness caused by an ear defect. The dog-human would likely have similar values to a disabled human with similar cognitive abilities.

Where is the disagreement?

My current guess is that you agree with most of the above, but disagree with the distilled claim that "value formation is very path dependent and architecture independent".

I think the problem is that this is very "distilled". Pope and Trout claim that some human values are convergent: "We think that many biases are convergently produced artifacts of the human learning process & environment". In other words, these values end up being path independent.

I discuss above that "architecture independent" is only true if you hold capabilities and reinforcement signals constant. The example given in the distillation is "a sufficiently large transformer and a sufficiently large conv net, given the same training data presented in the same order".

I'm realizing that this distillation is potentially misleading when moved from the AI context to the context of natural intelligence.

I definitely don't want newest first. The magic (new and upvoted) sort seems to work well. The default (top scoring) sort too. A concrete example is this post on "an especially elegant evpsych experiment". It's not the newest, but it is top-scoring.

I think an AI could reasonably convert ancient discussions to light use of threads to preserve most of the conversation flow, where it has value.

While I like the older recommendations, they highlight the problems with the discussion threads on the older (17yo) content. Often we have learned things in the last 17 years and comments that provide relevant updates would ideally be upvoted and be the first thing I read after the article. But the comments are sorted oldest-first by default. I rarely find the oldest comments the most useful. I can change the sort but I think the default sort discourages comments of the type I wish to read.

Is the algorithm getting a signal on positive vs negative voting, or are all votes treated as a positive signal?

If you retrieve the standard cached deep wisdom for don’t go overboard on admiring science, you will find thoughts like “Science gave us air conditioning, but it also made the hydrogen bomb” ... But the people who originated such thoughts ... Probably they didn’t like something science had to say about their pet beliefs, and sought ways to undermine its authority.

Among early critics of science making the hydrogen bomb were Einstein, Oppenheimer, Leo Szilard, and Bertrand Russell. They didn't like the risk of mass death, civilizational collapse, and possible human extinction. They weren't trying to undermine the authority of science.

If someone is so deep into a "happy death spiral" about science that nuclear weapons don't make them blink that is a severe case. I think it can be an effective argument for milder cases. Certainly my love for science was held in check by reading about AI extinction risk in 1997.

More generally I think that noticing the skulls of your Great Idea is often a cure. If someone is getting a happy death spiral about the USA, it helps if they notice the slavery. If industrialization, notice the global warming. If Christianity, notice the Inquisition. And so on.

Sort of covered here ("along with"):

Why wasn't a concept of "inclusive genetic fitness" programmed into us, along with a library of explicit strategies?

Pointing to the variety of brains and values in animals doesn't persuade me because they also have a wide variety of environments and experiences. Shard theory predicts a wide variety of values as a result (tempered by convergent evolution).

One distinctive prediction is in cases where the brain is the same but the experiences are different. You agree that "between humans, one person may value very different things from the next". I agree and would point to humans throughout history who had very different experiences and values. I think the example of football vs reading understates the differences, which include slavery vs cannibalism.

The other distinctive prediction is in cases where the brain is different but the experiences are the same. So for example, consider humans who grow up unable to walk, either due to a different brain or due to a different body. Shard theory predicts similar values despite these different causes.

The shard theory claim here is as quoted, "value formation is ... relatively architecture independent". This is not a claim about skill formation, eg learning to walk. It's also not a claim that architecture can never be causally upstream of values.

I see shard theory as a correction to Godshatter theory and its "thousand shards of desire". Yudkowsky writes:

And so organisms evolve rewards for eating, and building nests, and scaring off competitors, and helping siblings, and discovering important truths, and forming strong alliances, and arguing persuasively, and of course having sex...

Arguing persuasively is a common human value, but shard theory claims that it's not encoded into brain architecture. Instead it's dependent on the experience of arguing persuasively and having that experience reinforced. This can be the common experience of a child persuading their parent to give them another cookie.

There is that basic dependence of having a reinforcement learning system that is triggered by fat and sugar. But it's a long way from there to here.

Perhaps the majority of spouses think they unconditionally love their partners, and think they are unconditionally loved back, but some are wrong. Prediction is hard.

... if you love me unconditionally, presumably you love everything else as well, since it is only conditions that separate me from the worms.

It's worse than that. If Alice loves Bob unconditionally, everything else loves Bob as well, since it is only conditions that separate Alice from the worms. If Alice loves Bob unconditionally, then Alice also hates Bob, since it is only conditions that separate love from hate. If Alice loves Bob unconditionally, she also loved Bob during the Big Bang, since it is only conditions that separate the present from the past.

Fortunately, this isn't how language works, or the word "unconditional" would be unusable. If Alice loves Bob unconditionally, this means that there are no conditions other that Alice, Bob, and (present tense) love. If Germany surrenders unconditionally to the Allies, it means that there are no conditions other than Germany, the Allies, and surrender. If Target offers unconditional refunds to veterans on purchases of clothing, there are no conditions other than Target, refund, veterans, purchases, and clothing.

Even if we instead assume that by ‘unconditional’, people mean something like ‘resilient to most conditions that might come up for a pair of humans’, ...

Not quite. More like "resilient with very high probability to conditions that might come up for this pair of humans". Unconditional love for a toddler may be resilient to biting and kicking and punching. Unconditional love for a mentally sound adult may not.

The general principle is that when people speak they don't need to qualify high confidence statements. I can say something like "Bob is not going to commit treason" without qualifiers, assuming that Bob is not the treasonous type. Similarly I can say "Alice loves Bob unconditionally" without qualifying that in a wild conspiracy-theory hypothetical where Bob is committing treason, and Alice is about to find out, then Alice might soon have different feelings.

If I want to be more precise, I can use qualifiers -  "completely unconditional", "effectively unconditional", "mostly unconditional", "somewhat conditional", etc.

... my impression is that [unconditional love] is still too rare to warrant being the main point on the love-conditionality scale that we recognize.

I agree it's rare, but it's an important point to label. It's decision-relevant for marriage - if there is a 50% chance that Bob will get plump and a 50% chance that Alice would stop loving Bob if he got plump, then there is a 25% chance that their love will end for that reason. Depending on context, that may be reason enough not to marry (yet).

It's also decision-relevant when deciding if to become a parent. Parents are expected to care for their children, unpaid and unconditionally, until adulthood. Providing unpaid care without love is hard. If a prospective parent cannot see themselves loving their children unconditionally, that may be reason enough not to have kids (yet).

Historically these have been key life decisions, so it seems natural that we have a common phrase to describe the desired states for making them.

Load More