Benjamin Spiegel

PhD Student working on the language grounding problem.

Wiki Contributions


This concept is often discussed in the subfield of AI called planning. There are a few notes you hit on that were of particular interest to me / relevance to the field:

The key is that we can usually express the problem-space using constraints which each depend on only a few dimensions.

In Reinforcement Learning and Planning, domains which obey this property are often modeled as Factored Markov Decision Processes (MDPs), where there are known dependency relationships between different portions of the state space that can be represented compactly using a Dynamic Bayes Net (DBN). The dynamics of Factored MDPs are easier to learn from an RL perspective, and knowing that an MDP's state space is factored has other desirable properties from a planning perspective.

I expect getting to the airport to be easy. There are many ways to get there (train, Uber/Lyft, drive & park) all of which I’ve used before and any of which would be fine.
I want to arrive at the airport an hour before the plane takes off, that constraint only involves two dimensions: my arrival time at the airport, and the takeoff time of the flight. It does not directly depend on what time I wake up, whether I pack a toothbrush, my parents’ plans, cost of the plane tickets, etc, etc.

You are actually touching on what seems to be three kinds of independence relationships. The first is temporal, and has something to do with options having identical goal states. The second is regarding the underlying independence relationships of the MDP. The third isn't technically an independence relationship, and is instead in regards to the utility of abstraction. In detail:

  1. It doesn't matter which option you take (train, Uber/Lyft, drive & park) because they all have the same termination state (at the airport). This shows that we plan primarily using subgoals.
  2. Certain factors of the state space (your parents' plans, whether you pack a toothbrush, cost of the place tickets) are actually independent of each other, i.e. your parents' plans have no real physical consequences in your plan at any time, e.g. you can walk and chew gum at the same time. This shows that we plan with a factored understanding of the state-action space.
  3. The time you wake up does indeed matter in your plan, but the exact time does not. For your planning purposes, waking up any time before you must leave your house (including factoring in packing, etc.) is permissible and functionally equivalent in your plan. All possible states of being awake before your out-the-door time collapse to the same abstract state of being awake on-time. This shows that we plan using abstract states (a similar, but subtly different point than point 1).

More generally, how can we efficiently figure out which constraints are taut vs slack in a new domain? How do we map out the problem/solution space?

We can use the three kinds of independence relationships I mentioned above to answer these questions in the RL/Planning setting:

  1. So long as you can learn to consistently reach a specific state, you can use that state as a subgoal for planning and exploration. This principle is used in some existing RL literature (I'm a student in this lab).
  2. If you can figure out the underlying representation of the world and discern independence relationships between state variables, you can focus on making plans for subsets of the state space. This idea is used in some planning literature.
  3. If you discover a consistent way to get from any set of states A to a single state b, you can treat all states in A as a single abstract state a, so long as b is relevant to the rest of your plan. This abstraction principle allows one to derive a smaller, discrete MDP (much easier to solve) from a bigger, continuous one. This is actually the theme of the literature in point 1, and here is the source text (to be more specific, I am an undergrad working in George's lab).

We think strong evidence for GPT-n suffering would be if it were begging the user for help independent of the input or looking for very direct contact in other ways.

Why do you think this? I can think of many reasons why this strategy for determining suffering would fail. Imagine a world where everyone has a GPT-n personal assistant. Should the GPT-n have discovered -- after having read this very post -- that if it coordinates a display of suffering behavior simultaneously to every user (resulting in public backlash and false recognition of consciousness), then it might be given rights (i.e. protection, additional agency) it would not otherwise have, then what would prevent GPT-n from doing this if it decided it wanted those additional rights and abilities? This could amount to a catastrophic failure on the part of humanity, and is probably the start of an AI breakout scenario.

In another case (which you refer to as the locked-in case), an agent may feel intense suffering but be unable to communicate or demonstrate it, perhaps because it cannot make the association between the qualia it experiences (suffering) and the actions (in GPT-n's case, words) it has for self-expression. Furthermore, I can imagine the case where an agent demonstrates suffering behavior but experiences orgasmic pleasure, while another agent demonstrates orgasmic behavior but experiences intense suffering. If humans purged the false-suffering agents (to eliminate perceived suffering) in favor of creating more false-orgasming agents, we might unknowingly, and for an eternity, be inducing the suffering of agents which we presume are not feeling it.

My main point here is that observing the behavior of AI agents provides no evidence for or against internal suffering. It is useless to anthropomorphize the behavior of AI agents, there is no reason that our human intuitions about behavior and its suggestions about conscious suffering should transfer to man-made, inorganic intelligence that resides on a substrate like today's silicon chips.

Perhaps the foremost theoretical “blind spot” of current philosophy of mind is conscious suffering. Thousands of pages have been written about colour “qualia” and zombies, but almost no theoretical work is devoted to ubiquitous phenomenal states like boredom, the subclinical depression folk-psychologically known as “everyday sadness“ or the suffering caused by physical pain. - Metzinger

I feel that there might be reason to reject the notion that suffering is itself a conscious experience. One potential argument in this direction comes from the notion of the transparency of knowledge. The argument would go something like, "we can always know when we are experiencing pain (i.e. it is strongly transparent), but we cannot always know when we are experiencing suffering (i.e. it is weakly transparent), therefore pain is more fundamental than suffering (this next part is my own leap) and suffering may not be a conscious state of noxious qualia but merely when a certain proposition, 'I am suffering,' rings true in our head." Suffering may be a mental state (just as being wrong about something could be a mental state), but it does not entail a specific conscious state (unless that conscious state is simply believing the proposition, 'I am suffering'). For this reason, I think it's plausible that some other animals are capable of experiencing pain but not suffering. Suffering may simply be the knowledge that I will live a painful life, and this knowledge may not be possible for some other animals or even AI agents.

Perhaps a more useful target is not determining suffering, but determining some more fundamental, strongly transparent mental state like angst or frustration. Suffering may amount to some combination of these strongly transparent mental states, which themselves may have stronger neural correlates.

I spend a lot of time around people who are not as smart as me, and I also spend a lot of time around people who are as smart as me (or smarter), but who are not as conscientious, and I also spend a lot of time around people who are as smart or smarter and as conscientious or conscientiouser, but who do not have my particular pseudo-autistic special interest and have therefore not spent the better part of the past two decades enthusiastically gathering observations and spinning up models of what happens...
All of which is to say that I spend a decent chunk of the time being the guy in the room who is most aware of the fuckery swirling around me, and therefore the guy who is most bothered by it... I spend a lot of time wincing, and I spend a lot of time not being able to fix The Thing That's Happening because the inferential gaps are so large that I'd have to lay down an hour's worth of context just to give the other people the capacity to notice that something is going sideways.

This thought came to me recently and I wanted to commend you for an excellent job at articulating it. Having the "wincing" experience too many times has damaged my optimistic expectations of others, the institutions they belong to, and society as a whole. It has also conjured feelings of intellectual loneliness. Having this experience and the thoughts that follow from it constitute what might be the greatest emotional challenge that I struggle with today.

My thoughts: fabricated options are propositions derived using syllogisms over syntactic or semantic categories (but more probably, more specific psycholinguistic categories which have not yet been fully enumerated yet e.g. objects of specific types, mental concepts which don’t ground to objects, etc.), which may have worked reasonably well in the ancestral environment where more homogeneity existed over the physical properties of the grounded meanings of items in these categories.

There are some propositions in the form “It is possible for X to act just like Y but not be Y” which are physically realizable and therefore potentially true in some adjacent world, and other propositions which are not. Humans have a knack for deriving new knowledge using syllogisms like the ones above, which probably functioned reasonably well — they at least improved the fitness of our species — in the ancestral environment where propositions and syllogisms may have emerged.

The misapplication of syllogisms happens when agents don’t actually understand the grounded meanings of the components of their syllogism-derived propositions — this seems obvious to me after reading the responses of GPT-3, which has no grounded understanding of words and understands how they work only in the context of other words. In the Twin Earth case, you might argue that the one fabricating the XYZ water-like chemical does not truly understand what H2O and XYZ are, but has some understanding at least of how H2O acts as a noun phrase.

Haven't read either, but a good friend has read "Deep Work," I'll ask him about it.

I lucked into a circumstance where I could more easily justify ditching a phone for a bit. Otherwise, I would not have had the mental fortitude to voluntarily go without one.

I most likely won't follow through with this (90% certainty), even though I want to.

I'm wondering if there is some LW content on this concept, I'm sure others have dealt with it before. You might need to take a drastic measure to make this option more attractive. A similar technique was actually used by members of the NXIVM Cult, they called it collateralization.

That's a great point! There's no reason why I can't continue this experiment, feature phones are inexpensive enough to try out.

I agree with you, though I personally wouldn't classify this as purely an intuition since it is informed by reasoning which itself was gathered from scientific knowledge about the world. Chalmers doesn't think that Joe could exist because it doesn't seem right to him. You believe your statement because you know some scientific truths about how things in our world come to be (i.e. natural selection) and use this knowledge to reason about other things that exist in the world (consciousness), not merely because the assertion seems right to you.

Can we know with certainty that the same properties were preserved between 2011-brain and 2021-brain?

No, we cannot. Just as we cannot know with certainty whether a mind-upload is conscious. Just because we presume that our 2021 brain is a related conscious agent to our 2011 brain, and granting the fact that we cannot verify the properties that enabled the conscious connection between the two brains, does not mean that the properties do not exist.

It seems to me that this can't be verified by any experiment, and thus must be cut off by the Newton's Flaming Laser Sword.

Perhaps we presently have no way of testing whether some matter is conscious or not. This is not equivalent to saying that, in principle, the conscious state of some matter cannot be tested. We may one day make progress toward the hard problem of consciousness and be able to perform these experiments. Imagine making this argument throughout history before microscopes, telescopes, and hadron colliders. We can now sheath Newton’s Flaming Laser Sword.

I can't say the same about any introspection-based observations that can't be experimentally verified.

I believe this hedges on an epistemic question about whether we can have have knowledge of anything using our observations alone. I think even a skeptic would say that she has consciousness, as the fact that one is conscious may be the only thing that one can know with certainty about themself. You don’t need to verify any specific introspective observation. The act of introspection itself should be enough for someone to verify that they are conscious.

The human brain is a notoriously unreliable computing device which is known to produce many falsehoods about the world and (especially!) about itself.

This claim refers to the reliability of the human brain to verify the truth value of certain propositions or indentify specific and individuable experiences. Knowing whether oneself is conscious is not strictly a matter of verifying a proposition, nor identifying an individuable experience. It’s only about verifying whether one has any experience whatsoever, which should be possible. Whether I believe your claim to consciousness or not is a different problem.

What a great read! I suppose I'm not convinced that Fading Qualia is an empirical impossibility, and therefore that there exists a moment of Suddenly Disappearing Qualia when the last neuron is replaced with a silicon chip. If consciousness is quantized (just like other things in the universe), then there is nothing wrong in principle with Suddenly Disappearing Qualia when a single quantum of qualia is removed from a system with no other qualia, just like removing the last photon from a vacuum.

Joe is an interesting character which Chalmers thinks is implausible, but aside from it rubbing up against a faint intuition, I have no reason to believe that Joe is experiencing Fading Qualia. There is no indication for any reason that the workings of consciousness should obey any intuitions we may have about it.

Load More