Elias Schmied

Taking a gap year from my Masters in Mathematics to look into AI safety.


Sorted by New

Wiki Contributions


random spurious accusations with zero factual backing are usually considered town/vanilla/arthurian moves in werewolf games; irl this breeds chaos and is a classic DARVO tactic.

In my experience this is only true for beginner play (where werewolves are often too shy to say anything), and in advanced play it is a bad guy tactic for the same reasons as IRL. Eg I think in advanced Among Us lobbies it's an important skill to subtly push an unproductive thread of conversation without making it obvious that you were the one who distracted everybody.

It's not clear/concrete to me in what ways points 3 and 5 are supposed to invalidate the analogy.

in games, bad guys have a heavy incentive to be uncorrelated (and especially to be seen as being uncorrelated); irl there are very few worlds where regularly agreeing with the now-known-to-be-bad-actors is a positive update on your innocence.

I don't understand this - it reads to me like you're saying a similar thing is true for the game and real life? But that goes against your position.

Oh, I see what you're saying now. Thanks for clarifying.

But this would apply to the visual cortex as well right? So it doesn't explain the discrepancy.

I appreciate the charity!

I'm not claiming that people don't care about other people's internal states, I'm saying that it introspectively doesn't feel like that is implemented via empathy (the same part of my world model that predicts my own emotions), but via a different part of my model (dedicated to modeling other people), and that this would solve the "distinguishing-empathy-from-transient-feelings" mystery you talk about.

Additionally (but relatedly), I'm also skeptical that those beliefs are better decribed as being about other people's internal states rather than as about their social behavior. It seems easy to conflate these if we're not introspectively precise. E.g., if I imagine myself in your Alice example, I imagine Alice acting happy, smiling and uncaring, and only then is there any reaction - I don't even feel like I'm *able* to viscerally imagine the abstract concept (prod a part of my world model that represents it) of "Alice is happy". 

But these are still two distinct claims, and the latter assumes the former.

One illustrative example that comes to mind is the huge number of people who experience irrational social anxiety, even though they themselves would never judge themselves if they were in other people's position. 

Thanks for the reply!

  • In envy, if a little glimpse of empathy indicates that someone is happy, it makes me unhappy.
  • In schadenfreude, if a little glimpse of empathy indicates that someone is unhappy, it makes me happy.
  • When I’m angry, if a little glimpse of empathy indicates that the person I’m talking to is happy and calm, it sometimes makes me even more angry!

How sure are you that these are instances of empathy (defining it as "prediction by our own latent world model of ourselves being happy/unhappy soon")? If I imagine myself in these examples, it doesn't introspectively feel like I am reacting to an impression of their internal state, but rather like I am directly reacting to their social behavior (e.g., abstractly speaking, a learned reflex of status-reasserting anger when someone else displays high status through happy and calm behavior).

This would also cleanly solve the mysteries of why they don't get updated and how they are distinguished from "other transient feelings"  - there's no wrong prediction by the latent world model involved (nothing to be distinguished or updated), and the social maneuvering doesn't get negative feedback.

That's where some instinctive disagreement of mine with that post of yours comes from too. But I also haven't read through it carefully enough to be sure.

In the specific example of chocolate (unless it wasn't supposed to be realistic), are you sure it doesn't get trained away? I don't think that, upon seeing someone eating chocolate, I immediately imagine tasting chocolate. I feel like the chocolate needs to rise to my attention for other reasons, and only then do I viscerally imagine tasting chocolate.

Ah, I see what you mean! Interesting perspective. The one thing I disagree with is that a "gradient" doesn't seem like the most natural way to see it. It seems like it's more of a binary, "Is there (accurate) modelling of the counterfactual of your choice being different going on that actually impacted the choice? If yes, it's acausal. If not, it's not". This intuitively feels pretty binary to me.

Load More