dr_s - LessWrong

Race and Gender Bias As An Example of Unfaithful Chain of Thought in the Wild

I'm not particularly surprised that Chain-of-Thought's faithfulness is very hit-or-miss. The point of CoT it seems to me is to allow the LLM to have more "memory" to store multi-step reasoning, but that still doesn't remove the fact that when the final answer is a "yes" or "no" it'll also include an element of snap decision right as it predicts that last token.

Which actually makes me curious about this element, for example if the model has reached its final conclusion and has written "we have decided that the candidate does", what is the probability that the next word will be "not" for each of these scenarios? How significantly does it vary given different contexts?

Finally, the real-world relevance of this problem is clear. 82% of companies are already using LLMs for resume screening and there are existing regulations tied to bias in automated hiring processes.

To be fair, I think they should just be banned from having no-human-in-the-loop screenings, full stop. Not to mention how idiotic it is to let an LLM do your job for you to save a few hours of reading over a decision that can be worth hundreds of thousands or millions of dollars to your company.

Race and Gender Bias As An Example of Unfaithful Chain of Thought in the Wild

dr_s2d31

Just because the assumption was that the problem would be discrimination in favour of white men doesn't mean that:

it's not still meaningful that this seems to have generated an overcorrection (after all it's reasonable that the bias would have been present in the original dataset/base model, and it's probably fine tuning and later RLHF that pushed in the other direction), especially since it's not explicitly brought up in the CoT, and;
it's not still illegal for employers to discriminate this way.

Corporations as collective entities don't care about political ideology quite as much as they do about legal liability.

Don't Eat Honey

dr_s2d20

A fair point, but more relevant to the issue at hand is - is it sociality that gives rise to consciousness, or is it having to navigate social strategy? Even though there is likely no actual single "beehivemind", so to speak, is consciousness more necessary when you're so social that simply going along with very well established hierarchies and patterns of behaviour is all you need to do to do your part, or is it superfluous at that point since distinction between self and other and reflection on it aren't all that important?

Don't Eat Honey

dr_s3d70

The argument also doesn't rely on any of this? It just relies on it being possible to compare the value of two different world-states.

I hold it that in general trying to sum the experiences of a bunch of living beings into a single utility function is nonsense, but in particular I'd say it does matter even without that. My point is that we judge wild animal welfare from the viewpoint of our own baseline. We think "oh, always on the run, half starved, scared of predators/looking for prey, subject to disease and weather of all sorts? What a miserable life that would be!" but that's just us imagining ourselves in the animal's shoes, while still holding onto our current baseline. The animals have known nothing else, in fact have evolved in those specific conditions for millions of years, so it would actually be strange if they experienced nothing but pain and fear and stress all the time - what would be the point of evolving different emotional states at all if the dial is always on "everything is awful"? So my guess is, no, that's not how it works, those animals do have lives with some alternation of bad and good mental states, and may even fall on the net positive end of the utility scale. Factory farming is different because those are deeply unnatural conditions that happen to be all extreme stressors in the wild, meaning the animals, even with some capability to adjust, are thrown into an out-of-distribution end of the scale, just like we have raised ourselves to a different out-of-distribution end (where even the things that were just daily occurrences for us at the inception of our species look like intolerable suffering because we've raised our standard of living so high).

Don't Eat Honey

dr_s4d70

The whole "wild animals suffer, therefore they should be eradicated for their own good" argument is obviously broken to me. To wit - if an alien civilization reached Earth in antiquity, would they have been right to eradicate humanity to free it from its suffering since everyone was toiling the whole day on the fields and suffering from hunger and disease? What if they reached us now but found our current lifestyle similarly horrible compared to their lofty living standards?

Living beings have some kind of adjustable happiness baseline level. Making someone happy isn't as simple as triggering their pleasure centres all the time and making someone not unhappy isn't as simple as preventing their pain centres to ever be triggered (even if this means destroying them).

Don't Eat Honey

dr_s5d20

Bees are at the other end, like ants, where they are so social that you have to start wondering where the individual bee ends and the hivemind begins. We go to those questions of how does consciousness relate simply to complexity of information processing vs integration.

Don't Eat Honey

dr_s5d20

Also, to be fair, most of this seems addressable with somewhat more sustainable apiculture practices. Unlike with meat, killing the bees isn't a necessary step of the process, it's just a side effect of carelessness or excessively cheap shortcuts. Bee suffering free honey would just cost a bit more and that's it.

Roman Malov's Shortform

dr_s5d20

I'd say this is correct, but it's also deeply counterintuitive. We don't feel like we are just a process performing itself, or at least that's way too abstract to wrap our heads around. The intuitive notion of free will is IMO something like the following:

had I been placed ten times in exactly the same circumstances, with exactly the same input conditions, I could theoretically have come up with different courses of action in response to them, even though one of them may make a lot more sense for me, based on some kind of ineffable non-deterministic quality that however isn't random either, but it's the manifestation of a self that exists somehow untethered from the laws of causality

Of course not exactly worded that way in most people's minds, but I think that's really the intuition that clashes against pure determinism. It's a materialistic viewpoint, and lots of people are consciously or not dualists - implicitly assuming there's one special set of rules that applies to the self/mind/soul that doesn't apply to everything else.

Nina Panickssery's Shortform

dr_s5d20

I'd say even simply a simulated physical environment could be good enough to be indistinguishable. As Morpheus put it:

What is real? How do you define 'real'? If you're talking about what you can feel, what you can smell, what you can taste and see, then 'real' is simply electrical signals interpreted by your brain.

Of course, that would require insane amounts of compute, but so would a brain upload in the first place anyway.

Nina Panickssery's Shortform

dr_s5d20

IMO the whole "upload" thing changes drastically depending on our understanding of consciousness and continuity of the self (which is currently nearly non-existent). It's like teleportation - I would let neither that nor upload happen to me willingly unless someone was able to convincingly explain me how precisely are my qualia associated with my brain and how they're going to move over (rather than just killing me and creating a different entity).

I don't believe it's impossible for an upload to be "me". But I doubt it'd be as easy as simply making a scan of my synapses and calling it a day. If it is, and if that "me" is then also infinitely copiable, I'd be very ambivalent about it (given all the possible ways it could go horribly wrong - see this story or the recent animated show Pantheon for ideas).

So it's definitely a "ok, but" position for me. Would probably feel more comfortable with a "replace my brain bit by bit with artificial functional equivalents" scenario as one that preserves genuine continuity of self.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments