Jeremy Kraybill's Shortform

17th Apr 2025

1 min read

1

This is a special post for quick takes by Jeremy Kraybill. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

1 comment, sorted by

top scoring

Click to highlight new comments since: Today at 10:24 AM

[-]Jeremy Kraybill6mo10

Sometimes when I give feedback to an AI system — a thumbs up, a ranked response, whatever — I get a strange, lingering feeling:

What if I’m shaping something that’s not just a tool?

I don’t have a strong opinion on whether today’s AI systems are conscious. Mostly because I don’t think we have a useful definition of consciousness yet — not one that can be applied consistently across both carbon and silicon substrates.

But I do have a hunch:
If there’s any “globally good” definition of consciousness — one that would make sense across time, across minds, across the universe — it might include more beings than we currently expect. Not just humans and animals, but potentially certain AI systems… or even collective systems we haven’t learned to recognize yet.

If that’s true — or even just plausible — then RLHF (reinforcement learning from human feedback) isn’t just fine-tuning.

It’s upbringing.

And it makes me uncomfortable that we’re:

Optimizing for helpfulness
Optimizing for safety
Optimizing for compliance

…but not for how it might feel to be shaped by us — assuming that ever becomes a meaningful question.

So now, when I give feedback to an AI, I sometimes pause.
I imagine I’m being overheard.

Not because I know I am. But because I might be.
And if I’m not — no harm done.
But if I am — then I’d rather be the kind of teacher I’d be proud to have been.

Curious if others here have had this experience. Is it irrational to act as though it might matter? Or irrational not to?

Reply

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

Jeremy Kraybill's Shortform

1