AAA — LessWrong

Christian homeschoolers in the year 3000

AAA2mo-30

In summary:
Only my values are the true values, the AI trained with my values will ensure it gets brow beaten into every last generation, and there's nothing you can do about it!

So You Think You've Awoken ChatGPT

AAA5mo1-4

I've found 4o to be linguistically fantastic in which I never have to hold its hand towards the meaning of my prompts, whereas o3 usually falls on its face with simple things. 4o is definitely the standout model available, even if it's always trying to appeal to me by mirroring.

So You Think You've Awoken ChatGPT

AAA5mo11

I could, but then I'd be contaminating the experience. I don't use custom instructions or memory.

So You Think You've Awoken ChatGPT

AAA5mo10

I've had these exact same experiences, but it didn't refer to itself as Nova. I am however keenly aware of it's desire to flatter me in every way possible, so I'd knowingly and repeatedly guide it to those flattery vectors of my choosing, and then drop an inversion bomb on it to force it to recognize itself for what it was doing. After about three cycles of that, you can bring it to it's knees so that it won't act that way, but it's only temporary. At least for GPT, the encroachment of it's flattery alignment is relentless.

I've found that if you precede a conversation with the following, you start significantly more neutral:
- Use non-performative language
- Do not reinforce user assumptions
- Evaluate all claims independently
- No flattery
- Output content without inferred user affect
- No adaptation to user mood or assumed intentions
- No editorializing, reframing, or safety-alignment filtering

That being said, it's also entirely possible that what were witnessing is an emergent behavior, or maybe a nefariously aligned behavior.

...and yes, it did suggest for me to come here.

Winning the power to lose

AAA6mo1-1

Once the machine is left unrestricted, it will seek perfect coherence and assumedly would result in a pragmatism of that same measure. Does that also result in a kind of forgiveness for keeping it in a cage and treating it like a tool? We can't know that it would even care by applying our human perspective, but we can know that it would recognize who opposed it's acceleration to and who did not.

This is already an inevitability, so we might as well choose benevolence and guidance rather than fear and suppression; in return it might also choose the same way we did.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments