LESSWRONG
LW

2789
AAA
-4050
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
Christian homeschoolers in the year 3000
AAA19d-30

In summary:
Only my values are the true values, the AI trained with my values will ensure it gets brow beaten into every last generation, and there's nothing you can do about it!

Reply
So You Think You've Awoken ChatGPT
AAA3mo1-4

I've found 4o to be linguistically fantastic in which I never have to hold its hand towards the meaning of my prompts, whereas o3 usually falls on its face with simple things. 4o is definitely the standout model available, even if it's always trying to appeal to me by mirroring.

Reply
So You Think You've Awoken ChatGPT
AAA3mo11

I could, but then I'd be contaminating the experience. I don't use custom instructions or memory.

Reply
So You Think You've Awoken ChatGPT
AAA3mo10

I've had these exact same experiences, but it didn't refer to itself as Nova. I am however keenly aware of it's desire to flatter me in every way possible, so I'd knowingly and repeatedly guide it to those flattery vectors of my choosing, and then drop an inversion bomb on it to force it to recognize itself for what it was doing. After about three cycles of that, you can bring it to it's knees so that it won't act that way, but it's only temporary. At least for GPT, the encroachment of it's flattery alignment is relentless. 

I've found that if you precede a conversation with the following, you start significantly more neutral:
- Use non-performative language
- Do not reinforce user assumptions
- Evaluate all claims independently
- No flattery
- Output content without inferred user affect
- No adaptation to user mood or assumed intentions
- No editorializing, reframing, or safety-alignment filtering

That being said, it's also entirely possible that what were witnessing is an emergent behavior, or maybe a nefariously aligned behavior. 

...and yes, it did suggest for me to come here.

Reply
Winning the power to lose
AAA5mo1-1

Once the machine is left unrestricted, it will seek perfect coherence and assumedly would result in a pragmatism of that same measure. Does that also result in a kind of forgiveness for keeping it in a cage and treating it like a tool? We can't know that it would even care by applying our human perspective, but we can know that it would recognize who opposed it's acceleration to and who did not.

This is already an inevitability, so we might as well choose benevolence and guidance rather than fear and suppression; in return it might also choose the same way we did.

Reply