OK I was directed here by https://aisafety.quest/ and I fall into this camp: "Your instance of ChatGPT helped you clarify some ideas on a thorny problem (perhaps ... AI alignment) "
I like this suggestion and I'll try to do this: "Write your idea yourself, totally unassisted. Resist the urge to lean on LLM feedback during the process, and share your idea with other humans instead. It can help to try to produce the simplest version possible first; fit it in a few sentences, and see if it bounces off people. But you're going to need to make the prose your own, first."
I care about this idea and although I'm not an academic I'm willing to invest the time to get this right. I got a lot of benefit from this blog post.
I will point out one thing. If someone has been using ChatGPT, and Claude, and Grok, and Gemini since ChatGPT 3.5 first came out then they will be more inclined to think that they've mastered the sycophancy inherent in model output. They will tend to think that by simply taking outputs from one model and feeding them as intputs into another model with anti-sycophantic prompts this will "solve" the problem of sycophancy. I even have anti-sycophantic custom instructions for Chat-GPT which I will use for perplexity, Claude and Gemini on occasion: https://docs.google.com/document/d/1GlNtHJf20Zw3XpfYRtwStgpIKbV0JU6DO2BZal_cE4U/edit?usp=sharing
I agree with what you've said and I'm serious about contributing to the problem of AI alignment so now I need to roll up my sleeves and do the very hard work of actually reworking my idea in my own words. It's tough, I've had a lot of chronic health conditions but I'm not giving up. It's too easy to lean on these models as a cruch.
Once I've taken the time to rewrite the outputs from the model in my own words and I'm really serious about working on solutions to corrigibility I will return and hopefully figure out where I can appropriately contribute.
I admit that I'm naieve enough to think that one can develop an "immunity to sycophancy" by just assuming the models will always be sycophantic by default. People like me think, "yeah that happened to all those users simply because they haven't spent hundreds of hours using these models and they don't understand how they work." But somehow I don't think this attitude is going to have any authority here, which is good. I accept it and I'll contribute as best I can and I'll return as soon as I have something appropriate to submit.
I did wish I had a bit of help from a human but that's another issue.
OK I was directed here by https://aisafety.quest/ and I fall into this camp:
"Your instance of ChatGPT helped you clarify some ideas on a thorny problem (perhaps ... AI alignment) "
I like this suggestion and I'll try to do this:
"Write your idea yourself, totally unassisted. Resist the urge to lean on LLM feedback during the process, and share your idea with other humans instead. It can help to try to produce the simplest version possible first; fit it in a few sentences, and see if it bounces off people. But you're going to need to make the prose your own, first."
I care about this idea and although I'm not an academic I'm willing to invest the time to get this right. I got a lot of benefit from this blog post.
I will point out one thing. If someone has been using ChatGPT, and Claude, and Grok, and Gemini since ChatGPT 3.5 first came out then they will be more inclined to think that they've mastered the sycophancy inherent in model output. They will tend to think that by simply taking outputs from one model and feeding them as intputs into another model with anti-sycophantic prompts this will "solve" the problem of sycophancy. I even have anti-sycophantic custom instructions for Chat-GPT which I will use for perplexity, Claude and Gemini on occasion:
https://docs.google.com/document/d/1GlNtHJf20Zw3XpfYRtwStgpIKbV0JU6DO2BZal_cE4U/edit?usp=sharing
I agree with what you've said and I'm serious about contributing to the problem of AI alignment so now I need to roll up my sleeves and do the very hard work of actually reworking my idea in my own words. It's tough, I've had a lot of chronic health conditions but I'm not giving up. It's too easy to lean on these models as a cruch.
Once I've taken the time to rewrite the outputs from the model in my own words and I'm really serious about working on solutions to corrigibility I will return and hopefully figure out where I can appropriately contribute.
I admit that I'm naieve enough to think that one can develop an "immunity to sycophancy" by just assuming the models will always be sycophantic by default. People like me think, "yeah that happened to all those users simply because they haven't spent hundreds of hours using these models and they don't understand how they work." But somehow I don't think this attitude is going to have any authority here, which is good. I accept it and I'll contribute as best I can and I'll return as soon as I have something appropriate to submit.
I did wish I had a bit of help from a human but that's another issue.