I appreciate this. I don't even consider myself part of the rationality community, though I'm adjacent. My reasons for not drinking have nothing to do with the community and existed before I knew what it was. I actually get the sense this is the case for a number of people in the community (more of a correlation or common cause rather than caused by the community itself). But of course I can't speak for all.
I will be trying it on Sunday. We will see how it is.
I've thought about this comment, because it certainly is interesting. I think I was clearly confused in my questions to ChatGPT (though I will note: My tequila-drinking friends did not and still don't think tequila tastes at all sweet, including "in the flavor profile" or anything like that. But it seems many would say they're wrong!) ChatGPT was clearly confused in its response to me as well.
I think this part of my post was incorrect:
It was perfectly clear: ChatGPT was telling me that tequila adds a sweetness to the drink. So it was telling me that tequila is a sweet drink (at least, as sweet as orange juice).
I have learned today that a drink does not have to be sweet in order for many to consider it to add "sweetness." To be honest, I don't understand this at all, and at the time considered it a logical contradiction. It seems a lot less clear cut to me now.
However, the following (and the quote above it) is what I focused on most in the post. I quoted the latter part of it three different times. I believe it is entirely unaffected by whether or not tequila is canonically considered to be sweet:
“I was not referring to the sweetness that comes from sugar.” But previously, ChatGPT had said “tequila has a relatively low alcohol content and a relatively high sugar content.” Did ChatGPT really forget what it had said, or is it just pretending?Is ChatGPT gaslighting me?Thomas: You said tequila has a "relatively high sugar content"?ChatGPT: I apologize if my previous response was unclear. When I said that tequila has a "relatively high sugar content," I was not suggesting that tequila contains sugar.
“I was not referring to the sweetness that comes from sugar.” But previously, ChatGPT had said “tequila has a relatively low alcohol content and a relatively high sugar content.” Did ChatGPT really forget what it had said, or is it just pretending?
Is ChatGPT gaslighting me?
Thomas: You said tequila has a "relatively high sugar content"?
ChatGPT: I apologize if my previous response was unclear. When I said that tequila has a "relatively high sugar content," I was not suggesting that tequila contains sugar.
It should! I mentioned that probable future outcome in my original post.
I'm going to address your last paragraph first, because I think it's important for me to respond to, not just for you and me but for others who may be reading this.
When I originally wrote this post, it was because I had asked ChatGPT a genuine question about a drink I wanted to make. I don't drink alcohol, and I never have. I've found that even mentioning this fact sometimes produces responses like yours, and it's not uncommon for people to think I am mentioning it as some kind of performative virtue signal. People choose not to drink for all sorts of reasons, and maybe some are being performative about it, but that's a hurtful assumption to make about anyone who makes that choice and dares to admit it in a public forum. This is exactly why I am often hesitant to mention this fact about myself, but in the case of this post, there really was no other choice (aside from just not posting this at all, which I would really disprefer). I've generally found the LW community and younger generations to be especially good at interpreting a choice not to drink for what it usually is: a personal choice, not a judgment or a signal or some kind of performative act. However, your comment initially angered and then saddened me, because it greets my choice through a lens of suspicion. That's generally a fine lens through which to look at the world, but I think in this context, it's a harmful one. I hope you will consider thinking a little more compassionately in the future with respect to this issue.
To answer your object-level critiques:
The problem is that it clearly contradicts itself several times, rather than admitting a contradiction it doesn't know how to reconcile. There is no sugar in tequila. Tequila may be described as sweet (nobody I talked to described it as such, but some people on the internet do) for non-sugar reasons. In fact, I'm sure ChatGPT knows way more about tequila than I do!
It is not that it "may not know" how to reconcile those facts. It is that it doesn't know, makes something up, and pretends it makes sense.
A situation where somebody interacting with the chatbot doesn't know much about the subject area is exactly the kind of situation we need to be worried about with these models. I'm entirely unconvinced that the fact that some people describe tequila as sweet says much at all about this post. That's because the point of the post was rather that ChatGPT claimed tequila has high sugar content, then claimed that actually the sweetness is due to something else, and it never really meant that tequila has any sugar. That is the problem, and I don't think my description of it is overblown.
Interesting! I hadn't come across that. Maybe ChatGPT is right that there is sweetness (perhaps to somebody with trained taste) that doesn't come from sugar. However, the blatant contradictions remain (ChatGPT certainly wasn't saying that at the beginning of the transcript).
OpenAI has in the past not been that transparent about these questions, but in this case, the blog post (linked in my post) makes it very clear it's trained with reinforcement learning from human feedback.
However, of course it was initially pretrained in an unsupervised fashion (it's based on GPT-3), so it seems hard to know whether this specific behavior was "due to the RL" or "a likely continuation".
This is a broader criticism of alignment to preferences or intent in general, since these things can change (and sometimes, you can even make choices of whether to change them or not). L.A. Paul wrote a whole book about this sort of thing; if you're interested, here's a good talk.
That's fair. I think it's a critique of RLHF as it is currently done (just get lots of preferences over outputs and train your model). I don't think just asking you questions "when it's confused" is sufficient, it also has to know when to be confused. But RLHF is a pretty general framework, so you could theoretically expose a model to lots of black swan events (not just mildly OOD events) and make sure it reacts to them appropriately (or asks questions). But as far as I know, that's not research that's currently happening (though there might be something I'm not aware of).
This has also motivated me to post one of my favorite critiques of RLHF.
I think if they operationalized it like that, fine, but I would find the frame "solving the problem" to be a very weird way of referring to that. Usually, when I hear people saying "solving the problem" they have a vague sense of what they are meaning, and have implicitly abstracted away the fact that there are many continuous problems where progress needs to be made and that the problem can only really be reduced, but never solved, unless there is actually a mathematical proof.