Here's the link: https://www.theguardian.com/commentisfree/2023/apr/06/ai-chatgpt-guardian-technology-risks-fake-article

The issue seems to be that ChatGTP made up a news story and attributed it to a journalist who worked for The Guardian. I'm not quite sure how the researcher who was using ChatGTP was posing questions but suspect that might have something to do with the outcome.

It seems an odd result and for those in the industry that might have intuitions or even some direct knowledge of the case it would be interesting to hear thoughts about the situation.

That said, what I'm wondering though, even if this type of result -- inserting fake data into real data -- cannot be prevented (nonalignment issue) would the use of some type of cryptographic signature or blockchains for the publisher (or even private posters on the internet) be a solution? 

Leading me to a follow on question. In the alignment world has any, some, boat loads and we've got that covered, work be done on identifying which alignment issues are mitigated by some reaction function to that misalignment and which cannot be mitigated even if the misalignment persists?

New Answer
New Comment

2 Answers sorted by

quanticle

Apr 08, 2023

31

It's not that odd. Ars Technica has a good article on why generative AIs have such a strong tendency to confabulate. The short answer is that, given a prompt (consisting of tokens, which are similar to, but not quite the same as words), GPT will come up with new tokens that are more or less likely to come after the given tokens in the prompt. This is subject to a temperature parameter, which dictates how "creative" GPT is allowed to be (i.e. allowing GPT to pick less probable next-tokens with some probability). The output token is added to the prompt, and the whole thing is then fed back into GPT in order to generate another new token.

In other words, GPT is incapable of "going backwards", as a human might and editing its previous output in to correct inaccuracies or inconsistencies. Instead, what it has to do is take the previous output as a given, and try to come up with new tokens that are likely to be generated given the already generated incorrect tokens. This is how GPT ends up with confabulated citations. Given the prompt, GPT generates some tokens, representing an author, for example. It then tries to generate the most likely words associated with that author, and the rest of the prompt, which is presumably asking for citations. As it generates a title, it chooses a word that doesn't exist in any existing article titles written by that author. But it doesn't "know" that, and it has no way of going back and editing prior output in order to correct itself. Instead GPT presses on, generating more tokens that are deemed to be likely given the mixture of correct and incorrect tokens that it has generated.

Scott Alexander has a great post, about human psychology, which touches on a similar theme, called The Apologist and the Revolutionary. Using the terms of that post, a GPT is 100% apologist, 0% revolutionary. No matter how nonsensical its previous output, GPT, by its very design, must take that previous output as axiomatic, and generate new output based upon that. That is what leads to uncanny results when GPT is asked for specific facts.

[-]jmh1y20

Thanks! This was a very helpful comment for me. 

ChristianKl

Apr 07, 2023

20

You can automate fact-checking. When it comes to Guardian articles, one straightforward way is to check whether the article exist on the website of the Guardian.

Beyond that, there's a question of whether the Guardian has an interest in providing ways to proof that certain articles where written that the Guardian took down. 

1 comment, sorted by Click to highlight new comments since: Today at 3:22 PM

(by the way, it's Generative Pretrained Transformer, not Generative Treprained Pransformer)