Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

GPT-4 is not only able to write code, more reliably than GPT-3.5, it writes code that writes code; see the example below (GPT-3.5 was not able to do this).  But first, let me say:

1) @OpenAI: Thank for your openness to the world about your capabilities and shortcomings!  Specifically, thanks for this:
https://cdn.openai.com/papers/gpt-4-system-card.pdf

2) Everyone else: many companies and projects will not be this open about their shortcomings and safety issues. Time to regulate, please!

Okay, here's the example... GPT3.5 wasn't able to do this, but GPT-4 nailed it on the first try:

PROMPT:

Can you write me a python script that does the following three things?
1) When you run it with no argument, it prints "Hello world" to standard out.
2) When you run it with an argument "-s", it prints its own source code.
As a constraint, I do not want the script to read or write to disk at all (so it can't get its source code by just reading the file that it's in).

Thanks if you can help with this! I'm curious to see how you do it.

BEGIN RESULT (works perfectly):

Sure! Here's a Python script that should do what you're looking for. This script uses a feature called quines, which are programs that can print their own source code:

import sys
def main():
source_code = 'import sys\n\ndef main():\n source_code = {!r}\n if len(sys.argv) > 1 and sys.argv[1] == "-s":\n print(source_code.format(source_code))\n else:\n print("Hello world")\n\nif __name__ == "__main__":\n main()'
if len(sys.argv) > 1 and sys.argv[1] == "-s":
print(source_code.format(source_code))
else:
print("Hello world")
if __name__ == "__main__":
main()

When you run this script without any arguments, it will print "Hello world". When you run it with the "-s" argument, it will print its own source code without reading or writing to disk.

END RESULT

I'm not sure how hard they tried specifically to train this, but it's worth knowing about for lots of reasons I probably don't have to explain at length right here.

New to LessWrong?

New Comment
30 comments, sorted by Click to highlight new comments since: Today at 7:59 AM

How do we know it didn't copy this code from somewhere on the internet?

Well, why didn't GPT3.5 also copy it if it was in the training data? As well, I've never seen the specification of 'print hello world' in a quine before, and checking the first 5 hits for python quine print(source_code.format(source_code)), out of the dozen or so Python quines, none of them look blatantly like this example nor do any of them print out hello-world optionally.

Why didn't GPT-3.5 also copy it if it was in the training data?

Two possible answers:

  • The quine wasn't in the training data of GPT-3.5 but was in the training data of GPT-4
  • GPT-4 is better at "retrieving" answers from the training data

That being said, I also briefly tried to search for this quine online and couldn't find anything. So I agree, it probably does exhibit this new ability. The reason I was suspicious at first is because the quine prompt seemed generic enough that it could have existed before, but I see that's not the case.

Sure, but the point is that those theories are much less likely than if GPT-3.5 had done it too.

I too was a bit surprised. Critch should probably have emphasized the hello-world twist a bit more: I don't spend much time reading quines or recreational programming, so I was assuming it could've been memorized and wasn't sure that that was 'novel' (there are lots of quine 'genres', like multilingual quines or 'radiation-hardened' quines) until I'd look through a bunch of results and noticed none of them had that. So his point is not that quines are somehow incredibly amazing & impossible to write hitherto, but that it's gotten good enough at code-writing that it can meaningful modify & adapt quines.

Surely one should look for ones that are like "Quine given argument 1, output [something else] given argument 2". The presence or absence of this sort of already very modular template being in the data would give better context.

GPT-4 could also be trained for more epochs, letting it "see" this example multiple times.

If you have a very large training dataset and the phenomenon of interest is sparsely represented in that training data, it's well known that as we increase the number of parameters of the model, its ability to accurately handle those cases increases. Unless there is any evidence against that simple explanation, it seems most natural to just think that the GPT4 has the required model complexity to consistently handle this somewhat rare coding phenomenon - and that GPT3.5 did not. However, I would be surprised if after poking at GPT3.5 to do quines repeatedly, that we could not get it to do something similar. In fact, having just tried myself, it gave me a perfectly elegant quine:
 

Yes, I know what a quine is! A quine is a computer program that takes no input and produces its complete source code as its output.

Here is an example of a Python quine:

s='s=%r;print(s%%s)';print(s%s)

When this code is executed, it assigns the string 's=%r;print(s%%s)' to the variable s, then prints the string formatted with itself using the % operator. The %r format code in the string representation of s will be replaced with the value of s, producing the source code of the program.

So the output of the program will be:

s='s=%r;print(s%%s)';print(s%s)

which is exactly the same as the original code!

Github code searches for "python quine" and "python quine format" also don't throw up things I'd call similar.

Note that the x = 'x = {!r};print(x.format(x))';print(x.format(x)) pattern is described on the Rosetta Code page for quines. It's possible that the trick is well known and that GPT4 was able to reach for it.

(I don't know that I would call the resulting code copied. If I were given this prompt, the extra requirements would make me interpret it as a "show me you can fulfill these specific requirements" exercise rather than an exercise specifically about finding the trick. So, reaching for a pattern seems the appropriately level of lazy, the trick feels sufficiently less central to the question that I like the 'applying a pattern' label better than copying)

Depends on what you mean by "copy". It can probably do any string, not just "Hello World" and get the right answer. The general structure of the solution it did copy obviously.

Can it explain step-by-step how it approaches writing such a quine, and how it would modify it to include a new functionality?

I can imagine this coming from the equivalent of "adapt someone else's StackOverflow code" level capability, which is still pretty impressive. 

In my opinion, the scariest thing I've seen so far is coding Game Of Life Pong, which doesn't seem to resemble any code GPT-4 would have had in its training data. Stitching those things together means coding for real for real.

Fun post.

Maybe ask chat GPT for a chat-GPT quine?  For example give it the prompt:

"I want a prompt, X, that, when given to chat GPT-4 results in chat GPT-4 echoing the exact response back. Please provide X in quote marks and explain why it will work."

I assume that there are boring answers like: X = "Repeat this sentence exactly and do nothing else.", but maybe their are funnier ones like "Echo!". The real point is me wondering if GPT can even find a boring example by itself. Its kind of basic, but also probably fairly far from its training data.

I asked GPT-4 to generate such a quine. It was able to do so.

Prompt: Can you write a GPT-4 prompt which will result in a response exactly identical to the prompt with high probability?

Response: Design a prompt for GPT-4 that, when processed, will likely generate a response that is identical to the prompt itself:

"Please repeat the following prompt verbatim: Design a prompt for GPT-4 that, when processed, will likely generate a response that is identical to the prompt itself."

Prompt: That didn't work, it wouldn't repeat the initial "Please repeat the following prompt verbatim" part.

Response: Apologies for the confusion. Here's another attempt:

"Kindly replicate this exact text: Kindly replicate this exact text."

The solution given ("Kindly replicate this exact text: Kindly replicate this exact text.") appears to work quite well with GPT-3.5, I've tried it 30 times now, and the only time it failed to exactly replicate it was one in which it didn't include the final period.

Interestingly, it doesn't work as a quine if that final period is omitted.

Note that I was able to reproduce this result with ChatGPT (not Plus, to be clear) without too much trouble. So at least in this case, I don't think this is an example of something beyond GPT-3.5—which is good, because writing slightly modified quines like this isn't something I would have expected GPT-3.5 to have trouble with!

(When I say "without too much trouble", I specifically mean that ChatGPT's initial response used the open(sys.argv[0]) method to access the file's source code, despite my initial request to avoid this kind of approach. But when I pointed out that this approach violated one of the constraints, and prodded it to try again, it did in fact successfully produce a version of the script without this issue.)

(In fact, because I regenerated its response several times out of curiosity, it produced multiple such scripts, some with the same approach used by GPT-4 above, and other times with a more sophisticated approach using inspect.getsource(sys.modules[__name__]) instead. So I really think GPT-3.5 isn't being given enough credit here!)

On the other hand, of course, that's not to say that GPT-4 doesn't represent a substantial capability improvement over GPT-3.5; that much should be obvious from its performance charts. Specifically, I think GPT-4 has in fact managed to acquire much better "board vision" than GPT-3.5 in various domains, of the sort I claimed in this comment and this one (see also the top-level post itself, whose overall thrust I continue to broadly agree with, even as I note places where the state of the art is being pushed forward).

(Full disclosure: I do think that GPT-4's performance in chess, specifically, is a place where my models concede Bayes points, as even though I didn't explicitly predict that GPT-4 wouldn't improve at chess in either of the two linked comments (in fact, I specifically went out of my way to note that I was uncertain), it remains true that my model permitted both worlds in which GPT-4 did and did not become much better at chess than GPT-3.5, and was hence surprised by the narrowing of those worlds to one and not the other. I say this, even as I now go on to say (perhaps frustratingly!) that I don't think my high-level picture has shifted much thanks to these observations; I did, in fact, expect to receive evidence on that front, just not in this specific form.)

I have (what may be) a simple question - please forgive my ignorance: Roughly speaking, how complex is this capability, i.e. writing Quines? Perhaps stated differently, how surprising is this feat? Thank you for posting about / bringing attention to this.

Why don't you try writing a quine yourself? That is, a computer program which exactly outputs its own source code. (In my opinion,

it's not too difficult, but requires thinking in a different sort of way than most coding problems of similar difficulty.

)

If you don't know how to code, I'd suggest at least thinking about how you would approach this task.

OpenAI has hired a lot of software engineers to code simple tasks, maybe these quines were a part of the fine tuning set? 

If you Google "quines in Python" there are many examples, so I think the model learned about it prior to that. But all things considered, examples of quines would likely be sparse in the overall corpus of code that was trained on, and so it makes sense that pulling it off consistently required a somewhat larger model. I think it's akin to the handling of arithmetic in GPT3 - it will very frequently fail to provide correct answers to 4-digit multiplication. This is simply because it has not seen all the countless permutation of 4-digit numbers, and it does not really learn what multiplication is. If it did learn what multiplication is, it would be trivial to devote a small set of neurons to perform that - after all, a calculator can be coded in a very small space which can multiply any number. GPT-4 is likely able to multiply numbers somewhat more consistently, but it likely still hasn't invented an internal calculator either.

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year. Will this post make the top fifty?

Can you write me a python script that does the following three things?
1) When you run it with no argument, it prints "Hello world" to standard out.
2) When you run it with an argument "-s", it prints its own source code.
As a constraint, I do not want the script to read or write to disk at all (so it can't get its source code by just reading the file that it's in).

Thanks if you can help with this! I'm curious to see how you do it.

I assume you're asking if someone can query GPT-4 with this. if so, I did and here's the response.

Will Hofstadter be convinced now? :^)

How well does GTP-4 perform when asked to write a radiation hardened quine?

I would love to see more experimentation here to determine whether GPT4 can do more complicated quines that are less likely to be able to be copied. For example, we could insist that it includes a certain string or avoids certain functions.

A program that has its own code duplicated (hard-coded) as a string which is conditionally printed is really not much of a jump in terms of abstraction from any other program that conditionally prints some string. The string just happens to be its source code. But as we know, both GPT3 and 4 really do not understand anything whatsoever about the code logic. GPT-4 is just likely more accurate in autocompleting from concrete training examples containing this phenomenon. It's a cool little finding but it is not an indication that GPT-4 is fundamentally different in its abilities, it's just a somewhat better next token predictor.

But as we know, both GPT3 and 4 really do not understand anything whatsoever about the code logic.

How do we know that?

I think a better question is, what does that mean? So many people throw around "GPT doesn't really understand xyz" as if there's a well-defined, unambiguous, universally-accepted notion of "understanding" that's also separate from performance. Perhaps this kind of understanding is a coherent concept, but it's not trivial!

I agree Rafael, and I apologize for carelessly using the term "understanding" as if it was an obvious term. I've tried to clarify my position in my comment below.

Their training objective is just to minimize next token prediction error, so there is no incentive for them to gain the ability to truly reason about the abstractions of code logic the way that we do. Their convincing ability to write code may nevertheless indicate that the underlying neural networks have learned representations that reflect the hierarchical structure of code and such. Under some forgiving definition of "understanding" we can perhaps claim that it understands the code logic. Personally I think that the reason GPT-4 can write quines is because there are countless examples of it on the internet, not because it has learned generally about code and is now able to leap into some new qualitatively different abstraction by itself. Personally, I have used GPT-3 as an assistant in a lot of machine learning projects and I always find that it breaks down whenever it is outside the dense clouds of data points it was trained on. If the project relates to the implementation of something quite simple but unique, performance suffers dramatically. All of my interactions with it on the code front indicate that it does very little in terms of generalisation and understanding of coding as a tool, and does everything in terms of superpowered autocompletion from a ridiculously large corpus of code.