Cross posted from New Savanna.

I've just posted a new working paper. Title above, links, abstract, TOC, and two opening sections below:

Abstract: I gave ChatGPT three lists of topics for which it had to propose categories into which the topics could be sorted. Two lists were relatively short, 56 and 53 topics; I asked ChatGPT to propose six organizing categories for each. One list was much longer, 655 topics; I asked ChatGPT to propose 12 for categories for it. In all cases the proposed categories were reasonable. ChatGPT explained each proposed category either with a pair of sentences (the short lists), or with characterizing phrases (the long list). These characterizations were reasonable. In a further task, when asked to place topics under the proposed categories, ChatGPT placed many topics under the first two categories and very few under the last two. Though quite different in detail, this task has a rough formal similarity to generating a coherent story that is organized on three levels: 1) the whole story, 2) story segments, 3) sentences in story segments.


Organizing lists of categories into a coherent structure 1
The categories ChatGPT proposed 3
Formal similarity with story generation 5
What happens when ChatGPT lists tags under each category? 8
How would I have approached these tasks myself? 10
Propose categories to sort a short “top-level” list of 56 topics 10
Propose categories to sort a short arbitrary sub-list of 53 topics 14
Propose categories to sort the full list of 665 topics 16
Sort a short list and place the topics under the appropriate category 21    

Organizing lists of categories into a coherent structure

Some months ago I decided to see how ChatGPT would react to some associative clusters I had made. Associative cluster? Simple, a list of words I created by free association on some particular topic, for example:

atoms, periodic table, bonds, compounds, elements, molecules, reaction, oxidation, acids and bases, reagents, alchemy, changing liquids from one color to another, distilling, condenser, precipitate

I would present such a cluster to Chatster and see how it would respond. In that case, it informed me, “These are all concepts in the field of chemistry,” which is true. It then went on to tell me something about those fields.

This time I decided to tease it with a different list, the category tags for my New Savanna blog, which currently contains 655 items. In prompting ChatGPT with this list didn’t have anything in particular in mind; I just wanted to see what happened. It seemed, however, a bit extreme to dump the whole 655 item list on ChatGPT. So I started with a sub-set, two different subsets in fact. THEN, I gave it the whole list. What happened turned out to be interesting, as is sometimes the case.

In the case of the sub-sets, after a bit of interaction, I asked ChatGPT to organize the whole list in into a half-dozen categories, which it did. I asked it to organize the whole list, all 655 categories, into a dozen categories. No problem. Why is this interesting?

Sorting lists

First of all, note that we’re talking about organizing lists. This is a classic and fundamental problem in computing. Put things in alphabetical order, numerical order, in order by size, by zip code, by weight, date of birth, and so forth. Programs do this all the time. Given a criterion by which to establish a list, we know how to do this computationally.

What makes this particular problem interesting is the organizational criterion: inherent conceptual structure. How do you state conceptual structure in computational terms? It is no exaggeration to say that students of database design, artificial intelligence, and computational linguistics have devoted a great deal of effort to that problem.

We can thus see that sorting lists has two aspects: 1) specifying the sort criterion, and 2) applying it to the list. These are different kinds of problem. The first is about what things are and the second is about moving them around. There is an analog to this in linguistics. The first is about paradigmatic structure, and the second is about syntagmatic structure. To borrow terms from the great Russian linguist, Roman Jakobson, the first involves the axis of selection and the second is about the axis of combination.

New Comment
6 comments, sorted by Click to highlight new comments since:

An interesting idea (to use an AI to sort a large number of existing concepts into categories), but come on, this should be like one tweet, not a PDF that requires a registration to see.

I don't know quite how to respond to that. Without having read the piece that took me, I don't know, say 30-40 hours to write spread over two or three weeks (including the hour or so I spent with ChatGPT), you're telling me that it couldn't possibly be worth more than a tweet. How do you know that? Have you thought about what the task involves? If you had a list of 50 topics to organize, how would you do it manually? What about 655 topics? How would you do that manually? 

How would you do it using a computer? Sure, given well defined items and clear sort criteria, computers do that kind of thing all the time, and over humongous collections of items. It's a staple process of programming. But these items are not at all well-defined and the sort criteria, well, ChatGPT has to figure that out for itself.

Your's is not a serious comment.

If I was going to sort a list of 655 topics into a linear order and I didn't have a well-defined hierarchy or pre-existing list to work from, I might use one of two approaches:

  1. for manual sorting along a single axis, I can probably not give any sort of cardinal value but I can do comparisons of the form 'A is more/less X than is B'. Then I can use my resorter utility to lighten the burden of an obvious approach like trying to herd them all in a spreadsheet or text buffer.

  2. if I prefer to automate it, I can embed them (presumably they have text descriptions or titles, or even abstracts if they are things like papers or URLs) with a neural net and then I can 'sort them' by simply picking one to start with, and then finding the 'nearest' by embedding, and repeating until they are all in a giant list. I call this 'sort by magic' or 'sorting by semantic similarity'.

    (Note that this embedding approach, while a lot more work up front than simply tossing a list into a ChatGPT text book, has many advantages beyond just producing the list: it avoids any issues with GPT-4 hallucinating, forgetting, being very expensive to call on long lists, lists not fitting in context, the API being down, etc.)

    This produces some interesting effects: because such lists have contents that naturally cluster, reading through the sorted list will often reveal 'obvious' clusters as the list transitions from cluster to cluster. There is no a priori way to decide how many clusters there 'actually' are, but I found that roughly, k = sqrt(n) worked well to pick out a reasonably evenly populated & meaningful set of clusters.

    Once you have defined k and they are picked out like that, it's easy to grab a cluster and make it a sublist, for example, and to give it a name. (In fact, I even have a feature where I feed a cluster into GPT-4 as a list, and ask it for a descriptive name.) Or you can start editing it to fix up problems, or you can specify where to start to get a better list, or you can feed it into #1 as a starting point.

    So for example, my psychology/smell tag broke down nicely into a 'perfume', a 'human', and an 'animal' tag. I created perfume & human as more precise sub-tags, and left the animal links alone.

I don't know what these mean: "sort a list of 655 topics into a linear order," "sorting along a single axis." The lists I'm talking about are already in alphabetical order.  The idea is to come up with a set of categories which you can use to organize the list in thematically coherent sub lists. It's like you have a library of 1000 books. How are you going to put them on shelves? You could group them alphabetically by title or author's (last) name. Or you could group them by subject matter. In doing this you know what the subjects are have a sense of what things you'd like to see in the same shelves. This is what you call 'sorting by semantic similarity.'

The abstract of the paper explains what I was up to. But I wasn't using books; I was using unadorned lists of categories. When I started I didn't know what ChatGPT would do when given a list for which it had to come up with organizing categories. I know how I used those labels, but it knows nothing of that. So I gave it a try and found out what it could do. Things got interesting when I asked it to go beyond coming up with organizing categories and to actually sort list items into those categories.

I've also played around with having ChatGPT respond to clusters of words.

You could do it by embedding the text of each post, and then averaging all the embeddings of each tag's posts into a single 'tag embedding', which summarizes the gestalt of all the posts with a given tag. Then you could do my sort trick, or just use a standard clustering algorithm to cluster the tags into 12 clusters, and ask GPT to label each cluster using the list of titles, say.

This would address your points about GPT being unable to 'plan' or being misled by idiosyncratic uses of words like 'jasmine'. It would also produce a much more even distribution over the 12 clusters, unless there truly was an extremely skewed distribution (as well as the other advantages I mentioned like not forgetting or skipping any entries or confusing item counts or whatever).

Interesting, yes. Sure. But keep in mind that what I was up to in that paper is much simpler. I wasn't really interested in organizing my tag list. That's just a long list that I had available to me. I just wanted to see how ChatGPT would deal with the task of coming up with organizing categories. Could it do it at all? If so, would its suggestions be reasonable ones? Further, since I didn't know what it would do, I decided to start first with a shorter list. It was only when I'd determined that it could do the task in a reasonable way with the shorter lists that I threw the longer list at it.

What I've been up to is coming up with tasks where ChatGPT's performance gives me clues as to what's going on internally. Whereas the mechanistic interpretability folks are reverse engineering from the bottom up, I'm working from the top down. Now, in doing this, I've already got some ideas about semantics is structured in the brain; that is, I've got some ideas about the device that produces all those text strings. Not only that, but horror of horrors! Those ideas are based in 'classical' symbolic computing. But my particular set of ideas tells me that, yes, it makes sense that ANNs should be able to induce something that approximates what the brain is up to. So I've never for a minute thought the 'stochastic parrots' business was anything more than a rhetorical trick. I wrote that up after I'd worked with GPT-3 a little.

At this point I'm reasonably convinced that in some ways, yes, what's going on internally is like a classical symbolic net, but in other ways, no, it's quite different. I reached that conclusion after working intensively on having ChatGPT generate simple stories. After thinking about that for awhile I decided that, no, something's going on that's quite different from a classical symbolic story grammar. But then, what humans do seems to me in some ways not like classical story grammars.

It's all very complicated and very interesting. In the last month of so I've started working with a machine vision researcher at Goethe University in Frankfurt (Visvanathan Ramesh). We're slowly making progress.