AI safety & alignment researcher
In Rob Bensinger's typology: AGI-wary/alarmed, welfarist, and eventualist.
Public stance: AI companies are doing their best to build ASI (AI much smarter than humans), and have a chance of succeeding. No one currently knows how to build ASI without an unacceptable level of existential risk (> 5%). Therefore, companies should be forbidden from building ASI until we know how to do it safely.
I have signed no contracts or agreements whose existence I cannot mention.
Good point! I hadn't quite realized that although it seems obvious in retrospect.
Tokenizers are often used over multiple generations of a model, or at least that was the case a couple of years ago, so I wouldn't expect it to work well as a test.
Maybe! I've talked to a fair number of people (often software engineers, and especially people who have more financial responsibilities) who really want to contribute but don't feel safe making the leap without having some idea of their chances. But I don't think I've talked to anyone who was overconfident about getting funding. That's my own idiosyncratic sample, though, hard to know whether it's representative.
This is really terrific, thank you for doing the unglamorous but incredibly valuable work of keeping these up to date.
One suggestion re: funders[1]: it would be really high-value to track (per-funder) 'What percent of applications did you approve in the past year?' I think most people considering entering the field as a researcher worry a lot about how feasible it is to get funded[2], and having this info out there and up-to-date would go a long way toward addressing that worry. There are various options for more sophisticated versions, but just adding that single byte of info to each funder, updated >= annually, would be a huge improvement over the status quo.
Inspired by A plea for more funding shortfall transparency
(and/or how feasible it is to get a job in the field, but that's a separate issue)
You seem to think that this post poses a single clear puzzle, of the sort that could have a single answer.
The single clear puzzle, in my reading, is 'why have large increases in material wealth failed to create a world where people don't feel obligated to work long hours at jobs they hate?' That may or may not have a single answer, but I think it's a pretty clearly defined puzzle.
The essay gives it in two parts. First, the opening paragraph:
I'm skeptical that Universal Basic Income can get rid of grinding poverty, since somehow humanity's 100-fold productivity increase (since the days of agriculture) didn't eliminate poverty.
But that requires a concrete standard for poverty, which is given a bit lower:
What would it be like for people to not be poor? I reply: You wouldn't see people working 60-hour weeks, at jobs where they have to smile and bear it when their bosses abuse them.
Another big class of implied personas is the authors of all the text that they've encountered.
That also includes the 'authors' of texts that don't actually have an author per se. Some cases I can imagine are
Etc etc.
In some ways this seems like the most central case, since correctly modeling authors is at the heart of the pre-training loss function.
Modeling authors could be seen as a separate category from personas, but I expect that under the hood it's mostly the same thing using the same mechanisms; a persona is just a particular kind of author (or perhaps vice versa).
Many people seem to think the answer to the puzzle posed here is obvious, but they all think it's something different. This has nagged at me since it was posted. It's an issue that more people need to be thinking about, because if we don't understand it we can't fix it, and so the standard approaches to poverty may just fail even as our world becomes richer. Strong upvote for 2024 review.
Complicating that a bit further is the fact that there may be personas which are not present in the training data but which are implied by it. As Gwern rather nicely put it years ago:
It is sometimes argued that models like GPT-3 cannot be “an agent”; it is true, they are not “an” agent, because they are every agent. They are every agent that they have data about, and every hypothetical agent they can generalize to.
Great post! I'm glad to see continued momentum on the importance of persona research and related topics. One aspect I'd love to see addressed is some account of which personas matter. The number of latent personas present in the training data is enormous, and it seems likely intractable to study them all. I think we need to understand which personas affect model behavior when, and how strongly. I tried to point in that direction a bit in my functional self post, although I'm not as sold on that framing now as I was then.
Very interesting, thanks! I've been curious about this question for a while but haven't had a chance to investigate. A related question I'm very curious about is the degree to which models learn to place misspellings very close to the correct spelling in the latent space (eg whether the token combination [' explicit', 'ely'] activates nearly the same direction as the single token ' explicitly').