AI safety & alignment researcher
In Rob Bensinger's typology: AGI-wary/alarmed, welfarist, and eventualist.
Public stance: AI companies are doing their best to build ASI (AI much smarter than humans), and have a chance of succeeding. No one currently knows how to build ASI without an unacceptable level of existential risk (> 5%). Therefore, companies should be forbidden from building ASI until we know how to do it safely.
I have signed no contracts or agreements whose existence I cannot mention.
Maybe! I've talked to a fair number of people (often software engineers, and especially people who have more financial responsibilities) who really want to contribute but don't feel safe making the leap without having some idea of their chances. But I don't think I've talked to anyone who was overconfident about getting funding. That's my own idiosyncratic sample, though, hard to know whether it's representative.
This is really terrific, thank you for doing the unglamorous but incredibly valuable work of keeping these up to date.
One suggestion re: funders[1]: it would be really high-value to track (per-funder) 'What percent of applications did you approve in the past year?' I think most people considering entering the field as a researcher worry a lot about how feasible it is to get funded[2], and having this info out there and up-to-date would go a long way toward addressing that worry. There are various options for more sophisticated versions, but just adding that single byte of info to each funder, updated >= annually, would be a huge improvement over the status quo.
Inspired by A plea for more funding shortfall transparency
(and/or how feasible it is to get a job in the field, but that's a separate issue)
You seem to think that this post poses a single clear puzzle, of the sort that could have a single answer.
The single clear puzzle, in my reading, is 'why have large increases in material wealth failed to create a world where people don't feel obligated to work long hours at jobs they hate?' That may or may not have a single answer, but I think it's a pretty clearly defined puzzle.
The essay gives it in two parts. First, the opening paragraph:
I'm skeptical that Universal Basic Income can get rid of grinding poverty, since somehow humanity's 100-fold productivity increase (since the days of agriculture) didn't eliminate poverty.
But that requires a concrete standard for poverty, which is given a bit lower:
What would it be like for people to not be poor? I reply: You wouldn't see people working 60-hour weeks, at jobs where they have to smile and bear it when their bosses abuse them.
Another big class of implied personas is the authors of all the text that they've encountered.
That also includes the 'authors' of texts that don't actually have an author per se. Some cases I can imagine are
Etc etc.
In some ways this seems like the most central case, since correctly modeling authors is at the heart of the pre-training loss function.
Modeling authors could be seen as a separate category from personas, but I expect that under the hood it's mostly the same thing using the same mechanisms; a persona is just a particular kind of author (or perhaps vice versa).
Many people seem to think the answer to the puzzle posed here is obvious, but they all think it's something different. This has nagged at me since it was posted. It's an issue that more people need to be thinking about, because if we don't understand it we can't fix it, and so the standard approaches to poverty may just fail even as our world becomes richer. Strong upvote for 2024 review.
Complicating that a bit further is the fact that there may be personas which are not present in the training data but which are implied by it. As Gwern rather nicely put it years ago:
It is sometimes argued that models like GPT-3 cannot be “an agent”; it is true, they are not “an” agent, because they are every agent. They are every agent that they have data about, and every hypothetical agent they can generalize to.
Great post! I'm glad to see continued momentum on the importance of persona research and related topics. One aspect I'd love to see addressed is some account of which personas matter. The number of latent personas present in the training data is enormous, and it seems likely intractable to study them all. I think we need to understand which personas affect model behavior when, and how strongly. I tried to point in that direction a bit in my functional self post, although I'm not as sold on that framing now as I was then.
(Of course from the inside it doesn't look absurd, but instead feels like moral progress. One example of this that I happened across recently is filial piety in China, which became more and more extreme over time, until someone cutting off a piece of their flesh to prepare a medicinal broth for an ailing parent was held up as a moral exemplar.)
It's not clear whether you're the author of this quoted comment, but I don't know where it's originally from, so I'm responding here.
This moral stance is less extreme than it sounds at first blush. Consider the examples given in '“True Stories” of Filial Piety' from A Reader in Nineteenth-Century Chinese History, and in 'Chinese Filial Cannibalism: A Silk Road Import'.
Although modern readers may be put off by the squick factor of cannibalism, the closest equivalent is organ donation. When we read about someone donating an organ to save a sick parent, we see it in very similar terms, as an impressive and supererogatory act of moral virtue. In fact, this practice as it was understood is actually a smaller sacrifice than organ donation: the donor doesn't even need to sacrifice an entire organ, just a chunk of flesh—and unlike organ donation, efficacy is guaranteed.
It might have been medically preferable for donors to have the flesh surgically removed by a doctor, but they seem to have consistently needed to conceal the act, so presumably that wasn't an available option. It's also unclear to me whether, in the medical understanding of the time (which for one thing did not include the germ theory of disease) the perceived tradeoffs would be as strongly in favor of professional removal as they are now.
It may not be obvious at first glance why CCC's response supports Eliezer's point, so to spell it out in more detail: as of now (the night before), I expect that as of 10 am, there will be
A. 25% that cable guy already arrived, ie 25% chance of 'before noon and before 10'.
B. 75% that cable guy has not already arrived, and 33.3% chance that he'll arrive between 10 am and noon, ie .75 / 3 = 25% chance of 'before noon and after 10'.
C. 75% that cable guy has not already arrived, and 66.7% chance that he'll arrive after noon, ie .75 * 2 / 3 = 50% chance of 'after noon'.
This illustrates Eliezer's point nicely. Since the total 'before noon' chance is case A (25%) + case B (25%) and the total 'after noon' chance is case C (50%), my currently expected belief at 10 am tomorrow is the same as my expected belief now, and so we don't expect our confidence in 'before noon' to be higher (or lower) than it is now.
As of 10 am tomorrow, we will (as CCC says) have changed our probabilities in one direction or another, but in our current expectation of our beliefs at that point, it's still 50:50.
Tokenizers are often used over multiple generations of a model, or at least that was the case a couple of years ago, so I wouldn't expect it to work well as a test.