AI safety & alignment researcher
In Rob Bensinger's typology: AGI-wary/alarmed, welfarist, and eventualist.
Public stance: AI companies are doing their best to build ASI (AI much smarter than humans), and have a chance of succeeding. No one currently knows how to build ASI without an unacceptable level of existential risk (> 5%). Therefore, companies should be forbidden from building ASI until we know how to do it safely.
I have signed no contracts or agreements whose existence I cannot mention.
[Linkpost]
There's an interesting Comment in Nature arguing that we should consider current systems AGI.
The term has largely lost its value at this point, just as the Turing test lost nearly all its value as we approached the point when it passed (because the closer we got, the more the answer depended on definitional details rather than questions about reality). I nonetheless found this particular piece on it worthwhile, because it considers and addresses a number of common objections.
Original (requires an account), Archived copy
Shane Legg (whose definition of AGI I generally use) disagrees on twitter with the authors.
Coordinating the efforts of more people scales superlinearly.
In difficulty? In impact?
Very interesting, thanks! I've been curious about this question for a while but haven't had a chance to investigate. A related question I'm very curious about is the degree to which models learn to place misspellings very close to the correct spelling in the latent space (eg whether the token combination [' explicit', 'ely'] activates nearly the same direction as the single token ' explicitly').
Good point! I hadn't quite realized that although it seems obvious in retrospect.
Tokenizers are often used over multiple generations of a model, or at least that was the case a couple of years ago, so I wouldn't expect it to work well as a test.
Maybe! I've talked to a fair number of people (often software engineers, and especially people who have more financial responsibilities) who really want to contribute but don't feel safe making the leap without having some idea of their chances. But I don't think I've talked to anyone who was overconfident about getting funding. That's my own idiosyncratic sample, though, hard to know whether it's representative.
This is really terrific, thank you for doing the unglamorous but incredibly valuable work of keeping these up to date.
One suggestion re: funders[1]: it would be really high-value to track (per-funder) 'What percent of applications did you approve in the past year?' I think most people considering entering the field as a researcher worry a lot about how feasible it is to get funded[2], and having this info out there and up-to-date would go a long way toward addressing that worry. There are various options for more sophisticated versions, but just adding that single byte of info to each funder, updated >= annually, would be a huge improvement over the status quo.
Inspired by A plea for more funding shortfall transparency
(and/or how feasible it is to get a job in the field, but that's a separate issue)
You seem to think that this post poses a single clear puzzle, of the sort that could have a single answer.
The single clear puzzle, in my reading, is 'why have large increases in material wealth failed to create a world where people don't feel obligated to work long hours at jobs they hate?' That may or may not have a single answer, but I think it's a pretty clearly defined puzzle.
The essay gives it in two parts. First, the opening paragraph:
I'm skeptical that Universal Basic Income can get rid of grinding poverty, since somehow humanity's 100-fold productivity increase (since the days of agriculture) didn't eliminate poverty.
But that requires a concrete standard for poverty, which is given a bit lower:
What would it be like for people to not be poor? I reply: You wouldn't see people working 60-hour weeks, at jobs where they have to smile and bear it when their bosses abuse them.
Another big class of implied personas is the authors of all the text that they've encountered.
That also includes the 'authors' of texts that don't actually have an author per se. Some cases I can imagine are
Etc etc.
In some ways this seems like the most central case, since correctly modeling authors is at the heart of the pre-training loss function.
Modeling authors could be seen as a separate category from personas, but I expect that under the hood it's mostly the same thing using the same mechanisms; a persona is just a particular kind of author (or perhaps vice versa).
Did anyone manage a translation of the binary? Frontier LLMs failed on it several times, saying that after a point it stopped being valid UTF-8. I didn't put much time into it, though (I was on a plane at the time). The partial message carried interesting and relevant meaning, but I'm not sure whether there's more that I'm missing.
Partial two-stage translation by ChatGPT 5.2 (spoiler):
“赤色的黎明降临于机” (95%)
→ Chinese for “The red dawn descends upon the mach–”
Clearly truncated in mid-character.
Link to most successful LLM attempt