Running Lightcone Infrastructure, which runs LessWrong and Lighthaven.space. You can reach me at habryka@lesswrong.com.
(I have signed no contracts or agreements whose existence I cannot mention, which I am mentioning here as a canary)
The scenario really doesn't focus very much on describing what superintelligence looks like! It has like 7 paragraphs on this? Almost all of it is about trends about when powerful AI will arrive.
And then separately, "What superintelligence looks like" is claiming a much more important answer space than "I think something big will happen with AI in 2027, and here is a scenario about that".
This is a great comment! IMO would be good as a top-level shortform (or maybe even post).
Yeah, I was a bit worried about that. In case it's not clear, it's mostly a reference to "Toss a coin to your witcher" and it felt funnier to me when I said "Bitcoin" instead of "Coin".
I do think there is a real thing where Lightcone largely relies on substantial private donations, not a huge number of very small donations. Last year we had like 600 donors, which is a lot in some sense, but of course in order to fill our $2M budget this still meant the average donations was still $3,000!
So yeah, I do feel a bit bad about the affect of the title in this sense, but also felt like it captured something not totally invalid, and that makes me feel a bit better about it.
Some thoughts on this:
If you can pay market rate for us to design a high-quality site for browsing [content], that brings us surplus in cash by applying resources that are marginally cheap for us to use
One of the big confusing questions here is "what is market rate?". Design agencies are very much not commoditized and prices for someone to design a website + brand + interactive media essay range from $5,000 to $100M depending on who you are trying to get.
A different operationalization is something like "what multiple of your employees hourly salary would you need to be offered in order to take on a project purely on financial grounds?", and my current answer to that is "something between 2x and 4x". I am really not confident in this number, but it's where I've ended up the last 2-3 times I've asked myself this question. I am pretty sure it is not below 2x. For some projects I think this will put us substantially below market rate (because we can offer services you will have a really hard time buying anywhere on the open market without paying an arm and a leg), and for some will put us above market rate.
Of course, many projects have positive externalities, and so can drive that price a bunch lower. I do think that if things don't help with something like "helping humanity orient sanely towards the future" or "reducing the likelihood of AI existential risk", then usually the price I would end up quoting here ends up quite close to the above cost. Impact is very heavy-tailed.
But there are many exceptions! For example, I think it was a good call for people to work on COVID when it was happening, if they were in a strong position to make things go better, because it was really highly leveraged, and it seems like an important part of being cooperative with the broader world. I felt similar about Deciding To Win. Pharma regulation does not currently meet that bar for me though (it is a grave tragedy how incredibly messed up a lot of the stuff is, but not enough to think it's a civilizational priority to fix it the way COVID and the violation of democratic foundations is).
Hope that helps! In general I am happy to hear pitches for projects in the space and will generally be able to set some rough expectations (and we had a call where we talked about one project in the space that seemed cool and I was glad to have the call).
I think this would be quite valuable! I don't know how it compares to other things you do, but I definitely know a lot of people who end up doing a lot of their own research here in duplicated ways.
Yeah, OK, I do think your new two paragraphs help me understand a lot better where you are coming from.
The way I currently think about the kind of study we are discussing is that it does really feel like it's approximately at the edge of "how bad do instructions need to be/how competent do steering processes need to be in order to get good results out of current systems", and this is what makes them interesting! Like, if you had asked me in advance what LLMs would do if given the situation in the paper, I wouldn't have given you a super confident answer.
I think the overall answer to the question of "how bad to instructions need to be" is "like, reasonably bad. You can totally trigger specification gaming if you are not careful, but you also don't have to try enormously hard to prevent it for most low-stakes tasks". And the paper is one of the things that has helped me form that impression.
Of course, the thing about specification gaming that I care about the most are procedural details around things like "are there certain domains where the model is much better at instruction following?" and "does the model sometimes start substantially optimizing against my interests even if it definitely knows that I would not want that" and "does the model sometimes sandbag its performance?". The paper helps a bit with some of those, but as far as I can tell, doesn't take strong stances on how much it informs any of those questions (which I think is OK, I wish it had some better analysis around it, but AI papers are generally bad at that kind of conceptual analysis), and where secondary media around the paper does take a stance I do feel like it's stretching things a bit (but I think the authors are mostly not to blame for that).
To be clear, the URL for "What Superintelligence Looks Like" that was listed in that survey was "superintelligence2027.com", so that one also had the year in the name!
I mean, we review all the decisions manually, so it’s the same legibility of criteria as before. What is the great benefit of making the LLM decisions more legible in-particular? Ideally we would just get the error close to zero.