Running Lightcone Infrastructure, which runs LessWrong and Lighthaven.space. You can reach me at habryka@lesswrong.com.
(I have signed no contracts or agreements whose existence I cannot mention, which I am mentioning here as a canary)
He is presumably referring to your inline reacts:
(Though maybe the calculus changes if you've got tons of other prospective donors invested with them.)
Yeah, I think that's loading up on risk in a way that I think acquires a pretty substantial adjustment, since I do think that if VARA keeps doing well, then probably we will have an easier time fundraising in the future. Also, we are likely to also invest at least some of our assets ourselves this year maybe even with VARA, or make a similar portfolio ourselves, and so will also have a bunch of direct exposure to these trends.
Also, I have learned to become very hesitant to believe that people will actually donate more if their assets appreciate, so I would not really be able to plan around such expectations in the way I can plan around money in my bank account.
The $1.4M really isn’t a particularly natural milestone. Like, sure, it means we won’t literally all shut down, but it still implies major cuts. Putting a goal at $1.4M would I think cause people to index too much on that threshold.
And then I do think giving any kind of incremental stop is helpful and the millions seem like the best place for that.
Probably! I will take a look in a bit.
Huh, I think 4o + Sydney bing (both of which were post 2022) seem like more intense examples of spooky generalization / crazy RLHF hacks. Gemini 4 with its extreme paranoia and (for example) desperate insistence that it's not 2025 seems in the same category.
Like, I am really not very surprised that if you try reasonably hard you can avoid these kinds of error modes in a supervising regime, but we've gotten overwhelming evidence that you routinely do get crazy RLHF-hacking. I do think it's a mild positive update that you can avoid these kinds of error modes if you try, but when you look into the details, it also seems kind of obvious to me that we have not found any scalable way of doing so.
The markdown editor is really just a standard text-entry field. But if you used the WYSIWYG editor then we almost certainly have backups (we take some form of backup every 5 minutes or so).
Almost none of our financial assistance was 100%, and we did ask people pretty directly to pay what they can if they can't pay the full amount. My guess is we did a pretty good job at reasonable price discrimination here, though it's of course very hard to tell.
I mean, we review all the decisions manually, so it’s the same legibility of criteria as before. What is the great benefit of making the LLM decisions more legible in-particular? Ideally we would just get the error close to zero.
The scenario really doesn't focus very much on describing what superintelligence looks like! It has like 7 paragraphs on this? Almost all of it is about trends about when powerful AI will arrive.
And then separately, "What superintelligence looks like" is claiming a much more important answer space than "I think something big will happen with AI in 2027, and here is a scenario about that".
Yep, indeed, basically those two things.