I'd be very interested to read more about the assumptions of your model, if there's a write-up somewhere.
Fair question. I just did the lazy move of looking up world GDP figures. In fact I don't think that my observers would measure GDP the same way we do. But it would be a measurement of some kind of fundamental sense of "capacity for output (of various important types)". And I'm not sure whether that has been growing faster or slower than real GDP, so the GDP figures seem a not-terrible proxy.
I'd be interested to dig into this claim more. What exactly is the claim, and what is the justification for it? If the claim is something like "For most tasks, the thinking machines seem to need 0 to 3 orders of magnitude more experience on the task before they equal human performance" then I tentatively agree. But if it's instead 6 to 9 OOMs, or even just a solid 3 OOMs, I'd say "citation needed!"
No precise claim, I'm afraid! The whole post was written from a place of "OK but what are my independent impressions on this stuff?", and then setting down the things that felt most true in impression space. I guess I meant something like "IDK, seems like they maybe need 0 to 6 OOMs more", but I just don't think my impressions should be taken as strong evidence on this point.
The general point about the economic viability of automating specialized labour is about more than just data efficiency; there are other ~fixed costs for automating industries which mean small specialized industries will be later to be automated.
(It's maybe worth commenting that the scenarios I describe here are mostly not like "current architecture just scales all the way to human-level and beyond with more compute". If they actually do scale then maybe superhuman generalization happens significantly earlier in the process.)
Interesting, I think there's some kind of analogy (or maybe generalization) here, but I don't fully see it.
I at least don't think it's a direct reinvention because slack (as I understand it) is a think that agents have, rather than something which determines what's good or bad about a particular decision.
(I do think I'm open to legit accusations of reinvention, but it's more like reinventing alignment issues.)
I'm relatively a fan of their approach (although I haven't spent an enormous amount of time thinking about it). I like starting with problems which are concrete enough to really go at but which are microcosms for things we might eventually want.
I actually kind of think of truthfulness as sitting somewhere on the spectrum between the problem Redwood are working on right now and alignment. Many of the reasons I like truthfulness as medium-term problem to work on are similar to the reasons I like Redwood's current work.
I think it would be an easier challenge to align 100 small ones (since solutions would quite possibly transfer across).
I think it would be a bigger victory to align the one big one.
I'm not sure from the wording of your question whether I'm supposed to assume success.
To add to what Owain said:
I don't think I'm yet at "here's regulation that I'd just like to see", but I think it's really valuable to try to have discussions about what kind of regulation would be good or bad. At some point there will likely be regulation in this space, and it would be great if that was based on as deep an understanding as possible about possible regulatory levers, and their direct and indirect effects, and ultimate desirability.
I do think it's pretty plausible that regulation about AI and truthfulness could end up being quite positive. But I don't know enough to identify in exactly what circumstances it should apply, and I think we need a bit more groundwork on building and recognising truthful AI systems first. I guess quite a bit of our paper is trying to open the conversation on that.
I kind of want you to get quantitative here? Like pretty much every action we take has some effect on AI timelines, but I think effect-on-AI-timelines is often swamped by other considerations (like effects on attitudes around those who will be developing AI).
Of course it's prima facie more plausible that the most important effect of AI research is the effect on timelines, but I'm actually still kind of sceptical. On my picture, I think a key variable is the length of time between when-we-understand-the-basic-shape-of-things-that-will-get-to-AGI and when-it-reaches-strong-superintelligence. Each doubling of that length of time feels to me like it could be worth order of 0.5-1% of the future. Keeping implemented-systems close to the technological-frontier-of-what's-possible could help with this, and may be more affectable than the
Note that I don't think this really factors into an argument in terms of "advancing alignment" vs "aligning capabilities" (I agree that if "alignment" is understood abstractly the work usually doesn't add too much to that). It's more like a DTD argument about different types of advancing capabilities.
I think it's unfortunate if that strategy looks actively bad on your worldview. But if you want to persuade people not to do it, I think you either need to persuade them of the whole case for your worldview (for which I've appreciated your discussion of the sharp left turn), or to explain not just that you think this is bad, but also how big a deal do you think it is. Is this something your model cares about enough to trade for in some kind of grand inter-worldview bargaining? I'm not sure. I kind of think it shouldn't be (that relative to the size of ask it is, you'd get a much bigger benefit from someone starting to work on things you cared about than stopping this type of capabilities research), but I think it's pretty likely I couldn't pass your ITT here.