Doesn't seem like a genre subversion to me, it's just a bit clever/meta while still centrally being an allegorical AI alignment dialogue. IDK what the target audience is though (but maybe Eliezer just felt inspired to write this).
Probably memory / custom system prompt right?
Kyle Fish (works on ai welfare at anthropic) endorses this in his 80k episode. Obviously this isn't a commitment from anthropic.
I broadly agree (and have advocated for similar), but I think there are some non-trivial downsides around keeping weights around in the event of AI takeover (due to possiblity for torture/threats from future AIs). Maybe the ideal proposal would involve the weights being stored in some way that involves attempting destruction in the event of AI takeover. E.g., give 10 people keys and requires >3 keys to unencrypt and instruct people to destroy keys in the event of AI takevoer. (Unclear if this would work tbc.)
Yes, but you'd naively hope this wouldn't apply to shitty posts, just to mediocre posts. Like, maybe more people would read, but if the post is actually bad, people would downvote etc.
I'm pretty excited about building tools/methods for better dataset influence understanding, so this intuitively seems pretty exciting! (I'm both interested in better cheap approximation of the effects of leaving some data out and the effects of adding some data in.)
(I haven't looked at the exact method and results in this paper yet.)
The exact text is:
Dario: 6 months ago I I made this prediction that, you know, in in 6 months 90% of code would be written by by by AI models. Some people think that prediction is wrong, but within Anthropic and within a number of companies that we work with, that is absolutely true.
Marc: Um now 90 you know so you're saying that 90% of all code at Anthropic being written by the by the model today—
Dario: on on many teams you know not uniformly
I think on some teams at Anthropic 90% of code is written by AIs and on some teams it isn't for an average lower than 90%. I say more here.
Yes, I just meant "misaligned ai takeover". Edited to clarify.
E. g., Ryan Greenblatt thinks that spending 5% more resources than is myopically commercially expedient would be enough. AI 2027 also assumes something like this.
TBC, my view isn't that this is sufficient for avoiding takeover risk, it is that this suffices for "you [to] have a reasonable chance of avoiding AI takeover (maybe 50% chance of misaligned AI takeover?)".
(You seem to understand that this is my perspective and I think this is also mostly clear from the context in the box, but I wanted to clarify this given the footnote might be read in isolation or misinterpreted.)
I agree in general, but think this particular example is pretty reasonable because the point is general and just happens to be have been triggered by a specific post that 1a3orn thinks is an example of this (presumably this?).
I do think it's usually better practice to list a bunch of examples of the thing you're refering to, but also specific examples can sometimes be distracting/unproductive or cause more tribalism than needed? Like in this case I think it would probably be better if people considered this point in abstract (decoupled from implications) and thought about how much they agreed and then after applied this on a case by case basis. (A common tactic that (e.g.) scott alexander uses is to first make an abstract argument before applying it so that people are more likely to properly decouple.)