In context, I guess your claim is: "if the 'compressor' is post-hoc trying a bunch of algorithms and picking the best one, the full complexity of that process should count against the compressor." Totally agree with that as far as epistemology is concerned!
But I don't think the epistemological point carries over to the realm of rational-fic.
It's like trying to compress a file that was generated by a random device —
Gretta: You can't losslessly compress a truly random file.
I don't think this is strictly true. You can't a priori build a compression scheme that will work for an arbitrary random file (No Free Lunch Theorem). But you can ex post identify the particular patterns in a particular random file, and pick a compression scheme that picks up on those patterns. You probably end up with a pretty ugly scheme that doesn't generalize, and so is unsatisfactory in some aesthetic sense. Especially if you're going for lossless compression, since there's probably a ton of noise that's just very hard to compress in... (read more)
And especially, maybe you remember how at the time it didn't seem like a flaw to you. You were not going around being like, “And today I shall be a flawed character.”
A therapist once gave me the insight that character weaknesses are strengths taken too far. Harry's energetic and clever and knowledgeable, he's inspired and energized by competition, and he can meme people into doing things -- and he can be a know-it-all who assumes first principles & cleverness trump empirics and experience, someone who's unwilling to lose, and irresponsible or annoying in how he leads others.
I was thinking the mask of "person who's read more books than Harry ever will"
We're launching an "AI psychiatry" team as part of interpretability efforts at Anthropic! We'll be researching phenomena like model personas, motivations, and situational awareness, and how they lead to spooky/unhinged behaviors. (x)
"making up types of guy" research is a go?
They're hiring; you might be great for this.
Nice post!
Your trader analogy made me think, you'd ideally want to have a training period with lots of honeypots and surveillance, so that the trader learns that crime doesn't pay.
This suggests some level-3 ideas I didn't see in your post:
These... (read more)
OK, cool, I think I understand where you're coming from much better now. Seems like we basically agree and were just emphasizing different things in our original comments!
I'm in violent agreement that there's a missing mood when people say "AIs will follow the law". I think there's something going on where people are like "but liberalism / decentralized competition have worked so well" and ignoring all the constraints on individual actors that make it so. Rule of law, external oversight, difficulty of conspiring with other humans, inefficiencies of gov't that limit its ability to abuse power, etc.
And those constraints might all fall away with the AGI transition. That's for a number of... (read more)
Apologies for the scrappiness of the below -- I wanted to respond but I have mostly a scattering of thoughts rather than solid takes.
I like the intelligence curse piece very much -- it's what I meant to reference when I linked the Turing Trap above, but I couldn't remember the title & Claude pointed me to that piece instead. I agree with everything you're saying directionally! But I feel some difference in emphasis or vibe that I'm curious about.
-
One response I notice having to your points is: why the focus on value alignment?
"We could use intent alignment / corrigibility to avoid AIs being problematic due to these factors. But all these issues... (read 448 more words →)
Thanks, I love the specificity here!
Prompt: if someone wanted to spend some $ and some expert-time to facilitate research on "inventing different types of guys", what would be especially useful to do? I'm not a technical person or a grantmaker myself, but I know a number of both types of people; I could imagine e.g. Longview or FLF or Open Phil being interested in this stuff.
Invoking Cunningham's law, I'll try to give a wrong answer for you or others to correct! ;)
Technical resources:
You're allowed to care about things besides AI safety
I worry that a lot of AI safety / x-risk people have imbibed a vibe of urgency, impossibility, and overwhelming-importance to solving alignment in particular; that this vibe distorts thinking; that the social sphere around AI x-risk makes it harder for people to update.
Yesterday I talked to an AI safety researcher who said he's pretty sure alignment will be solved by default. But whenever he talks to people about this, they just say "surely you don't think it's >99% likely? shouldn't you just keep working for the sake of that 1% chance?"
Obviously there's something real here - 1% of huge is huge. But... (read more)
Refurbishing the classic AI safety argument
My initial exposure to AI safety arguments was via Eliezer posts. My mental model of his logic goes something like:
"0) AI training will eventually yield high-quality agents;
1) high-quality agents will be utility maximizers;
2) utility maximizers will monomaniacally optimize for some world-feature;
3) therefore utility maximizers will seek Omohundro goals;
4) they'll be smarter than us, so this will disempower us;
5) value is fragile, so empowered AIs monomaniacally optimizing for their utility function fucks us over with very high probability"
VNM doesn't do what you want. As folks like @Rohin Shah and @nostalgebraist have pointed out, point 2 (and therefore 3 and 5) don't really follow. A utility function can have... (read 476 more words →)
What's up with incredibly successful geniuses having embarassing & confusing public meltdowns? What's up with them getting into naziism in particular?
Components of my model:
I'm pretty uninformed on the object level here (whether anyone is doing this; how easy it would be). But crazy-seeming inefficiencies crop up pretty often in our fallen world, and often what they need is a few competent people who make it their mission to fix them. I also suspect there would be a lot of cool "learning by doing" value involved in trying to scale up this work, and if you published your initial attempts at replication then people would get useful info about whether more of this is needed. Basically, getting funding to do and publish a pilot project seems great. I'd recommend having a lot of clarity about how you'd choose papers to replicate, or maybe just committing to a specific list of papers, so that people don't have to worry that you're cherry-picking results when you publish them :)