All Posts

Sorted by Magic (New & Upvoted)

Friday, June 16th 2023
Fri, Jun 16th 2023

AI 6
Logic & Mathematics 2
Probability & Statistics 2
Community 1
Natural Abstraction 1
Philosophy of Language 1
More
Shortform
8johnswentworth9h
Consider two claims: * Any system can be modeled as maximizing some utility function, therefore utility maximization is not a very useful model * Corrigibility is possible, but utility maximization is incompatible with corrigibility, therefore we need some non-utility-maximizer kind of agent to achieve corrigibility These two claims should probably not both be true! If any system can be modeled as maximizing a utility function, and it is possible to build a corrigible system, then naively the corrigible system can be modeled as maximizing a utility function. I expect that many peoples' intuitive mental models around utility maximization boil down to "boo utility maximizer models", and they would therefore intuitively expect both the above claims to be true at first glance. But on examination, the probable-incompatibility is fairly obvious, so the two claims might make a useful test to notice when one is relying on yay/boo reasoning about utilities in an incoherent way.
3
3kuira10h
Sometimes I have an internal desire different to do something different than what I think should be done (for example, I might desire to play a game while also thinking the better choice is to read). I've been experimenting with using randomness to mediate this. I keep a D20 with me, give each side of the dispute some odds proportional to the strength of its resolve, and then roll the die. In theory, this means neither side will overpower the other, and even a small resolve still has a chance. I'm not sure how useful this is, but it's fun, and can sort of give me motivation (I've tried to internalize this kind of roll as a rule not to break without good reason). Also, when I'm merely deciding between some options, sometimes I'll roll more casually with equal odds, and it'll help me realize that I already know which it is I really wanted to do (if I don't like the roll's outcome).
2NicholasKross7h
In response to / inspired by this SSC post [https://astralcodexten.substack.com/p/your-incentives-are-not-the-same]: I was originally going to comment something about "how do I balance this with the need to filter for niche nerds who are like me?", but then I remembered that the post is actually literally about dunks/insults on Twitter. o_0 This, in meta- and object-level ways, got to a core problem I have: I want to do smart and nice things with smart and nice people, yet these (especially the social stuff) requires me to be so careful + actually have anything like a self-filter. And even trying to practice/exercise that basic self-filtering skill feels physically draining. (ADHD + poor sleep btw, but just pointing these out doesn't do much!) To expand on this (my initial comment [https://astralcodexten.substack.com/p/your-incentives-are-not-the-same/comment/17376134]): While I love being chill and being around chill people, I also (depending on my emotional state) can find it exhausting to do basic social things like "not saying every thought that you think" and "not framing every sentence I say as a joke". I was once given the "personal social boundaries" talk by some family members. One of them said they were uncomfortable with a certain behavior/conversational-thing I did. (It was probably something between "fully conscious" and "a diagnosable tic".). And I told them flat-out that I would have trouble staying in their boundary (which was extremely basic and reasonable of them to set, mind you!), and that I literally preferred not-interacting-with-them to spending the energy to mask. Posts like this remind me of how scared of myself I sometimes am, and maybe should be? I'm scared and of being either [ostracized by communities I deeply love] or [exhausting myself by "masking" all the time]. And I don't really know how to escape this, except by learned coping mechanisms that are either (to me) "slowly revealing more of myself and being more casual, in proport
2Douglas_Knight13h
Someone just told me that the solution to conflicting experiments is more experiments. Taken literally this is wrong: more experiments just means more conflict. What we need are fewer experiments. We need to get rid of the bad experiments. Why expect that future experiments will be better? Maybe if the experimenters read the past experiments, they could learn from them. Well, maybe, but maybe if you read the experiments today, you could figure out which ones are bad today. If you don't read the experiments today and don't bother to judge which ones are better, what incentive is there for future experimenters to make better experiments, rather than accumulating conflict?
1
1Dalcy Bremin2h
What's a good technical introduction to Decision Theory and Game Theory for alignment researchers? I'm guessing standard undergrad textbooks don't include, say, content about logical decision theory. I've mostly been reading posts on LW but as with most stuff here they feel more like self-contained blog posts (rather than textbooks that build on top of a common context) so I was wondering if there was anything like a canonical resource providing a unified technical / math-y perspective on the whole subject.