Consider two claims:
* Any system can be modeled as maximizing some utility function, therefore
utility maximization is not a very useful model
* Corrigibility is possible, but utility maximization is incompatible with
corrigibility, therefore we need some non-utility-maximizer kind of agent to
achieve corrigibility
These two claims should probably not both be true! If any system can be modeled
as maximizing a utility function, and it is possible to build a corrigible
system, then naively the corrigible system can be modeled as maximizing a
utility function.
I expect that many peoples' intuitive mental models around utility maximization
boil down to "boo utility maximizer models", and they would therefore
intuitively expect both the above claims to be true at first glance. But on
examination, the probable-incompatibility is fairly obvious, so the two claims
might make a useful test to notice when one is relying on yay/boo reasoning
about utilities in an incoherent way.
3
3kuira10h
Sometimes I have an internal desire different to do something different than
what I think should be done (for example, I might desire to play a game while
also thinking the better choice is to read). I've been experimenting with using
randomness to mediate this. I keep a D20 with me, give each side of the dispute
some odds proportional to the strength of its resolve, and then roll the die.
In theory, this means neither side will overpower the other, and even a small
resolve still has a chance. I'm not sure how useful this is, but it's fun, and
can sort of give me motivation (I've tried to internalize this kind of roll as a
rule not to break without good reason).
Also, when I'm merely deciding between some options, sometimes I'll roll more
casually with equal odds, and it'll help me realize that I already know which it
is I really wanted to do (if I don't like the roll's outcome).
2NicholasKross7h
In response to / inspired by this SSC post
[https://astralcodexten.substack.com/p/your-incentives-are-not-the-same]:
I was originally going to comment something about "how do I balance this with
the need to filter for niche nerds who are like me?", but then I remembered that
the post is actually literally about dunks/insults on Twitter. o_0
This, in meta- and object-level ways, got to a core problem I have: I want to do
smart and nice things with smart and nice people, yet these (especially the
social stuff) requires me to be so careful + actually have anything like a
self-filter. And even trying to practice/exercise that basic self-filtering
skill feels physically draining. (ADHD + poor sleep btw, but just pointing these
out doesn't do much!)
To expand on this (my initial comment
[https://astralcodexten.substack.com/p/your-incentives-are-not-the-same/comment/17376134]):
While I love being chill and being around chill people, I also (depending on my
emotional state) can find it exhausting to do basic social things like "not
saying every thought that you think" and "not framing every sentence I say as a
joke".
I was once given the "personal social boundaries" talk by some family members.
One of them said they were uncomfortable with a certain
behavior/conversational-thing I did. (It was probably something between "fully
conscious" and "a diagnosable tic".). And I told them flat-out that I would have
trouble staying in their boundary (which was extremely basic and reasonable of
them to set, mind you!), and that I literally preferred
not-interacting-with-them to spending the energy to mask.
Posts like this remind me of how scared of myself I sometimes am, and maybe
should be? I'm scared and of being either [ostracized by communities I deeply
love] or [exhausting myself by "masking" all the time]. And I don't really know
how to escape this, except by learned coping mechanisms that are either (to me)
"slowly revealing more of myself and being more casual, in proport
2Douglas_Knight13h
Someone just told me that the solution to conflicting experiments is more
experiments. Taken literally this is wrong: more experiments just means more
conflict. What we need are fewer experiments. We need to get rid of the bad
experiments.
Why expect that future experiments will be better? Maybe if the experimenters
read the past experiments, they could learn from them. Well, maybe, but maybe if
you read the experiments today, you could figure out which ones are bad today.
If you don't read the experiments today and don't bother to judge which ones are
better, what incentive is there for future experimenters to make better
experiments, rather than accumulating conflict?
1
1Dalcy Bremin2h
What's a good technical introduction to Decision Theory and Game Theory for
alignment researchers? I'm guessing standard undergrad textbooks don't include,
say, content about logical decision theory. I've mostly been reading posts on LW
but as with most stuff here they feel more like self-contained blog posts
(rather than textbooks that build on top of a common context) so I was wondering
if there was anything like a canonical resource providing a unified technical /
math-y perspective on the whole subject.