Dumping out a lot of thoughts on LW in hopes that something sticks. Eternally upskilling.
I write the ML Safety Newsletter
DMs open, especially for promising opportunities in AI Safety and potential collaborators. I'm maybe interested in helping you optimize the communications of your new project.
I may have indeed made a mistake to frontload the math and thought experiments and put the introspection at the end, rather than centering the introspection and putting the rest in an appendix.
that's not how utility works, utility is the unit of value, and so it doesn't make sense in my ontology to say that they diminish in value.
I don't think I'm anywhere near negative utilitarian enough to empathize with that last point. As I mention in my previous post, I'm quite positive utilitarian.
I don't really have time to digest 2&3 right now, and I find myself confused without reading up on the things you cite.
This seems like it works but demands a very strange universal prior that penalizes big things and large numbers. I consider the original Pascal's Mugging post to have settled the argument about this type of prior.
This is very far up, above my hopes for humanity in the good ASI worlds, but not wildly higher than that, I expect. This is not a practical post afaik, and I said so. It is for filling out our conception of utilitarianism, and adding robustness to edge cases can sometimes help with creating useful new frames. Historically, it is the idea that came to me first and inspired me to write the sublinear utility post.
Here is the post I mentioned which responds to the question of bounded utility functions in much more detail.
Update: I tried claude 3 sonnet, 3 opus, 3.7 sonnet, 4 sonnet, and 4 opus, and all of them can repeat back ' ForCanBeConvertedToForeach' just fine, so it's (probably) not just a straightforward porting of glitch tokens to claudes, which updates me a little towards pareidolia.
Connection I recently made:
I'm not really sure what is going on with glitch tokens still, and even though ' cyclone' isn't a glitch token itself, I suspect that there is something weird about it that maybe got crystallized in training. Not quite sure why this would show up in Claude, and maybe I'm just latching onto pareidolia.
The post on why my utility function is bounded is hopefully coming out later this week, and it is in fact an independent point from what this post is talking about. Neither of those muggings sound like they would work. Alas, I don't have all my thoughts written out right here right now, so you shall have to wait.
Yeah, measure is pretty much what I was trying to get at in this post without trying to get into actually measure. I think a more detailed rewrite of this would maybe go into measure and more math, but that isn't my priority for now. I agree that you can want exactly some constant number of beings; once again I'm not trying to give the One Objective Morality here, I'm just talking about the shape that my values seem to be when I look at them, and maybe other people will find this useful.
I don't really understand what you're saying about the relativity point. Also, I'm not trying to say the "correct" way to value things is my way, I'm saying that my way is my way, and I don't think doubling up the transistors is going to do anything that it is coherent to care about.
Infinite utility functions mean that there is a concrete input such that the output is "infinity", such as "you go to heaven in the Wager scenario". Unbounded utility functions do not necessarily output "infinity" for a particular value. f(x)=x or "count the number of paper clips" is unbounded but at no concrete input does it tell you "infinity".