ABlue
ABlue has not written any posts yet.

Is there a better way of discovering strong arguments for a non-expert than asking for them publicly?
Also, it assumes there is a separate module for making predictions, which cannot be manipulated by the agent. This assumption is not very probable in my view.
Isn't this a blocker for any discussion of particular utility functions?
If a simple philosophical argument can cut the expected odds of AI doom by an order of magnitude, we might not change our current plans, but it suggests that we have a lot of confusion on the topic that further research might alleviate.
And more generally, "the world where we almost certainly get killed by ASI" and "The world where we have an 80% chance of getting killed by ASI" are different worlds, and, ignoring motives to lie for propaganda purposes, if we actually live in the latter we should not say we live in the former.
I don't think wireheading is "myopic" when it overlaps with self-maintenance. Classic example would be painkillers; they do ~nothing but make you "feel good now" (or at least less bad), but sometimes feeling less bad is necessary to function properly and achieve long-term value. I think that gratitude journaling is also part of this overlap area. That said I don't know many peoples' experiences with it so maybe it's more prone to "abuse" than I expect.
A corrigible AI is one that is cooperative to attempts to modify it to bring it more in line with what its creators/users want it to be. Some people think that this is a promising direction for alignment research, since if an AI could be guaranteed to be corrigible, even if it end up with wild/dangerous goals, we could in principle just modify it to not have those goals and it wouldn't try to stop us.
"Alignment win condition," as far as I know, is a phrase I just made up. I mean it as something that, regardless of whether it "solves" alignment in a specific technical sense, achieves the underlying goal of... (read more)
I don't trust a hypothetical arbitrary superintelligence but I agree that a superintelligence is too much power for any extant organization, which means that "corrigibility" is not an alignment win condition. An AI resisting modification to do bad things (whatever that might mean on reflection) seems like a feature, not a bug.
Do you believe or allow for a distinction between value and ethics? Intuitively it feels like metaethics should take into account the Goodness of Reality principle, but I think my intuition comes from a belief that if there's some objective notion of Good, ethics collapses to "you should do whatever makes the world More Gooder," and I suppose that that's not strictly necessary.
The adulterer, the slave owner and the wartime rapist all have solid evolutionary reasons to engage in behaviors most of us might find immoral. I think their moral blind spots are likely not caused by trapped priors, like an exaggerated fear of dogs is.
I don't think the evopsych and trapped-prior views are incompatible. A selection pressure towards immoral behavior could select for genes/memes that tend to result in certain kinds of trapped prior.
I also suspect something along the lines of "Many (most?) great spiritual leaders were making a good-faith effort to understand the same ground truth with the same psychological equipment and got significantly farther than most normal people do." But in order for that to be plausible, you would need a reason why the almost-truths they found are so goddamn antimemetic that the most studied and followed people in history weren't able to make them stick. Some of the selection pressure surely comes down to social dynamics. I'd like to think that people who have grazed some great Truth are less likely to torture and kill infidels than someone who thinks they know... (read more)
I read this paragraph as saying ~the same thing as the original post in a different tone