Posts

Sorted by New

Wiki Contributions

Comments

Sorted by
ABlue42

You want to help? Figure out what kind of incremental changes you can begin to introduce in any of them, in order to begin extinguishing the sort of problems you've now elevated to the rank of "saving-worthy" in your own head. Note that, in all likelihood, by extinguishing one you will merrily introduce a whole bunch of others - something you won't get to discover until much later one. Yet that is, realistically, what you can actually go on to accomplish.

I read this paragraph as saying ~the same thing as the original post in a different tone

ABlue30

Is there a better way of discovering strong arguments for a non-expert than asking for them publicly?

ABlue10

Also, it assumes there is a separate module for making predictions, which cannot be manipulated by the agent. This assumption is not very probable in my view.

Isn't this a blocker for any discussion of particular utility functions?

ABlue10

If a simple philosophical argument can cut the expected odds of AI doom by an order of magnitude, we might not change our current plans, but it suggests that we have a lot of confusion on the topic that further research might alleviate.

And more generally, "the world where we almost certainly get killed by ASI" and "The world where we have an 80% chance of getting killed by ASI" are different worlds, and, ignoring motives to lie for propaganda purposes, if we actually live in the latter we should not say we live in the former.

ABlue10

I don't think wireheading is "myopic" when it overlaps with self-maintenance. Classic example would be painkillers; they do ~nothing but make you "feel good now" (or at least less bad), but sometimes feeling less bad is necessary to function properly and achieve long-term value. I think that gratitude journaling is also part of this overlap area. That said I don't know many peoples' experiences with it so maybe it's more prone to "abuse" than I expect.

ABlue10

A corrigible AI is one that is cooperative to attempts to modify it to bring it more in line with what its creators/users want it to be. Some people think that this is a promising direction for alignment research, since if an AI could be guaranteed to be corrigible, even if it end up with wild/dangerous goals, we could in principle just modify it to not have those goals and it wouldn't try to stop us.

"Alignment win condition," as far as I know, is a phrase I just made up. I mean it as something that, regardless of whether it "solves" alignment in a specific technical sense, achieves the underlying goal of alignment research which is "have artificial intelligence which does things we want and doesn't do things we don't want." A superintelligence that is perfectly aligned with its creator's goals would be very interesting technically and mathematically, but if its creator wants it to kill anyone it really isn't any better than an unaligned superintelligence that kills everyone too.

ABlue10

I don't trust a hypothetical arbitrary superintelligence but I agree that a superintelligence is too much power for any extant organization, which means that "corrigibility" is not an alignment win condition. An AI resisting modification to do bad things (whatever that might mean on reflection) seems like a feature, not a bug.

ABlue90

Do you believe or allow for a distinction between value and ethics? Intuitively it feels like metaethics should take into account the Goodness of Reality principle, but I think my intuition comes from a belief that if there's some objective notion of Good, ethics collapses to "you should do whatever makes the world More Gooder," and I suppose that that's not strictly necessary.

ABlue138

The adulterer, the slave owner and the wartime rapist all have solid evolutionary reasons to engage in behaviors most of us might find immoral. I think their moral blind spots are likely not caused by trapped priors, like an exaggerated fear of dogs is.

I don't think the evopsych and trapped-prior views are incompatible. A selection pressure towards immoral behavior could select for genes/memes that tend to result in certain kinds of trapped prior.

ABlue128

I also suspect something along the lines of "Many (most?) great spiritual leaders were making a good-faith effort to understand the same ground truth with the same psychological equipment and got significantly farther than most normal people do." But in order for that to be plausible, you would need a reason why the almost-truths they found are so goddamn antimemetic that the most studied and followed people in history weren't able to make them stick. Some of the selection pressure surely comes down to social dynamics. I'd like to think that people who have grazed some great Truth are less likely to torture and kill infidels than someone who thinks they know a great truth. Cognitive blind spots could definitely explain things, though.

The problem is, the same thing that would make blind spots good at curbing the spread of enlightenment also makes them tricky to debate as a mechanism for it. They're so slippery that until you've gotten past one yourself it's hard to believe they exist (especially when the phenomenal experience of knowing-something-that-was-once-utterly-unknowable can also seemingly be explained by developing a delusion). They're also hard to falsify. What you call active blind spots are a bit easier to work with, I think most people can accept the idea of something like "a truth you're afraid to confront" even if they haven't experienced such a thing themselves (or are afraid to confront the fact that they have).

I look forward to reading your next post(s) as well as this site's reaction to them

Load More