hi! i'm tammy :3
i research the QACI plan for formal-goal AI alignment at orthogonal.
check out my blog and my twitter.
agreed overall.
if ones goal is to minimize the harm per animal conditional on it existing, and one believes that ASI is within reach, the correct focus would seem to be to ignore alignment and focus on capabilities
IMO aligned AI reduces suffering even more than unaligned AI because it'll pay alien civilizations (eg baby eaters) to not do things that we'd consider large scale suffering (in exchange for some of our lightcone), so even people closer to the negative utilitarian side should want to solve alignment.
ratfic (as i'm using here) typically showcases characters applying lesswrong rationality well. lesswrong rationality is typically defined as ultimately instrumental to winning.
yes, this is correct. i believe we should solve alignment before building AI, rather than after. (in fact, alignment should be fundamental to the design of the AI, not a patch you apply after the fact)
oh, so this is a temporary before-AI-inevitably-either-kills-everyone-or-solves-everything thing, not a plan for making the AI-that-solves-everything-including-X-risk?
"human alignment" as you put it seems undesirable to me — i want people to get their values satisfied and then conflicts resolved in some reasonable manner, i don't want to change people's values so they're easier to satisfy-all-at-once. changing other people's values is very rude and, almost always, a violation of their current values.
any idea how you'd envision "making people love their neighbor as themselves" ? sounds like modifying everyone on earth like that would be much more difficult than, say, changing the mind of the people who would make the AIs that are gonna kill everyone.
Our alignment philosophy is simple: we cannot align AI's to human values until we know approximately what human values actually are, and we cannot know that until we solve the human alignment problem.
what do you make of coherent extrapolated volition, which is the usual solution for solving alignment without having a full understanding of our values?
what do you mean by "human alignment problem"? here it seems that you mean "understanding the values of humans", but many people use that term to mean a variety of things (usually they use it to mean "making humans aligned with one another")
that last point is plausible for some, but for most i expect that we're far from the pareto frontier and large positive sum gains to be made through cooperation (assuming they implement a decision theory that allows such cooperation).