One of the most common counterarguments against a lot of alignment research that I hear sounds something like this: “Making current AI do X with Y won’t help, because AGI will RSI to ASI and break everything”. And it.. sounds convincing, but also like an ultimate counterargument? I mean, I...
TL;DR: This post is about value of recreating “caring drive” similar to some animals and why it might be useful for AI Alignment field in general. Finding and understanding the right combination of training data/loss function/architecture/etc that allows gradient descent to robustly find/create agents that will care about other agents...
I often see people cite the "no free lunch theorem" as a counterargument to the existence of an AGI. I think this is a very bad argument. This theorem is formulated as a problem of predicting random and uniformly distributed data. Just open the Wikipedia!. In my opinion, this is...
We may have one example of realized out-of-distribution alignment: maternal attachment. Evolution has been able to create an architecture that seems to take care of something reliably enough that modern humans, with access to unlimited food, drugs, and VR headsets, do not seek to feed a child to death, drug...