I’d love the chance to brainstorm and refine these ideas, to explore how we might engineer architectures that are simple yet robust, capable of sustaining trust, repair, and cooperation without introducing subjugation or dependency.
I think what you're describing here sounds more like a higher level problem - "given a population of agents in two groups A and H, where H are Humans and A are vastly more powerful AIs, which policy should agents in A adopt that even when universalised produces a stable and prosperous equilibrium?". That's definitely part of it, but the problem I'm referring to when mentioning architectures is "how do we even make an AI that is guaranteed to always stick to such a policy?".
To be clear, it's not given that this is even possible. We can't do it now with AIs still way simpler than AGI. And us humans aren't an example of that either. We do have general trends and instincts, but we aren't all "programmed" in a way that makes us positively prosocial. Just imagine what would likely happen if instead of AIs you gave that same level of power to a small group of humans. The problem is that even as is, between humans, relative power and implicit threats of violence are a part of what keeps existing equilibria. That works because humans are all approximately of the same individual power, and rely on social structures to amplify that. If AIs were all individually far more powerful than us, they would need to also have superhuman restraint and morality, not just superhuman intelligence, to not simply start caring about their own business and let us die as a side effect.
Ultimately, relational design matters because humans inevitably interact through familiar social frameworks such as trust, repair, etc. If we ignore that, alignment risks producing systems that are powerful but alien in ways that matter for human flourishing.
I'm not sure what insight that adds though. I don't think social frameworks would be anything like we're used to with AIs. We would relate to them by anthropomorphising them probably - that much is sure, we do that already - so some of these things would apply on our side of the relationship. But they don't need to apply the other way (in fact, I'd be a bit worried about an AI that can decide autonomously to not trust me). If anything a type of relationship that we would consider deeply uncomfortable and wrong with humans - master and servant - might be the safest when it comes to humans and AIs, though it also has its flaws. Building "friend" peer AGIs is already incredibly risky unless you somehow find a way to let them solve the relational problem on the way, while ensuring that the process for doing so is robust.
"If you want a picture of the future, imagine a chancla stomping on a human face - forever."
I sometimes wonder if the real question isn't whether AI will one day betray us, but whether we will have taught it, and ourselves, how to repair when it does.
This I think however misses the point and becomes a bit of a platitude. Yes, it's true that the interaction with AI is relational, but the thing that IMO purely humanistic perspectives really miss is that this is a relation in which half of the relation isn't human. Not human, not mammal, not even biological - think of the weirdest animal you could try to form a relationship with, then think weirder. You wouldn't keep a wild tiger or a grizzly bear in your home, and those are still far more similar to us than an AI, a priori, has any reason to be. If the AI resembles us - insofar as it does - it's only thanks to how we shaped it in the first place. I really don't expect AI to betray us either out of cunning malevolence or out of bitterness for our treatment of it, unless we make it capable of being malevolent or feeling bitter in the first place (or at least, unless we don't take care to not make it be those things).
So in some way, "build an AI that has with us the relationship a healthy, non-abusive mother has with her children" may not be a terrible idea. The problem with that is:
we're very very far from being able to hit that target with precision from a technical standpoint (and it needs to be precise, since as you point out it is terribly close to things like "narcissistic ego-monster that only uses you for their self-gratification")
even if we could hit it successfully, it still implies a degree of subjugation and quite literal infantilisation that doesn't seem like the best way for our species to spend its entire future.
Okay. I agree some people genuinely want to mass murder the other side just to get slightly more resources. I just want more data that this would actually be a majority.
Why do you put the onus on proving that there is one rule about it being a majority? We know it happens. It's hard to say for stuff like the Nazis because technically the people only voted for some guy who was certainly very gung-ho about militarism and about the need for Germany to expand, but then the matter was basically taken out of their hands, and it was at best a plurality to begin with...
Yes, obviously there is no one case I can present to say "here's a situation where at least 50%+1 of the population genuinely was in favour of war". Neither can you prove that this has never happened. All we know is that some people do express favour for war; sometimes there are even mass movements in favour of it, depending on circumstances; and it would be somewhat odd if by some strange hidden law of social dynamics that fraction could never exceed 50%, despite having definitely been significant in various occasions we can refer to.
Anyway at the very least we seem to have evidence that over 50% of Israelis believe the current war in Gaza is appropriate or even not harsh enough. That's a bit of evidence.
I think de-escalating would also be easier when people of both countries have high level of visibility into what people of the other country are feeling and why.
I don't think that's achieved just by "no more secrecy" though. Understanding how another country's population feels isn't a matter of that information being concealed, but hard to measure and aggregate.
So to add on this:
they may have chosen this way because it turns out taking the derivative of a matrix logarithm without certain guarantees of commutativity of the matrix with its own differential is really really hard. Which to be fair isn't a good reason per se, but yeah.
Also, the paper mentions that
the Kullback–Leiber divergence [7, 10], other f -divergences including Pearson divergence and Hellinger distance [34], zero-one loss [35], or the mean-square error of an estimation [36, 37]
and looking at it, the quantum fidelity reduces to one minus the Hellinger distance squared:
https://en.wikipedia.org/wiki/Hellinger_distance
So it's not in theory any worse or better than picking the K-L divergence, since all seem like a valid starting point; however it makes sense that this might be worth some further questioning.
EDIT: in addition, due to the nature of the matrix logarithm, the quantum K-L divergence has some serious drawbacks. It's basically the equivalent of the classic ones actually - if (the distribution at the denominator) is ever zero, the divergence goes to infinity. In quantum terms, that's if any one of the eigenvalues of is zero. So I think it's possible that they saw this as simply not well-behaved enough to be worth using.
That's all right, thanks for the feedback - I've added a section with the formula proper!
Oh, that sounds interesting! Definitely gonna check this out.
Tails come apart. The most successful actresses are not the most attractive women.
On average, the most successful actresses are certainly more attractive than their male counterparts at least. While both tend to be good looking, a man who is not attractive but has other talents (funny, athletic, good at physical acting, charismatic) tends to have a better chance at fame than a woman with the same talents and the same flaws.
She's an outlier, but an outlier in the sense that someone who's 6'2" is an outlier, not an outlier like Michael Jordan is an outlier.
I think this might be selling Marilyn Monroe short, because from what I've heard, she was also extremely charismatic and skilled at building her image. Which is kind of a different but parallel point to yours - lots of women can have pretty faces and pretty bodies, but what really makes a few like her stand above and beyond (besides the luck of being in the right place at the right time to become world famous) is also the talent to leverage that and make it shine. A model for example needs to know how to pose; a well-rounded variety act needs to be pretty, emotionally intelligent, good at conversation, at coordination, with the patience and persistence to suffer through all sorts of treatments and make-up... there's a lot! It's just not all physical.
No, I don't think there's anything like that. I do wonder about deriving the same result for the divergence. I have no idea how hard that would be; it might even be quite easy. Possibly even reduces to something more Bayes-like in case of commutating operators. I'll try.
I mean, master/servant is a relation. I think if you managed to enforce it rigorously, the biggest risk from it would be humans "playing themselves" - just as we've done until now, only with far greater power. For example basically falling into wireheading out of pursuit of enjoyment, etc.
Can you sketch a broad example of how such a thing would look like? How does it differ from example from the classic image of a Friendly AI (FAI)?