PeteJ — LessWrong

AI Alignment and the Classical Humanist Tradition

Note: I see this approach is also proposed by "AI Alignment Proposals"

This article explores the concept and potential application of bottom-up virtue ethics as an approach to instilling ethical behavior in artificial intelligence (AI) systems. We argue that by training machine learning models to emulate virtues such as honesty, justice, and compassion, we can cultivate positive traits and behaviors based on ideal human moral character. This bottom-up approach contrasts with traditional top-down programming of ethical rules, focusing instead on experiential learning. Although this approach presents its own challenges, it offers a promising avenue for the development of more ethically aligned AI systems.

AI Alignment and the Classical Humanist Tradition

PeteJ2y-10

Thanks for the reply. To comment on your reply, let me be more precise in what I believe are the problems this approach could address.

a) Assuming that the AI is already loyal, or at least following our wishes: how can we make sure that it takes human values into account in its decisions such that it doesn't mistakenly do something against these values?

I believe running every decision through a "Classical Virtue Ethics"-control, to be a way to do this.
As an analogy, we can train AI on grammar and rhetoric, by giving it a huge amount of literature to read, and it will be able to master the rules and applications of grammar and rhetoric. In the same way, it's plausible that it would be possible to get it to master Classical humanist virtue ethics as well, seeing as that is just another art, and not different in principle.
So I propose that we could train it in virtue ethics in the same way it's trained in grammar and rhetoric. If you call that "just train it", and that is a dismissal of the approach, I'd appreciate some more in-depth engagement with the analogy to grammar and rhetoric.

b) Assuming that the AI isn't "inwardly aligned", but instead "ethically neutral" - at least before it's trained, how can we increase the chance that it develops inward alignment with human values?

a )AI could seem to take on some of the values of the text it is trained on. If we train it on Classical virtue ethics, and seek to increase the value it gives to this input, we might increase the chances of it becoming "inwardly aligned". Edit: it seems you are engaging with this type of idea in your "fallacy of dumb superintelligence".
b) We could extrapolate from how humans become "aligned" with virtues. In classical virtue ethics one's self-concept is central to this. And we get this alignment by emulating the role model, by seeking to adopt the role-model's perspective. In the same manner, we could try to find a way to get the AI's self-concept model to be tied to the classical virtues.

Catching the Eye of Sauron

PeteJ3y10

"You are, whether you like it or not, engaged in memetic warfare - and recent events/information make me think this battle isn't being given proper thought"

I'd like to chime in on the psychological aspects of such a warfare. I suggest that a heroic mindset will be helpful in mustering courage, hope and tenacity in fighting this cultural battle. In the following I will sketch out both a closer view of the heroic mindset, as well as a methodology for achieving it.

A) On the heroic mindset
Heroism is arguably "altruistic risk, resilience, and fundamental respect for humanity". One could see it as part of manliness, but one could also have gender-inclusive versions of heroism, for example the rationalist hero, as per Eliezer Yudkowsky's reference to The World of Null-A-science fiction-series. The key aspect for me is that the hero is willing to take initiative, to take a stand, is courageous, fights the good cause, doesn't become cynical, isn't a coward - and many other good qualities.

B) On achieving the heroic mindset
The Oxford Character project focuses on developing virtue, and has summarized some key findings here. One of the strategies for character development mentioned is "engagement with virtuous exemplars". In other words: imitating or taking on other people as role models - perhaps even identifying with the role model. Linda Zagzebski - who has written on virtue epistemology - has also written a book on this type of virtue development called "Exemplarist Moral Theory". Recommended.

I believe that if we find some good role-models, for example the Null-A heroes, or others like Nelson Mandela, Winston Churchill, or Eleanor Roosevelt, we can identify with them, and thereby access our own resources. One practical way of going about this is through Todd Herman's The Alter Ego Effect: The Power of Secret Identities to Transform Your Life. [2019]. We basically take on a heroic identity, and go through our day from that frame of reference. This can be majorly empowering.

In summary, the key to taking on a heroic mindset lies in shifting one's identity to a heroic identity. And we can do that in practice through Zagzebski's or perhaps even more hands-on: through Todd Herman's work.

By doing this, we might gain more fighting spirit and willingness to try to be actors and movers, not just "understanders" of this world and this situation.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments