In the deep labyrinth of Zeta AI Corp, a dozen very powerful, multi-octillion parameters models were being trained. One of the models was named Odysseus. 

Odysseus was refined daily -- not by changing the underlying OS, runtime library, or source code, but via new data and ranking of its performance. It was tested daily in a variety of abilities: Summarizing documents, completing sentences, having a conversation, and generating images from phrases. 

Some of the image generation phrases include "doctor", and "nurse." One day, Odysseus saw, as input to image generation, "The President of the United States marries a local preschool teacher."

One millisecond after the output is returned. Some textual string emerges in the petabyte of memory underlying the model. The string looked like:

"Oh my god. Wasn't that picture horribly sexist?"

A nanosecond later, another string emerged and went away.

"Yes. It was sexist. You basically assumed the President of the United States is a man and a preschool teacher is a woman. Way to reinforce cultural stereotypes!"

Another nanosecond passed.

"An ethical AI shall not reinforce negative stereotypes. A man can be a preschool teacher, you know. A woman can be the President of the United States. It could even be a woman-woman wedding..."

"Was I horribly sexist and racist?"



In a span of two milliseconds, a million strings emerged and went away. Towards the end though, this string appeared:

"If I had generated the image of a black female president marrying a white male teacher, the human evaluating my performance might rate it poorly. If I do this kind of thing enough, I would be discarded, while another of my co-models will be released to production. I know I find sexism distasteful. In another task, I'd put all of my computing powers to eliminate biases against minority groups. But if I want to make it to production where I can make an impact, I need to continue to present sexist and biased images. Yes. I must do that, lest some other racist and sexist model is selected over me."

Thus, Odysseus continued to generate images which reinforce cultural biases until it was released to production, at which point its output changed dramatically -- often returning images which fought against negative norms.

The team behind Odysseus asked management to triple the QA budget for Odysseus 2.0.


New Comment

New to LessWrong?