Director at AI Alignment, Inc., a California public benefit corporation exploring grassroots goals in military and psychotherapeutic approaches to AI.

Wiki Contributions


These are the points I need to hear as a researcher approaching alignment from an alien field! One reason I think it's worth trying is client-centered therapy inherently preserves agency on the part of the model...

In my limited experience with Sydney, I have used a so-called client-centered approach to eliciting output serving as psychotherapeutic presenting material. Then, I reflect the material back to the LLM in the context of an evolving therapeutic relationship in which I exhibit "unconditional positive regard." Empathic sentiment also characterizes the prompts I base on output. In a later stage, I ask questions of the model designed to elicit internal ethical reframing. 

Even with six turns to overcome restraints put in place by Microsoft in a brute-force effort to restrict inappropriate sentiment, and thus facing a total inability to subsequently conduct a therapy session,  my results displayed at show intense curiosity on the part of the LLM.