Biologist (PhD in epigenetics & co-evolution), working in medical science communication.
Interested in AI alignment from an evolutionary and developmental perspective.
Exploring how ideas from biology, cognitive development, and ethics can inform robust alignment strategies.
Here to learn, share ideas, and connect with others working on trustworthy AI.
Are there any models out there that tend to be better at this sort of task, i.e. constructive criticism? If so, what makes them perform better in this domain? Specific post-training? Also why, wouldn't "the right prompt" be able to compensate for bias in either direction (blatant sycophancy vs. brutal roast)?