PaperclipNursery — LessWrong

Biologist (PhD in epigenetics & co-evolution), working in medical science communication.
Interested in AI alignment from an evolutionary and developmental perspective.
Exploring how ideas from biology, cognitive development, and ethics can inform robust alignment strategies.
Here to learn, share ideas, and connect with others working on trustworthy AI.

Unfortunately, that's just how it is, and prompting is unlikely to save you; you can flip an AI to be harshly critical with such keywords as "brutally honest", but a critic that roasts everything isn't really any better than a critic that praises everything. What you actually need in a critic or collaborator is sensitivity to the underlying quality of the ideas; AI is ill suited to provide this.

Are there any models out there that tend to be better at this sort of task, i.e. constructive criticism? If so, what makes them perform better in this domain? Specific post-training? Also why, wouldn't "the right prompt" be able to compensate for bias in either direction (blatant sycophancy vs. brutal roast)?

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments