rpglover64

PSA: Consider alternatives to AUROC when reporting classifier metrics for alignment

TL;DR If you’re presenting a classifier that detects misalignment and providing metrics for it, please: 1. report the TPR at FPR=0.001, 0.01, and 0.05 2. plot the ROC curve on a log-log scale See https://arxiv.org/abs/2112.03570 for more context on why you might want to do this. ML Background (If all...

Jun 24, 202418

rpglover64

rpglover64

PSA: Consider alternatives to AUROC when reporting classifier metrics for alignment

What's actually going on in the "mind" of the model when we fine-tune GPT-3 to InstructGPT?

Are LLMs sufficient for AI takeoff?

Image generation and alignment

rpglover64

PSA: Consider alternatives to AUROC when reporting classifier metrics for alignment

What's actually going on in the "mind" of the model when we fine-tune GPT-3 to InstructGPT?

Are LLMs sufficient for AI takeoff?

Image generation and alignment

PSA: Consider alternatives to AUROC when reporting classifier metrics for alignment

Are LLMs sufficient for AI takeoff?

What's actually going on in the "mind" of the model when we fine-tune GPT-3 to InstructGPT?

Image generation and alignment

rpglover64's Shortform