LESSWRONG
LW

Wikitags

Verification

This page is a stub.
Subscribe
Subscribe
Discussion0
Discussion0
Posts tagged Verification
39Validating against a misalignment detector is very different to training against one
Ω
mattmacdermott
6mo
Ω
4
156Formal verification, heuristic explanations and surprise accounting
Ω
Jacob_Hilton
1y
Ω
11
104Compact Proofs of Model Performance via Mechanistic Interpretability
Ω
LawrenceC, rajashree, Adrià Garriga-alonso, Jason Gross
1y
Ω
4
20The V&V method - A step towards safer AGI
Yoav Hollander
2mo
1
15Making it harder for an AGI to "trick" us, with STVs
Ω
Tor Økland Barstad
3y
Ω
5
10Alignment with argument-networks and assessment-predictions
Ω
Tor Økland Barstad
3y
Ω
5
Add Posts