LESSWRONG
LW

1038
Fernando Avalos
23110
Message
Dialogue
Subscribe

Wubbalubbaudbdub

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
19Approximating Human Preferences Using a Multi-Judge Learned System
Ω
2mo
Ω
0
6[Linkpost] Interpretable Analysis of Features Found in Open-source Sparse Autoencoder (partial replication)
1y
1
Alignment and Deep Learning
Fernando Avalos2y*50

I got redirected into here via AI Safety Ideas. Was this idea ever implemented? Has something similar been attempted? As someone who just got into the field and is looking to test its fit, I'm willing to invest time and effort to get an MVP working.

Reply