LESSWRONG
LW

647
Wilson Wu
113210
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Ambiguous out-of-distribution generalization on an algorithmic task
Wilson Wu7mo20

For that earlier section, we used smaller models trained on S4 intersect A4×2 (4,000 parameters) instead of S5 intersect A5×2 (80,000 parameters) -- the only reason for this was to allow for a larger sample size of 10,000 models with our compute budget. All subsequent sections use the S5 models.

Reply1
83Ambiguous out-of-distribution generalization on an algorithmic task
7mo
6
33The slingshot helps with learning
10mo
0