LESSWRONG
LW

Wilson Wu
113210
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Ambiguous out-of-distribution generalization on an algorithmic task
Wilson Wu5mo20

For that earlier section, we used smaller models trained on S4 intersect A4×2 (4,000 parameters) instead of S5 intersect A5×2 (80,000 parameters) -- the only reason for this was to allow for a larger sample size of 10,000 models with our compute budget. All subsequent sections use the S5 models.

Reply1
No wikitag contributions to display.
83Ambiguous out-of-distribution generalization on an algorithmic task
5mo
6
33The slingshot helps with learning
8mo
0