LESSWRONG
LW

2
gkamradt
0010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
AI #69: Nice
gkamradt1y10

Test set: 51% vs prior SoTA of 34% (human baseline is unknown)

Ryan tested against the public test set and got 51%. The SOTA score reported here was on the private test set.

Reporting scores on public data are usually inflated due to overfitting (by humans looking at the questions and answers then tailoring their model)

Reply