Joseph Eisner has not written any posts yet.
Did you use a learning rate schedule? Cosine anneal? If not: probably should. If so: the loss/perplexity/bpc would always appear to plateau as you finish the schedule, it doesn't imply more training wouldn't be beneficial...
Did you use a learning rate schedule? Cosine anneal? If not: probably should. If so: the loss/perplexity/bpc would always appear to plateau as you finish the schedule, it doesn't imply more training wouldn't be beneficial...