Right. Imagine an agent picking actions in a discrete-time game. Each time-advancing decision is a step. (E.g. for a debate, submitting one argument is a step.) But you don't just leave it running forever, (typically) you occasionally reset the environment to a (potentially random) starting state and let the agent try again - an episode.

[-]evhub4y30

This is correct, but at least in the quote above, the most important distinction is that most RL algorithms propagate credit assignment back across steps but not across episodes.

Reply

[-]Oliver Sourbut4y30

I agree, except I want to add a caveat.

Sometimes 'step' refers to such atomic environmental interactions. Then this is right.

Other times, 'step' (especially 'training step' or 'gradient step' but not always qualified) refers to a step in a training algorithm. For example a classic pattern in RL is collect many episodes or sub-episode trajectory fragments, and use them to compute a gradient update. That's also called a 'step'. Outside of RL, this is probably the only (or at least main) use of the word 'step'.

Reply

[-]Evan R. Murphy4y10

Thanks Charlie, Evan H. and Oliver. Your comments definitely help to give me a clearer picture.

Reply

Moderation Log

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

10

[ Question ]

What is a training "step" vs. "episode" in machine learning?

10

10

1 Answers sorted by
top scoring

Apr 28, 2022

10

[ Question ]

What is a training "step" vs. "episode" in machine learning?

10

10

1 Answers sorted by top scoring

Apr 28, 2022

1 Answers sorted by
top scoring