MDPs and the Bellman Equation, Intuitively Explained
Imagine you're a selfish taxi driver who takes passengers around between 3 towns. Each trip has a different probability of catastrophe and different earnings potential. You want to find a repeatable path of trips to maximise profit. No, you can’t just greedily pick the best trip from each city -...
Ummmmm yeah what have I done so far. I didn't really get any solid work done this week either. I have decided to extend the project by another two weeks with the other two people involved - we have all been pretty preoccupied with life. Last week on sunday night i didn't really do a solid hour of work. I did manage to summarise the concept of selection theorems and think about the agent type signature - a concept i will be referring to throughout the post, super fundamental. Tonight I will hopefully actually meet with my group. I wanna do like half an hour of work before, and a little bit after too. I want to summarise the good regulator theorem this week as well as turner's post on power seeking.