I'm thinking of designing a reinforcement learning environment based on Conway's Game of Life (GoL). In it, at every timestep, an agent can change the state of some cells.
As is the case with most interesting RL problems, agent behaviour would be determined by the reward function.
In this scenario, I see some issues with simple reward functions:
1) Total life:
Something like this glider gun would represent a technically correct unbounded score, with a stream of lonely travellers in an interstaller abyss. Say we want to stick to a finite memory.
2) Highest density:
This is generation 50 of the Max spacefiller. The agent might find some way of maintaining the striped pattern from the middle. Uninspiring, to say the least; like
***spoiler alert*** being batteries in The Matrix.
Other reward functions might consider variation over time.
3) Some function with a penalty for static life:
Consider Karel's p177:
This oscillator has a period of 177 time steps. There's another one with p312, but I couldn't find a good-enough gif of it. Such patterns would likely game this reward specification too.
I've thought of a few other simple functions, which all look flawed in some obvious ways.
That's not to say there is some ungameable reward function. But I wonder, if each cell symbolically represented some small unit of sentience (say a single person, or a family, or a planet) what would flourishing look like?