Sorted by New

Wiki Contributions


I've been reading through a lot of your posts and I'm trying to think of how to apply this knowledge to an RL agent, specifically for a contest (MineRL) where you're not allowed to use any hardcoded knowledge besides the architecture, training algorithm, and broadly applicable heuristics like curiosity. Unfortunately, I keep running into the hardcoded via evolution parts of your model. It doesn't seem like the steering subsystem can be replaced with just the raw reward signal, and it also doesn't seem like it can be easily learned via regular RL. Do you have any ideas on how to replace those kind of evolved systems in that kind of environment?