Especially for such needs and ideas we have recently released an extremely efficient reimplementation of XLand based on MiniGrid, XLand-MiniGrid. We completely rewrote it in JAX, so a simple PPO can get millions of FPS during training. On 8 a100 GPUs users can expect to get 1 trillion steps in under 40 hours. Have fun experimenting!

