arXiv:2606.20679v1 Announce Type: new Abstract: Video-world-model policies learn action-relevant representations by predicting future observations. However, they condition on only a short observation window, which renders long-horizon manipulation non-Markovian when the correct action depends on ea