arXiv:2606.10276v1 Announce Type: new Abstract: For natural human-robot interaction, a robot must understand human intent expressed not only through language but also through nonverbal signals such as gestures and gaze. However, current robot policies rely on language instructions as the sole inter