News·Unclaimed·

Drop-Then-Recovery: How Redundant Are Vision-Language-Action Models?

arXiv:2606.27755v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models enable instruction-driven robotic manipulation, but they inherit oversized language backbones from pretrained VLMs whose capacity far exceeds what is needed for short robotic instructions. This raises a basic questi

via RSS