What Training Robot Policies Taught Me About Emergent Capabilities and Control
I spent six weeks training a humanoid robot to do household tasks. Along the way, my research lead and I started noticing things about the particular failure modes of the robot that seemed to indicate some strange architectural vulnerabilities of VLAs as a whole.
Our work was done as part of the Stanford BEHAVIOR-1K Challenge which involved training Vision-Language-Action (VLA) models that take in camera images and output robot motor commands to complete everyday tasks. Think tidying your bedrooms, putting your dishes away, moving your halloween decorations to storage. Our final score was a modest 1.78%, but most gains happened almost overnight after... (read 1682 more words →)