π0: A Vision-Language-Action Flow Model for General Robot Control | Reading Group
Bayes Centre (Bayes Theorem (G.03)) 47 Potterrow, Edinburgh, United KingdomIn this presentation, Yuhui will present a paper on building generalist robot policies using a vision-language foundation model. The authors propose a flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge. Trained on a large, varied dataset, the model can follow language instructions, perform tasks in a zero-shot setting, and learn new skills via fine-tuning.