Welcome to the first reading group presentation!
In this presentation, Yuhui will present a paper on building generalist robot policies using a vision-language foundation model. The authors propose a flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge. Trained on a large, varied dataset, the model can follow language instructions, perform tasks in a zero-shot setting, and learn new skills via fine-tuning.
Paper Link: Please find the relevant paper here.
Coming soon!
Yuhui Wan is a Research Associate in Autonomy for Surgical Robots.

