CRAFT uses video diffusion to generate diverse, photorealistic bimanual robot training data, enhancing learning with scalable and action-consistent demos.
Discover DIAL, a novel framework that enhances Vision-Language-Action models by separating intent and action via latent world modeling for robust robot con...