Zero-Shot Video Coding with Stochastic Rectified Flow

Generation Is Compression: Zero-Shot Video Coding via Stochastic Rectified Flow

Summary: arXiv:2603.26571v1 Announce Type: cross

Abstract

Existing generative video compression methods primarily rely on generative models as post-hoc reconstruction modules, which operate on top of conventional codecs. In response to the limitations of these traditional approaches, we propose a novel framework termed Generative Video Codec (GVC). This zero-shot framework innovatively transforms a pretrained video generative model into a codec itself, allowing the transmitted bitstream to specify the generative decoding trajectory directly, without the need for retraining.

Technical Innovations

To achieve this groundbreaking capability, we convert the deterministic rectified-flow ordinary differential equation (ODE) commonly used in modern video foundation models into an equivalent stochastic differential equation (SDE) during inference. This conversion unlocks per-step stochastic injection points, facilitating codebook-driven compression. Our unified backbone enables the instantiation of three complementary conditioning strategies:

Image-to-Video (I2V): This strategy employs adaptive tail-frame atom allocation to optimize the video generation process from static images.
Text-to-Video (T2V): Operating with near-zero side information, this strategy relies on a pure generative prior to create video content based on textual descriptions.
First-Last-Frame-to-Video (FLF2V): This method utilizes boundary-sharing Group of Pictures (GOP) chaining to enable dual-anchor temporal control, effectively managing the flow of video frames.

Trade-Offs in Video Compression

Together, these strategies provide a principled trade-off space between three critical dimensions: spatial fidelity, temporal coherence, and compression efficiency. Each approach offers unique advantages that can be leveraged depending on the specific requirements of the video content and application.

Experimental Results

Comprehensive experiments conducted on standard benchmarks demonstrate the effectiveness of GVC in achieving high-quality video reconstruction. Notably, GVC operates below a bitrate of 0.002 bits per pixel (bpp), showcasing its efficiency. Furthermore, the system supports flexible bitrate control through a single hyperparameter, enhancing its adaptability for various use cases.

Conclusion

The introduction of the Generative Video Codec marks a significant advancement in the field of video compression. By eliminating the need for retraining and directly leveraging pretrained models, GVC stands out as a promising solution for efficient video coding. Future research may explore further refinements and applications of this framework, potentially revolutionizing how video content is compressed and transmitted in the digital age.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Zero-Shot Video Coding with Stochastic Rectified Flow

Generation Is Compression: Zero-Shot Video Coding via Stochastic Rectified Flow

Abstract

Technical Innovations

Trade-Offs in Video Compression

Experimental Results

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related