Zero-Shot Video Coding with Stochastic Rectified Flow

Date:

Generation Is Compression: Zero-Shot Video Coding via Stochastic Rectified Flow

Summary: arXiv:2603.26571v1 Announce Type: cross

Abstract

Existing generative video compression methods primarily rely on generative models as post-hoc reconstruction modules, which operate on top of conventional codecs. In response to the limitations of these traditional approaches, we propose a novel framework termed Generative Video Codec (GVC). This zero-shot framework innovatively transforms a pretrained video generative model into a codec itself, allowing the transmitted bitstream to specify the generative decoding trajectory directly, without the need for retraining.

Technical Innovations

To achieve this groundbreaking capability, we convert the deterministic rectified-flow ordinary differential equation (ODE) commonly used in modern video foundation models into an equivalent stochastic differential equation (SDE) during inference. This conversion unlocks per-step stochastic injection points, facilitating codebook-driven compression. Our unified backbone enables the instantiation of three complementary conditioning strategies:

  • Image-to-Video (I2V): This strategy employs adaptive tail-frame atom allocation to optimize the video generation process from static images.
  • Text-to-Video (T2V): Operating with near-zero side information, this strategy relies on a pure generative prior to create video content based on textual descriptions.
  • First-Last-Frame-to-Video (FLF2V): This method utilizes boundary-sharing Group of Pictures (GOP) chaining to enable dual-anchor temporal control, effectively managing the flow of video frames.

Trade-Offs in Video Compression

Together, these strategies provide a principled trade-off space between three critical dimensions: spatial fidelity, temporal coherence, and compression efficiency. Each approach offers unique advantages that can be leveraged depending on the specific requirements of the video content and application.

Experimental Results

Comprehensive experiments conducted on standard benchmarks demonstrate the effectiveness of GVC in achieving high-quality video reconstruction. Notably, GVC operates below a bitrate of 0.002 bits per pixel (bpp), showcasing its efficiency. Furthermore, the system supports flexible bitrate control through a single hyperparameter, enhancing its adaptability for various use cases.

Conclusion

The introduction of the Generative Video Codec marks a significant advancement in the field of video compression. By eliminating the need for retraining and directly leveraging pretrained models, GVC stands out as a promising solution for efficient video coding. Future research may explore further refinements and applications of this framework, potentially revolutionizing how video content is compressed and transmitted in the digital age.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.