Omni Parsing Framework: Unified Multimodal Parsing Report

Logics-Parsing-Omni Technical Report

Summary: arXiv:2603.09677v3 Announce Type: replace

Abstract

Addressing the challenges of fragmented task definitions and the heterogeneity of unstructured data in multimodal parsing, this paper proposes the Omni Parsing framework. This framework establishes a Unified Taxonomy covering documents, images, and audio-visual streams, introducing a progressive parsing paradigm that bridges perception and cognition.

Framework Overview

The Omni Parsing framework integrates three hierarchical levels:

Holistic Detection: This level achieves precise spatial-temporal grounding of objects or events to establish a geometric baseline for perception.
Fine-grained Recognition: It performs symbolization (e.g., OCR/ASR) and attribute extraction on localized objects to complete structured entity parsing.
Multi-level Interpreting: This level constructs a reasoning chain from local semantics to global logic.

Key Advantages

A pivotal advantage of this framework is its evidence anchoring mechanism, which enforces a strict alignment between high-level semantic descriptions and low-level facts. This enables “evidence-based” logical induction, transforming unstructured signals into standardized knowledge that is locatable, enumerable, and traceable.

Dataset and Model Release

Building on this foundation, a standardized dataset has been constructed, and the Logics-Parsing-Omni model has been released. This model successfully converts complex audio-visual signals into machine-readable structured knowledge.

Experimental Results

Experiments demonstrate that fine-grained perception and high-level cognition are synergistic, effectively enhancing model reliability. The integration of these capabilities allows for improved performance in multimodal parsing tasks.

Evaluation Benchmark

Furthermore, to quantitatively evaluate these capabilities, the authors introduce OmniParsingBench, a benchmark designed to assess the performance of the Omni Parsing framework. This benchmark provides a comprehensive evaluation of model performance across various multimodal tasks.

Access to Resources

Code, models, and the benchmark are released at the following link:
Logics-Parsing-Omni GitHub Repository.

Conclusion

The Omni Parsing framework represents a significant advancement in the field of multimodal parsing, providing a comprehensive solution to the challenges posed by unstructured data and fragmented task definitions. By integrating perception and cognition, the framework paves the way for future research and applications in this rapidly evolving domain.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Omni Parsing Framework: Unified Multimodal Parsing Report

Logics-Parsing-Omni Technical Report

Abstract

Framework Overview

Key Advantages

Dataset and Model Release

Experimental Results

Evaluation Benchmark

Access to Resources

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related