CSPO: Improving Table-to-LaTeX Generation with Component Rewards

CSPO: Alleviating Reward Ambiguity for Structured Table-to-LaTeX Generation

Summary: arXiv:2604.10918v1 Announce Type: new

Abstract: Tables contain rich structured information, yet when stored as images their contents remain “locked” within pixels. Converting table images into LaTeX code enables faithful digitization and reuse, but current multimodal large language models (MLLMs) often fail to preserve structural, style, or content fidelity. Conventional post-training with reinforcement learning (RL) typically relies on a single aggregated reward, leading to reward ambiguity that conflates multiple behavioral aspects and hinders effective optimization.

We propose Component-Specific Policy Optimization (CSPO), an RL framework that disentangles optimization across LaTeX tables components—structure, style, and content. In particular, CSPO assigns component-specific rewards and backpropagates each signal only through the tokens relevant to its component, alleviating reward ambiguity and enabling targeted component-wise optimization. To comprehensively assess performance, we introduce a set of hierarchical evaluation metrics. Extensive experiments demonstrate the effectiveness of CSPO, underscoring the importance of component-specific optimization for reliable structured generation.

Background

The digitization of tables plays a crucial role in data accessibility and usability within various fields, including academia, data science, and engineering. However, images of tables do not allow easy manipulation or analysis of their contents. LaTeX, a typesetting system commonly used for scientific documents, provides a robust framework for representing structured information such as tables. Thus, converting table images into LaTeX code is essential for unlocking this information.

The Challenge of Reward Ambiguity

Current methodologies in training MLLMs for table-to-LaTeX generation are often hampered by reward ambiguity. This phenomenon arises when multiple performance metrics are aggregated into a single reward signal, obscuring the distinct contributions of each aspect of table generation—structural integrity, stylistic consistency, and content accuracy. As a result, the optimization process becomes less efficient, leading to suboptimal outcomes.

Introducing Component-Specific Policy Optimization (CSPO)

CSPO addresses these challenges by implementing a novel approach that focuses on individual components of table generation. The framework operates as follows:

Component-Specific Rewards: CSPO defines distinct rewards for structure, style, and content, allowing for a more nuanced evaluation of model performance.
Targeted Backpropagation: By backpropagating rewards only through relevant tokens, CSPO ensures focused optimization efforts, leading to improved fidelity in each aspect of the generated LaTeX tables.
Hierarchical Evaluation Metrics: To accurately gauge performance, a set of hierarchical metrics is employed, providing a comprehensive overview of the model’s efficacy across different components.

Experimental Results

Extensive experiments conducted using CSPO have demonstrated its superiority in generating structured LaTeX tables. Results indicate that models trained with CSPO significantly outperform those relying on traditional RL methods in terms of structural integrity, stylistic adherence, and content accuracy.

Conclusion

The introduction of CSPO marks a significant advancement in the field of table-to-LaTeX generation. By alleviating reward ambiguity and focusing on component-specific optimization, this framework enhances the reliability and effectiveness of structured generation, paving the way for improved data reuse and accessibility.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

CSPO: Improving Table-to-LaTeX Generation with Component Rewards

CSPO: Alleviating Reward Ambiguity for Structured Table-to-LaTeX Generation

Background

The Challenge of Reward Ambiguity

Introducing Component-Specific Policy Optimization (CSPO)

Experimental Results

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related