Pharos-ESG: A Framework for Multimodal Parsing, Contextual Narration, and Hierarchical Labeling of ESG Reports
Summary: arXiv:2511.16417v2 Announce Type: replace
Abstract: Environmental, Social, and Governance (ESG) principles are reshaping the foundations of global financial governance, transforming capital allocation architectures, regulatory frameworks, and systemic risk coordination mechanisms. However, as the core medium for assessing corporate ESG performance, ESG reports present significant challenges for large-scale understanding, due to chaotic reading order from slide-like irregular layouts and implicit hierarchies arising from lengthy, weakly structured content.
To address these challenges, we propose Pharos-ESG, a unified framework that transforms ESG reports into structured representations through multimodal parsing, contextual narration, and hierarchical labeling. This innovative framework integrates several key components:
- Reading-Order Modeling Module: This module is based on layout flow, which helps in understanding the intended sequence of content presentation.
- Hierarchy-Aware Segmentation: Guided by table-of-contents anchors, this feature enables the identification of content segments while preserving structural coherence.
- Multi-Modal Aggregation Pipeline: This pipeline contextually transforms visual elements into coherent natural language, enhancing the readability and interpretability of the content.
The Pharos-ESG framework further enriches its outputs by integrating ESG, Global Reporting Initiative (GRI), and sentiment labels. This yields annotations that are closely aligned with the analytical demands of financial research, ensuring that stakeholders can derive meaningful insights from the data.
Extensive experiments conducted on annotated benchmarks demonstrate that Pharos-ESG consistently outperforms both dedicated document parsing systems and general-purpose multimodal models. The results indicate a significant advancement in the capabilities of automated systems to parse and structure complex ESG reports, which are traditionally challenging to interpret.
In addition to the framework itself, we are excited to announce the release of Aurora-ESG, the first large-scale public dataset of ESG reports. This dataset spans multiple regions including Mainland China, Hong Kong, and U.S. markets, featuring unified structured representations of multimodal content. Furthermore, it is enriched with fine-grained layout and semantic annotations, thereby providing robust support for ESG integration in financial governance and decision-making processes.
In conclusion, Pharos-ESG represents a significant leap forward in the intersection of artificial intelligence and financial governance. By addressing the complexities of ESG report interpretation, it not only enhances the accessibility of critical information but also supports informed decision-making in an era where ESG considerations are increasingly vital to corporate accountability and investment strategies.
