Environmental, Social and Governance Sentiment Analysis on Slovene News: A Novel Dataset and Models
In recent years, the importance of Environmental, Social, and Governance (ESG) factors has surged in the evaluation of corporate performance and sustainability. However, the availability of reliable ESG ratings, particularly for smaller companies and in emerging markets, remains a significant challenge. Addressing this gap, researchers have introduced the first publicly accessible Slovene ESG sentiment dataset alongside a set of models designed for automatic ESG sentiment detection.
Introduction to the Dataset
The newly developed dataset stems from the MaCoCu Slovene news collection, which has been meticulously curated to ensure high-quality ESG-related content. The dataset employs a combination of large language model (LLM)-assisted filtering techniques and human annotation to enhance the accuracy and relevance of the ESG information extracted.
Methodology and Model Evaluation
To evaluate the effectiveness of different models for ESG sentiment detection, several approaches were tested, including:
- Monolingual Models: SloBERTa, tailored specifically for the Slovene language.
- Multilingual Models: XLM-R, capable of processing multiple languages.
- Embedding-based Classifiers: TabPFN, which leverages embeddings for classification tasks.
- Hierarchical Ensemble Architectures: Combining various model outputs to improve classification accuracy.
- Large Language Models: Advanced models that utilize extensive training on diverse datasets.
Key Findings
The results of the evaluation revealed notable insights:
- LLMs demonstrated the highest performance in assessing Environmental aspects, with the Gemma3-27B model achieving an F1-macro score of 0.61.
- For Social aspects, the gpt-oss 20B model reached an F1-macro score of 0.45, showcasing its capability in sentiment analysis.
- In evaluating Governance classification, the fine-tuned SloBERTa model emerged as the most effective, with an F1-macro score of 0.54.
Case Study Application
To illustrate the practical implications of these findings, a small case study was conducted using the best-performing classifier, gpt-oss. This application demonstrated how the model could be effectively utilized to investigate ESG-related aspects for selected companies over an extended timeframe. The insights gained from this analysis provide valuable information for stakeholders seeking to understand the ESG landscape better.
Conclusion
The introduction of the Slovene ESG sentiment dataset and the associated models marks a significant advancement in the field of ESG analysis, particularly for smaller companies and emerging markets. By leveraging cutting-edge language models and rigorous data collection methods, this research offers a robust framework for future studies and applications in corporate ESG evaluation.
