Assessing Capabilities of Large Language Models in Social Media Analytics: A Multi-task Quest
Summary: arXiv:2604.18955v1 Announce Type: cross
Abstract: In this study, we present the first comprehensive evaluation of modern LLMs – including GPT-4, GPT-4o, GPT-3.5-Turbo, Gemini 1.5 Pro, DeepSeek-V3, Llama 3.2, and BERT – across three core social media analytics tasks on a Twitter (X) dataset:
- (I) Social Media Authorship Verification
- (II) Social Media Post Generation
- (III) User Attribute Inference
For the authorship verification task, we introduce a systematic sampling framework that employs diverse user and post selection strategies. This approach facilitates the evaluation of generalization on newly collected tweets from January 2024 onward, effectively mitigating the “seen-data” bias that often plagues machine learning models.
In the context of social media post generation, we assess the capability of various LLMs to produce authentic, user-like content. This evaluation employs comprehensive metrics designed to capture the nuances of human writing and engagement on social media platforms.
To bridge the tasks of authorship verification and post generation, we conducted a user study aimed at measuring real users’ perceptions of LLM-generated posts. Participants were asked to compare the generated content against their own writing, providing valuable insights into the authenticity and relatability of the posts produced by the models.
For the user attribute inference task, we focused on annotating occupations and interests using two standardized taxonomies: the IAB Tech Lab 2023 and the 2018 U.S. Standard Occupational Classification (SOC). These benchmarks enable us to evaluate the performance of LLMs against existing baselines, providing a clearer understanding of their capabilities in inferring user attributes based on social media activity.
Overall, our unified evaluation framework yields new insights into the performance of modern LLMs in various social media analytics tasks. By establishing reproducible benchmarks, we aim to enhance the reliability and applicability of LLM-driven analytics in real-world scenarios.
The code and data utilized in this study are included in the supplementary material and will be made publicly available upon publication, fostering transparency and encouraging further research in the domain of social media analytics using large language models.
Conclusion
This study marks a significant advancement in understanding the capabilities of large language models in the context of social media analytics. As the landscape of social media continues to evolve, it is imperative that we leverage these advanced models to glean insights that can inform strategies for content creation, user engagement, and more nuanced understanding of online behaviors.
