NLP in Brazilian banking results: comparison between traditional topic modeling techniques and LLMS
Carregando...
Arquivos
Data
2025-02-27
Autores
Orientador(res)
Souza, Renato Rocha
Métricas
Título da Revista
ISSN da Revista
Título de Volume
Resumo
This study evaluates the application of Natural Language Processing (NLP) techniques for analyzing quarterly earnings call transcripts from Brazilian banks, with a focus on comparing traditional topic modeling methods—Latent Dirichlet Allocation (LDA) and BERTopic—with advanced Large Language Models (LLMs), including GPT-4-turbo, Llama3, and Qwen2. The research is structured into three benchmark tasks: (1) comparing traditional NLP methods with GPT-4-turbo for unstructured topic modeling, (2) benchmarking GPT-4-turbo, Llama3, and Qwen2 in unstructured topic modeling using the innovative "LLM-as-a-Judge" framework, and (3) evaluating LLMs for structured topic modeling and sentiment analysis using labeled datasets. The results reveal the limitations of traditional models in capturing nuanced, domain-specific content due to their reliance on bag-of-words and clustering techniques, particularly in small, homogeneous datasets. Conversely, LLMs demonstrated superior performance, leveraging pre-trained architectures to generate contextually rich and coherent outputs without requiring dataset-specific training. Among the LLMs, GPT-4-turbo consistently outperformed others across tasks, achieving higher scores in coherence, accuracy, and contextual relevance. Open-source models like Qwen2 showed promise as resource-efficient alternatives, though with reduced consistency compared to GPT-4-turbo. The study highlights the evolving methodologies for evaluating modern NLP models, emphasizing the inadequacy of traditional metrics like coherence and UMass scores for assessing LLM outputs. By incorporating a hybrid evaluation approach—combining structured benchmarks, qualitative assessments, and the "LLM-as-a-Judge" framework—this research provides a comprehensive method for model comparison. These findings underline the transformative potential of LLMs in domain-specific applications and suggest pathways for future advancements in NLP evaluation techniques and model scalability.
