Model risk in credit scoring models with big data applications

Carregando...
Imagem de Miniatura
Data
2023-06-28

Orientador(res)

Schiozer, Rafael Felipe

Métricas

Título da Revista

ISSN da Revista

Título de Volume

Resumo
Large databases and Machine Learning have increased our ability to produce credit scoring models with a different number of observations and explanatory variables. Although managers and regulators have concerns about the potential risks associated with algorithms’ discretion for variable selection, model building and the lack of causality, insufficient attention has been given to the inappropriate utilization of highhit rate credit scoring models, or to credit scoring model risk. This study fills this gap by proposing a novel model risk measure, 𝐶𝑆𝑀𝑅, Credit Scoring Model Risk, based on the correlation between the dependent variable and the generated predictions. This work empirically tests the 𝐶𝑆𝑀𝑅 in plugin LASSO credit scoring models and finds that adding loans from different banks to increase the number of observations is not optimal in in-sample basis, challenging the generally accepted assumption that more data leads to better predictions. However, the evaluation of model performance using in-sample data may exhibit instability across out-of-time estimations. Therefore, the decision-making (choosing a model among a variety of possibilities) based exclusively on in-sample’s measures may be problematic, because banks’ loan portfolios change over time, models can be born uncalibrated (or not well-fitted to the current portfolio) and can behave differently under new macroeconomic conditions, or along exogenous and stochastic events. This work also proposes a procedure to forecast the best-performing model in out-ottime datasets. Three (complementary) approaches help the model user to choose between the segmented or full data models, for out-of-time applications, by predicting which model tends to have higher correlation (or lower model risk). The first approach is based on the concept of “shrinkage”; the second uses a Monte Carlo simulation; and the third is a Bayesian estimation of covariances.

Descrição

Área do Conhecimento

Avaliação

Revisão

Suplementado Por

Referenciado Por