Double machine learning for pitcher management in baseball
Carregando...
Arquivos
Data
2025-07-23
Autores
Orientador(res)
Fernandes, Marcelo
Métricas
Título da Revista
ISSN da Revista
Título de Volume
Resumo
This study applies a Double Machine Learning (DML) framework to evaluate pitcher substitution strategies in Major League Baseball (MLB). With data from the 2020–2024 seasons, the model treats pitcher removal as a binary intervention and estimates its impact on opponent offensive production, measured by weighted On-Base Average (wOBA). It contains 169 features, grouped in: historical performance, fatigue indicators, game context, and pitcher style. To account for observational bias, the methodology incorporates inverse probability weighting through propensity scores. The R-Learner was applied enables flexible estimation of the Conditional Average Treatment Effect (CATE), allowing granular analysis of substitution value across game scenarios. The results support several established pitching management theories, including the times through the order effect and leverage-based substitutions. Moreover, model-generated scores prove to be more efficient decision levers than traditional indicators, offering managers actionable insights beyond conventional heuristics. The study also evaluates a landmark case—the substitution of Blake Snell in the 2020 World Series—demonstrating the framework’s ability to assess high-stakes managerial decisions. Overall, the approach provides data-driven insights into optimizing pitcher management and expands the analytical frontier of baseball strategy.
