Development, validation, and application of cyber-bert: using deep learning for large-scale identification and classification of cybersecurity disclosures in SEC filings
Carregando...
Arquivos
Data
2024-06-04
Autores
Orientador(res)
Caldas, Miguel Pinto
Métricas
Título da Revista
ISSN da Revista
Título de Volume
Resumo
As cybersecurity events have emerged among the top global risks, the necessity of firms providing transparent and timely information about them has become mandatory. This has led to the emergence of a rich cybersecurity disclosure research stream. However, some gaps persist in extant literature, including: (a) the use of small samples or short time spans; (b) binary classification (cybersecurity-related versus non-cybersecurity-related), rather than multiple disclosure categories; (c) the use of a dictionary approach instead of machine learning (ML) or superior Large Language Models (LLMs); (d) a scarcity of studies that include timely 8-K filings in addition to annual 10-K filings; and (e) the lack of cybersecurity disclosure studies thoroughly examining boilerplate patterns. This paper describes the development and validation of a deep learning model called CYBER-BERT, and to address these gaps, illustrates its application locating and categorizing 2.5 million cybersecurityrelated phrases contained in all 10-K and 8-K SEC filings over 18 years (2006–2023). As contributions of the study, beside the toolset (CYBER-BERT and a Cybersecurity BI), results from 4 illustrations of its use showed that (a) firms did not file cybersecurity disclosures as timely as intended by the SEC: 95.5% of disclosures were filed via 10-Ks rather than more timely 8-Ks; (b) cybersecurity disclosures have increased significantly: total disclosures grew 343% and breach disclosures grew 510%; (c) content-wise, two cybersecurity categories exhibited high (vulnerability) and medium (action) boilerplate use, and all categories had low readability in two independent measures; and (d) following a breach disclosure, vulnerability and action disclosure activities increased 97.4% and 77.4%, respectively, compared with the previous year. Moreover, implications for research and practice are discussed.
