Métodos para detecção de outliers em séries de preços do índice de preços ao consumidor
Título da Revista
ISSN da Revista
Título de Volume
Outliers are observations that appear to be inconsistent with the others. Also called atypical, extreme or aberrant values, these inconsistencies can be caused, for instance, by political changes or economic crises, unexpected cold or heat waves, and measurement or typing errors. Although outliers are not necessarily incorrect values, they can distort the results of an analysis and lead researchers to erroneous conclusions if they are related to measurement or typing errors. The objective of this research is to study and compare different methods for detecting abnormalities in the price series from the Consumer Price Index (Índice de Preços ao Consumidor - IPC), calculated by the Brazilian Institute of Economy (Instituto Brasileiro de Economia - IBRE) from Getulio Vargas Foundation (Fundação Getulio Vargas - FGV). The IPC measures the price variation of a fixed set of goods and services, which are part of customary expenses for families with income levels between 1 and 33 monthly minimum wages and is mainly used as an indice of reference to evaluate the purchasing power of consumer. In addition to the method currently used by price analysts in IBRE, the study also considered variations of the IBRE Method, the Boxplot Method, the SIQR Boxplot Method, the Adjusted Boxplot Method, the Resistant Fences Method, the Quartile Method, the Modified Quartile Method, the Median Absolute Deviation Method and the Tukey Algorithm. These methods wre applied to data of the munucipalities Rio de Janeiro and São Paulo. In order to analyze the performance of each method, it is necessary to know the real extreme values in advance. Therefore, in this study, it was assumed that prices which were discarded or changed by analysts in the critical process were the real outliers. The method from IBRE is correlated with altered or discarded prices by analysts. Thus, the assumption that the changed or discarded prices by the analysts are the real outliers can influence the results, causing the method from IBRE be favored compared to other methods. However, thus, it is possible to compute two measurements by which the methods are evaluated. The first is the method’s accuracy score, which displays the proportion of detected real outliers. The second is the number of false-positive produced by the method, that tells how many values needed to be flagged to detect a real outlier. As higher the hit rate generated by the method and as the lower the amount of false positives produced therefrom, the better the performance of the method. Therefore, it was possible to construct a ranking relative to the performance of the methods, identifying the best among those analyzed. In the municipality of Rio de Janeiro, some of the variations of the method from IBRE showed equal or superior to the original method performances. As for the city of São Paulo, the method from IBRE showed the best performance. It is argued that a method correctly detects an outlier when it signals a real outlier as an extreme value. The method with the highest accuracy score and with smaller number of false-positive was from IBRE. For future investigations, we hope to test the methods in data obtained from simulation and from widely used data bases, so that the assumption related to the discarded or changed prices, during the critical process, does not alter the results.