In this work we deal with the problem of detecting and ex-plaining exceptional behaving values in categorical datasets by perceiv-ing an attribute value as anomalous if its frequency occurrence is ex-ceptionally typical or un-typical within the distribution of frequencies occurrences of any other attribute value. The notion of frequency occur-rence is provided by specialising the Kernel Density Estimation method to the domain of frequency values and an outlierness measure is de fined by leveraging the cdf of such a density. This measure is able to simulta-neously identify two kinds of anomalies called lower outliers and upper outliers, namely exceptionally low or high frequent values. Moreover, data values labeled as outliers come with an interpretable explanations for their abnormality, which is a desirable feature of any knowledge discovery technique.
Detecting and Explaining Exceptional Values in Categorical Data
Angiulli F.;Fassetti F.;Palopoli L.;Serrao C.
2020-01-01
Abstract
In this work we deal with the problem of detecting and ex-plaining exceptional behaving values in categorical datasets by perceiv-ing an attribute value as anomalous if its frequency occurrence is ex-ceptionally typical or un-typical within the distribution of frequencies occurrences of any other attribute value. The notion of frequency occur-rence is provided by specialising the Kernel Density Estimation method to the domain of frequency values and an outlierness measure is de fined by leveraging the cdf of such a density. This measure is able to simulta-neously identify two kinds of anomalies called lower outliers and upper outliers, namely exceptionally low or high frequent values. Moreover, data values labeled as outliers come with an interpretable explanations for their abnormality, which is a desirable feature of any knowledge discovery technique.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.