There are several strategies to cope with the simultaneous presence of different measurement scales. A reasonable option would be to compute the dissimilarity matrix for each type of variable: bynary, categorical, ordinal and metric. Then a compromise dissimilarity matrix can be achieved by using a convex combination of all the partial matrices (``partial" because each of them is linked to a specific group of indicators and not to the globality of the issues reported in the units). This paper addresses the problem of specifying differential weights for each type of variable in order to reflect their significance, reliability and statistical adequacy. To this end, the Distatis procedure yields a compromise distance matrix between units which can be analyzed by using the usual technique of cluster analysis. Experiments performed with k-nearest neighbor imputation demonstrate the ability of the proposed method to compensate for missing values when several type of variables occur in the same data set.
Distance function for mixed type data
TARSITANO, Agostino;
2007-01-01
Abstract
There are several strategies to cope with the simultaneous presence of different measurement scales. A reasonable option would be to compute the dissimilarity matrix for each type of variable: bynary, categorical, ordinal and metric. Then a compromise dissimilarity matrix can be achieved by using a convex combination of all the partial matrices (``partial" because each of them is linked to a specific group of indicators and not to the globality of the issues reported in the units). This paper addresses the problem of specifying differential weights for each type of variable in order to reflect their significance, reliability and statistical adequacy. To this end, the Distatis procedure yields a compromise distance matrix between units which can be analyzed by using the usual technique of cluster analysis. Experiments performed with k-nearest neighbor imputation demonstrate the ability of the proposed method to compensate for missing values when several type of variables occur in the same data set.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.