There are several strategies to cope with the simultaneous presence of different measurement scales. A reasonable option would be to compute the dissimilarity matrix for each type of variable: bynary, categorical, ordinal and metric. Then a compromise dissimilarity matrix can be achieved by using a convex combination of all the partial matrices (``partial" because each of them is linked to a specific group of indicators and not to the globality of the issues reported in the units). This paper addresses the problem of specifying differential weights for each type of variable in order to reflect their significance, reliability and statistical adequacy. To this end, the Distatis procedure yields a compromise distance matrix between units which can be analyzed by using the usual technique of cluster analysis. Experiments performed with k-nearest neighbor imputation demonstrate the ability of the proposed method to compensate for missing values when several type of variables occur in the same data set.

Distance function for mixed type data

TARSITANO, Agostino;
2007-01-01

Abstract

There are several strategies to cope with the simultaneous presence of different measurement scales. A reasonable option would be to compute the dissimilarity matrix for each type of variable: bynary, categorical, ordinal and metric. Then a compromise dissimilarity matrix can be achieved by using a convex combination of all the partial matrices (``partial" because each of them is linked to a specific group of indicators and not to the globality of the issues reported in the units). This paper addresses the problem of specifying differential weights for each type of variable in order to reflect their significance, reliability and statistical adequacy. To this end, the Distatis procedure yields a compromise distance matrix between units which can be analyzed by using the usual technique of cluster analysis. Experiments performed with k-nearest neighbor imputation demonstrate the ability of the proposed method to compensate for missing values when several type of variables occur in the same data set.
2007
978-88-6056-020-9
Distatis; Nearest neighbor imputation; LAD regression
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/164551
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact