Distance function for mixed type data

IRIS

There are several strategies to cope with the simultaneous presence of different measurement scales. A reasonable option would be to compute the dissimilarity matrix for each type of variable: bynary, categorical, ordinal and metric. Then a compromise dissimilarity matrix can be achieved by using a convex combination of all the partial matrices (``partial" because each of them is linked to a specific group of indicators and not to the globality of the issues reported in the units). This paper addresses the problem of specifying differential weights for each type of variable in order to reflect their significance, reliability and statistical adequacy. To this end, the Distatis procedure yields a compromise distance matrix between units which can be analyzed by using the usual technique of cluster analysis. Experiments performed with k-nearest neighbor imputation demonstrate the ability of the proposed method to compensate for missing values when several type of variables occur in the same data set.

Distance function for mixed type data

TARSITANO, Agostino;Bonafine I.

2007-01-01

Abstract

There are several strategies to cope with the simultaneous presence of different measurement scales. A reasonable option would be to compute the dissimilarity matrix for each type of variable: bynary, categorical, ordinal and metric. Then a compromise dissimilarity matrix can be achieved by using a convex combination of all the partial matrices (``partial" because each of them is linked to a specific group of indicators and not to the globality of the issues reported in the units). This paper addresses the problem of specifying differential weights for each type of variable in order to reflect their significance, reliability and statistical adequacy. To this end, the Distatis procedure yields a compromise distance matrix between units which can be analyzed by using the usual technique of cluster analysis. Experiments performed with k-nearest neighbor imputation demonstrate the ability of the proposed method to compensate for missing values when several type of variables occur in the same data set.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2007
			
	Codice ISBN
	
				978-88-6056-020-9
			
	Parole chiave
	
				Distatis; Nearest neighbor imputation; LAD regression
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/164551

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

social impact