Frequency-based similarity for parameterized sequences: Formal framework, algorithms, and applications

IRIS

Computing sequence similarity is a problem emerging in several areas of research. Current solution algorithms are often based on alignment methods under the assumption that matching symbols, or at least a substitution schema among them, are known in advance. This is natural for sequences defined over the same alphabet of symbols. However, for sequences defined over different alphabets and in absence of an appropriate background knowledge, sequence similarity can be conveniently reconsidered from a different perspective where determining the best substitution schema is also part of the computation problem. The basic idea is that any symbol of a sequence can be correlated with many symbols of another, provided each correlation frequently occurs over the various positions of the alignment. This novel setting is formalized and relevant application domains fitting its peculiarities are illustrated. Moreover, the computational complexity of the alignment problems arising therein is analyzed, and practical solution approaches are proposed and validated over synthetic and real datasets.

Frequency-based similarity for parameterized sequences: Formal framework, algorithms, and applications

GRECO, Gianluigi;TERRACINA, Giorgio

2013-01-01

Abstract

Computing sequence similarity is a problem emerging in several areas of research. Current solution algorithms are often based on alignment methods under the assumption that matching symbols, or at least a substitution schema among them, are known in advance. This is natural for sequences defined over the same alphabet of symbols. However, for sequences defined over different alphabets and in absence of an appropriate background knowledge, sequence similarity can be conveniently reconsidered from a different perspective where determining the best substitution schema is also part of the computation problem. The basic idea is that any symbol of a sequence can be correlated with many symbols of another, provided each correlation frequently occurs over the various positions of the alignment. This novel setting is formalized and relevant application domains fitting its peculiarities are illustrated. Moreover, the computational complexity of the alignment problems arising therein is analyzed, and practical solution approaches are proposed and validated over synthetic and real datasets.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2013
			
	Parole chiave
	
				Alignment problem; Mining technique; Sequence similarity
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/139090

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

8

8

social impact