Computing sequence similarity is a problem emerging in several areas of research. Current solution algorithms are often based on alignment methods under the assumption that matching symbols, or at least a substitution schema among them, are known in advance. This is natural for sequences defined over the same alphabet of symbols. However, for sequences defined over different alphabets and in absence of an appropriate background knowledge, sequence similarity can be conveniently reconsidered from a different perspective where determining the best substitution schema is also part of the computation problem. The basic idea is that any symbol of a sequence can be correlated with many symbols of another, provided each correlation frequently occurs over the various positions of the alignment. This novel setting is formalized and relevant application domains fitting its peculiarities are illustrated. Moreover, the computational complexity of the alignment problems arising therein is analyzed, and practical solution approaches are proposed and validated over synthetic and real datasets.

Frequency-based similarity for parameterized sequences: Formal framework, algorithms, and applications

GRECO, Gianluigi;TERRACINA, Giorgio
2013-01-01

Abstract

Computing sequence similarity is a problem emerging in several areas of research. Current solution algorithms are often based on alignment methods under the assumption that matching symbols, or at least a substitution schema among them, are known in advance. This is natural for sequences defined over the same alphabet of symbols. However, for sequences defined over different alphabets and in absence of an appropriate background knowledge, sequence similarity can be conveniently reconsidered from a different perspective where determining the best substitution schema is also part of the computation problem. The basic idea is that any symbol of a sequence can be correlated with many symbols of another, provided each correlation frequently occurs over the various positions of the alignment. This novel setting is formalized and relevant application domains fitting its peculiarities are illustrated. Moreover, the computational complexity of the alignment problems arising therein is analyzed, and practical solution approaches are proposed and validated over synthetic and real datasets.
2013
Alignment problem; Mining technique; Sequence similarity
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/139090
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 8
  • ???jsp.display-item.citation.isi??? 8
social impact