Protection Techniques from Information Extraction

IRIS

Information extraction technologies meet the market need for automatic tools for extracting semi-structured information from web pages. However, pages may change over time due to different reasons, ranging from restyling pages to on-purpose modifications brought about into pages in order to puzzle Web wrappers. In this paper we deal with this latter scenario, by studying the issue of on-purpose wrapper spoiling and its relationship to wrapping. We present an architecture and a tool implementing a wrapper spoiling system, and discuss some practical spoiling techniques which are also experimentally tested.

Protection Techniques from Information Extraction

GRECO, Gianluigi;IANNI, Giovambattista;LIO V;PALOPOLI, Luigi

2007-01-01

Abstract

Information extraction technologies meet the market need for automatic tools for extracting semi-structured information from web pages. However, pages may change over time due to different reasons, ranging from restyling pages to on-purpose modifications brought about into pages in order to puzzle Web wrappers. In this paper we deal with this latter scenario, by studying the issue of on-purpose wrapper spoiling and its relationship to wrapping. We present an architecture and a tool implementing a wrapper spoiling system, and discuss some practical spoiling techniques which are also experimentally tested.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2007
			
	Codice ISBN
	
				0-7695-2747-7
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/170787

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

1

0

social impact