Information extraction technologies meet the market need for automatic tools for extracting semi-structured information from web pages. However, pages may change over time due to different reasons, ranging from restyling pages to on-purpose modifications brought about into pages in order to puzzle Web wrappers. In this paper we deal with this latter scenario, by studying the issue of on-purpose wrapper spoiling and its relationship to wrapping. We present an architecture and a tool implementing a wrapper spoiling system, and discuss some practical spoiling techniques which are also experimentally tested.
Protection Techniques from Information Extraction
GRECO, Gianluigi;IANNI, Giovambattista;PALOPOLI, Luigi
2007-01-01
Abstract
Information extraction technologies meet the market need for automatic tools for extracting semi-structured information from web pages. However, pages may change over time due to different reasons, ranging from restyling pages to on-purpose modifications brought about into pages in order to puzzle Web wrappers. In this paper we deal with this latter scenario, by studying the issue of on-purpose wrapper spoiling and its relationship to wrapping. We present an architecture and a tool implementing a wrapper spoiling system, and discuss some practical spoiling techniques which are also experimentally tested.File in questo prodotto:
Non ci sono file associati a questo prodotto.
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.