One of the errors most frequently made in any transition from paper to digital documents is to believe that – as in the past – the change of document format has an effect on the choice of methodologies governing the different phases of an archive’s life-cycle. Concerning the conservation plan in particular, although digitization has removed one of the basic motives for the sorting required to optimize the use of space (often a costly item), it has resulted in an increase in redundancy and superfluous information. The traditional theory that considered sorting as an integral part of the reorganization of paper archives was based on the conviction that respecting the relationships between documents involved an intellectual content that required, firstly, the reorganization of the archives’ overall structure and, secondly, the evaluation of which elements ought to be kept. Electronic documents, in contrast, do not exist as actual physical entities and it is rare for their storage as electronic signals to have any connection with the documents themselves; whether displayed on a screen or printed out , the physical relationship loses all meaning. Indeed, the phenomenon strengthens the logical relationships between documents. Consequently, we are witnessing a change in the parameter and methods that - until now – were paramount for the conservation and selection of archived documents. At the same time, there no longer seems to be the same absolute necessity to reorganize repositories as a preliminary phase to ex post selection, even considering the increasingly ambiguous need to pre-establish ex ante the life-cycle of each document typology. In this context, methodologies for textual analysis and terminology extraction, also based on algorithms of frequency and statistical relevance of terms applying to digital and digitalized archives, can represent the development of extremely useful applications, that can also deal with the increase in the production of documents that have not always been correctly classified as archives.

Il lavoro riporta i risultati di una ricerca volta a individuare le metodologie per la costruzione di strumenti di supporto alla selezione dei documenti in ambiente digitale. In particolare, il metodo elaborato si concretizza nello sviluppo di uno specifico tool, in grado di coadiuvare processi automatici supervisionati, che sfruttando la massimizzazione dell’entropia permette di calcolare, dato un modello di scarto definito, la probabilità condizionata di selezione di un documento nel rispetto dell’insieme delle feature che lo rappresentano.

Metodi per la selezione automatica dei documenti

ROVELLA, Anna;GUARASCI, Roberto Franco;
2009-01-01

Abstract

One of the errors most frequently made in any transition from paper to digital documents is to believe that – as in the past – the change of document format has an effect on the choice of methodologies governing the different phases of an archive’s life-cycle. Concerning the conservation plan in particular, although digitization has removed one of the basic motives for the sorting required to optimize the use of space (often a costly item), it has resulted in an increase in redundancy and superfluous information. The traditional theory that considered sorting as an integral part of the reorganization of paper archives was based on the conviction that respecting the relationships between documents involved an intellectual content that required, firstly, the reorganization of the archives’ overall structure and, secondly, the evaluation of which elements ought to be kept. Electronic documents, in contrast, do not exist as actual physical entities and it is rare for their storage as electronic signals to have any connection with the documents themselves; whether displayed on a screen or printed out , the physical relationship loses all meaning. Indeed, the phenomenon strengthens the logical relationships between documents. Consequently, we are witnessing a change in the parameter and methods that - until now – were paramount for the conservation and selection of archived documents. At the same time, there no longer seems to be the same absolute necessity to reorganize repositories as a preliminary phase to ex post selection, even considering the increasingly ambiguous need to pre-establish ex ante the life-cycle of each document typology. In this context, methodologies for textual analysis and terminology extraction, also based on algorithms of frequency and statistical relevance of terms applying to digital and digitalized archives, can represent the development of extremely useful applications, that can also deal with the increase in the production of documents that have not always been correctly classified as archives.
2009
Il lavoro riporta i risultati di una ricerca volta a individuare le metodologie per la costruzione di strumenti di supporto alla selezione dei documenti in ambiente digitale. In particolare, il metodo elaborato si concretizza nello sviluppo di uno specifico tool, in grado di coadiuvare processi automatici supervisionati, che sfruttando la massimizzazione dell’entropia permette di calcolare, dato un modello di scarto definito, la probabilità condizionata di selezione di un documento nel rispetto dell’insieme delle feature che lo rappresentano.
Scarto, Selezione, Documenti; Archives, Records, Selection, Sorting
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/131026
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact