One of the errors most frequently made in any transition from paper to digital documents is to believe that – as in the past – the change of document format has an effect on the choice of methodologies governing the different phases of an archive’s life-cycle. Concerning the conservation plan in particular, although digitization has removed one of the basic motives for the sorting required to optimize the use of space (often a costly item), it has resulted in an increase in redundancy and superfluous information. The traditional theory that considered sorting as an integral part of the reorganization of paper archives was based on the conviction that respecting the relationships between documents involved an intellectual content that required, firstly, the reorganization of the archives’ overall structure and, secondly, the evaluation of which elements ought to be kept. Electronic documents, in contrast, do not exist as actual physical entities and it is rare for their storage as electronic signals to have any connection with the documents themselves; whether displayed on a screen or printed out , the physical relationship loses all meaning. Indeed, the phenomenon strengthens the logical relationships between documents. Consequently, we are witnessing a change in the parameter and methods that - until now – were paramount for the conservation and selection of archived documents. At the same time, there no longer seems to be the same absolute necessity to reorganize repositories as a preliminary phase to ex post selection, even considering the increasingly ambiguous need to pre-establish ex ante the life-cycle of each document typology. In this context, methodologies for textual analysis and terminology extraction, also based on algorithms of frequency and statistical relevance of terms applying to digital and digitalized archives, can represent the development of extremely useful applications, that can also deal with the increase in the production of documents that have not always been correctly classified as archives.
La sélection des documents dans le systéme numèrique
ROVELLA A.
;GUARASCI R
;TAVERNITI M
2009-01-01
Abstract
One of the errors most frequently made in any transition from paper to digital documents is to believe that – as in the past – the change of document format has an effect on the choice of methodologies governing the different phases of an archive’s life-cycle. Concerning the conservation plan in particular, although digitization has removed one of the basic motives for the sorting required to optimize the use of space (often a costly item), it has resulted in an increase in redundancy and superfluous information. The traditional theory that considered sorting as an integral part of the reorganization of paper archives was based on the conviction that respecting the relationships between documents involved an intellectual content that required, firstly, the reorganization of the archives’ overall structure and, secondly, the evaluation of which elements ought to be kept. Electronic documents, in contrast, do not exist as actual physical entities and it is rare for their storage as electronic signals to have any connection with the documents themselves; whether displayed on a screen or printed out , the physical relationship loses all meaning. Indeed, the phenomenon strengthens the logical relationships between documents. Consequently, we are witnessing a change in the parameter and methods that - until now – were paramount for the conservation and selection of archived documents. At the same time, there no longer seems to be the same absolute necessity to reorganize repositories as a preliminary phase to ex post selection, even considering the increasingly ambiguous need to pre-establish ex ante the life-cycle of each document typology. In this context, methodologies for textual analysis and terminology extraction, also based on algorithms of frequency and statistical relevance of terms applying to digital and digitalized archives, can represent the development of extremely useful applications, that can also deal with the increase in the production of documents that have not always been correctly classified as archives.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.