Towards the Automated Population of Thesauri Using BERT: A Use Case on the Cybersecurity Domain

IRIS

The present work delves into innovative methodologies leveraging the widely used BERT model to enhance the population and enrichment of domain-oriented controlled vocabularies as Thesauri. Starting from BERT's embeddings, we extracted information from a sample corpus of Cybersecurity related documents and presented a novel Natural Language Processing-inspired pipeline that combines Neural language models, knowledge graph extraction, and natural language inference for identifying implicit relations (adaptable to thesaural relationships) and domain concepts to populate a domain thesaurus. Preliminary results are promising, showing the effectiveness of using the proposed methodology, and thus the applicability of LLMs, BERT in particular, to enrich specialized controlled vocabularies with new knowledge.

Towards the Automated Population of Thesauri Using BERT: A Use Case on the Cybersecurity Domain

Elena Cardillo;Alessio Portaro;Maria Taverniti;Claudia Lanza;Raffaele Guarasci

2024-01-01

Abstract

The present work delves into innovative methodologies leveraging the widely used BERT model to enhance the population and enrichment of domain-oriented controlled vocabularies as Thesauri. Starting from BERT's embeddings, we extracted information from a sample corpus of Cybersecurity related documents and presented a novel Natural Language Processing-inspired pipeline that combines Neural language models, knowledge graph extraction, and natural language inference for identifying implicit relations (adaptable to thesaural relationships) and domain concepts to populate a domain thesaurus. Preliminary results are promising, showing the effectiveness of using the proposed methodology, and thus the applicability of LLMs, BERT in particular, to enrich specialized controlled vocabularies with new knowledge.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Codice ISBN
	
				978-3-031-53554-3
			
	Appare nelle tipologie:
	
				2.1 Contributo in volume (Capitolo o Saggio)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/363503

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

3

ND

social impact