This paper presents a method for developing a malware ontology structure by detecting malware instances on Twitter. The ontology represents a semi-automatic classifier fed by the data extracted from tweets. In particular, the automatic part of the presented methodology relies on a pattern-based approach to detect trigger expressions leading to new information about malware, whilst the manual one covers the evaluation of the results by domain-experts, who also validate the reliability of the semantic relationships within the ontology framework. We present preliminary results on the application of our methodology to tweets extracted from MalwareBazaar database showing how the documents’ collection analysis, through Natural Language Processing (NLP) tasks, can support the knowledge retrieval and documents’ classification procedures for building early warning system of detected malware. Results obtained from this research paper within the time framework of 2023 are referred to the previous version of the current social network X.
Towards a semi-automatic classifier of malware through tweets for early warning threat detection
Claudia Lanza
;
2024-01-01
Abstract
This paper presents a method for developing a malware ontology structure by detecting malware instances on Twitter. The ontology represents a semi-automatic classifier fed by the data extracted from tweets. In particular, the automatic part of the presented methodology relies on a pattern-based approach to detect trigger expressions leading to new information about malware, whilst the manual one covers the evaluation of the results by domain-experts, who also validate the reliability of the semantic relationships within the ontology framework. We present preliminary results on the application of our methodology to tweets extracted from MalwareBazaar database showing how the documents’ collection analysis, through Natural Language Processing (NLP) tasks, can support the knowledge retrieval and documents’ classification procedures for building early warning system of detected malware. Results obtained from this research paper within the time framework of 2023 are referred to the previous version of the current social network X.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.