Generating Fake Documents using Probabilistic Logic Graphs

IRIS

Past research has shown that over 8 months may elapse between the time a network is compromised and the time the attack is discovered. During this long gap, attackers can steal valuable intellectual property from the victim. The recent FORGE system suggests that generating fake-but-believable-versions of documents can delay the attacker, cost him money, and increase his uncertainty. However, FORGE only modifies the textual component of the document. But in the real world, documents consist of many non-textual components such as charts, equations, formulas, diagrams, and tables. We propose the concept of a Probabilistic Logic Graph (PLG) and show that PLGs provide a single, unified framework within which the different parts of a document can be expressed. We then define the problem of generating, for a given PLG representation of a document, a set of fake yet highly believable PLGs (i.e., documents), so that an attacker looking at them (the original and the fake ones) cannot easily identify the original document. We show that the problem of generating fake PLGs is intractable-but we propose an approximation algorithm solving it efficiently. We evaluate the use of PLGs over a corpus of patents and show our fakes can effectively deceive an adversary.

Generating Fake Documents using Probabilistic Logic Graphs

Qian Han;Cristian Molinaro;Antonio Picariello;Giancarlo Sperli;Venkatramanan S. Subrahmanian;Yanhai Xiong

2022-01-01

Abstract

Past research has shown that over 8 months may elapse between the time a network is compromised and the time the attack is discovered. During this long gap, attackers can steal valuable intellectual property from the victim. The recent FORGE system suggests that generating fake-but-believable-versions of documents can delay the attacker, cost him money, and increase his uncertainty. However, FORGE only modifies the textual component of the document. But in the real world, documents consist of many non-textual components such as charts, equations, formulas, diagrams, and tables. We propose the concept of a Probabilistic Logic Graph (PLG) and show that PLGs provide a single, unified framework within which the different parts of a document can be expressed. We then define the problem of generating, for a given PLG representation of a document, a set of fake yet highly believable PLGs (i.e., documents), so that an attacker looking at them (the original and the fake ones) cannot easily identify the original document. We show that the problem of generating fake PLGs is intractable-but we propose an approximation algorithm solving it efficiently. We evaluate the use of PLGs over a corpus of patents and show our fakes can effectively deceive an adversary.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2022
			
	Parole chiave
	
				cybersecurity
Deception
fake documents
intellectual property
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/328031

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

42

ND

social impact