Symbolic Analysis and LLM-Guided Debugging of Digital Twin Models with ASP Chef and DTDL

IRIS

DTDL (Digital Twins Definition Language) provides no mechanism for logical reasoning or constraint checking over digital twin models. We integrate DTDL with ASP Chef, a web-based Answer Set Programming (ASP) platform, via a structured DTDL-to-ASP mapping and three dedicated operations: @DTDL/Parse for fact generation, @DTDL/Analysis for structural metrics, and @DTDL/Debug for symbolic validation with LLM-guided repair. The key design decision is that error detection is symbolic and deterministic within the implemented set of constraint classes; a language model is invoked only after the ASP layer has produced a concrete, grounded diagnostic, keeping the correctness boundary with the symbolic layer. Soundness and completeness guarantees are scoped to these constraint classes; a formal proof is left as future work. We illustrate the framework on two agricultural use cases and report a proof-of-concept assessment on 99 diagnostics spanning 21 error classes across four domains. Three binary metrics are used: json_valid and entity_recall are computed mechanically; fix quality (judge_correct) is assessed by an independent LLM judge (Claude Sonnet 4.6). The complete grounded workflow achieves 90% judge_correct and 86% json_valid; a fair ablation baseline—same LLM and system message, but error type and entity name in natural language without structured diagnostics—achieves 77% and 75%, respectively. The gap is consistent across three independent judges and statistically significant (McNemar (Formula presented.)), but the inter-judge reliability of judge_correct is limited ((Formula presented.) ranging from 0.00 to 0.44), so results should be read as directional evidence rather than precise effect estimates. Excluding the dominant isolated_interface class ((Formula presented.), ceiling score), the conservative estimate is 87% vs. 79% on the remaining 71 diagnostics. These results constitute a preliminary proof-of-concept limited to a small number of models, a few application domains, and a single LLM configuration; results do not generalize beyond this specific setting. The judge_correct metric is assessed by LLM-as-judge and does not carry a perfect inter-annotator agreement.

Symbolic Analysis and LLM-Guided Debugging of Digital Twin Models with ASP Chef and DTDL

Alviano, Mario;Guarasci, Paola

2026-01-01

Abstract

DTDL (Digital Twins Definition Language) provides no mechanism for logical reasoning or constraint checking over digital twin models. We integrate DTDL with ASP Chef, a web-based Answer Set Programming (ASP) platform, via a structured DTDL-to-ASP mapping and three dedicated operations: @DTDL/Parse for fact generation, @DTDL/Analysis for structural metrics, and @DTDL/Debug for symbolic validation with LLM-guided repair. The key design decision is that error detection is symbolic and deterministic within the implemented set of constraint classes; a language model is invoked only after the ASP layer has produced a concrete, grounded diagnostic, keeping the correctness boundary with the symbolic layer. Soundness and completeness guarantees are scoped to these constraint classes; a formal proof is left as future work. We illustrate the framework on two agricultural use cases and report a proof-of-concept assessment on 99 diagnostics spanning 21 error classes across four domains. Three binary metrics are used: json_valid and entity_recall are computed mechanically; fix quality (judge_correct) is assessed by an independent LLM judge (Claude Sonnet 4.6). The complete grounded workflow achieves 90% judge_correct and 86% json_valid; a fair ablation baseline—same LLM and system message, but error type and entity name in natural language without structured diagnostics—achieves 77% and 75%, respectively. The gap is consistent across three independent judges and statistically significant (McNemar (Formula presented.)), but the inter-judge reliability of judge_correct is limited ((Formula presented.) ranging from 0.00 to 0.44), so results should be read as directional evidence rather than precise effect estimates. Excluding the dominant isolated_interface class ((Formula presented.), ceiling score), the conservative estimate is 87% vs. 79% on the remaining 71 diagnostics. These results constitute a preliminary proof-of-concept limited to a small number of models, a few application domains, and a single LLM configuration; results do not generalize beyond this specific setting. The judge_correct metric is assessed by LLM-as-judge and does not carry a perfect inter-annotator agreement.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2026
			
	Parole chiave
	
				Answer Set Programming
ASP Chef
constraint validation
digital twins
DTDL
graph visualization
IoT
knowledge representation
semantic modeling
smart agriculture
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/407937

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

0

0

social impact