DTDL (Digital Twins Definition Language) provides no mechanism for logical reasoning or constraint checking over digital twin models. We integrate DTDL with ASP Chef, a web-based Answer Set Programming (ASP) platform, via a structured DTDL-to-ASP mapping and three dedicated operations: @DTDL/Parse for fact generation, @DTDL/Analysis for structural metrics, and @DTDL/Debug for symbolic validation with LLM-guided repair. The key design decision is that error detection is symbolic and deterministic within the implemented set of constraint classes; a language model is invoked only after the ASP layer has produced a concrete, grounded diagnostic, keeping the correctness boundary with the symbolic layer. Soundness and completeness guarantees are scoped to these constraint classes; a formal proof is left as future work. We illustrate the framework on two agricultural use cases and report a proof-of-concept assessment on 99 diagnostics spanning 21 error classes across four domains. Three binary metrics are used: json_valid and entity_recall are computed mechanically; fix quality (judge_correct) is assessed by an independent LLM judge (Claude Sonnet 4.6). The complete grounded workflow achieves 90% judge_correct and 86% json_valid; a fair ablation baseline—same LLM and system message, but error type and entity name in natural language without structured diagnostics—achieves 77% and 75%, respectively. The gap is consistent across three independent judges and statistically significant (McNemar (Formula presented.)), but the inter-judge reliability of judge_correct is limited ((Formula presented.) ranging from 0.00 to 0.44), so results should be read as directional evidence rather than precise effect estimates. Excluding the dominant isolated_interface class ((Formula presented.), ceiling score), the conservative estimate is 87% vs. 79% on the remaining 71 diagnostics. These results constitute a preliminary proof-of-concept limited to a small number of models, a few application domains, and a single LLM configuration; results do not generalize beyond this specific setting. The judge_correct metric is assessed by LLM-as-judge and does not carry a perfect inter-annotator agreement.

Symbolic Analysis and LLM-Guided Debugging of Digital Twin Models with ASP Chef and DTDL

Alviano, Mario;Guarasci, Paola
2026-01-01

Abstract

DTDL (Digital Twins Definition Language) provides no mechanism for logical reasoning or constraint checking over digital twin models. We integrate DTDL with ASP Chef, a web-based Answer Set Programming (ASP) platform, via a structured DTDL-to-ASP mapping and three dedicated operations: @DTDL/Parse for fact generation, @DTDL/Analysis for structural metrics, and @DTDL/Debug for symbolic validation with LLM-guided repair. The key design decision is that error detection is symbolic and deterministic within the implemented set of constraint classes; a language model is invoked only after the ASP layer has produced a concrete, grounded diagnostic, keeping the correctness boundary with the symbolic layer. Soundness and completeness guarantees are scoped to these constraint classes; a formal proof is left as future work. We illustrate the framework on two agricultural use cases and report a proof-of-concept assessment on 99 diagnostics spanning 21 error classes across four domains. Three binary metrics are used: json_valid and entity_recall are computed mechanically; fix quality (judge_correct) is assessed by an independent LLM judge (Claude Sonnet 4.6). The complete grounded workflow achieves 90% judge_correct and 86% json_valid; a fair ablation baseline—same LLM and system message, but error type and entity name in natural language without structured diagnostics—achieves 77% and 75%, respectively. The gap is consistent across three independent judges and statistically significant (McNemar (Formula presented.)), but the inter-judge reliability of judge_correct is limited ((Formula presented.) ranging from 0.00 to 0.44), so results should be read as directional evidence rather than precise effect estimates. Excluding the dominant isolated_interface class ((Formula presented.), ceiling score), the conservative estimate is 87% vs. 79% on the remaining 71 diagnostics. These results constitute a preliminary proof-of-concept limited to a small number of models, a few application domains, and a single LLM configuration; results do not generalize beyond this specific setting. The judge_correct metric is assessed by LLM-as-judge and does not carry a perfect inter-annotator agreement.
2026
Answer Set Programming
ASP Chef
constraint validation
digital twins
DTDL
graph visualization
IoT
knowledge representation
semantic modeling
smart agriculture
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/407937
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact