Human activity prediction (HAP) is crucial for enabling intelligent smart home services; yet, it is often hindered by the scarcity of high-quality, multidimensional datasets. Existing datasets are typically fragmented, capturing either long-term activity sequences or short-term device interactions, but rarely both in a unified manner. Traditional data collection methods are costly and time-consuming, while conventional simulation techniques struggle to generate diverse and logically coherent behavior sequences. To address these limitations, we propose SmartLLM, a novel large language model (LLM)-based simulation framework for automated generation of multidimensional smart home datasets. SmartLLM simulates simulated agents with distinct profiles (e.g., old man, remote worker, and holiday maker) performing daily activities within configurable home environments, generating temporally aligned sequences across activity-device-sensor dimensions. We generate two months of simulated data for three user profiles and validated their plausibility through activity distribution visualization, statistical perplexity analysis, and case studies. Multidimensional feature validation experiments further demonstrate that our multidimensional data significantly enhances the accuracy of activity prediction models compared to using single-dimensional features. This work successfully addresses key bottlenecks in smart home data acquisition and provides a scalable, high-quality data foundation for advancing smart home algorithm research. The code is available at: https://github.com/HuankeZheng/SmartLLM.

SmartLLM: Multidimensional Dataset Generation via LLM Simulation in Smart Home

Fortino, Giancarlo
2026-01-01

Abstract

Human activity prediction (HAP) is crucial for enabling intelligent smart home services; yet, it is often hindered by the scarcity of high-quality, multidimensional datasets. Existing datasets are typically fragmented, capturing either long-term activity sequences or short-term device interactions, but rarely both in a unified manner. Traditional data collection methods are costly and time-consuming, while conventional simulation techniques struggle to generate diverse and logically coherent behavior sequences. To address these limitations, we propose SmartLLM, a novel large language model (LLM)-based simulation framework for automated generation of multidimensional smart home datasets. SmartLLM simulates simulated agents with distinct profiles (e.g., old man, remote worker, and holiday maker) performing daily activities within configurable home environments, generating temporally aligned sequences across activity-device-sensor dimensions. We generate two months of simulated data for three user profiles and validated their plausibility through activity distribution visualization, statistical perplexity analysis, and case studies. Multidimensional feature validation experiments further demonstrate that our multidimensional data significantly enhances the accuracy of activity prediction models compared to using single-dimensional features. This work successfully addresses key bottlenecks in smart home data acquisition and provides a scalable, high-quality data foundation for advancing smart home algorithm research. The code is available at: https://github.com/HuankeZheng/SmartLLM.
2026
Human activity prediction (HAP)
large language model (LLM)
multidimensional dataset generation
smart home
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/406318
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact