The development of robust data management and analysis systems leveraging Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems has significantly improved the opportunity of extracting knowledge from huge datasets. Moreover, integrating established LLMs with custom-designed RAGs allows treating heterogeneous and complex multidimensional data as those representing biomedical information. Nevertheless, despite these advances, for health-related data, there is an increased requirement of more reliable and precise prediction mechanisms, inducing a necessity of improving data models and mechanisms. The study focuses on defining a framework able to manage high-dimensional biomedical data. The implemented system employs advanced indexing techniques to efficiently store and retrieve extensive datasets, addressing the critical demands of comprehensive cardiology research and analysis. It acquires biomedical multidimensional data and enhances information and utility by combining supervised and unsupervised learning methods, ensuring both high accuracy and practical applications. Integrated data management and RAG systems underscore their ability to enhance the identification of biomarkers and clinical data in health-related patient risk stratification and novel biomarker discovery. By using state-of-the-art metrics, benchmarks and practical applications, the use of integrated data management and RAG systems underscore its ability to enhance the identification of biomarkers and clinical data in health related applications. Finally, CardioTRAP applications prove the importance of integrating data management and RAG systems to positively apply biomedical research results in clinical practice.
CardioTRAP: Design of a Retrieval Augmented System (RAG) for Clinical Data in Cardiology
Vizza, Patrizia;Indolfi, Ciro;Veltri, Pierangelo
2025-01-01
Abstract
The development of robust data management and analysis systems leveraging Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems has significantly improved the opportunity of extracting knowledge from huge datasets. Moreover, integrating established LLMs with custom-designed RAGs allows treating heterogeneous and complex multidimensional data as those representing biomedical information. Nevertheless, despite these advances, for health-related data, there is an increased requirement of more reliable and precise prediction mechanisms, inducing a necessity of improving data models and mechanisms. The study focuses on defining a framework able to manage high-dimensional biomedical data. The implemented system employs advanced indexing techniques to efficiently store and retrieve extensive datasets, addressing the critical demands of comprehensive cardiology research and analysis. It acquires biomedical multidimensional data and enhances information and utility by combining supervised and unsupervised learning methods, ensuring both high accuracy and practical applications. Integrated data management and RAG systems underscore their ability to enhance the identification of biomarkers and clinical data in health-related patient risk stratification and novel biomarker discovery. By using state-of-the-art metrics, benchmarks and practical applications, the use of integrated data management and RAG systems underscore its ability to enhance the identification of biomarkers and clinical data in health related applications. Finally, CardioTRAP applications prove the importance of integrating data management and RAG systems to positively apply biomedical research results in clinical practice.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


