Geotagged data gathered from social media can be used to discover places-of-interest (PoIs) that have attracted many visitors. Since a PoI is generally identified by geographical coordinates of a single point, it is hard to match it with people trajectories. Therefore, we define an area, called region-of-interest (RoI), represented by the boundaries of a PoI. The main goal of this study is to discover RoIs from PoIs using spatial data mining techniques. In this paper, we propose a new parallel method for extracting RoIs from social media datasets. It consists of two main steps: (i) automatic keyword extraction and data grouping and (ii) parallel RoI extraction. The first step extracts keywords identifying the PoIs; these keywords are used to group social media items according to the places they refer to. The second step uses a Parallel Clustering Approach (ParCA) of spatial dataset to identify RoIs. ParCA exploits a parallel execution of DBSCAN on subsets of data to generate subclusters on each processing node and then merge overlapping subclusters to form global clusters. ParCA was implemented using the MapReduce model. Experiments performed over a set of PoIs in the city of Rome using social media data show that our approach is highly scalable and reaches an accuracy of 79% in detecting RoIs. On a parallel computer with 50 cores, we obtained a speedup of 52 by processing large datasets divided into 32 splits, compared with the execution time registered when each dataset is not partitioned.

Parallel extraction of Regions-of-Interest from social media data

Belcastro L.;Marozzo F.;Talia D.;Trunfio P.
2020

Abstract

Geotagged data gathered from social media can be used to discover places-of-interest (PoIs) that have attracted many visitors. Since a PoI is generally identified by geographical coordinates of a single point, it is hard to match it with people trajectories. Therefore, we define an area, called region-of-interest (RoI), represented by the boundaries of a PoI. The main goal of this study is to discover RoIs from PoIs using spatial data mining techniques. In this paper, we propose a new parallel method for extracting RoIs from social media datasets. It consists of two main steps: (i) automatic keyword extraction and data grouping and (ii) parallel RoI extraction. The first step extracts keywords identifying the PoIs; these keywords are used to group social media items according to the places they refer to. The second step uses a Parallel Clustering Approach (ParCA) of spatial dataset to identify RoIs. ParCA exploits a parallel execution of DBSCAN on subsets of data to generate subclusters on each processing node and then merge overlapping subclusters to form global clusters. ParCA was implemented using the MapReduce model. Experiments performed over a set of PoIs in the city of Rome using social media data show that our approach is highly scalable and reaches an accuracy of 79% in detecting RoIs. On a parallel computer with 50 cores, we obtained a speedup of 52 by processing large datasets divided into 32 splits, compared with the execution time registered when each dataset is not partitioned.
parallel clustering
regions-of-interest
RoI mining
scalability
social media analysis
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/20.500.11770/315606
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? ND
social impact