Top-k user-specified preferred answers in massive graph databases

IRIS

There are numerous applications where users wish to identify subsets of vertices in a social network or graph database that are of interest to them. They may specify sets of patterns and vertex properties, and each of these confers a score to a subgraph. The users want to find the subgraphs with top-k highest scores. Examples in the real world where such subgraphs involve custom scoring methods include: techniques to identify sets of coordinated influence bots on Twitter, methods to identify suspicious subgraphs of nodes involved in nuclear proliferation networks, and sets of sockpuppet accounts seeking to illicitly influence star ratings on e-commerce platforms. All of these types of applications have numerous custom scoring methods. This motivates the concept of Scoring Queries presented in this paper — unlike past work, an important aspect of scoring queries is that the users get to choose the scoring mechanism, not the system. We present the Advanced top-k (ATK) algorithm and show that it intelligently leverages graph indexes from the past but also presents novel pruning opportunities. We present an implementation of ATK showing that it beats out a baseline algorithm that builds on advanced subgraph matching methods with multiple graph database backends including Jena and GraphDB. We show that ATK scales well on real world graph databases from YouTube, Flickr, IMDb, and CiteSeerX.

Top-k user-specified preferred answers in massive graph databases

Park N.;Pugliese A.;Serra E.;Subrahmanian V. S.

2020-01-01

Abstract

There are numerous applications where users wish to identify subsets of vertices in a social network or graph database that are of interest to them. They may specify sets of patterns and vertex properties, and each of these confers a score to a subgraph. The users want to find the subgraphs with top-k highest scores. Examples in the real world where such subgraphs involve custom scoring methods include: techniques to identify sets of coordinated influence bots on Twitter, methods to identify suspicious subgraphs of nodes involved in nuclear proliferation networks, and sets of sockpuppet accounts seeking to illicitly influence star ratings on e-commerce platforms. All of these types of applications have numerous custom scoring methods. This motivates the concept of Scoring Queries presented in this paper — unlike past work, an important aspect of scoring queries is that the users get to choose the scoring mechanism, not the system. We present the Advanced top-k (ATK) algorithm and show that it intelligently leverages graph indexes from the past but also presents novel pruning opportunities. We present an implementation of ATK showing that it beats out a baseline algorithm that builds on advanced subgraph matching methods with multiple graph database backends including Jena and GraphDB. We show that ATK scales well on real world graph databases from YouTube, Flickr, IMDb, and CiteSeerX.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2020
			
	Parole chiave
	
				Graph databases; Preferred answers; Top-k querying
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/300095

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

1

1

social impact