This paper discusses issues in the distribution of bundled workflows across ubiquitous peer-to-peer networks for the application of music information retrieval. The underlying motivation for this work is provided by the DART project, which aims to develop a novel music recommendation system by gathering statistical data using collaborative filtering techniques and the analysis of the audio itsel, in order to create a reliable and comprehensive database of the music that people own and which they listen to. To achieve this, the DART scientists creating the algorithms need the ability to distribute the Triana workflows they create, representing the analysis to be performed, across the network on a regular basis (perhaps even daily) in order to update the network as a whole with new workflows to be executed for the analysis. DART uses a similar approach to BOINC but differs in that the workers receive input data in the form of a bundled Triana workflow, which is executed in order to process any MP3 files that they own on their machine. Once analysed, the results are returned to DART's distributed database that collects and aggregates the resulting information. DART employs the use of package repositories to decentralise the distribution of such workflow bundles and this approach is validated in this paper through simulations that show that suitable scalability is maintained through the system as the number of participants increases. The results clearly illustrate the effectiveness of the approach.
Distributing workflows over a ubiquitous P2P network
TALIA, Domenico;
2007-01-01
Abstract
This paper discusses issues in the distribution of bundled workflows across ubiquitous peer-to-peer networks for the application of music information retrieval. The underlying motivation for this work is provided by the DART project, which aims to develop a novel music recommendation system by gathering statistical data using collaborative filtering techniques and the analysis of the audio itsel, in order to create a reliable and comprehensive database of the music that people own and which they listen to. To achieve this, the DART scientists creating the algorithms need the ability to distribute the Triana workflows they create, representing the analysis to be performed, across the network on a regular basis (perhaps even daily) in order to update the network as a whole with new workflows to be executed for the analysis. DART uses a similar approach to BOINC but differs in that the workers receive input data in the form of a bundled Triana workflow, which is executed in order to process any MP3 files that they own on their machine. Once analysed, the results are returned to DART's distributed database that collects and aggregates the resulting information. DART employs the use of package repositories to decentralise the distribution of such workflow bundles and this approach is validated in this paper through simulations that show that suitable scalability is maintained through the system as the number of participants increases. The results clearly illustrate the effectiveness of the approach.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.