Audio super-resolution refers to techniques that improve the quality of audio signals, usually by exploiting bandwidth extension methods, whereby audio enhancement is obtained by expanding the phase and the spectrogram of the input audio traces. These techniques are therefore much significant for all those cases where audio traces miss relevant parts of the audible spectrum. In many cases, the given input signal contains the low-band frequencies (the easiest to capture with low-quality recording instruments) whereas the high-band must be generated. In this paper, we illustrate a system for bandwidth extension that works on musical tracks and generates the high-band frequencies starting from the low-band ones. The system, called ViT Super-resolution (ViT-SR), features an architecture based on a Generative Adversarial Network and Vision Transformer model. Some experiments, which are accounted for in the paper, serve the purpose to prove the effectiveness of the presented approach. In particular, our purpose was to demonstrate that it is possible to faithfully reconstruct the high-band signal of an audio file having only its low-band spectrum available as the input, therewith including the usually difficult to synthetically generate harmonics associated with the input track which significantly contribute to the final perceived sound quality.

Audio Super-Resolution via Vision Transformer

Nistico' S.;Palopoli L.;Romano A. P.
2022-01-01

Abstract

Audio super-resolution refers to techniques that improve the quality of audio signals, usually by exploiting bandwidth extension methods, whereby audio enhancement is obtained by expanding the phase and the spectrogram of the input audio traces. These techniques are therefore much significant for all those cases where audio traces miss relevant parts of the audible spectrum. In many cases, the given input signal contains the low-band frequencies (the easiest to capture with low-quality recording instruments) whereas the high-band must be generated. In this paper, we illustrate a system for bandwidth extension that works on musical tracks and generates the high-band frequencies starting from the low-band ones. The system, called ViT Super-resolution (ViT-SR), features an architecture based on a Generative Adversarial Network and Vision Transformer model. Some experiments, which are accounted for in the paper, serve the purpose to prove the effectiveness of the presented approach. In particular, our purpose was to demonstrate that it is possible to faithfully reconstruct the high-band signal of an audio file having only its low-band spectrum available as the input, therewith including the usually difficult to synthetically generate harmonics associated with the input track which significantly contribute to the final perceived sound quality.
2022
978-3-031-16563-4
978-3-031-16564-1
Audio super-resolution
Generative adversarial networks
Music enhancement
Transformers
Vision transformer
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11770/346001
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact