Stuttering is a widespread speech disorder involving about the $$5%$$ of the population and the $$2.5%$$ of children under the age of 5. Much work in literature studies causes, mechanisms and epidemiology and much work is devoted to illustrate treatments, prognosis and how to diagnose stutter. Relevantly, a stuttering evaluation requires the skills of a multi-dimensional team. An expert speech-language therapist conduct a precise evaluation with a series of tests, observations, and interviews. During an evaluation, a speech language therapist perceive, record and transcribe the number and types of speech disfluencies that a person produces in different situations. Stuttering is very variable in the number of repeated syllables/words and in the secondary aspects that alter the clinical picture. This work wants to help in the difficult task of evaluating the stuttering and recognize the occurrencies of disfluency episodes like repetitions and prolongations of sounds, syllables, words or phrases silent pauses, hesitations or blocks before speech. In particular, we propose a deep-learning based approach able at automatically detecting difluent production point in the speech helping in early classification of the problems providing the number of disfluencies and time intervals where the disfluencies occur. A deep learner is built to preliminarily valuate audio fragments. However, the scenario at hand contains some peculiarities making the detection challenging. Indeed, (i) fragments too short lead to uneffective classification since a too short audio fragment is not able to capture the stuttering episode; and (ii) fragments too long lead to uneffective classification since stuttering episode can have a very small duration and, then, the much fluent speaking contained in the fragment masks the disfluence. So, we design an ad-hoc segment classifier that, exploiting the output of a deep learner working with non too short fragments, classifies each small segment composing an audio fragment by estimating the probability of containing a disfluence.
Learning and Detecting Stuttering Disorders
Fassetti F.
;Nistico S.
2019-01-01
Abstract
Stuttering is a widespread speech disorder involving about the $$5%$$ of the population and the $$2.5%$$ of children under the age of 5. Much work in literature studies causes, mechanisms and epidemiology and much work is devoted to illustrate treatments, prognosis and how to diagnose stutter. Relevantly, a stuttering evaluation requires the skills of a multi-dimensional team. An expert speech-language therapist conduct a precise evaluation with a series of tests, observations, and interviews. During an evaluation, a speech language therapist perceive, record and transcribe the number and types of speech disfluencies that a person produces in different situations. Stuttering is very variable in the number of repeated syllables/words and in the secondary aspects that alter the clinical picture. This work wants to help in the difficult task of evaluating the stuttering and recognize the occurrencies of disfluency episodes like repetitions and prolongations of sounds, syllables, words or phrases silent pauses, hesitations or blocks before speech. In particular, we propose a deep-learning based approach able at automatically detecting difluent production point in the speech helping in early classification of the problems providing the number of disfluencies and time intervals where the disfluencies occur. A deep learner is built to preliminarily valuate audio fragments. However, the scenario at hand contains some peculiarities making the detection challenging. Indeed, (i) fragments too short lead to uneffective classification since a too short audio fragment is not able to capture the stuttering episode; and (ii) fragments too long lead to uneffective classification since stuttering episode can have a very small duration and, then, the much fluent speaking contained in the fragment masks the disfluence. So, we design an ad-hoc segment classifier that, exploiting the output of a deep learner working with non too short fragments, classifies each small segment composing an audio fragment by estimating the probability of containing a disfluence.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.