Training a CNN to Estimate Voice Pathology from Connected Speech Using EGG to Automatically Label the Dataset for Voicing

Howard, Ian

View/Open

2023_142_149.pdf (808.9Kb)

license.txt (5.295Kb)

Date

2023-03-30

Author

Howard, Ian

Metadata

Show full item record

Abstract

We describe a new system for estimating voice pathology directly from the acoustic speech signal to assist in the diagnosis of pathological voice conditions by voice specialists. Our main novel contributions are the use of Electroglottography (EGG) in neural net training to automatically label speech acoustic signals for voicing and the generation of running estimates of pathology with high temporal resolution from the acoustic signal alone. These estimates can also be linked to the parts of speech signals where voice pathology manifests itself most strongly. By operating directly on the acoustic signal waveform without the use of any pre-processing, we avoid the use of hand-crafted features. We trained and tested a neural network using speech datasets with normal and pathological voicing and found that it can provide effective finegrained indications of pathology. Our quantitative results show that this neural network performs well in distinguishing between speakers with normal and pathological voice conditions, achieving a recognition rate of 91%, which compares favorably with results from other studies.

URI

https://pearl.plymouth.ac.uk/handle/10026.1/21316

Collections

School of Engineering, Computing and Mathematics

Journal

Studientexte zur Sprachkommunikation Band 105: Elektronische Sprachsignalverarbeitung 2023 Conference proceedings of the 34st conference in München with 32 contributions. ISBN 978-3-95908-303-4

Conference name

ESSV 2023 LMU Munich Germany

Start date

2023-03-01

Finish date

2023-03-03

Publisher URL

https://www.essv.de/

Recommended, similar items

The following license files are associated with this item:

Original License