Speech Fundamental Period Estimation using a Neural Network

Howard, Ian

View/Open

license.txt (5.295Kb)

2020_44_51.pdf (2.000Mb)

Date

2020-03-31

Author

Howard, Ian

Metadata

Show full item record

Abstract

Here we extend previous work for the estimation of the time of excitation (Tx) from the speech signal using a shallow neural network. We make use of a dataset that consists of the simultaneously recorded speech and Laryngograph signals from drama students speaking a phonetically balanced passage. We first use the Laryngograph signal to estimate the location of vocal fold closures as a function of time. Then, by considering the problem as a supervised learning task, we train a multilayer perceptron to map between raw speech samples, selected using a sliding input window, to a single output target sample that represents the presence or absence of an excitation point. We present result of operation across several male speakers and also demonstrate that it is possible to reconstruct the Laryngograph directly from the speech signal.

URI

https://pearl.plymouth.ac.uk/handle/10026.1/21320

Collections

School of Engineering, Computing and Mathematics

Journal

Studientexte zur Sprachkommunikation Band 95: Elektronische Sprachsignalverarbeitung 2020 Conference proceedings of the 31st conference in Magdeburg with 38 contributions. ISBN: 978-3-959081-93-1

Conference name

ESSV 2020 Magdeburg

Publisher URL

https://www.essv.de/

Recommended, similar items

The following license files are associated with this item:

Original License