The imitation of spoken stop consonants by an articulatory synthesizer using only general learning principles addresses significant issues in speech inversion and speech acquisition. Stop consonants are relatively large, complex acoustic events resulting from discrete articulations, so inversion based on the use of small time windows or based on the minimisation of average articulatory error across multiple places of articulation will not provide a satisfactory solution. This paper explores the effect of variation in inversion window size and the use of smoothing constraints on the quality of imitation of the stops [b], [d] and [g]. However good results are only obtained when inversion is supplemented by a phonetic labelling performed over a large time window. This source of additional phonetic information allows inversion to exploit different discrete gestures for the different places of articulation. The results demonstrate the importance of a phonological layer of perceptual analysis prior to imitation and speech acquisition.

Publication Date


Publication Title

9th European Conference on Speech Communication and Technology

First Page


Last Page


Organisational Unit

School of Engineering, Computing and Mathematics