You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection

Venkatesh, S; Moffat, D; Miranda, Eduardo

dc.contributor.author	Venkatesh, S
dc.contributor.author	Moffat, D
dc.contributor.author	Miranda, Eduardo
dc.date.accessioned	2022-04-01T18:00:38Z
dc.date.available	2022-04-01T18:00:38Z
dc.date.issued	2022-03-24
dc.identifier.issn	2076-3417
dc.identifier.issn	2076-3417
dc.identifier.other	3293
dc.identifier.uri	http://hdl.handle.net/10026.1/18986
dc.description.abstract	Audio segmentation and sound event detection are crucial topics in machine listening that aim to detect acoustic classes and their respective boundaries. It is useful for audio-content analysis, speech recognition, audio-indexing, and music information retrieval. In recent years, most research articles adopt segmentation-by-classification. This technique divides audio into small frames and individually performs classification on these frames. In this paper, we present a novel approach called You Only Hear Once (YOHO), which is inspired by the YOLO algorithm popularly adopted in Computer Vision. We convert the detection of acoustic boundaries into a regression problem instead of frame-based classification. This is done by having separate output neurons to detect the presence of an audio class and predict its start and end points. The relative improvement for F-measure of YOHO, compared to the state-of-the-art Convolutional Recurrent Neural Network, ranged from 1% to 6% across multiple datasets for audio segmentation and sound event detection. As the output of YOHO is more end-to-end and has fewer neurons to predict, the speed of inference is at least 6 times faster than segmentation-by-classification. In addition, as this approach predicts acoustic boundaries directly, the post-processing and smoothing is about 7 times faster.
dc.format.extent	3293-3293
dc.language	en
dc.language.iso	en
dc.publisher	MDPI
dc.subject	audio segmentation
dc.subject	sound event detection
dc.subject	you only look once
dc.subject	deep learning
dc.subject	regression
dc.subject	convolutional neural network
dc.subject	music-speech detection
dc.subject	convolutional recurrent neural network
dc.subject	radio
dc.title	You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection
dc.type	journal-article
dc.type	Journal Article
plymouth.author-url	https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000780536900001&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=11bb513d99f797142bcfeffcc58ea008
plymouth.issue	7
plymouth.volume	12
plymouth.publication-status	Published online
plymouth.journal	Applied Sciences
dc.identifier.doi	10.3390/app12073293
plymouth.organisational-group	/Plymouth
plymouth.organisational-group	/Plymouth/Faculty of Arts, Humanities and Business
plymouth.organisational-group	/Plymouth/Faculty of Arts, Humanities and Business/School of Society and Culture
plymouth.organisational-group	/Plymouth/REF 2021 Researchers by UoA
plymouth.organisational-group	/Plymouth/REF 2021 Researchers by UoA/UoA33 Music, Drama, Dance, Performing Arts, Film and Screen Studies
plymouth.organisational-group	/Plymouth/Users by role
plymouth.organisational-group	/Plymouth/Users by role/Academics
dcterms.dateAccepted	2022-03-22
dc.rights.embargodate	2022-4-5
dc.identifier.eissn	2076-3417
dc.rights.embargoperiod	Not known
rioxxterms.funder	Engineering and Physical Sciences Research Council
rioxxterms.identifier.project	Radio Me: Real-time Radio Remixing for people with mild to moderate dementia who live alone, incorporating Agitation Reduction, and Reminders
rioxxterms.versionofrecord	10.3390/app12073293
rioxxterms.licenseref.uri	http://www.rioxx.net/licenses/all-rights-reserved
rioxxterms.licenseref.startdate	2022-03-24
rioxxterms.type	Journal Article/Review
plymouth.funder	Radio Me: Real-time Radio Remixing for people with mild to moderate dementia who live alone, incorporating Agitation Reduction, and Reminders::Engineering and Physical Sciences Research Council

Files in this item

Name:: applsci-12-03293-v2.pdf
Size:: 1023.Kb
Format:: PDF

View/Open

Name:: UoP_Deposit_Agreement v1.1 ...
Size:: 125.4Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

School of Society and Culture

Show simple item record