You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection
dc.contributor.author | Venkatesh, S | |
dc.contributor.author | Moffat, D | |
dc.contributor.author | Miranda, Eduardo | |
dc.date.accessioned | 2022-04-01T18:00:38Z | |
dc.date.available | 2022-04-01T18:00:38Z | |
dc.date.issued | 2022-03-24 | |
dc.identifier.issn | 2076-3417 | |
dc.identifier.issn | 2076-3417 | |
dc.identifier.other | 3293 | |
dc.identifier.uri | http://hdl.handle.net/10026.1/18986 | |
dc.description.abstract |
Audio segmentation and sound event detection are crucial topics in machine listening that aim to detect acoustic classes and their respective boundaries. It is useful for audio-content analysis, speech recognition, audio-indexing, and music information retrieval. In recent years, most research articles adopt segmentation-by-classification. This technique divides audio into small frames and individually performs classification on these frames. In this paper, we present a novel approach called You Only Hear Once (YOHO), which is inspired by the YOLO algorithm popularly adopted in Computer Vision. We convert the detection of acoustic boundaries into a regression problem instead of frame-based classification. This is done by having separate output neurons to detect the presence of an audio class and predict its start and end points. The relative improvement for F-measure of YOHO, compared to the state-of-the-art Convolutional Recurrent Neural Network, ranged from 1% to 6% across multiple datasets for audio segmentation and sound event detection. As the output of YOHO is more end-to-end and has fewer neurons to predict, the speed of inference is at least 6 times faster than segmentation-by-classification. In addition, as this approach predicts acoustic boundaries directly, the post-processing and smoothing is about 7 times faster. | |
dc.format.extent | 3293-3293 | |
dc.language | en | |
dc.language.iso | en | |
dc.publisher | MDPI | |
dc.subject | audio segmentation | |
dc.subject | sound event detection | |
dc.subject | you only look once | |
dc.subject | deep learning | |
dc.subject | regression | |
dc.subject | convolutional neural network | |
dc.subject | music-speech detection | |
dc.subject | convolutional recurrent neural network | |
dc.subject | radio | |
dc.title | You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection | |
dc.type | journal-article | |
dc.type | Journal Article | |
plymouth.author-url | https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000780536900001&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=11bb513d99f797142bcfeffcc58ea008 | |
plymouth.issue | 7 | |
plymouth.volume | 12 | |
plymouth.publication-status | Published online | |
plymouth.journal | Applied Sciences | |
dc.identifier.doi | 10.3390/app12073293 | |
plymouth.organisational-group | /Plymouth | |
plymouth.organisational-group | /Plymouth/Faculty of Arts, Humanities and Business | |
plymouth.organisational-group | /Plymouth/Faculty of Arts, Humanities and Business/School of Society and Culture | |
plymouth.organisational-group | /Plymouth/REF 2021 Researchers by UoA | |
plymouth.organisational-group | /Plymouth/REF 2021 Researchers by UoA/UoA33 Music, Drama, Dance, Performing Arts, Film and Screen Studies | |
plymouth.organisational-group | /Plymouth/Users by role | |
plymouth.organisational-group | /Plymouth/Users by role/Academics | |
dcterms.dateAccepted | 2022-03-22 | |
dc.rights.embargodate | 2022-4-5 | |
dc.identifier.eissn | 2076-3417 | |
dc.rights.embargoperiod | Not known | |
rioxxterms.funder | Engineering and Physical Sciences Research Council | |
rioxxterms.identifier.project | Radio Me: Real-time Radio Remixing for people with mild to moderate dementia who live alone, incorporating Agitation Reduction, and Reminders | |
rioxxterms.versionofrecord | 10.3390/app12073293 | |
rioxxterms.licenseref.uri | http://www.rioxx.net/licenses/all-rights-reserved | |
rioxxterms.licenseref.startdate | 2022-03-24 | |
rioxxterms.type | Journal Article/Review | |
plymouth.funder | Radio Me: Real-time Radio Remixing for people with mild to moderate dementia who live alone, incorporating Agitation Reduction, and Reminders::Engineering and Physical Sciences Research Council |