Show simple item record

dc.contributor.authorVenkatesh, S
dc.contributor.authorMoffat, D
dc.contributor.authorMiranda, Eduardo
dc.date.accessioned2022-04-01T18:00:38Z
dc.date.available2022-04-01T18:00:38Z
dc.date.issued2022-03-24
dc.identifier.issn2076-3417
dc.identifier.issn2076-3417
dc.identifier.other3293
dc.identifier.urihttp://hdl.handle.net/10026.1/18986
dc.description.abstract

Audio segmentation and sound event detection are crucial topics in machine listening that aim to detect acoustic classes and their respective boundaries. It is useful for audio-content analysis, speech recognition, audio-indexing, and music information retrieval. In recent years, most research articles adopt segmentation-by-classification. This technique divides audio into small frames and individually performs classification on these frames. In this paper, we present a novel approach called You Only Hear Once (YOHO), which is inspired by the YOLO algorithm popularly adopted in Computer Vision. We convert the detection of acoustic boundaries into a regression problem instead of frame-based classification. This is done by having separate output neurons to detect the presence of an audio class and predict its start and end points. The relative improvement for F-measure of YOHO, compared to the state-of-the-art Convolutional Recurrent Neural Network, ranged from 1% to 6% across multiple datasets for audio segmentation and sound event detection. As the output of YOHO is more end-to-end and has fewer neurons to predict, the speed of inference is at least 6 times faster than segmentation-by-classification. In addition, as this approach predicts acoustic boundaries directly, the post-processing and smoothing is about 7 times faster.

dc.format.extent3293-3293
dc.languageen
dc.language.isoen
dc.publisherMDPI
dc.subjectaudio segmentation
dc.subjectsound event detection
dc.subjectyou only look once
dc.subjectdeep learning
dc.subjectregression
dc.subjectconvolutional neural network
dc.subjectmusic-speech detection
dc.subjectconvolutional recurrent neural network
dc.subjectradio
dc.titleYou Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection
dc.typejournal-article
dc.typeJournal Article
plymouth.author-urlhttps://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000780536900001&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=11bb513d99f797142bcfeffcc58ea008
plymouth.issue7
plymouth.volume12
plymouth.publication-statusPublished online
plymouth.journalApplied Sciences
dc.identifier.doi10.3390/app12073293
plymouth.organisational-group/Plymouth
plymouth.organisational-group/Plymouth/Faculty of Arts, Humanities and Business
plymouth.organisational-group/Plymouth/Faculty of Arts, Humanities and Business/School of Society and Culture
plymouth.organisational-group/Plymouth/REF 2021 Researchers by UoA
plymouth.organisational-group/Plymouth/REF 2021 Researchers by UoA/UoA33 Music, Drama, Dance, Performing Arts, Film and Screen Studies
plymouth.organisational-group/Plymouth/Users by role
plymouth.organisational-group/Plymouth/Users by role/Academics
dcterms.dateAccepted2022-03-22
dc.rights.embargodate2022-4-5
dc.identifier.eissn2076-3417
dc.rights.embargoperiodNot known
rioxxterms.funderEngineering and Physical Sciences Research Council
rioxxterms.identifier.projectRadio Me: Real-time Radio Remixing for people with mild to moderate dementia who live alone, incorporating Agitation Reduction, and Reminders
rioxxterms.versionofrecord10.3390/app12073293
rioxxterms.licenseref.urihttp://www.rioxx.net/licenses/all-rights-reserved
rioxxterms.licenseref.startdate2022-03-24
rioxxterms.typeJournal Article/Review
plymouth.funderRadio Me: Real-time Radio Remixing for people with mild to moderate dementia who live alone, incorporating Agitation Reduction, and Reminders::Engineering and Physical Sciences Research Council


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record


All items in PEARL are protected by copyright law.
Author manuscripts deposited to comply with open access mandates are made available in accordance with publisher policies. Please cite only the published version using the details provided on the item record or document. In the absence of an open licence (e.g. Creative Commons), permissions for further reuse of content should be sought from the publisher or author.
Theme by 
Atmire NV