Show simple item record

dc.contributor.authorVenkatesh, S
dc.contributor.authorMoffat, David
dc.contributor.authorMiranda, Eduardo
dc.contributor.editorLerch A
dc.contributor.editorKnees P
dc.date.accessioned2021-03-31T11:03:47Z
dc.date.issued2021-03-31
dc.identifier.issn2079-9292
dc.identifier.issn2079-9292
dc.identifier.other827
dc.identifier.urihttp://hdl.handle.net/10026.1/17008
dc.descriptionSpecial Issue "Machine Learning Applied to Music/Audio Signal Processing"
dc.description.abstract

Music and speech detection provides us valuable information regarding the nature of content in broadcast audio. It helps detect acoustic regions that contain speech, voice over music, only music, or silence. In recent years, there have been developments in machine learning algorithms to accomplish this task. However, broadcast audio is generally well-mixed and copyrighted, which makes it challenging to share across research groups. In this study, we address the challenges encountered in automatically synthesising data that resembles a radio broadcast. Firstly, we compare state-of-the-art neural network architectures such as CNN, GRU, LSTM, TCN, and CRNN. Later, we investigate how audio ducking of background music impacts the precision and recall of the machine learning algorithm. Thirdly, we examine how the quantity of synthetic training data impacts the results. Finally, we evaluate the effectiveness of synthesised, real-world, and combined approaches for training models, to understand if the synthetic data presents any additional value. Amongst the network architectures, CRNN was the best performing network. Results also show that the minimum level of audio ducking preferred by the machine learning algorithm was similar to that of human listeners. After testing our model on in-house and public datasets, we observe that our proposed synthesis technique outperforms real-world data in some cases and serves as a promising alternative.

dc.format.extent827-827
dc.languageen
dc.language.isoen
dc.publisherMDPI
dc.subjectaudio classification
dc.subjectaudio ducking
dc.subjectaudio segmentation
dc.subjectautomatic mixing
dc.subjectConvolutional Recurrent Neural Network
dc.subjectdeep learning
dc.subjectmusic information retrieval
dc.subjectmusic-speech detection
dc.subjectradio
dc.subjecttraining set synthesis
dc.titleInvestigating the Effects of Training Set Synthesis for Audio Segmentation of Radio Broadcast
dc.typejournal-article
dc.typeJournal Article
plymouth.author-urlhttps://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000638313000001&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=11bb513d99f797142bcfeffcc58ea008
plymouth.issue7
plymouth.volume10
plymouth.publisher-urlhttps://www.mdpi.com/2079-9292/10/7/827
plymouth.publication-statusPublished online
plymouth.journalElectronics
dc.identifier.doi10.3390/electronics10070827
plymouth.organisational-group/Plymouth
plymouth.organisational-group/Plymouth/Faculty of Arts, Humanities and Business
plymouth.organisational-group/Plymouth/Faculty of Arts, Humanities and Business/School of Society and Culture
plymouth.organisational-group/Plymouth/REF 2021 Researchers by UoA
plymouth.organisational-group/Plymouth/REF 2021 Researchers by UoA/UoA33 Music, Drama, Dance, Performing Arts, Film and Screen Studies
plymouth.organisational-group/Plymouth/Users by role
plymouth.organisational-group/Plymouth/Users by role/Academics
dcterms.dateAccepted2021-03-28
dc.rights.embargodate2021-4-1
dc.identifier.eissn2079-9292
dc.rights.embargoperiodNot known
rioxxterms.funderEngineering and Physical Sciences Research Council
rioxxterms.identifier.projectRadio Me: Real-time Radio Remixing for people with mild to moderate dementia who live alone, incorporating Agitation Reduction, and Reminders
rioxxterms.versionVersion of Record
rioxxterms.versionofrecord10.3390/electronics10070827
rioxxterms.licenseref.urihttp://www.rioxx.net/licenses/all-rights-reserved
rioxxterms.licenseref.startdate2021-03-31
rioxxterms.typeJournal Article/Review
plymouth.funderRadio Me: Real-time Radio Remixing for people with mild to moderate dementia who live alone, incorporating Agitation Reduction, and Reminders::Engineering and Physical Sciences Research Council


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record


All items in PEARL are protected by copyright law.
Author manuscripts deposited to comply with open access mandates are made available in accordance with publisher policies. Please cite only the published version using the details provided on the item record or document. In the absence of an open licence (e.g. Creative Commons), permissions for further reuse of content should be sought from the publisher or author.
Theme by 
Atmire NV