Investigating the Effects of Training Set Synthesis for Audio Segmentation of Radio Broadcast
dc.contributor.author | Venkatesh, S | |
dc.contributor.author | Moffat, David | |
dc.contributor.author | Miranda, Eduardo | |
dc.contributor.editor | Lerch A | |
dc.contributor.editor | Knees P | |
dc.date.accessioned | 2021-03-31T11:03:47Z | |
dc.date.issued | 2021-03-31 | |
dc.identifier.issn | 2079-9292 | |
dc.identifier.issn | 2079-9292 | |
dc.identifier.other | 827 | |
dc.identifier.uri | http://hdl.handle.net/10026.1/17008 | |
dc.description | Special Issue "Machine Learning Applied to Music/Audio Signal Processing" | |
dc.description.abstract |
Music and speech detection provides us valuable information regarding the nature of content in broadcast audio. It helps detect acoustic regions that contain speech, voice over music, only music, or silence. In recent years, there have been developments in machine learning algorithms to accomplish this task. However, broadcast audio is generally well-mixed and copyrighted, which makes it challenging to share across research groups. In this study, we address the challenges encountered in automatically synthesising data that resembles a radio broadcast. Firstly, we compare state-of-the-art neural network architectures such as CNN, GRU, LSTM, TCN, and CRNN. Later, we investigate how audio ducking of background music impacts the precision and recall of the machine learning algorithm. Thirdly, we examine how the quantity of synthetic training data impacts the results. Finally, we evaluate the effectiveness of synthesised, real-world, and combined approaches for training models, to understand if the synthetic data presents any additional value. Amongst the network architectures, CRNN was the best performing network. Results also show that the minimum level of audio ducking preferred by the machine learning algorithm was similar to that of human listeners. After testing our model on in-house and public datasets, we observe that our proposed synthesis technique outperforms real-world data in some cases and serves as a promising alternative. | |
dc.format.extent | 827-827 | |
dc.language | en | |
dc.language.iso | en | |
dc.publisher | MDPI | |
dc.subject | audio classification | |
dc.subject | audio ducking | |
dc.subject | audio segmentation | |
dc.subject | automatic mixing | |
dc.subject | Convolutional Recurrent Neural Network | |
dc.subject | deep learning | |
dc.subject | music information retrieval | |
dc.subject | music-speech detection | |
dc.subject | radio | |
dc.subject | training set synthesis | |
dc.title | Investigating the Effects of Training Set Synthesis for Audio Segmentation of Radio Broadcast | |
dc.type | journal-article | |
dc.type | Journal Article | |
plymouth.author-url | https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000638313000001&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=11bb513d99f797142bcfeffcc58ea008 | |
plymouth.issue | 7 | |
plymouth.volume | 10 | |
plymouth.publisher-url | https://www.mdpi.com/2079-9292/10/7/827 | |
plymouth.publication-status | Published online | |
plymouth.journal | Electronics | |
dc.identifier.doi | 10.3390/electronics10070827 | |
plymouth.organisational-group | /Plymouth | |
plymouth.organisational-group | /Plymouth/Faculty of Arts, Humanities and Business | |
plymouth.organisational-group | /Plymouth/Faculty of Arts, Humanities and Business/School of Society and Culture | |
plymouth.organisational-group | /Plymouth/REF 2021 Researchers by UoA | |
plymouth.organisational-group | /Plymouth/REF 2021 Researchers by UoA/UoA33 Music, Drama, Dance, Performing Arts, Film and Screen Studies | |
plymouth.organisational-group | /Plymouth/Users by role | |
plymouth.organisational-group | /Plymouth/Users by role/Academics | |
dcterms.dateAccepted | 2021-03-28 | |
dc.rights.embargodate | 2021-4-1 | |
dc.identifier.eissn | 2079-9292 | |
dc.rights.embargoperiod | Not known | |
rioxxterms.funder | Engineering and Physical Sciences Research Council | |
rioxxterms.identifier.project | Radio Me: Real-time Radio Remixing for people with mild to moderate dementia who live alone, incorporating Agitation Reduction, and Reminders | |
rioxxterms.version | Version of Record | |
rioxxterms.versionofrecord | 10.3390/electronics10070827 | |
rioxxterms.licenseref.uri | http://www.rioxx.net/licenses/all-rights-reserved | |
rioxxterms.licenseref.startdate | 2021-03-31 | |
rioxxterms.type | Journal Article/Review | |
plymouth.funder | Radio Me: Real-time Radio Remixing for people with mild to moderate dementia who live alone, incorporating Agitation Reduction, and Reminders::Engineering and Physical Sciences Research Council |