Show simple item record

dc.contributor.supervisorMiranda, Eduardo
dc.contributor.authorVenkatesh, Satvik
dc.contributor.otherSchool of Society and Cultureen_US
dc.date.accessioned2022-12-19T10:09:16Z
dc.date.available2022-12-19T10:09:16Z
dc.date.issued2022
dc.identifier10577651en_US
dc.identifier.urihttp://hdl.handle.net/10026.1/20092
dc.description.abstract

Audio segmentation divides an audio signal into homogenous sections such as music and speech. It is useful as a preprocessing step to index, store, and modify audio recordings, radio broadcasts and TV programmes. Machine learning models for audio segmentation are generally trained on copyrighted material, which cannot be shared across research groups. Furthermore, annotating these datasets is a time-consuming and expensive task. In this thesis, we present a novel approach that artificially synthesises data that resembles radio signals. We replicate the workflow of a radio DJ in mixing audio and investigate parameters like fade curves and audio ducking. Using this approach, we obtained state-of-the-art performance for music-speech detection on in-house and public datasets. After demonstrating the efficacy of training set synthesis, we investigate how audio ducking of background music impacts the precision and recall of the machine learning algorithm. Interestingly, we observed that the minimum level of audio ducking preferred by the machine learning algorithm was similar to that of human listeners. Furthermore, we observe that our proposed synthesis technique outperforms real-world data in some cases and serves as a promising alternative. This project also proposes a novel deep learning system called You Only Hear Once (YOHO), which is inspired by the YOLO algorithm popularly adopted in Computer Vision. We convert the detection of acoustic boundaries into a regression problem instead of frame-based classification. The relative improvement for F-measure of YOHO, compared to the state-of-the-art Convolutional Recurrent Neural Network, ranged from 1% to 6% across multiple datasets. As YOHO predicts acoustic boundaries directly, the speed of inference and post-processing steps are 6 times faster than frame-based classification. Furthermore, we investigate domain generalisation methods such as transfer learning and adversarial training. We demonstrated that these methods helped our algorithm perform better in unseen domains. In addition to audio segmentation, another objective of this project is to explore real-time radio remixing. This is a step towards building a customised radio and consequently, integrating it with the schedule of the listener. The system would remix music from the user’s personal playlist and play snippets of diary reminders at appropriate transition points. The intelligent remixing is governed by the underlying audio segmentation and other deep learning methods. We also explore how individuals can communicate with intelligent mixing systems through non-technical language. We demonstrated that word embeddings help in understanding representations of semantic descriptors.

en_US
dc.language.isoen
dc.publisherUniversity of Plymouth
dc.rightsAttribution 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/us/*
dc.subjectaudio segmentationen_US
dc.subjectsound event detectionen_US
dc.subjectdeep learningen_US
dc.subjectmusic-speech detectionen_US
dc.subjectintelligent remixingen_US
dc.subjectradioen_US
dc.subjectaudio classificationen_US
dc.subjectmusic information retrievalen_US
dc.subject.classificationPhDen_US
dc.titleDeep Learning for Audio Segmentation and Intelligent Remixingen_US
dc.typeThesis
plymouth.versionpublishableen_US
dc.identifier.doihttp://dx.doi.org/10.24382/778
dc.identifier.doihttp://dx.doi.org/10.24382/778
dc.rights.embargoperiodNo embargoen_US
dc.type.qualificationDoctorateen_US
rioxxterms.funderEngineering and Physical Sciences Research Councilen_US
rioxxterms.identifier.projectRadio Me: Real-time Radio Remixing for people with mild to moderate dementia who live alone, incorporating Agitation Reduction, and Remindersen_US
rioxxterms.versionNA
plymouth.orcid.idhttp://orcid.org/0000-0001-5244-3020en_US


Files in this item

Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution 3.0 United States
Except where otherwise noted, this item's license is described as Attribution 3.0 United States

All items in PEARL are protected by copyright law.
Author manuscripts deposited to comply with open access mandates are made available in accordance with publisher policies. Please cite only the published version using the details provided on the item record or document. In the absence of an open licence (e.g. Creative Commons), permissions for further reuse of content should be sought from the publisher or author.
Theme by 
Atmire NV