Erin Browne


Despite current efforts to study deep-sea life, there is a dependency on technological advancements to asses it at a scale and pace required to inform effective management and conservation. Data collection platforms, such as Remotely Operated Vehicles (ROVs) are routinely applied to studying the deep sea due to their ability to collect large image-based datasets. Thus, data collection in the deep sea is becoming less of a problem in comparison to data interpretation, where terabytes of video data is collected during expeditions and manual interpretation of this is currently the standard procedure. Deep learning (DL), a sub-field of artificial intelligence (AI) has potential to address this particular issue thanks to its ability to analyse vast image-based datasets with minimal human interaction required, often exceeding efficiency, and a classification accuracy near-equivalent with human experts. This thesis investigates how DL, in particular Convolutional Neural Networks (CNNs), have progressed to the point of potential application in deep-sea research for detection and classification of organisms in image-based datasets. Different methodological approaches to training “off-the-shelf” CNNs on ROV datasets are assessed, with the aim to inform marine scientists, with little background in computer science, what steps and considerations are required when using CNNs for such tasks. In conjunction, a potential pipeline to perform real-time detection and classification during research expeditions is outlined. The research conducted in this thesis suggests CNN architectures perform differently given different training approaches and training image datasets, each with their own trade-offs. Maximum performances were achieved using the You Only Look Once (YOLO) version 3 architecture and a train from scratch (TS) approach, with improvements seen when using the pre-processed training image dataset. This gave 93% recall and 63% precision in detection of areas of presence-absence, and a strong correlation of estimated counts of S. fragilissima with manual counts (73%). Overall, results suggest that classifiers performance was mostly affected by the architecture type (version 3 and 4) and pre-processing steps chosen. However difference in the standard computer vision (CV) metrics assessed are minimal, meaning more simplistic approaches could be used, streamlining the procedure for non-experts. The pipeline for real-time detection and classification of Syringammina fragilissima on ROV livestream on-board a vessel performed efficiently at 25 frames per second (FPS) requiring no more than 12GB video RAM (VRAM) of a NVIDIA GeForce RTX 3090 Graphical Processing Unit (GPU); making it an achievable and cost-effective set-up for scientist on lower budgets. Despite these results it is noted that even the maximum performing classifier stills attains false positives and false negatives, meaning for reliable ecological metrics a degree of human intervention to check the data is required. This suggests that the described pipeline can achieve real-time detection on-board a vessel, however the training of the classifier impacts its performance. Thus, making the dataset used to train the CNNs integral in its performance. Studies in understanding the impact of pre-processing of training imagery datasets is a key area to focus on in the future regarding improvements to “off-the-shelf” CNNs as these are more user-friendly to implement than having to design a personalised CNN architecture. This provides a stepping stone for most non-experts in using such advanced analytical tools, and could lead to major increases in data availability for conservation and management of the deep seas.

Document Type


Publication Date