Automated Identification of Digital Evidence across Heterogeneous Data Resources

Mohammed, Hussam J

dc.contributor.supervisor	Clarke, Nathan
dc.contributor.author	Mohammed, Hussam J
dc.contributor.other	School of Engineering, Computing and Mathematics	en_US
dc.date.accessioned	2018-11-19T17:42:39Z
dc.date.issued	2018
dc.identifier	10494170	en_US
dc.identifier.uri	http://hdl.handle.net/10026.1/12839
dc.description.abstract	Digital forensics has become an increasingly important tool in the fight against cyber and computer-assisted crime. However, with an increasing range of technologies at people’s disposal, investigators find themselves having to process and analyse many systems with large volumes of data (e.g., PCs, laptops, tablets, and smartphones) within a single case. Unfortunately, current digital forensic tools operate in an isolated manner, investigating systems and applications individually. The heterogeneity and volume of evidence place time constraints and a significant burden on investigators. Examples of heterogeneity include applications such as messaging (e.g., iMessenger, Viber, Snapchat, and WhatsApp), web browsers (e.g., Firefox and Google Chrome), and file systems (e.g., NTFS, FAT, and HFS). Being able to analyse and investigate evidence from across devices and applications in a universal and harmonized fashion would enable investigators to query all data at once. In addition, successfully prioritizing evidence and reducing the volume of data to be analysed reduces the time taken and cognitive load on the investigator. This thesis focuses on the examination and analysis phases of the digital investigation process. It explores the feasibility of dealing with big and heterogeneous data sources in order to correlate the evidence from across these evidential sources in an automated way. Therefore, a novel approach was developed to solve the heterogeneity issues of big data using three developed algorithms. The three algorithms include the harmonising, clustering, and automated identification of evidence (AIE) algorithms. The harmonisation algorithm seeks to provide an automated framework to merge similar datasets by characterising similar metadata categories and then harmonising them in a single dataset. This algorithm overcomes heterogeneity issues and makes the examination and analysis easier by analysing and investigating the evidential artefacts across devices and applications based on the categories to query data at once. Based on the merged datasets, the clustering algorithm is used to identify the evidential files and isolate the non-related files based on their metadata. Afterwards, the AIE algorithm tries to identify the cluster holding the largest number of evidential artefacts through searching based on two methods: criminal profiling activities and some information from the criminals themselves. Then, the related clusters are identified through timeline analysis and a search of associated artefacts of the files within the first cluster. A series of experiments using real-life forensic datasets were conducted to evaluate the algorithms across five different categories of datasets (i.e., messaging, graphical files, file system, internet history, and emails), each containing data from different applications across different devices. The results of the characterisation and harmonisation process show that the algorithm can merge all fields successfully, with the exception of some binary-based data found within the messaging datasets (contained within Viber and SMS). The error occurred because of a lack of information for the characterisation process to make a useful determination. However, on further analysis, it was found that the error had a minimal impact on subsequent merged data. The results of the clustering process and AIE algorithm showed the two algorithms can collaborate and identify more than 92% of evidential files.	en_US
dc.description.sponsorship	HCED Iraq	en_US
dc.language.iso	en
dc.publisher	University of Plymouth
dc.subject	Digital Forensics	en_US
dc.subject	HETEROGENEOUS DATA	en_US
dc.subject.classification	PhD	en_US
dc.title	Automated Identification of Digital Evidence across Heterogeneous Data Resources	en_US
dc.type	Thesis
plymouth.version	publishable	en_US
dc.identifier.doi	http://dx.doi.org/10.24382/693
dc.rights.embargodate	2019-11-19T17:42:39Z
dc.rights.embargoperiod	12 months	en_US
dc.type.qualification	Doctorate	en_US
rioxxterms.version	NA

Files in this item

Name:: 2018Mohammed10494170PhD.pdf
Size:: 2.972Mb
Format:: PDF
Description:: full version

View/Open

Name:: license.txt
Size:: 3.016Kb
Format:: Text file

View/Open

This item appears in the following Collection(s)

01 Research Theses Main Collection
Research Theses Main

Show simple item record