Abstract

Digital forensics has become an increasingly important tool in the fight against cyber and computer-assisted crime. However, with an increasing range of technologies at people’s disposal, investigators find themselves having to process and analyse many systems with large volumes of data (e.g., PCs, laptops, tablets, and smartphones) within a single case. Unfortunately, current digital forensic tools operate in an isolated manner, investigating systems and applications individually. The heterogeneity and volume of evidence place time constraints and a significant burden on investigators. Examples of heterogeneity include applications such as messaging (e.g., iMessenger, Viber, Snapchat, and WhatsApp), web browsers (e.g., Firefox and Google Chrome), and file systems (e.g., NTFS, FAT, and HFS). Being able to analyse and investigate evidence from across devices and applications in a universal and harmonized fashion would enable investigators to query all data at once. In addition, successfully prioritizing evidence and reducing the volume of data to be analysed reduces the time taken and cognitive load on the investigator. This thesis focuses on the examination and analysis phases of the digital investigation process. It explores the feasibility of dealing with big and heterogeneous data sources in order to correlate the evidence from across these evidential sources in an automated way. Therefore, a novel approach was developed to solve the heterogeneity issues of big data using three developed algorithms. The three algorithms include the harmonising, clustering, and automated identification of evidence (AIE) algorithms. The harmonisation algorithm seeks to provide an automated framework to merge similar datasets by characterising similar metadata categories and then harmonising them in a single dataset. This algorithm overcomes heterogeneity issues and makes the examination and analysis easier by analysing and investigating the evidential artefacts across devices and applications based on the categories to query data at once. Based on the merged datasets, the clustering algorithm is used to identify the evidential files and isolate the non-related files based on their metadata. Afterwards, the AIE algorithm tries to identify the cluster holding the largest number of evidential artefacts through searching based on two methods: criminal profiling activities and some information from the criminals themselves. Then, the related clusters are identified through timeline analysis and a search of associated artefacts of the files within the first cluster. A series of experiments using real-life forensic datasets were conducted to evaluate the algorithms across five different categories of datasets (i.e., messaging, graphical files, file system, internet history, and emails), each containing data from different applications across different devices. The results of the characterisation and harmonisation process show that the algorithm can merge all fields successfully, with the exception of some binary-based data found within the messaging datasets (contained within Viber and SMS). The error occurred because of a lack of information for the characterisation process to make a useful determination. However, on further analysis, it was found that the error had a minimal impact on subsequent merged data. The results of the clustering process and AIE algorithm showed the two algorithms can collaborate and identify more than 92% of evidential files.

Awarding Institution(s)

University of Plymouth

Supervisor

Nathan Clarke, Fudong Li

Keywords

Digital Forensics, HETEROGENEOUS DATA

Document Type

Thesis

Publication Date

2018

Deposit Date

June 2024

Additional Links

http://dx.doi.org/10.24382/693

Recommended Citation

Mohammed, H. (2018) Automated Identification of Digital Evidence across Heterogeneous Data Resources. Thesis. University of Plymouth. Available at: http://dx.doi.org/10.24382/693

Download

Additional Files

license.txt (3 kB)

COinS

School of Engineering, Computing and Mathematics Theses

Automated Identification of Digital Evidence across Heterogeneous Data Resources

Abstract

Awarding Institution(s)

Supervisor

Keywords

Document Type

Publication Date

Deposit Date

Additional Links

Recommended Citation

Additional Files

Search

Browse

About

Links

School of Engineering, Computing and Mathematics Theses

Automated Identification of Digital Evidence across Heterogeneous Data Resources

Authors

Abstract

Awarding Institution(s)

Supervisor

Keywords

Document Type

Publication Date

Deposit Date

Additional Links

Recommended Citation

Additional Files

Share

Search

Browse

About

Links