Show simple item record

dc.contributor.supervisorClarke, Nathan
dc.contributor.authorVranopoulos, Georgios
dc.contributor.otherSchool of Engineering, Computing & Mathematicsen_US
dc.date.accessioned2022-12-21T13:56:12Z
dc.date.issued2022
dc.identifier10523677en_US
dc.identifier.urihttp://hdl.handle.net/10026.1/20099
dc.description.abstract

The creation of new knowledge from manipulating and analysing existing knowledge is one of the primary objectives of any cognitive system. Out of the Big Data governing Vs, namely Volume, Velocity, Variety, Veracity, Validity, Volatility and Value, the first three are considered the primary ones. Most of the effort on Big Data research has been focussed upon Volume and Velocity, while Variety, “the ugly duckling” of Big Data, is often neglected and difficult to solve. A principal challenge with Variety is being able to understand and comprehend the data in gaining insight. Organisations have been investing in analytics relying on internal and external data to gain a competitive advantage. However, the legal and regulatory acts imposed nationally and internationally have become a challenge.

The approach focuses on the use of self-learning systems that will enable automatic compliance of data against regulatory requirements along with the capability of generating valuable and readily usable metadata towards data classification. While for data confidentiality, a framework that utilises algorithmic classification and workflow capabilities is proposed. Such a rule-based system, implementing the corporate data classification policy, will minimise the risk of exposure by facilitating users to identify the approved guidelines and enforce them quickly.

Two experiments towards confidential data identification and data characterisation were conducted in evaluating the feasibility of the approach. The focus of the experiments was to confirm that repetitive manual tasks can be automated, thus reducing the focus of a Data Scientist on data identification and thereby providing more focus towards the extraction and analysis of the data itself. In addition to that, a survey with subject matter experts, a diverse audience of academics and senior business executives in the fields of security and data management, was conducted featuring and evaluating a working prototype. The proof-of-concept showcased the model’s capabilities and provided a hands-on experience for expert to better understand the proposal.

The experimental work confirmed that: a) the use of algorithmic techniques attributed to the substantial decrease in false positives regarding the identification of confidential information; b) evidence that the use of a fraction of a data set, along with statistical analysis and supervised learning is sufficient in identifying the structure of information within it; c) the model for corporate confidentiality is viable and the proposed features of the system are of value.

With this proposal, the issues of understanding the nature of data can be mitigated, enabling a greater focus on meaningful interpretation of the heterogeneous data, while at the same time the organisations can secure their data and confirm data confidentiality and compliance.

en_US
dc.language.isoen
dc.publisherUniversity of Plymouth
dc.rightsCC0 1.0 Universal*
dc.rights.urihttp://creativecommons.org/publicdomain/zero/1.0/*
dc.subjectBig Dataen_US
dc.subjectVarietyen_US
dc.subjectBooster Metricsen_US
dc.subjectData Characterisationen_US
dc.subjectData Confidentialityen_US
dc.subjectData Loss Preventionen_US
dc.subject.classificationPhDen_US
dc.titleTackling Big Data Variety using Metadataen_US
dc.typeThesis
plymouth.versionpublishableen_US
dc.identifier.doihttp://dx.doi.org/10.24382/474
dc.rights.embargodate2023-12-21T13:56:12Z
dc.rights.embargoperiod12 monthsen_US
dc.type.qualificationDoctorateen_US
rioxxterms.versionNA
plymouth.orcid_id0000-0002-2874-6459en_US


Files in this item

Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

CC0 1.0 Universal
Except where otherwise noted, this item's license is described as CC0 1.0 Universal

All items in PEARL are protected by copyright law.
Author manuscripts deposited to comply with open access mandates are made available in accordance with publisher policies. Please cite only the published version using the details provided on the item record or document. In the absence of an open licence (e.g. Creative Commons), permissions for further reuse of content should be sought from the publisher or author.
Theme by 
Atmire NV