Show simple item record

dc.contributor.authorVranopoulos, G
dc.contributor.authorClarke, Nathan
dc.contributor.authorAtkinson, Shirley
dc.date.accessioned2022-01-10T15:43:38Z
dc.date.available2022-01-10T15:43:38Z
dc.date.issued2022-01-10
dc.identifier.issn2196-1115
dc.identifier.issn2196-1115
dc.identifier.other8
dc.identifier.urihttp://hdl.handle.net/10026.1/18537
dc.description.abstract

<jats:title>Abstract</jats:title><jats:p>The creation of new knowledge from manipulating and analysing existing knowledge is one of the primary objectives of any cognitive system. Most of the effort on Big Data research has been focussed upon <jats:italic>Volume</jats:italic> and <jats:italic>Velocity</jats:italic>, while <jats:italic>Variety</jats:italic>, “the ugly duckling” of Big Data, is often neglected and difficult to solve. A principal challenge with <jats:italic>Variety</jats:italic> is being able to understand and comprehend the data. This paper proposes and evaluates an automated approach for metadata identification and enrichment in describing Big Data. The paper focuses on the use of self-learning systems that will enable automatic compliance of data against regulatory requirements along with the capability of generating valuable and readily usable metadata towards data classification. Two experiments towards data confidentiality and data identification were conducted in evaluating the feasibility of the approach. The focus of the experiments was to confirm that repetitive manual tasks can be automated, thus reducing the focus of a Data Scientist on data identification and thereby providing more focus towards the extraction and analysis of the data itself. The origin of the datasets used were Private/Business and Public/Governmental and exhibited diverse characteristics in relation to the number of files and size of the files. The experimental work confirmed that: (a) the use of algorithmic techniques attributed to the substantial decrease in false positives regarding the identification of confidential information; (b) evidence that the use of a fraction of a data set along with statistical analysis and supervised learning is sufficient in identifying the structure of information within it. With this approach, the issues of understanding the nature of data can be mitigated, enabling a greater focus on meaningful interpretation of the heterogeneous data.</jats:p>

dc.format.extent8-
dc.languageen
dc.language.isoen
dc.publisherSpringerOpen
dc.subjectBig Data
dc.subjectVariety
dc.subjectData characterization
dc.subjectData origination
dc.subjectData Format
dc.subjectData confidentiality
dc.subjectDelimiter determination
dc.subjectMetadata
dc.subjectContextual integrity
dc.titleAddressing big data variety using an automated approach for data characterization
dc.typejournal-article
dc.typeJournal Article
plymouth.author-urlhttps://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000741007800001&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=11bb513d99f797142bcfeffcc58ea008
plymouth.issue1
plymouth.volume9
plymouth.publication-statusPublished
plymouth.journalJournal Of Big Data
dc.identifier.doi10.1186/s40537-021-00554-3
plymouth.organisational-group/Plymouth
plymouth.organisational-group/Plymouth/Faculty of Science and Engineering
plymouth.organisational-group/Plymouth/Faculty of Science and Engineering/School of Engineering, Computing and Mathematics
plymouth.organisational-group/Plymouth/REF 2021 Researchers by UoA
plymouth.organisational-group/Plymouth/REF 2021 Researchers by UoA/UoA11 Computer Science and Informatics
plymouth.organisational-group/Plymouth/Users by role
plymouth.organisational-group/Plymouth/Users by role/Academics
dcterms.dateAccepted2021-12-19
dc.rights.embargodate2022-1-11
dc.identifier.eissn2196-1115
dc.rights.embargoperiodNot known
rioxxterms.versionofrecord10.1186/s40537-021-00554-3
rioxxterms.licenseref.urihttp://www.rioxx.net/licenses/all-rights-reserved
rioxxterms.licenseref.startdate2022-01-10
rioxxterms.typeJournal Article/Review


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record


All items in PEARL are protected by copyright law.
Author manuscripts deposited to comply with open access mandates are made available in accordance with publisher policies. Please cite only the published version using the details provided on the item record or document. In the absence of an open licence (e.g. Creative Commons), permissions for further reuse of content should be sought from the publisher or author.
Theme by 
Atmire NV