On Internet Traffic Classification: A Two-Phased Machine Learning Approach

Bakhshi, T; Ghita, B

dc.contributor.author	Bakhshi, T
dc.contributor.author	Ghita, B
dc.date.accessioned	2016-09-09T08:28:00Z
dc.date.accessioned	2017-08-11T09:17:52Z
dc.date.available	2016-09-09T08:28:00Z
dc.date.available	2017-08-11T09:17:52Z
dc.date.issued	2016-07-01
dc.identifier.issn	2090-7141
dc.identifier.issn	2090-715X
dc.identifier.other	2048302
dc.identifier.uri	http://hdl.handle.net/10026.1/9762
dc.description.abstract	<jats:p>Traffic classification utilizing flow measurement enables operators to perform essential network management. Flow accounting methods such as NetFlow are, however, considered inadequate for classification requiring additional packet-level information, host behaviour analysis, and specialized hardware limiting their practical adoption. This paper aims to overcome these challenges by proposing two-phased machine learning classification mechanism with NetFlow as input. The individual flow classes are derived per application through<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M1"><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:math>-means and are further used to train a C5.0 decision tree classifier. As part of validation, the initial unsupervised phase used flow records of fifteen popular Internet applications that were collected and independently subjected to<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M2"><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:math>-means clustering to determine unique flow classes generated per application. The derived flow classes were afterwards used to train and test a supervised C5.0 based decision tree. The resulting classifier reported an average accuracy of 92.37% on approximately 3.4 million test cases increasing to 96.67% with adaptive boosting. The classifier specificity factor which accounted for differentiating content specific from supplementary flows ranged between 98.37% and 99.57%. Furthermore, the computational performance and accuracy of the proposed methodology in comparison with similar machine learning techniques lead us to recommend its extension to other applications in achieving highly granular real-time traffic classification.</jats:p>
dc.format.extent	0-0
dc.language	en
dc.language.iso	en
dc.publisher	Hindawi Limited
dc.relation.replaces	http://hdl.handle.net/10026.1/5424
dc.relation.replaces	10026.1/5424
dc.title	On Internet Traffic Classification: A Two-Phased Machine Learning Approach
dc.type	journal-article
dc.type	Journal Article
plymouth.volume	2016
plymouth.publication-status	Published
plymouth.journal	Journal of Computer Networks and Communications
dc.identifier.doi	10.1155/2016/2048302
plymouth.organisational-group	/Plymouth
plymouth.organisational-group	/Plymouth/Faculty of Science and Engineering
plymouth.organisational-group	/Plymouth/Faculty of Science and Engineering/School of Engineering, Computing and Mathematics
plymouth.organisational-group	/Plymouth/REF 2021 Researchers by UoA
plymouth.organisational-group	/Plymouth/REF 2021 Researchers by UoA/UoA11 Computer Science and Informatics
plymouth.organisational-group	/Plymouth/Users by role
plymouth.organisational-group	/Plymouth/Users by role/Academics
dcterms.dateAccepted	2016-05-08
dc.identifier.eissn	2090-715X
dc.rights.embargoperiod	Not known
rioxxterms.versionofrecord	10.1155/2016/2048302
rioxxterms.licenseref.uri	http://www.rioxx.net/licenses/all-rights-reserved
rioxxterms.licenseref.startdate	2016-07-01
rioxxterms.type	Journal Article/Review
plymouth.oa-location	http://downloads.hindawi.com/journals/jcnc/2016/2048302.pdf

Files in this item

Name:: 2048302 (1).pdf
Size:: 2.606Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

School of Engineering, Computing and Mathematics

Show simple item record