Behavioural Monitoring via Network Communications
MetadataShow full item record
It is commonly acknowledged that using Internet applications is an integral part of an individual’s everyday life, with more than three billion users now using Internet services across the world; and this number is growing every year. Unfortunately, with this rise in Internet use comes an increasing rise in cyber-related crime. Whilst significant effort has been expended on protecting systems from outside attack, only more recently have researchers sought to develop countermeasures against insider attack. However, for an organisation, the detection of an attack is merely the start of a process that requires them to investigate and attribute the attack to an individual (or group of individuals).
The investigation of an attack typically revolves around the analysis of network traffic, in order to better understand the nature of the traffic flows and importantly resolves this to an IP address of the insider. However, with mobile computing and Dynamic Host Control Protocol (DHCP), which results in Internet Protocol (IP) addresses changing frequently, it is particularly challenging to resolve the traffic back to a specific individual.
The thesis explores the feasibility of profiling network traffic in a biometric-manner in order to be able to identify users independently of the IP address. In order to maintain privacy and the issue of encryption (which exists on an increasing volume of network traffic), the proposed approach utilises data derived only from the metadata of packets, not the payload. The research proposed a novel feature extraction approach focussed upon extracting user-oriented application-level features from the wider network traffic. An investigation across nine of the most common web applications (Facebook, Twitter, YouTube, Dropbox, Google, Outlook, Skype, BBC and Wikipedia) was undertaken to determine whether such high-level features could be derived from the low-level network signals. The results showed that whilst some user interactions were not possible to extract due to the complexities of the resulting web application, a majority of them were.
Having developed a feature extraction process that focussed more upon the user, rather than machine-to-machine traffic, the research sought to use this information to determine whether a behavioural profile could be developed to enable identification of the users. Network traffic of 27 users over 2 months was collected and processed using the aforementioned feature extraction process. Over 140 million packets were collected and processed into 45 user-level interactions across the nine applications. The results from behavioural profiling showed that the system is capable of identifying users, with an average True Positive Identification Rate (TPIR) in the top three applications of 87.4%, 75% and 61.9% respectively.
Whilst the initial study provided some encouraging results, the research continued to develop further refinements which could improve the performance. Two techniques were applied, fusion and timeline analysis techniques. The former approach sought to fuse the output of the classification stage to better incorporate and manage the variability of the classification and resulting decision phases of the biometric system. The latter approach sought to capitalise on the fact that whilst the IP address is not reliable over a period of time due to reallocation, over shorter timeframes (e.g. a few minutes) it is likely to reliable and map to the same user. The results for fusion across the top three applications were 93.3%, 82.5% and 68.9%. The overall performance adding in the timeline analysis (with a 240 second time window) on average across all applications was 72.1%.
Whilst in terms of biometric identification in the normal sense, 72.1% is not outstanding, its use within this problem of attributing misuse to an individual provides the investigator with an enormous advantage over existing approaches. At best, it will provide him with a user’s specific traffic and at worst allow them to significantly reduce the volume of traffic to be analysed.