In this research, we use different supervised and unsupervised machine learning techniques to detect anomalies in NetFlow data. We aim to create a system for home or small-business use where the user is in control. We use WEKA for the machine learning models and feature selection
...
In this research, we use different supervised and unsupervised machine learning techniques to detect anomalies in NetFlow data. We aim to create a system for home or small-business use where the user is in control. We use WEKA for the machine learning models and feature selection. The UGR’16 dataset is used to train and test the models. We create three different models for each method where the model is trained on one day and tested on another. We find that supervised models perform better than unsupervised ones. Random Forest has the highest F1-score (0.9165) of the supervised models. However, Random Forest is statistically similar to 6 out of 8 different classifiers according to the Friedman and Nemenyi tests. One challenge with supervised methods is that there is a need for a third party to make sure the models are updated for new attacks. However, it seems feasible to create a system for network monitoring for home usage. Finally, we argue that the approach to research in machine learning should, in some cases, take a different direction. Instead of chasing the highest accuracy, we should look at which factors allow a user to work with a system. A set of rules is much easier to interpret than complex models. Especially if these models are statistically similar we could look at factors other than a single metric. After these results, we look at presenting warning messages to end-users. We want to motivate users to take action after they read a message. For this, we first create a theoretical framework based on behaviour, motivation and uncertainties. We want to find out if we can use uncertainties to group users, which allows us to create a semi-personal approach. Based on this theoretical framework, we create different versions of a warning message that focus on different uncertainties. Through 22 interviews, we asked the participants to rank the different versions and asked them questions about their stance on network security and preferences for warning messages. We found that uncertainties can not be used to group users but that they do influence them. Including a solution and which steps to take offers users peace of mind, even if they are not able to complete the steps. The results indicate that a more personal approach is necessary where every person has the choice to customize the message to fit their preference.