Jobfeed alarm system

Applying change point detection and a particle filter to a random walk

More Info
expand_more

Abstract

Jobfeed is an online database containing all vacancies posted on the internet. The database obtains this data through a process called spidering. The data is collected by visiting web pages and extracting vacancies from these pages, using machine learning techniques. In the process of spidering, errors can occur. To ensure the quality of the Jobfeed data, an old alarm system is in place to detect possible errors in the spidering process. This alarm system triggers alarms whenever the spidering of a website seems to be malfunctioning. The old alarm system is not performing well, with a precision of 0.028 and recall of 0.18. The goal of this thesis is to develop a new alarm system, better capable of triggering alarms. In order to do so, the data has been modelled as a hidden Makrov model. The core of the alarm system consists of a particle filter, a sequential MonteCarlomethodthatisabletojudgewhethertheprocessisshowingbehaviourthatindicatessomething is wrong. A series of methods to detect a change in distribution of a time series, or a change point, have been developed and tested to detect a change in a random walk with drift. The initial values for the particle filter were partly estimated by applying the best performing change point detection method to find the last stable segment in the historical data. The data in this stable segment can be used to estimate initial values for the system. Sequential decision making theory is used to decide whether an alarm should be triggered or not. The new alarm system has a precision of 0.74 and recall of 0.96
which is a big improvement compared to the old system used.