Google timeline accuracy assessment and error prediction

More Info
expand_more

Abstract

Google Location Timeline, once activated, allows to track devices and save their locations. This feature might be useful in the future as available data for evidence in investigations. For that, the court would be interested in the reliability of these data. The position is presented in the form of a pair of coordinates and a radius, hence the estimated area for tracked device is enclosed by a circle. This research focuses on the assessment of the accuracy of the locations given by Google Location History Timeline, which variables affect this accuracy and the initial steps to develop a linear multivariate model that can potentially predict the actual error with respect to the true location considering environmental variables. The determination of the potential influential variables (configuration of mobile device connectivity, speed of movement and environment) was set through a series of experiments in which the true position of the device was recorded with a reference Global Positioning System (GPS) device with a superior order of accuracy. The accuracy was assessed measuring the distance between the Google provided position and the de facto one, later referred to as Google error. If this Google error distance is less than the radius provided, we define it as a hit. The configuration that has the largest hit rate is when the mobile device has GPS available, with a 52% success. Then the use of 3G and 2G connection go with 38% and 33% respectively. The Wi-Fi connection only has a hit rate of 7%. Regarding the means of transport, when the connection is 2G or 3G, the worst results are in Still with a hit rate of 9% and the best in Car with 57%. Regarding the prediction model, the distances and angles from the position of the device to the three nearest cell towers, and the categorical (non-numerical) variables of Environment and means of transport were taking as input variables in this initial study. To evaluate the usability of a model, a Model hit is defined when the actual observation is within the 95% confidence interval provided by the model. Out of the models developed, the one that shows the best results was the one that predicted the accuracy when the used network is 2G, with 76% of Model hits. The second model with best performance had only a 23% success (with the mobile network set to 3G).