Call Detail Record (CDR) data enables the analysis of human behaviour on a large scale and the information that it contains can be promising. Not only does it allow us to track the movements of many individuals throughout time, it uncovers patterns in a persons decision making pr
...
Call Detail Record (CDR) data enables the analysis of human behaviour on a large scale and the information that it contains can be promising. Not only does it allow us to track the movements of many individuals throughout time, it uncovers patterns in a persons decision making process that potentially tell us a lot about the effects of different interventions. The opportunity of finding new information on human behaviour has been noticed in several research fields, but every researcher eventually finds the same blockade: privacy. The data represents a detailed track of individuals and therefore these individuals must give approval (almost certainly lowering the amount of data that can be collected), or the data must be aggregated to the point that user anonymity is guaranteed. As a consequence of aggregated data, potentially important information could be lost. Especially in the case that both the dimension of location and time are aggregated, as these two could be considered as the essence of the CDR data. There are however techniques that increase the aggregation level, by de-aggregating the data. Naive Bayes classification has shown to be a functioning method within Machine Learning to de-aggregate a dataset that has incorporated information on at least one of the two essential dimensions; location in this case. By using the same variables to describe the administrative areas within the country that were used to describe the rows within the data, Naive Bayes classification can find the area that is most likely to fit the row. Matching the variables of the areas to the variables within the displacement dataset represents the backbone of the process, as the de-aggregation is driven by the closeness of datapoints between the two datasets.