Effectiveness of propensity score methods with density estimation in identifying overlap for causal inference

More Info
expand_more

Abstract

For causal inference, sufficient overlap is needed. It is possible to use propensity scores with the positivity assumption to ensure overlap is present. However, positivity is not enough to properly identify the region of overlap. For this, propensity scores need to be used in combination with density estimation. This project aims to evaluate this method, discovering in which scenarios it performs well or fails in identifying the region of overlap. More specifically, how it scales with more features or outliers, and how using different classifiers affects the performance. The method was tested with samples from a simulated dataset. The predicted overlap was compared with the true overlap of the known distributions.
Following the experiments, the method seems to perform best when the treatment and control groups share one region of overlap. In this case, logistic regression works best out of the classifiers that were tested. The overall performance drops when the two groups have multiple regions of overlap. For this, the random forest classifier performs best instead. Throughout all scenarios, the performance of the model drops with increasing dimensionality. Furthermore, having a small percentage of outliers only slightly affects the model. With more outliers, logistic regression is the only classifier further affected.