Low-cost air quality sensors can fill gaps between the sparse measurements done with high-quality national monitoring grids and might contribute to creating a more complete understanding of air pollution in an urban area. However, until there is no agreement on what degree of sen
...
Low-cost air quality sensors can fill gaps between the sparse measurements done with high-quality national monitoring grids and might contribute to creating a more complete understanding of air pollution in an urban area. However, until there is no agreement on what degree of sensor accuracy is acceptable, the sensor data quality should be validated before governmental bodies use it as input for decision-making (Lewis2017).
This research proposes a method to assess and improve the data quality of low-cost air quality sensors measuring Particulate Matter (PM). To answer the research question "How can accuracy and precision of Particulate Matter measurement results from a low-cost outdoor sensor network be improved by using a correction model, using data from reference sensors and additional sensors measuring inferencing phenomena?" an experiment setup with sensors operating under real-world conditions is applied.
Two low-cost sensor nodes, both containing a microcontroller, two low-cost PM sensors, and a temperature and humidity sensor, are placed at two locations in the city of Rotterdam. At those two locations, they are placed next to a high-quality air quality monitoring station from the environmental agency of Rotterdam. These monitoring stations provide benchmark data for the low-cost sensor nodes. A third data source provides data on air pressure and wind speed for the whole city of Rotterdam.
The data that originates from both sensor nodes and monitoring stations are matched and correlated with each other. Subsequently, the measurements from the low-cost sensor nodes are evaluated. Correlations and cross inferences of PM with other independent variables such as humidity, ambient temperature, wind speed and air pressure are investigated. Thereafter, utilizing the Stepwise Multiple Linear Regression method, various correction models are created that take various combinations of external variables into account. The correction models vary with respect to the amount of included external environmental variables and the polynomial degree. From all those possible correction models, the best correction model per location is selected by evaluating the Root Mean Square Error (RMSE) of the corrected dataset.
Consequently, the results of the chosen correction model are validated. It is found that the best performing correction models are those that include only the original PM data and the effect of adding more independent variables is limited. The best correction models for the four low-cost PM sensors are able to decrease the RMSE of the observations: the original normalized RMSE ranged from 0.0918 to 0.1249, while the corrected normalized RMSE range from 0.03110 to 0.03759. So, it is possible to improve the data quality of low-cost PM sensors with the stepwise MLR method and setup as shown in this research. However, including parameters for independent variables humidity, temperature, air pressure or wind speed does not improve the data quality significantly.
Besides, when an extra sensor node is placed in an air quality monitoring network as described in this research, it is necessary to create a correction model for that specific sensor. Like Castell (2017) and Mukherjee (2017) also found, it is necessary to calibrate each individual low-cost sensor before adding it to an air quality measuring network of the type as described in this research. Namely, it is found that for each low-cost PM sensor in the network different correction models are created.