Characterizing data ecosystems to support official statistics with open mapping data for reporting on sustainable development goals

More Info
expand_more

Abstract

Reporting on the Sustainable Development Goals (SDGs) is complex given the wide variety of governmental and NGO actors involved in development projects as well as the increased number of targets and indicators. However, data on the wide variety of indicators must be collected regularly, in a robust manner, comparable across but also within countries and at different administrative and disaggregated levels for adequate decision making to take place. Traditional census and household survey data is not enough. The increase in Small and Big Data streams have the potential to complement official statistics. The purpose of this research is to develop and evaluate a framework to characterize a data ecosystem in a developing country in its totality and to show how this can be used to identify data, outside the official statistics realm, that enriches the reporting on SDG indicators. Our method consisted of a literature study and an interpretative case study (two workshops with 60 and 35 participants and including two questionnaires, over 20 consultations and desk research). We focused on SDG 6.1.1. (Proportion of population using safely managed drinking water services) in rural Malawi. We propose a framework with five dimensions (actors, data supply, data infrastructure, data demand and data ecosystem governance). Results showed that many governmental and NGO actors are involved in water supply projects with different funding sources and little overall governance. There is a large variety of geospatial data sharing platforms and online accessible information management systems with however a low adoption due to limited internet connectivity and low data literacy. Lots of data is still not open. All this results in an immature data ecosystem. The characterization of the data ecosystem using the framework proves useful as it unveils gaps in data at geographical level and in terms of dimensionality (attributes per water point) as well as collaboration gaps. The data supply dimension of the framework allows identification of those datasets that have the right quality and lowest cost of data extraction to enrich official statistics. Overall, our analysis of the Malawian case study illustrated the complexities involved in achieving self-regulation through interaction, feedback and networked relationships. Additional complexities, typical for developing countries, include fragmentation, divide between governmental and non-governmental data activities, complex funding relationships and a data poor context.