GOV-LLM

Using Large Language Models for Bench-Marking GovTech Innovation

More Info
expand_more

Abstract

Governments are increasingly dependent on GovTech, which is the technology that facilitates processes in the public sector. Benchmarking the state of GovTech is done by governments and yields indispensable insights, which are used for optimising resource utilisation, identifying areas for improvement, and facilitating evidence-based policy prioritisation. Current benchmarking efforts are resource-intensive, time-intensive, and have limited scope, resulting in an inefficient assessment of GovTech innovation.

This thesis explores the potential of Large Language Models (LLMs) to overcome the practical limitations of existing GovTech benchmarking methods, analysing the GovTech Maturity Index of the World Bank as a case study. Using an LLM and leveraging state-of-the-art techniques, including fine-tuning, Retrieval Augmented Generation, and Prompt Engineering, the usability of these models as an artefact for GovTech benchmarking is assessed.

The results show that the best-performing model outperforms the random chance accuracy, indicating that the LLM not only understands the question and data format, but also contains the information to correctly answer the benchmark questions. The research concludes that the created artefact has the potential to improve GovTech benchmarking, resulting in more informed policy decision making.

The thesis contributes valuable insights to the field of the GovTech research field by making GovTech benchmarking more efficient, leading to a better analysis of the current GovTech market, and contributing to the academic debate on GovTech market analysis. Furthermore, by streamlining the Dutch benchmarking process, resulting in more accurate insights, the study contributes to the advancement of GovTech solutions within the Netherlands, ultimately benefiting society as a whole.

The artefact created in this research has significant policy relevance, as more efficient benchmarking allows policymakers to better optimise resource allocation, identify key areas for investment, and improve evidence-based policy changes regarding GovTech.

The limitations of this research include model inaccuracies, challenges in handling long contexts, and the potential for incorrect answers. An ethical analysis is performed, from which it can be concluded that the relevance of the information used by the models is the most apparent ethical concern.

Future research may focus on improving the models to better handle long contexts, reducing inaccuracies, and improving the overall performance in populating GovTech benchmarks by incorporating additional data sources and improving the models. The research design is limited by the lack of environmental aspects in the process and the narrow scope of using a single benchmark for one country.

As next steps, this research proposes a roadmap that includes the continuation of the development of the artefact, the extension of ethical analysis, the performance of trials, and the beginning of a wider application of the artefact.

Files

Thesis_zonder_bijlage_C.pdf
(pdf | 4.62 Mb)
Unknown license