PDFA Distillation via String Probability Queries

Baumgartner, R.; Verwer, S.E.

PDFA Distillation via String Probability Queries

Other (2024)

Authors

R. Baumgartner Algorithmics -

S.E. Verwer Algorithmics -

Research Group

Algorithmics () (TU Delft)

To reference this document use:

http://resolver.tudelft.nl/uuid:70dff61f-bc53-4c65-b6d1-210d5954f46d

More Info

expand_more

Published Date

2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Department

Software Technology

Research Group

Algorithmics

Abstract

Probabilistic deterministic finite automata (PDFA) are discrete event systems modeling conditional probabilities over languages: Given an already seen sequence of tokens they return the probability of tokens of interest to appear next. These types of models have gained interest in the domain of explainable machine learning, where they are used as surrogate models for neural networks trained as language models. In this work we present an algorithm to distill PDFA from neural networks. Our algorithm is a derivative of the L# algorithm and capable of learning PDFA from a new type of query, in which the algorithm infers conditional probabilities from the probability of the queried string to occur. We show its effectiveness on a recent public dataset by distilling PDFA from a set of trained neural networks.