Transforming strings to vector spaces using prototype selection

More Info
expand_more

Abstract

A common way of expressing string similarity in structural pattern recognition is the edit distance. It allows one to apply the kNN rule in order to classify a set of strings. However, compared to the wide range of elaborated classi¿ers known from statistical pattern recognition, this is only a very basic method. In the present paper we propose a method for transforming strings into n-dimensional real vector spaces based on prototype selection. This allows us to subsequently classify the transformed strings with more sophisticated classi¿ers, such as support vector machine and other kernel based methods. In a number of experiments, we show that the recognition rate can be signi¿cantly improved by means of this procedure.