Pattern matching is the act of checking if a value is in the set of values described by a pattern. Many programming languages provide constructs to pattern match on program values. Pattern matching constructs appear in different variants. Stratego, a term rewriting language, feat
...
Pattern matching is the act of checking if a value is in the set of values described by a pattern. Many programming languages provide constructs to pattern match on program values. Pattern matching constructs appear in different variants. Stratego, a term rewriting language, features first-class pattern matching, which attempts to match a pattern with a value. If the match is successful, variables in the pattern are bound. If the pattern does not match, the matching expression fails. Using choice and sequence operators, one can build expressions that attempt to match multiple patterns until a match succeeds, evaluating the expression that corresponds to that pattern. This is a common way of branching in Stratego programs. The Stratego compiler handles each match attempt independently, disregarding the context.
In this thesis, we research context-aware compilation of first-class matches, aiming to speed up pattern matching operations without requiring changes to programs. We research the resemblance of Stratego pattern matching with match cases, a pattern matching construct that is ubiquitous in functional-style languages. Previous research shows that match cases can be compiled efficiently by considering all cases together, instead of a single alternative at the time, during compilation. We develop behaviour-preserving translations from first-class matches to match cases and from match cases to decision tree automata. We efficiently represent these automata in Java, the target language of the Stratego compiler. All these changes integrate in the latest upstream version of the Stratego compiler.
We evaluate the performance of our altered compiler and observe an average speed-up of 4x on programs that rely heavily on pattern matching, at the cost of a 20% increase in compilation time and space.