Conclusion

8 Conclusion

To the best of our knowledge, there is no literature on the analysis of pattern matching. the works closest to ours are Sekar et al. [1992], Maranget [1992], which give some algorithms for computing directions (also called indices). Those algorithms are in fact very similar to our algorithm U. Indeed, our present work can be seen as a thorough exploration of how to apply the initial algorithm to pattern matching diagnostics, while directions are confined to the compilation of Laville’s semantics.

Works on the compilation of pattern matching to decision trees sometimes recall that this compilation technique yields “non-exhaustive match” and “useless clause” warnings as by-products — see for instance Baudinet and MacQueen [1985], Aitken [1992], Sestoft [1996]. However there are good reasons to adopt backtracking automata: a potentially exponential behavior is avoided, and, in practice, output code size is somehow reduced. In our opinion, designers that wish to adopt backtracking automata should not be prevented to do so by the question of pattern matching diagnostics. Hence, we consider diagnostics directly. As a consequence, our algorithm for producing diagnostics is not only independent from any compilation strategy, but is also proved correct with respect to several semantics.

We now examine how a few compilers perform pattern matching diagnostics. The SML/NJ compiler Appel and MacQueen [1991] compiles pattern matching by following the decision tree approach, thereby naturally producing diagnostics. However, it flags “useless patterns” as “useless clauses”, which can confuse programmers. It should be noticed that for the sake of precise warnings for “useless patterns” (i.e. useless arguments in or-patterns), we defined some partial expansion of or-patterns, whereas compilation calls for a complete expansion. From our understanding of its code and architecture, the SML/NJ compiler performs such a complete expansion before the compilation of pattern matching, and then ignores the existence of or-patterns. Additionally, the SML/NJ takes exponential time (and produce code of exponential size) in experiments V and S.

The Glasgow Haskell compiler ghc Peyton Jones et al. [1993] also carries out pattern matching checks. Surprisingly, by default, the “useless clause” diagnostic is enabled, while the “non-exhaustive match” diagnostic is disabled. As ghc compiles pattern matching to backtracking automata Wadler [1987], Augustsson [1985], the source of the compiler contains specific code for producing pattern matching diagnostics. This code apparently proceeds exactly along the lines of compilation to decision trees. Of course, no tree is produced, instead the analyzer computes the leaves of the tree. This approach results in producing several examples of non-matching values in the case of non-exhaustive matches. Unfortunately, it also results in exponential running times and excessive memory consumption in examples T, V and S.

Those comparisons demonstrate that our algorithm, though inspired by compilation to decision trees, is more efficient. Such efficiency stems from several causes. First, our algorithm does not build any tree data-structure of potentially exponential size. More significantly, the compilation scheme can be seen as the search of all matching values, while, our algorithm only searches for one matching value. The benefits of this approach are numerous: the search is stopped as soon as one matching value is found; in some important and frequent case (inductive step 2-(b) in Section 3.1) our algorithm performs one recursive call, where compilation performs two or more; and finally, some simplification on the input patterns that we perform are inappropriate to compilation.

In our opinion, pattern matching analysis has been somehow neglected. For instance, although the Definition of Standard ML Harper et al. [1991] requires compliant implementation to flag “redundant” and “non-exhaustive” matches, the Haskell Report Hudak et al. [1998] does not mention any similar requirement. It can certainly be considered that such a question is of minor importance in the context of a language definition. But we believe that providing warnings against statically checkable, common, programming errors is an important feature of any mature compiler. And of course, we also believe useless clauses and non-exhaustive matches to be such errors.

Finally, our approach of studying pattern matching anomalies on the semantical level results in more adequate and precise warnings, tailored to various programming situations. Additionally, our technique is applicable to both ML and Haskell; and the cost of our implementation seems to be under control.