The useful clause problem

Useless clauses are sometimes called redundant. In our opinion, “useless” is more precise, since it conveys the semantical nature of the concept better.

Example 2 Let P and Q be the following two pattern matrices.

P =

⎛
⎜
⎝

`Nil`	_
_	`Nil`

⎞
⎟
⎠

Q =

⎛
⎜
⎜
⎜
⎜
⎜
⎜
⎝

`Nil`	_
_	`Nil`
`One(_)`	_
_	`One(_)`
`Cons (_,_)`	_
_	`Cons(_,_)`

⎞
⎟
⎟
⎟
⎟
⎟
⎟
⎠

Matrix P is not exhaustive, since, for instance, vector v^→ = (One (0) One (0)) does not match any row of P.

By contrast, matrix Q is exhaustive. Let us consider any value vector v^→ of the appropriate type. Then, v₁ and v₂ are instances of the patterns Nil, One(_), or Cons(_,_). That is, we may partition values into nine sets denoted by nine different pattern vectors. It turns out that this partition is precise enough to apply Definition 2.

If v^→ is an instance of…	then, v^→ matches row number…
(`Nil` `Nil`) (`Nil` `One(_)`) (`Nil` `Cons(_,_)`)	1
(`One(_)` `Nil`) (`Cons(_,_)` `Nil`)	2
(`One(_)` `One(_)`) (`One(_)` `Cons(_,_)`)	3
(`Cons(_,_)` `One(_)`)	4
(`Cons(_,_)` `Cons(_,_)`)	5

As another consequence, one may observe that row number 6 of matrix Q is useless.

Because we use the ML definition of pattern matching (Definition 2) we claim that the two definitions above express what is generally understood by “an exhaustive match” and “an useless clause”, However it is intuitively clear that the two questions are quite similar, and in fact they can be expressed using the following definition.

Proof: Corollary of definitions.□

Our framework of two separate definitions 2 and 3 exposes that, as far as pattern matching anomalies are concerned, the matching predicate can be simplified. More precisely, it is important to notice that v^→ matches some row in P (Definition 2) is equivalent to P ≼ v^→ (Definition 3). In other words, the order of rows in P is irrelevant while computing U(P,q^→ ).

3.1 Solving the useful clause problem

In this section we compute U recursively. We proceed by first defining a recursive function U_rec and then showing U = U_rec. The definition of U_rec owes much to the traditional compilation of ML pattern matching to decision trees — Pettersson [1992] gives a modern presentation of this quite ancient compilation scheme.

Let thus P be a pattern matrix of size m × n and q^→ be a pattern vector of size n. Induction proceeds by decomposing P and q^→ along first column.

We now establish a few “key” properties of matrix specialization (1 below) and of the default matrix (2 to 4 below). Basically, the key property of specialization expresses that matching by P and S(c, P) are equivalent for value vectors whose first component admits c as a root constructor; while the key properties of the default matrix express the equivalence of matching by P and D(P) in more detailed situations.

Lemma 1 (Key properties) For any matrix P, constructor c, and value vector v^→ such that v₁ = c(w₁, … , w_a) (all being of the appropriate types), we have:

P ⋠

→

⇐⇒ S(c, P) ⋠S(c,

→

). (1)

Additionally, for any value vector v^→, we have:

P ⋠(v₁ v₂⋯v_n) =⇒ D(P) ⋠(v₂⋯v_n). (2)

Furthermore, given any matrix P, let Σ be set of the root constructors of P’s first column. If Σ is not empty, then for any constructor c not in Σ and any value vector (w₁⋯ w_a v₂⋯ v_n), we have:

D(P) ⋠(v₂⋯v_n) =⇒ P ⋠(c(w₁, … , w_a) v₂⋯v_n). (3)

If Σ is empty, then, for any value vector v^→, we have instead:

D(P) ⋠(v₂⋯v_n) =⇒ P ⋠(v₁ v₂⋯v_n). (4)

Proof: Mechanical application of definitions.□

We could of course have formulated the key properties by reversing implications and by using ≼ in place of ⋠. However, we adopt the negated formulation, to match Definition 2. Nevertheless, we shall also consider (1) when P has exactly one row. In that case, for any value vector v^→ such that v₁ = c(w₁, … , w_a), we write more directly:

Proof: Base cases are easy. Let first q^→ be the empty pattern vector, written (). The set of q^→ instances consists of the unique empty value vector, also written (). If P’s rows exist and are empty, then P’s first row filters the value vector ().

⎛
⎝

⎞
⎠

, ()) = ∅.

Moreover, if P has no rows, then it cannot filter any value, We have:

M(∅,

→

) = {

→

∣

→

≼

→

And we conclude, since q^→ has at least one instance for any q^→.

To prove inductive cases, it suffices to show that U meets the equations that define U_rec.

If q₁ = c(r₁, … , r_a) for some constructor c, then we need prove:
U(P,
→

q

) = U(S(c, P), S(c,
→

q

)).

However, by (1) applied to both P and q^→, we have the stronger result:
M(P,
→

q

) = ⎧
⎪
⎨
⎪
⎩
→

v

∣ S(c,
→

v

) ∈ M(S(c, P), S(c,
→

q

)) ⎫
⎪
⎬
⎪
⎭ .

Namely, remember that U(P, q^→ ) means that the set M(P, q^→) of matching values is not empty (Definition 6).
If q₁ is a wildcard, then let Σ = {c₁, …, c_z} be as in the definition of U_rec.
1. If Σ is a complete signature. For any c_k in Σ, we define the set M_k:
  M_k = M(S(c_k, P), S(c_k,
  →
  
  q
  
  )).
  
  By typing, for any value v₁ of the appropriate type, we have q₁ ≼ v₁, if and only if there exists a constructor c_k in Σ and values w₁, …, w_ak such that v₁ = c_k(w₁, … , w_ak). Thus, by property (1), one easily shows:
  M(P,
  →
  
  q
  
  ) =
  z
  
  ∪
  
  k=1
  
  ⎧
  ⎪
  ⎨
  ⎪
  ⎩
  →
  
  v
  
  ∣ S(c_k,
  →
  
  v
  
  )∈ M_k ⎫
  ⎪
  ⎬
  ⎪
  ⎭ .
  
  And we can conclude:
  U(P,
  →
  
  q
  
  ) =
  z
  
  ∨
  
  k=1
  
  U(S(c_k,P), S(c,
  →
  
  q
  
  )).
2. In all situations, we have (by (2)):
  M(P,
  →
  
  q
  
  ) ⊆ ⎧
  ⎪
  ⎨
  ⎪
  ⎩
  →
  
  v
  
  ∣ (v₂⋯v_n) ∈ M(D(P), (q₂⋯q_n)) ⎫
  ⎪
  ⎬
  ⎪
  ⎭ .
  
  In the case where Σ is empty, the reverse inclusion holds — by (4). And we can conclude, by the “type are not empty” axiom.
  It is worth noticing that the reverse inclusion does not hold when Σ is non-empty. Namely, when considering sets of matching values M, we have to take all possible values into account. Anyway, by the inclusion above, we have: U(P, q^→ ) =⇒ U(D(P), (q₂⋯q_n)).
  Conversely, assume U(D(P), (q₂⋯q_n)) = True, and let t be the type of the first component of tested value vectors. Then, there exists (v₂⋯v_n) such that D(P) ⋠(v₂⋯v_n) and (q₂⋯q_n) ≼ (v₂⋯v_n). Furthermore, by the hypothesis “Σ does not hold all the constructors of type t” we know that there exists some constructor c of type t₁ × ⋯ × t_a → t such that c ∉Σ. Thus, by our axiom “types are not empty”, there exist values w₁,… , w_a of respective types t₁,…, t_a. Then, vector v^→ = (c(w₁, … , w_a) v₂⋯v_n) is a witness of the validity of U(P, q^→ ), by (3) and q₁ = _ ≼ v₁.
If q_i is an or-pattern (r₁∣r₂), then, by definition of ≼ for or-patterns, we have:
M(P, ((r₁∣r₂) q₂⋯q_n)) = M(P, (r₁ q₂⋯q_n)) ⋃ M(P, (r₂ q₂⋯q_n)).

□

3.2 Detecting the anomalies

Since we know how to compute U, we can detect pattern matching anomalies. Given some expression match … with p₁ -> e₁ | p₂ -> e₂ | … | p_m -> e_m, exhaustiveness is checked by computing:

3 The useful clause problem

3.1 Solving the useful clause problem

3.2 Detecting the anomalies