Further features of association rules

There are further interesting features of association rules.

Classical definability

We say that the association rule is classically definable if it can be expressed by means of classical predicate calculus (i.e. predicates, variables, classical quantifiers ∀, ∃, Boolean connectives and the predicate of equality).

One solution of this problem is presented in [Ha 78]. The problem of classical definability of association rules is a special case of a more general problem of classical definability of generalized quantifiers in monadic observational predicate calculi. The criterion of classical definability is given the Tharp's theorem [Ha 78].

The Tharp's theorem is but to general from the point of view of association rule. It can be shown that there is a more intuitive criterion of classical definability of association rules. The criterion of definability of the association rule φ ≈ ψ is based on simple properties of the function F_≈ associated to the 4ft-quantifier ≈, see [Ra 86], [Ra 03].

Tables of critical frequencies

Verification of some 4ft-quantifiers requires complex computation, examples are quantifiers lower critical implication ⇒^!_p,
αBase, upper critical implication ⇒^?_p,
αBase, double lower critical implication ⇔^!_p,
αBase, lower critical equivalence ≡^!_p,
αBase and Fisher’s quantifier ≈_α
Base.

It can be shown [Ha 78] that for each implicational quantifier ⇒* there is an unary function Tb_⇒* such that

⇒*(a,b,c,d) = 1 if and only if b < Tb_⇒*(a).

The function Tb_⇒*(a) is called the table of maximal b for implicational quantifier ⇒*. It can reduce even very complex computation to one simple test of inequality.

Similar functions can be defined also for other classes of association rules [Ra 86], [Ra 98A], [Ra 98B], [Ra 98C]. These functions are called tables of critical frequencies.

Calculi with incomplete information

A typical feature of data mining is dealing with missing information. It means that we have to deal with data matrices containing special symbol X for missing information. It results in fact that the attributes can have values from {0,1,X}. The symbol “X” means that the truth-value of the attribute is not known.

It means that we have to deal with nine-fold tables instead of with four-fold tables, see Tab. 1.

Tab. 1 Nine-fold contingency table of φ and ψ in `M`
`M`	φ	Xψ	¬ψ
φ	f_{1, 1}	f_{1, X}	f_{1, 0}
Xφ	f_{X, 1}	f_{X, X}	f_{X, 0}
¬φ	f_{0, 1}	f_{0, X}	f_{0, 0}

Here f_{1, 1} is the number of objects satisfying both φ and ψ, f_{1, X} is the number of objects satisfying φ with a not-known truth value for ψ, etc.

The logical calculi concerning data with missing values are defined and studied in [Ha 78]. Some results concern association rules. Some further results concerning association rules with missing information are in [Ra 86], [Ra 98B].