KDD procedures – GUHA



GUHA is an original Czech method of exploratory data analysis. Its development started in the 1960s. The goal of GUHA method is to offer all interesting facts following from the analysed data to the given problem. GUHA is realised by GUHA-procedures.

GUHA-procedure is a computer program, the input of which consists of the analysed data and of a few parameters defining a very large set of potentially relevant patterns. The output of the GUHA procedure consists of all prime patterns. The pattern is prime is it is true in the analysed data and if it does not immediately follow from other more simple output pattern.

The most known GUHA procedure is the procedure ASSOC defined in [Ha 78]. It was designed to mine for patterns of the form of certain type of generalized association rules. These rules expres not only classical relation of two Boolean attributes based on confidence and support but also further relations corresponding among other to statistical hypothesis tests.

The GUHA procedure ASSOC was several time implemented [Ra 78], [HSZ 95], [GH +]. The procedure 4ft-Miner is the newest implementation of the procedure ASSOC. It has some new never implemented features namely dealing with conditional association rules and very fine possibilities of the definition of set of potentially relevant rules.

The implementations of the GUHA procedure ASSOC are based on the bit string representation of analysed data, see e.g. [Ra 78], [RS 02], [Si 03]. This approach is further developed and used in all data mining procedures and also in the machine learning procedure KEX of the LISp-Miner system.

The research of GUHA method has lead to a specific theory covering explicit logical and statistical foundations, see monograph [Ha 78]. Related logical theory is further developed also in relation to the LISP-Miner, see research activities related to the LISp-Miner project.

Let us remark that also the procedures KL-Miner, CF-Miner, SDKL-Miner, SD4ft-Miner and SDCF-Miner are the GUHA procedures in the sense of [Ha 78] even if they are not mentioned in [Ha 78]. We mean that input of each of these procedures consists of the analysed data and of a few parameters defining a very large set of potentially relevant patterns and that the output consists of all prime patterns. Moreover, the implementation of these procedures is based on the same technique of the bit-strings as the procedure 4ft-Miner.