KDD procedures – 4ft-Miner

 

4ft-Miner procedure

Author(s):

Milan Šimůnek, Jan Rauch

Responsibility:

Jan Rauch (theory), Milan Šimůnek (software), Martin Kejkula (help)

Description:
Data mining procedure 4ft-Miner mines for association rules of the form Ant Suc and for conditional association rules of the form Ant Suc / Cond [RS 02]. The procedure deals with data matrices. The Boolean attributes Ant, Suc and Cond are derived from columns of the analysed data matrix.

The intuitive meaning of Ant Suc is that Boolean attributes Ant and Suc are associated in the way given by the symbol . The intuitive meaning of Ant Suc / Cond is that Boolean attributes Ant and Suc are associated in the way given by when the condition defined by Boolean attribute Cond is satisfied.

Symbol is called 4ft-quantifier. It denotes a condition concerning a four-fold contingency table of Ant and Suc. There are 16 classes of 4ft-quantifiers corresponding to various types of association rules (e.g. implication, equivalency, rules based on statistical hypotheses tests). Conjunctions of several 4ft-quantifiers can be also used.

The association rules Ant Suc are substantially more general than the “classical” association rules X Y where X and Y are sets of items [Ag 96]. The intuitive meaning of X Y is that transactions containing set X of items tend to contain set Y of items. Two measures of intensity of association rule X Y are used, confidence and support. The “classical” association rule discovery task is a task to find all the association rules X Y such that the support and confidence of X Y are above the user-defined thresholds minconf and minsup. The a a-priori algorithm [Ag 96] is usually used.

The 4ft-Miner procedure does not use the a-priori algorithm. An algorithm based on representation of the analysed data by suitable strings of bits is used [RS 02]. This way it is easy to compute four-fold contingency table necessary for verification of Ant Suc and of Ant Suc / Cond. This approach makes also possible to deal with more complex literals than usual.

Boolean attributes Ant, Suc and Cond are conjunctions of literals. An example of such Boolean attribute is the conjunction A(a1) B(b1, b2, b3) of literals A(a1) and B(b1, b2, b3). Here A is an attribute and a1 is one of its possible values. Analogously b1, b2, b3 are some of possible values of attribute B.

More information on literals. There is a demonstration of the main features of the procedure 4ft-Miner.

Files to download:
LISp-Miner.Core.OldUI.zip 33.45 MB August 13, 2014
Legacy LISp-Miner system core files separated into modules for each GUHA procedure. Contains also other legacy modules LMAdmin and LMDataSource.
History:

Development of the first version of 4ft-Miner procedure started in 1996. The project was done by J. Rauch [Ra 97A], [Ra 97B]. Representation of analysed data by suitable strings of bits and further long-time experience in implementation of GUHA method were used [Ra 71], [Ra 78] and [Ra 81]. Several data transformation tools were embedded in the first version.

Both the first and the further versions of 4ft-Miner procedure were implemented by M. Šim?nek. M. Šimůnek is also the co-author of the algorithm of the 4ft-Miner.

The new conception of 4ft-Miner was created by J. Rauch and M. Šim?nek in 1999. The data transformation tools were separated and independent transformation and exploration procedure DataSource was implemented by M. Šimůnek. Procedure DataSource became the basis of the Elementary subsystem.

The new version was largely used both in teaching and in applications. Various new enhancements were invented and implemented in 2001–2004. The most important ones are the implementation of partial cedents and the possibility of using several 4ft-quantifiers in one task.

Several user tailored interfaces to the procedure 4ft-Miner were implemented. An example is the experiment with on-line analysis of the atherosclerosis data in the frame of the STULONG project.

Print page

 
 

Send comments about this site to the webmaster