Assume we have 3 rules:
[1] {A,B,D} -> {C}
[2] {A,B} -> {C}
[3] Whatever it is
Rule [2] is a subset of rule [1] (because rule [1] contains all the items in rule [2]), so rule [1] should be eliminated (because rule [1] is too specific and its information is included in rule [2] )
I searched through the internet and everyone is using these code to remove redundant rules:
subset.matrix <- is.subset(rules.sorted, rules.sorted)
subset.matrix[lower.tri(subset.matrix, diag=T)] <- NA
redundant <- colSums(subset.matrix, na.rm=T) >= 1
which(redundant)
rules.pruned <- rules.sorted[!redundant]
I dont understand how the code work.
After line 2 of the code, the subset.matrix will become:
[,1] [,2] [,3]
[1,] NA 1 0
[2,] NA NA 0
[3,] NA NA NA
The cells in the lower triangle are set to be NA and since rule [2] is a subset of rule [1], the corresponding cell is set to 1. So I have 2 questions:
Why do we have to set the lower triangle as NA? If we do so then how can we check whether rule
[2]is subset of rule[3]or not? (the cell has been set as NA)In our case, rule
[1]should be the one to be eliminated, but these code eliminate rule[2]instead of rule[1]. (Because the first cell in column 2 is 1, and according to line 3 of the code, the column sums of column 2 >= 1, therefore will be treated as redundant)
Any help would be appreciated !!