Question? Leave a message!




Association Rule Mining

Association Rule Mining
Association Rule Mining www.ThesisScientist.comMining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FPGrowth  Max and closed patterns  Mining various kinds of association/correlation rules www.ThesisScientist.comMaxpatterns Closepatterns  If there are frequent patterns with many items, enumerating all of them is costly.  We may be interested in finding the ‘boundary’ frequent patterns.  Two types… www.ThesisScientist.comMaxpatterns 1  Frequent pattern a , …, a  ( ) + 1 100 100 2 1 0 0 100 30 ( ) + … + ( ) = 2 1 = 1.2710 100 1 0 0 frequent subpatterns  Maxpattern: frequent patterns without proper frequent super pattern  BCDE, ACD are maxpatterns Tid Items  BCD is not a maxpattern 10 A,B,C,D,E 20 B,C,D,E, Minsup=2 30 A,C,D,F www.ThesisScientist.comMaximal Frequent Itemset An itemset is maximal frequent if none of its immediate supersets is frequent null Maximal A B C D E Itemsets AB AC AD AE BC BD BE CD CE DE ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE Infrequent Itemsets Border ABCD E www.ThesisScientist.comClosed Itemset  An itemset is closed if none of its immediate supersets has the same support as the itemset Itemset Support A 4 TID Items Itemset Support B 5 1 A,B A,B,C 2 C 3 2 B,C,D A,B,D 3 D 4 3 A,B,C,D A,C,D 2 A,B 4 4 A,B,D B,C,D 3 A,C 2 5 A,B,C,D A,B,C,D 2 A,D 3 B,C 3 B,D 4 C,D 3 www.ThesisScientist.comMaximal vs Closed Itemsets Transaction Ids null 124 123 1234 245 345 TID Items A B C D E 1 ABC 2 ABCD 12 124 24 123 4 2 3 24 34 3 BCE 45 AB AC AD AE BC BD BE CD CE DE 4 ACDE 5 DE 12 24 2 2 4 4 3 4 ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE 4 2 ABCD ABCE ABDE ACDE BCDE Not supported by any transactions ABCDE www.ThesisScientist.comMaximal vs Closed Frequent Itemsets Closed but null Minimum support = 2 not maximal 124 123 1234 245 345 A B C D E Closed and maximal 12 124 24 123 4 2 3 24 34 45 AB AC AD AE BC BD BE CD CE DE 12 24 2 2 4 4 3 4 ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE 4 2 ABCD ABCE ABDE ACDE BCDE Closed = 9 Maximal = 4 ABCDE www.ThesisScientist.comMaximal vs Closed Itemsets Frequent Itemsets Closed Frequent Itemsets Maximal Frequent Itemsets www.ThesisScientist.comMaxMiner: Mining Maxpatterns  Idea: generate the complete set enumeration tree one level at a time, while prune if applicable.  (ABCD) A (BCD) B (CD) C (D) D () AB (CD) AC (D) AD () BC (D) BD () CD () ABC (C) ABD () ACD () BCD () www.ThesisScientist.com ABCD ()Local Pruning Techniques (e.g. at node A) Check the frequency of ABCD and AB, AC, AD.  If ABCD is frequent, prune the whole subtree.  If AC is NOT frequent, remove C from the parenthesis before expanding.  (ABCD) A (BCD) B (CD) C (D) D () AB (CD) AC (D) AD () BC (D) BD () CD () ABC (C) ABD () ACD () BCD () www.ThesisScientist.com ABCD ()Algorithm MaxMiner  Initially, generate one node N= ,  (ABCD) where h(N)= and t(N)=A,B,C,D.  Consider expanding N,  If h(N)t(N) is frequent, do not expand N.  If for some it(N), h(N)i is NOT frequent, remove i from t(N) before expanding N.  Apply global pruning techniques… www.ThesisScientist.comGlobal Pruning Technique (across subtrees)  When a max pattern is identified (e.g. ABCD), prune all nodes (e.g. B, C and D) where h(N)t(N) is a subset of it (e.g. ABCD).  (ABCD) A (BCD) B (CD) C (D) D () AB (CD) AC (D) AD () BC (D) BD () CD () ABC (C) ABD () ACD () BCD () www.ThesisScientist.com ABCD ()Example Tid Items  (ABCDEF) 10 A,B,C,D,E 20 B,C,D,E, A (BCDE) B (CDE) C (DE) D (E) E () 30 A,C,D,F Items Frequency Minsup=2 ABCDEF 0 A 2 Max patterns: B 2 C 3 D 3 E 2 F 1 www.ThesisScientist.comExample Tid Items  (ABCDEF) 10 A,B,C,D,E 20 B,C,D,E, A (BCDE) B (CDE) C (DE) D (E) E () 30 A,C,D,F AC (D) AD () Minsup=2 Node A Items Frequency Max patterns: ABCDE 1 AB 1 AC 2 AD 2 AE 1 www.ThesisScientist.comExample Tid Items  (ABCDEF) 10 A,B,C,D,E 20 B,C,D,E, A (BCDE) B (CDE) C (DE) D (E) E () 30 A,C,D,F AC (D) AD () Minsup=2 Node B Items Frequency Max patterns: BCDE 2 BCDE BC BD BE www.ThesisScientist.comExample Tid Items  (ABCDEF) 10 A,B,C,D,E 20 B,C,D,E, A (BCDE) B (CDE) C (DE) D (E) E () 30 A,C,D,F AC (D) AD () Minsup=2 Node AC Items Frequency Max patterns: ACD 2 BCDE ACD www.ThesisScientist.com