Attached Document have all the information
No plagarism1000 words
ITS632: Data Mining
Research Paper
Note: This is an individual submission. Each student must submit their own work.
Due: Feb 17th, 2020.
Submission Instructions: Submit on iLearn/Blackboard at the placeholder of the “Research Paper”
Assignment. Submission format: Word or PDF.
Write a minimum of 2.5 page paper that describes the basic concepts of Association Analysis. Also
describe the market basket analysis with examples. Use Times 12 font, double space, with 1 inch margin.
In addition to the 2.5 pages write-up, add answers to the following to your paper.
For each of the following questions, provide an example of an association rule from the market basket
domain that satisfies the following conditions. Also, describe whether such rules are subjectively
interesting. Your answers should have detailed responses, failing which several points will be deducted.
(a) A rule that has high support and high confidence.
(b) A rule that has reasonably high support but low confidence.
(c) A rule that has low support and low confidence.
(d) A rule that has low support and high confidence.
Note-1: for citations, use the following format:
https://owl.purdue.edu/owl/research_and_citation/resources.html
If any work is not cited or plagiarized, student will receive zero points for the Research paper.
Note-2: If instructions of font size, margin and paper length are not followed, several points will be
deducted.
https://owl.purdue.edu/owl/research_and_citation/resources.html
Data Mining
Association Analysis: Basic Concepts
and Algorithms
Lecture Notes
Introduction to Data Mining
by
Tan, Steinbach, Kumar
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 *
Association Rule Mining
Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction
Market-Basket transactions
Example of Association Rules
{Diaper} {Beer},
{Milk, Bread} {Eggs,Coke},
{Beer, Bread} {Milk},
Implication means co-occurrence, not causality!
TID
Items
1
Bread, Milk
2
Bread, Diaper, Beer, Eggs
3
Milk, Diaper, Beer, Coke
4
Bread, Milk, Diaper, Beer
5
Bread, Milk, Diaper, Coke
Definition: Frequent Itemset
Itemset
A collection of one or more items
Example: {Milk, Bread, Diaper}
k-itemset
An itemset that contains k items
Support count ()
Frequency of occurrence of an itemset
E.g. ({Milk, Bread,Diaper}) = 2
Support
Fraction of transactions that contain an itemset
E.g. s({Milk, Bread, Diaper}) = 2/5
Frequent Itemset
An itemset whose support is greater than or equal to a minsup threshold
TID
Items
1
Bread, Milk
2
Bread, Diaper, Beer, Eggs
3
Milk, Diaper, Beer, Coke
4
Bread, Milk, Diaper, Beer
5
Bread, Milk, Diaper, Coke
Definition: Association Rule
Association Rule
An implication expression of the form X Y, where X and Y are itemsets
Example:
{Milk, Diaper} {Beer}
Rule Evaluation Metrics
Support (s)
Fraction of transactions that contain both X and Y
Confidence (c)
Measures how often items in Y
appear in transactions that
contain X
Example:
TID
Items
1
Bread, Milk
2
Bread, Diaper, Beer, Eggs
3
Milk, Diaper, Beer, Coke
4
Bread, Milk, Diaper, Beer
5
Bread, Milk, Diaper, Coke
Association Rule Mining Task
Given a set of transactions T, the goal of association rule mining is to find all rules having
support ≥ minsup threshold
confidence ≥ minconf threshold
Brute-force approach:
List all possible association rules
Compute the support and confidence for each rule
Prune rules that fail the minsup and minconf thresholds
Computationally prohibitive!
Mining Association Rules
Example of Rules:
{Milk,Diaper} {Beer} (s=0.4, c=0.67)
{Milk,Beer} {Diaper} (s=0.4, c=1.0)
{Diaper,Beer} {Milk} (s=0.4, c=0.67)
{Beer} {Milk,Diaper} (s=0.4, c=0.67)
{Diaper} {Milk,Beer} (s=0.4, c=0.5)
{Milk} {Diaper,Beer} (s=0.4, c=0.5)
Observations:
All the above rules are binary partitions of the same itemset:
{Milk, Diaper, Beer}
Rules originating from the same itemset have identical support but
can have different confidence
Thus, we may decouple the support and confidence requirements
TID
Items
1
Bread, Milk
2
Bread, Diaper, Beer, Eggs
3
Milk, Diaper, Beer, Coke
4
Bread, Milk, Diaper, Beer
5
Bread, Milk, Diaper, Coke
Mining Association Rules
Two-step approach:
Frequent Itemset Generation
Generate all itemsets whose support minsup
Rule Generation
Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset
Frequent itemset generation is still computationally expensive
Frequent Itemset Generation
Given d items, there are 2d possible candidate itemsets
Frequent Itemset Generation
Brute-force approach:
Each itemset in the lattice is a candidate frequent itemset
Count the support of each candidate by scanning the database
Match each transaction against every candidate
Complexity ~ O(NMw) => Expensive since M = 2d !!!
N�
w�
M�
List of Candidates�
Computational Complexity
Given d unique items:
Total number of itemsets = 2d
Total number of possible association rules:
If d=6, R = 602 rules
Frequent Itemset Generation Strategies
Reduce the number of candidates (M)
Complete search: M=2d
Use pruning techniques to reduce M
Reduce the number of transactions (N)
Reduce size of N as the size of itemset increases
Used by DHP and vertical-based mining algorithms
Reduce the number of comparisons (NM)
Use efficient data structures to store the candidates or transactions
No need to match every candidate against every transaction
Reducing Number of Candidates
Apriori principle:
If an itemset is frequent, then all of its subsets must also be frequent
Apriori principle holds due to the following property of the support measure:
Support of an itemset never exceeds the support of its subsets
This is known as the anti-monotone property of support
Illustrating Apriori Principle
Found to be Infrequent
Pruned supersets
Illustrating Apriori Principle
Items (1-itemsets)
Pairs (2-itemsets)
(No need to generate
candidates involving Coke
or Eggs)
Triplets (3-itemsets)
Minimum Support = 3
If every subset is considered,
6C1 + 6C2 + 6C3 = 41
With support-based pruning,
6 + 6 + 1 = 13
Item
Count
Bread
4
Coke
2
Milk
4
Beer
3
Diaper
4
Eggs
1
Itemset
Count
{Bread,Milk}
3
{Bread,Beer}
2
{Bread,Diaper}
3
{Milk,Beer}
2
{Milk,Diaper}
3
{Beer,Diaper}
3
Itemset
Count
{Bread,Milk,Diaper}
3
Apriori Algorithm
Method:
Let k=1
Generate frequent itemsets of length 1
Repeat until no new frequent itemsets are identified
Generate length (k+1) candidate itemsets from length k frequent itemsets
Prune candidate itemsets containing subsets of length k that are infrequent
Count the support of each candidate by scanning the DB
Eliminate candidates that are infrequent, leaving only those that are frequent
Reducing Number of Comparisons
Candidate counting:
Scan the database of transactions to determine the support of each candidate itemset
To reduce the number of comparisons, store the candidates in a hash structure
Instead of matching each transaction against every candidate, match it against candidates contained in the hashed buckets
N�
k�
Buckets�
Hash Structure�
Generate Hash Tree
Suppose you have 15 candidate itemsets of length 3:
{1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4 5 8}, {1 5 9}, {1 3 6}, {2 3 4}, {5 6 7}, {3 4 5}, {3 5 6}, {3 5 7}, {6 8 9}, {3 6 7}, {3 6 8}
You need:
Hash function
Max leaf size: max number of itemsets stored in a leaf node (if number of candidate itemsets exceeds max leaf size, split the node)
2 3 4
5 6 7
1 4 5
1 3 6
1 2 4
4 5 7
1 2 5
4 5 8
1 5 9
3 4 5
3 5 6
3 5 7
6 8 9
3 6 7
3 6 8
1,4,7
2,5,8
3,6,9
Hash function
Association Rule Discovery: Hash tree
1,4,7
2,5,8
3,6,9
Hash Function
Candidate Hash Tree
Hash on 1, 4 or 7
1 5 9
1 4 5
1 3 6
3 4 5
3 6 7
3 6 8
3 5 6
3 5 7
6 8 9
2 3 4
5 6 7
1 2 4
4 5 7
1 2 5
4 5 8
Association Rule Discovery: Hash tree
1,4,7
2,5,8
3,6,9
Hash Function
Candidate Hash Tree
Hash on 2, 5 or 8
1 5 9
1 4 5
1 3 6
3 4 5
3 6 7
3 6 8
3 5 6
3 5 7
6 8 9
2 3 4
5 6 7
1 2 4
4 5 7
1 2 5
4 5 8
Association Rule Discovery: Hash tree
1,4,7
2,5,8
3,6,9
Hash Function
Candidate Hash Tree
Hash on 3, 6 or 9
1 5 9
1 4 5
1 3 6
3 4 5
3 6 7
3 6 8
3 5 6
3 5 7
6 8 9
2 3 4
5 6 7
1 2 4
4 5 7
1 2 5
4 5 8
Subset Operation
Given a transaction t, what are the possible subsets of size 3?
1 2 3 5 6�
Transaction, t�
2 3 5 6�
3 5 6�
2�
1�
5 6�
1 3�
3 5 6�
1 2�
6�
1 5�
5 6�
2 3�
6�
2 5�
5 6�
3�
1 2 3�1 2 5�1 2 6�
1 3 5�1 3 6�
1 5 6�
2 3 5�2 3 6�
2 5 6�
3 5 6�
Subsets of 3 items�
Level 1�
Level 2�
Level 3�
6�
3 5�
Subset Operation Using Hash Tree
transaction
1 4 5
3 6 7
3 6 8
3 5 6
3 5 7
6 8 9
2 3 4
5 6 7
1 2 4
4 5 7
1 2 5
4 5 8
1 5 9
1 3 6
3 4 5
1 2 3 5 6
1 +
2 3 5 6
3 5 6
2 +
5 6
3 +
2,5,8
3,6,9
Hash Function
1,4,7
Subset Operation Using Hash Tree
1 5 9
1 3 6
3 4 5
transaction
1 4 5
3 6 7
3 6 8
3 5 6
3 5 7
6 8 9
2 3 4
5 6 7
1 2 4
4 5 7
1 2 5
4 5 8
2,5,8
3,6,9
Hash Function
1,4,7
1 2 3 5 6
3 5 6
1 2 +
5 6
1 3 +
6
1 5 +
3 5 6
2 +
5 6
3 +
1 +
2 3 5 6
Subset Operation Using Hash Tree
1 5 9
1 3 6
3 4 5
transaction
Match transaction against 11 out of 15 candidates
1 4 5
3 6 7
3 6 8
3 5 6
3 5 7
6 8 9
2 3 4
5 6 7
1 2 4
4 5 7
1 2 5
4 5 8
2,5,8
3,6,9
Hash Function
1,4,7
1 2 3 5 6
3 5 6
1 2 +
5 6
1 3 +
6
1 5 +
3 5 6
2 +
5 6
3 +
1 +
2 3 5 6
Factors Affecting Complexity
Choice of minimum support threshold
lowering support threshold results in more frequent itemsets
this may increase number of candidates and max length of frequent itemsets
Dimensionality (number of items) of the data set
more space is needed to store support count of each item
if number of frequent items also increases, both computation and I/O costs may also increase
Size of database
since Apriori makes multiple passes, run time of algorithm may increase with number of transactions
Average transaction width
transaction width increases with denser data sets
This may increase max length of frequent itemsets and traversals of hash tree (number of subsets in a transaction increases with its width)
Compact Representation of Frequent Itemsets
Some itemsets are redundant because they have identical support as their supersets
Number of frequent itemsets
Need a compact representation
Maximal Frequent Itemset
Border
Infrequent Itemsets
Maximal Itemsets
An itemset is maximal frequent if none of its immediate supersets is frequent
null�
AB�
AC�
AD�
AE�
BC�
BD�
BE�
CD�
CE�
DE�
ABC�
ABD�
ABE�
ACD�
ACE�
ADE�
BCD�
BCE�
BDE�
CDE�
A�
B�
C�
D�
E�
ABCD�
ABCE�
ABDE�
ACDE�
BCDE�
ABCDE�
Closed Itemset
An itemset is closed if none of its immediate supersets has the same support as the itemset
Sheet1
TID Items
1 {A,B}
2 {B,C,D}
3 {A,B,C,D}
4 {A,B,D}
5 {A,B,C,D}
Sheet2
Sheet3
Sheet1
Itemset Support
{A} 4
{B} 5
{C} 3
{D} 4
{A,B} 4
{A,C} 2
{A,D} 3
{B,C} 3
{B,D} 4
{C,D} 3
Sheet2
Sheet3
Sheet1
Itemset Support
{A,B,C} 2
{A,B,D} 3
{A,C,D} 2
{B,C,D} 3
{A,B,C,D} 2
Sheet2
Sheet3
Maximal vs Closed Itemsets
Transaction Ids
Not supported by any transactions
Sheet1
TID Items
1 ABC
2 ABCD
3 BCE
4 ACDE
5 DE
Maximal vs Closed Frequent Itemsets
Minimum support = 2
# Closed = 9
# Maximal = 4
Closed and maximal
Closed but not maximal
Maximal vs Closed Itemsets
Frequent Itemsets�
Closed Frequent Itemsets�
Maximal Frequent Itemsets�
Alternative Methods for Frequent Itemset Generation
Traversal of Itemset Lattice
General-to-specific vs Specific-to-general
….�
Frequent itemset border�
null�
{a1,a2,…,an}�
(a) General-to-specific�
….�
null�
{a1,a2,…,an}�
Frequent itemset border�
(b) Specific-to-general�
….�
Frequent itemset border�
null�
{a1,a2,…,an}�
(c) Bidirectional�
Alternative Methods for Frequent Itemset Generation
Traversal of Itemset Lattice
Equivalent Classes
null�
AB�
AC�
AD�
null�
BC�
BD�
AB�
CD�
AC�
AD�
ABC�
ABD�
BC�
ACD�
BD�
CD�
BCD�
A�
B�
C�
A�
B�
C�
D�
D�
ABC�
ABD�
ACD�
BCD�
ABCD�
(a) Prefix tree�
(b) Suffix tree�
ABCD�
Alternative Methods for Frequent Itemset Generation
Traversal of Itemset Lattice
Breadth-first vs Depth-first
(a) Breadth first�
(b) Depth first�
Alternative Methods for Frequent Itemset Generation
Representation of Database
horizontal vs vertical data layout
Horizontal Data Layout�
Vertical Data Layout�
FP-growth Algorithm
Use a compressed representation of the database using an FP-tree
Once an FP-tree has been constructed, it uses a recursive divide-and-conquer approach to mine the frequent itemsets
FP-tree construction
null
A:1
B:1
null
A:1
B:1
B:1
C:1
D:1
After reading TID=1:
After reading TID=2:
Sheet1
TID Items
1 {A,B}
2 {B,C,D}
3 {A,C,D,E}
4 {A,D,E}
5 {A,B,C}
6 {A,B,C,D}
7 {B,C}
8 {A,B,C}
9 {A,B,D}
10 {B,C,E}
Sheet2
Sheet3
FP-Tree Construction
null
A:7
B:5
B:3
C:3
D:1
C:1
D:1
C:3
D:1
D:1
E:1
E:1
Pointers are used to assist frequent itemset generation
D:1
E:1
Transaction Database
Header table
Sheet1
TID Items
1 {A,B}
2 {B,C,D}
3 {A,C,D,E}
4 {A,D,E}
5 {A,B,C}
6 {A,B,C,D}
7 {B,C}
8 {A,B,C}
9 {A,B,D}
10 {B,C,E}
Sheet2
Sheet3
Sheet1
Item Pointer
A
B
C
D
E
Sheet2
Sheet3
FP-growth
null
A:7
B:5
B:1
C:1
D:1
C:1
D:1
C:3
D:1
D:1
Conditional Pattern base for D:
P = {(A:1,B:1,C:1),
(A:1,B:1),
(A:1,C:1),
(A:1),
(B:1,C:1)}
Recursively apply FP-growth on P
Frequent Itemsets found (with sup > 1):
AD, BD, CD, ACD, BCD
D:1
Tree Projection
Set enumeration tree:
Possible Extension: E(A) = {B,C,D,E}
Possible Extension: E(ABC) = {D,E}
Tree Projection
Items are listed in lexicographic order
Each node P stores the following information:
Itemset for node P
List of possible lexicographic extensions of P: E(P)
Pointer to projected database of its ancestor node
Bitvector containing information about which transactions in the projected database contain the itemset
Projected Database
Original Database:
Projected Database for node A:
For each transaction T, projected transaction at node A is T E(A)
Sheet1
TID Items
1 {A,B}
2 {B,C,D}
3 {A,C,D,E}
4 {A,D,E}
5 {A,B,C}
6 {A,B,C,D}
7 {B,C}
8 {A,B,C}
9 {A,B,D}
10 {B,C,E}
Sheet2
Sheet3
Sheet1
TID Items
1 {B}
2 {}
3 {C,D,E}
4 {D,E}
5 {B,C}
6 {B,C,D}
7 {}
8 {B,C}
9 {B,D}
10 {}
Sheet2
Sheet3
ECLAT
For each item, store a list of transaction ids (tids)
TID-list
ECLAT
Determine support of any k-itemset by intersecting tid-lists of two of its (k-1) subsets.
3 traversal approaches:
top-down, bottom-up and hybrid
Advantage: very fast support counting
Disadvantage: intermediate tid-lists may become too large for memory
Sheet1
TID Items A B C D E
1 A,B,E 1 1 2 2 1
2 B,C,D 4 2 3 4 3
3 C,E 5 5 4 5 6
4 A,C,D 6 7 8 9
5 A,B,C,D 7 8 9
6 A,E 8 10
7 A,B 9
8 A,B,C
9 A,C,D
10 B A B
1 1
4 2
5 5
6 7
7 8
8 10
9
Sheet2
Sheet3
Sheet1
TID Items A B C D E
1 A,B,E 1 1 2 2 1
2 B,C,D 4 2 3 4 3
3 C,E 5 5 4 5 6
4 A,C,D 6 7 8 9
5 A,B,C,D 7 8 9
6 A,E 8 10
7 A,B 9
8 A,B,C
9 A,C,D
10 B A B
1 1
4 2
5 5
6 7
7 8
8 10
9
Sheet2
Sheet3
Sheet1
TID Items A B C D E
1 A,B,E 1 1 2 2 1
2 B,C,D 4 2 3 4 3
3 C,E 5 5 4 5 6
4 A,C,D 6 7 8 9
5 A,B,C,D 7 8 9
6 A,E 8 10
7 A,B 9
8 A,B,C
9 A,C,D
10 B A B AB
1 1 1
4 2 5
5 5 7
6 7 8
7 8
8 10
9
Sheet2
Sheet3
Rule Generation
Given a frequent itemset L, find all non-empty subsets f L such that f L – f satisfies the minimum confidence requirement
If {A,B,C,D} is a frequent itemset, candidate rules:
ABC D, ABD C, ACD B, BCD A,
A BCD, B ACD, C ABD, D ABC
AB CD, AC BD, AD BC, BC AD,
BD AC, CD AB,
If |L| = k, then there are 2k – 2 candidate association rules (ignoring L and L)
Rule Generation
How to efficiently generate rules from frequent itemsets?
In general, confidence does not have an anti-monotone property
c(ABC D) can be larger or smaller than c(AB D)
But confidence of rules generated from the same itemset has an anti-monotone property
e.g., L = {A,B,C,D}:
c(ABC D) c(AB CD) c(A BCD)
Confidence is anti-monotone w.r.t. number of items on the RHS of the rule
Rule Generation for Apriori Algorithm
Lattice of rules
Low Confidence Rule
Pruned Rules
ABCD=>{ }�
BC=>AD�
BD=>AC�
CD=>AB�
AD=>BC�
AC=>BD�
AB=>CD�
D=>ABC�
C=>ABD�
B=>ACD�
A=>BCD�
ACD=>B�
ABD=>C�
ABC=>D�
BCD=>A�
Rule Generation for Apriori Algorithm
Candidate rule is generated by merging two rules that share the same prefix
in the rule consequent
join(CD=>AB,BD=>AC)
would produce the candidate
rule D => ABC
Prune rule D=>ABC if its
subset AD=>BC does not have
high confidence
Effect of Support Distribution
Many real data sets have skewed support distribution
Support distribution of a retail data set
Effect of Support Distribution
How to set the appropriate minsup threshold?
If minsup is set too high, we could miss itemsets involving interesting rare items (e.g., expensive products)
If minsup is set too low, it is computationally expensive and the number of itemsets is very large
Using a single minimum support threshold may not be effective
Multiple Minimum Support
How to apply multiple minimum supports?
MS(i): minimum support for item i
e.g.: MS(Milk)=5%, MS(Coke) = 3%,
MS(Broccoli)=0.1%, MS(Salmon)=0.5%
MS({Milk, Broccoli}) = min (MS(Milk), MS(Broccoli))
= 0.1%
Challenge: Support is no longer anti-monotone
Suppose: Support(Milk, Coke) = 1.5% and
Support(Milk, Coke, Broccoli) = 0.5%
{Milk,Coke} is infrequent but {Milk,Coke,Broccoli} is frequent
Multiple Minimum Support
Multiple Minimum Support
Multiple Minimum Support (Liu 1999)
Order the items according to their minimum support (in ascending order)
e.g.: MS(Milk)=5%, MS(Coke) = 3%,
MS(Broccoli)=0.1%, MS(Salmon)=0.5%
Ordering: Broccoli, Salmon, Coke, Milk
Need to modify Apriori such that:
L1 : set of frequent items
F1 : set of items whose support is MS(1)
where MS(1) is mini( MS(i) )
C2 : candidate itemsets of size 2 is generated from F1
instead of L1
Multiple Minimum Support (Liu 1999)
Modifications to Apriori:
In traditional Apriori,
A candidate (k+1)-itemset is generated by merging two
frequent itemsets of size k
The candidate is pruned if it contains any infrequent subsets
of size k
Pruning step has to be modified:
Prune only if subset contains the first item
e.g.: Candidate={Broccoli, Coke, Milk} (ordered according to
minimum support)
{Broccoli, Coke} and {Broccoli, Milk} are frequent but
{Coke, Milk} is infrequent
Candidate is not pruned because {Coke,Milk} does not contain
the first item, i.e., Broccoli.
Pattern Evaluation
Association rule algorithms tend to produce too many rules
many of them are uninteresting or redundant
Redundant if {A,B,C} {D} and {A,B} {D}
have same support & confidence
Interestingness measures can be used to prune/rank the derived patterns
In the original formulation of association rules, support & confidence are the only measures used
Application of Interestingness Measure
Interestingness Measures
Computing Interestingness Measure
Given a rule X Y, information needed to compute rule interestingness can be obtained from a contingency table
Contingency table for X Y
Used to define various measures
support, confidence, lift, Gini,
J-measure, etc.
Y Y
X f11 f10 f1+
X f01 f00 fo+
f+1 f+0 |T|
f11: support of X and Y
f10: support of X and Y
f01: support of X and Y
f00: support of X and Y
Drawback of Confidence
Coffee
Coffee
Tea 15 5 20
Tea 75 5 80
90 10 100
Association Rule: Tea Coffee
Confidence= P(Coffee|Tea) = 0.75
but P(Coffee) = 0.9
Although confidence is high, rule is misleading
P(Coffee|Tea) = 0.9375
Statistical Independence
Population of 1000 students
600 students know how to swim (S)
700 students know how to bike (B)
420 students know how to swim and bike (S,B)
P(SB) = 420/1000 = 0.42
P(S) P(B) = 0.6 0.7 = 0.42
P(SB) = P(S) P(B) => Statistical independence
P(SB) > P(S) P(B) => Positively correlated
P(SB) < P(S) P(B) => Negatively correlated
Statistical-based Measures
Measures that take into account statistical dependence
Example: Lift/Interest
Association Rule: Tea Coffee
Confidence= P(Coffee|Tea) = 0.75
but P(Coffee) = 0.9
Lift = 0.75/0.9= 0.8333 (< 1, therefore is negatively associated)
Coffee
Coffee
Tea 15 5 20
Tea 75 5 80
90 10 100
Drawback of Lift & Interest
Statistical independence:
If P(X,Y)=P(X)P(Y) => Lift = 1
Y Y
X 10 0 10
X 0 90 90
10 90 100
Y Y
X 90 0 90
X 0 10 10
90 10 100
There are lots of measures proposed in the literature
Some measures are good for certain applications, but not for others
What criteria should we use to determine whether a measure is good or bad?
What about Apriori-style support based pruning? How does it affect these measures?
Properties of A Good Measure
Piatetsky-Shapiro:
3 properties a good measure M must satisfy:
M(A,B) = 0 if A and B are statistically independent
M(A,B) increase monotonically with P(A,B) when P(A) and P(B) remain unchanged
M(A,B) decreases monotonically with P(A) [or P(B)] when P(A,B) and P(B) [or P(A)] remain unchanged
Comparing Different Measures
10 examples of contingency tables:
Rankings of contingency tables using various measures:
Sheet1
Example
E1 8123 83 424 1370 10000 1.1581671343
E2 8330 2 622 1046 10000 1.116800672
E3 9481 94 127 298 10000 1.030581565
E4 3954 3080 5 2961 10000 1.4198707063
E5 2886 1363 1320 4431 10000 1.6148802655
E6 1500 2000 500 6000 10000 2.1428571429
E7 4000 2000 1000 3000 10000 1.3333333333
E8 4000 2000 2000 2000 10000 1.1111111111
E9 1720 7121 5 1154 10000 1.127815235
E10 61 2483 4 7452 10000 3.6889211418
Sheet2
Sheet3
Property under Variable Permutation
Does M(A,B) = M(B,A)?
Symmetric measures:
support, lift, collective strength, cosine, Jaccard, etc
Asymmetric measures:
confidence, conviction, Laplace, J-measure, etc
Property under Row/Column Scaling
Grade-Gender Example (Mosteller, 1968):
Mosteller:
Underlying association should be independent of
the relative number of male and female students
in the samples
2x
10x
Male Female
High 2 3 5
Low 1 4 5
3 7 10
Male Female
High 4 30 34
Low 2 40 42
6 70 76
Property under Inversion Operation
Transaction 1
Transaction N
.
.
.
.
.
Example: -Coefficient
-coefficient is analogous to correlation coefficient for continuous variables
Coefficient is the same for both tables
Y Y
X 60 10 70
X 10 20 30
70 30 100
Y Y
X 20 10 30
X 10 60 70
30 70 100
Property under Null Addition
Invariant measures:
support, cosine, Jaccard, etc
Non-invariant measures:
correlation, Gini, mutual information, odds ratio, etc
Different Measures have Different Properties
Sheet1
Symbol Measure Range P1 P2 P3 O1 O2 O3 O3′ O4
F Correlation -1 … 0 … 1 Yes Yes Yes Yes No Yes Yes No
l Lambda 0 … 1 Yes No No Yes No No* Yes No
a Odds ratio Yes* Yes Yes Yes Yes Yes* Yes No
Q Yule’s Q -1 … 0 … 1 Yes Yes Yes Yes Yes Yes Yes No
Y Yule’s Y -1 … 0 … 1 Yes Yes Yes Yes Yes Yes Yes No
k Cohen’s -1 … 0 … 1 Yes Yes Yes Yes No No Yes No
M Mutual Information 0 … 1 Yes Yes Yes Yes No No* Yes No
J J-Measure 0 … 1 Yes No No No No No No No
G Gini Index 0 … 1 Yes No No No No No* Yes No
s Support 0 … 1 No Yes No Yes No No No No
c Confidence 0 … 1 No Yes No Yes No No No Yes
L Laplace 0 … 1 No Yes No Yes No No No No
V Conviction No Yes No Yes** No No Yes No
I Interest Yes* Yes Yes Yes No No No No
IS IS (cosine) 0 .. 1 No Yes Yes Yes No No No Yes
PS Piatetsky-Shapiro’s -0.25 … 0 … 0.25 Yes Yes Yes Yes No Yes Yes No
F Certainty factor -1 … 0 … 1 Yes Yes Yes No No No Yes No
AV Added value 0.5 … 1 … 1 Yes Yes Yes No No No No No
S Collective strength No Yes Yes Yes No Yes* Yes No
z Jaccard 0 .. 1 No Yes Yes Yes No No No Yes
K Klosgen’s Yes Yes Yes No No No No No
where:
O1: Symmetry under variable permutation
O2: Marginal invariance
O3: Antisymmetry under row or column permutation
O3′: Inversion invariance
O4: Null invariance
Yes*: Yes if measure is normalized
Yes**: Yes if measure is symmetrized by taking max(M(A,B),M(B,A))
No*: Symmetry under row or column permutation
Sheet2
Sheet3
3
3
2
0
3
1
3
2
1
3
2
K
K
÷
÷
ø
ö
ç
ç
è
æ
–
–
÷
÷
ø
ö
ç
ç
è
æ
–
MBD01E1AF4B.unknown
Support-based Pruning
Most of the association rule mining algorithms use support measure to prune rules and itemsets
Study effect of support pruning on correlation of itemsets
Generate 10000 random contingency tables
Compute support and pairwise correlation for each table
Apply support-based pruning and examine the tables that are removed
Effect of Support-based Pruning
Chart1
-1
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
All Itempairs
13
95
172
282
359
503
648
716
830
867
908
915
881
759
602
532
370
259
194
84
11
Chart2
-1
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
Support > 0.9
0
0
4
7
9
7
25
35
42
86
92
75
54
28
14
16
3
5
0
0
0
Chart3
-1
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
Support > 0.7
7
29
70
95
150
182
267
296
336
383
375
370
332
272
202
161
89
49
41
17
4
Chart4
-1
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
Support > 0.5
13
92
164
265
327
428
542
572
640
665
664
678
645
540
406
355
237
139
118
51
5
Chart5
-1
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
Support < 0.01
10
21
22
18
28
28
46
40
33
25
21
7
2
1
0
0
0
0
0
0
0
Chart6
-1
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
Support < 0.03
13
71
68
66
67
84
94
93
93
77
56
30
14
5
3
0
0
0
0
0
0
Chart7
-1
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
Support < 0.05
13
91
104
106
107
128
141
151
161
136
99
67
45
15
8
4
2
1
0
0
0
Sheet1
Correlation All > 0.9 > 0.7 > 0.5 < 0.01 < 0.03 < 0.05
-1 13 0 7 13 10 13 13
-0.9 95 0 29 92 21 71 91
-0.8 172 4 70 164 22 68 104
-0.7 282 7 95 265 18 66 106
-0.6 359 9 150 327 28 67 107
-0.5 503 7 182 428 28 84 128
-0.4 648 25 267 542 46 94 141
-0.3 716 35 296 572 40 93 151
-0.2 830 42 336 640 33 93 161
-0.1 867 86 383 665 25 77 136
0 908 92 375 664 21 56 99
0.1 915 75 370 678 7 30 67
0.2 881 54 332 645 2 14 45
0.3 759 28 272 540 1 5 15
0.4 602 14 202 406 0 3 8
0.5 532 16 161 355 0 0 4
0.6 370 3 89 237 0 0 2
0.7 259 5 49 139 0 0 1
0.8 194 0 41 118 0 0 0
0.9 84 0 17 51 0 0 0
1 11 0 4 5 0 0 0
Sheet2
Sheet3
Effect of Support-based Pruning
Support-based pruning eliminates mostly negatively correlated itemsets
Chart1
-1
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
All Itempairs
13
95
172
282
359
503
648
716
830
867
908
915
881
759
602
532
370
259
194
84
11
Chart2
-1
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
Support > 0.9
0
0
4
7
9
7
25
35
42
86
92
75
54
28
14
16
3
5
0
0
0
Chart3
-1
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
Support > 0.7
7
29
70
95
150
182
267
296
336
383
375
370
332
272
202
161
89
49
41
17
4
Chart4
-1
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
Support > 0.5
13
92
164
265
327
428
542
572
640
665
664
678
645
540
406
355
237
139
118
51
5
Chart5
-1
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
Support < 0.01
10
21
22
18
28
28
46
40
33
25
21
7
2
1
0
0
0
0
0
0
0
Chart6
-1
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
Support < 0.03
13
71
68
66
67
84
94
93
93
77
56
30
14
5
3
0
0
0
0
0
0
Chart7
-1
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
Support < 0.05
13
91
104
106
107
128
141
151
161
136
99
67
45
15
8
4
2
1
0
0
0
Sheet1
Correlation All > 0.9 > 0.7 > 0.5 < 0.01 < 0.03 < 0.05
-1 13 0 7 13 10 13 13
-0.9 95 0 29 92 21 71 91
-0.8 172 4 70 164 22 68 104
-0.7 282 7 95 265 18 66 106
-0.6 359 9 150 327 28 67 107
-0.5 503 7 182 428 28 84 128
-0.4 648 25 267 542 46 94 141
-0.3 716 35 296 572 40 93 151
-0.2 830 42 336 640 33 93 161
-0.1 867 86 383 665 25 77 136
0 908 92 375 664 21 56 99
0.1 915 75 370 678 7 30 67
0.2 881 54 332 645 2 14 45
0.3 759 28 272 540 1 5 15
0.4 602 14 202 406 0 3 8
0.5 532 16 161 355 0 0 4
0.6 370 3 89 237 0 0 2
0.7 259 5 49 139 0 0 1
0.8 194 0 41 118 0 0 0
0.9 84 0 17 51 0 0 0
1 11 0 4 5 0 0 0
Sheet2
Sheet3
Chart1
-1
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
All Itempairs
13
95
172
282
359
503
648
716
830
867
908
915
881
759
602
532
370
259
194
84
11
Chart2
-1
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
Support > 0.9
0
0
4
7
9
7
25
35
42
86
92
75
54
28
14
16
3
5
0
0
0
Chart3
-1
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
Support > 0.7
7
29
70
95
150
182
267
296
336
383
375
370
332
272
202
161
89
49
41
17
4
Chart4
-1
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
Support > 0.5
13
92
164
265
327
428
542
572
640
665
664
678
645
540
406
355
237
139
118
51
5
Chart5
-1
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
Support < 0.01
10
21
22
18
28
28
46
40
33
25
21
7
2
1
0
0
0
0
0
0
0
Chart6
-1
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
Support < 0.03
13
71
68
66
67
84
94
93
93
77
56
30
14
5
3
0
0
0
0
0
0
Chart7
-1
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
Support < 0.05
13
91
104
106
107
128
141
151
161
136
99
67
45
15
8
4
2
1
0
0
0
Sheet1
Correlation All > 0.9 > 0.7 > 0.5 < 0.01 < 0.03 < 0.05
-1 13 0 7 13 10 13 13
-0.9 95 0 29 92 21 71 91
-0.8 172 4 70 164 22 68 104
-0.7 282 7 95 265 18 66 106
-0.6 359 9 150 327 28 67 107
-0.5 503 7 182 428 28 84 128
-0.4 648 25 267 542 46 94 141
-0.3 716 35 296 572 40 93 151
-0.2 830 42 336 640 33 93 161
-0.1 867 86 383 665 25 77 136
0 908 92 375 664 21 56 99
0.1 915 75 370 678 7 30 67
0.2 881 54 332 645 2 14 45
0.3 759 28 272 540 1 5 15
0.4 602 14 202 406 0 3 8
0.5 532 16 161 355 0 0 4
0.6 370 3 89 237 0 0 2
0.7 259 5 49 139 0 0 1
0.8 194 0 41 118 0 0 0
0.9 84 0 17 51 0 0 0
1 11 0 4 5 0 0 0
Sheet2
Sheet3
Chart1
-1
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
All Itempairs
13
95
172
282
359
503
648
716
830
867
908
915
881
759
602
532
370
259
194
84
11
Chart2
-1
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
Support > 0.9
0
0
4
7
9
7
25
35
42
86
92
75
54
28
14
16
3
5
0
0
0
Chart3
-1
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
Support > 0.7
7
29
70
95
150
182
267
296
336
383
375
370
332
272
202
161
89
49
41
17
4
Chart4
-1
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
Support > 0.5
13
92
164
265
327
428
542
572
640
665
664
678
645
540
406
355
237
139
118
51
5
Chart5
-1
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
Support < 0.01
10
21
22
18
28
28
46
40
33
25
21
7
2
1
0
0
0
0
0
0
0
Chart6
-1
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
Support < 0.03
13
71
68
66
67
84
94
93
93
77
56
30
14
5
3
0
0
0
0
0
0
Chart7
-1
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
Support < 0.05
13
91
104
106
107
128
141
151
161
136
99
67
45
15
8
4
2
1
0
0
0
Sheet1
Correlation All > 0.9 > 0.7 > 0.5 < 0.01 < 0.03 < 0.05
-1 13 0 7 13 10 13 13
-0.9 95 0 29 92 21 71 91
-0.8 172 4 70 164 22 68 104
-0.7 282 7 95 265 18 66 106
-0.6 359 9 150 327 28 67 107
-0.5 503 7 182 428 28 84 128
-0.4 648 25 267 542 46 94 141
-0.3 716 35 296 572 40 93 151
-0.2 830 42 336 640 33 93 161
-0.1 867 86 383 665 25 77 136
0 908 92 375 664 21 56 99
0.1 915 75 370 678 7 30 67
0.2 881 54 332 645 2 14 45
0.3 759 28 272 540 1 5 15
0.4 602 14 202 406 0 3 8
0.5 532 16 161 355 0 0 4
0.6 370 3 89 237 0 0 2
0.7 259 5 49 139 0 0 1
0.8 194 0 41 118 0 0 0
0.9 84 0 17 51 0 0 0
1 11 0 4 5 0 0 0
Sheet2
Sheet3
Effect of Support-based Pruning
Investigate how support-based pruning affects other measures
Steps:
Generate 10000 contingency tables
Rank each table according to the different measures
Compute the pair-wise correlation between the measures
Effect of Support-based Pruning
Without Support Pruning (All Pairs)
Red cells indicate correlation between
the pair of measures > 0.85
40.14% pairs have correlation > 0.85
Scatter Plot between Correlation & Jaccard Measure
Effect of Support-based Pruning
0.5% support 50%
61.45% pairs have correlation > 0.85
Scatter Plot between Correlation & Jaccard Measure:
Effect of Support-based Pruning
0.5% support 30%
76.42% pairs have correlation > 0.85
Scatter Plot between Correlation & Jaccard Measure
Subjective Interestingness Measure
Objective measure:
Rank patterns based on statistics computed from data
e.g., 21 measures of association (support, confidence, Laplace, Gini, mutual information, Jaccard, etc).
Subjective measure:
Rank patterns according to user’s interpretation
A pattern is subjectively interesting if it contradicts the
expectation of a user (Silberschatz & Tuzhilin)
A pattern is subjectively interesting if it is actionable
(Silberschatz & Tuzhilin)
Interestingness via Unexpectedness
Need to model expectation of users (domain knowledge)
Need to combine expectation of users with evidence from data (i.e., extracted patterns)
+
Pattern expected to be frequent
–
Pattern expected to be infrequent
Pattern found to be frequent
Pattern found to be infrequent
+
–
Expected Patterns
–
+
Unexpected Patterns
Interestingness via Unexpectedness
Web Data (Cooley et al 2001)
Domain knowledge in the form of site structure
Given an itemset F = {X1, X2, …, Xk} (Xi : Web pages)
L: number of links connecting the pages
lfactor = L / (k k-1)
cfactor = 1 (if graph is connected), 0 (disconnected graph)
Structure evidence = cfactor lfactor
Usage evidence
Use Dempster-Shafer theory to combine domain knowledge and evidence from data
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Beer
}
Diaper
,
Milk
{
Þ
4
.
0
5
2
|
T
|
)
Beer
Diaper,
,
Milk
(
=
=
=
s
s
67
.
0
3
2
)
Diaper
,
Milk
(
)
Beer
Diaper,
Milk,
(
=
=
=
s
s
c
null
AB
AC
AD
AE
BC
BD
BE
CD
CE
DE
A
B
C
D
E
ABC
ABD
ABE
ACD
ACE
ADE
BCD
BCE
BDE
CDE
ABCD
ABCE
ABDE
ACDE
BCDE
ABCDE
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
N
Transactions
List of
Candidates
M
w
1
2
3
1
1
1
1
+
–
=
ú
û
ù
ê
ë
é
÷
ø
ö
ç
è
æ
–
´
÷
ø
ö
ç
è
æ
=
+
–
=
–
=
å
å
d
d
d
k
k
d
j
j
k
d
k
d
R
)
(
)
(
)
(
:
,
Y
s
X
s
Y
X
Y
X
³
Þ
Í
”
null
ABACADAEBCBDBECDCEDE
ABCDE
ABCABDABEACDACEADEBCDBCEBDECDE
ABCDABCEABDEACDEBCDE
ABCDE
null
ABACADAEBCBDBECDCEDE
ABCDE
ABCABDABEACDACEADEBCDBCEBDECDE
ABCDABCEABDEACDEBCDE
ABCDE
Item
Count
Bread
4
Coke
2
Milk
4
Beer
3
Diaper
4
Eggs
1
Itemset
Count
{
Bread,Milk}
3
{
Bread,Beer}
2
{
Bread,Diaper}
3
{
Milk,Beer}
2
{
Milk,Diaper}
3
{
Beer,Diaper}
3
Itemset
Count
{Bread,Milk,Diaper}
3
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
N
Transactions
Hash Structure
k
Buckets
1 2 3 5 6
Transaction, t
2 3 5 613 5 62
5 61 33 5 61 261 55 62 362 5
5 63
1 2 3
1 2 5
1 2 6
1 3 5
1 3 6
1 5 6
2 3 5
2 3 6
2 5 63 5 6
Subsets of 3 items
Level 1
Level 2
Level 3
63 5
TIDA1A2A3A4A5A6A7A8A9A10B1B2B3B4B5B6B7B8B9B10C1C2C3C4C5C6C7C8C9C10
1
1111111111
00000000000000000000
2
1111111111
00000000000000000000
3
1111111111
00000000000000000000
4
1111111111
00000000000000000000
5
1111111111
00000000000000000000
60000000000
1111111111
0000000000
70000000000
1111111111
0000000000
80000000000
1111111111
0000000000
90000000000
1111111111
0000000000
100000000000
1111111111
0000000000
1100000000000000000000
1111111111
1200000000000000000000
1111111111
1300000000000000000000
1111111111
1400000000000000000000
1111111111
1500000000000000000000
1111111111
å
=
÷
ø
ö
ç
è
æ
´
=
10
1
10
3
k
k
null
ABACADAEBCBDBECDCEDE
ABCDE
ABCABDABEACDACEADEBCDBCEBDECDE
ABCDABCEABDEACDEBCDE
ABCD
E
TIDItems
1{A,B}
2{B,C,D}
3{A,B,C,D}
4{A,B,D}
5{A,B,C,D}
ItemsetSupport
{A}4
{B}5
{C}3
{D}4
{A,B}4
{A,C}2
{A,D}3
{B,C}3
{B,D}4
{C,D}3
ItemsetSupport
{A,B,C}2
{A,B,D}3
{A,C,D}2
{B,C,D}3
{A,B,C,D}2
TIDItems
1ABC
2ABCD
3BCE
4ACDE
5DE
null
AB
AC
AD
AE
BC
BD
BE
CD
CE
DE
A
B
C
D
E
ABC
ABD
ABE
ACD
ACE
ADE
BCD
BCE
BDE
CDE
ABCD
ABCE
ABDE
ACDE
BCDE
ABCDE
124
123
1234
245
345
12
124
24
4
123
2
3
24
34
45
12
2
24
4
4
2
3
4
2
4
null
AB
AC
AD
AE
BC
BD
BE
CD
CE
DE
A
B
C
D
E
ABC
ABD
ABE
ACD
ACE
ADE
BCD
BCE
BDE
CDE
ABCD
ABCE
ABDE
ACDE
BCDE
ABCDE
124
123
1234
245
345
12
124
24
4
123
2
3
24
34
45
12
2
24
4
4
2
3
4
2
4
Frequent
Itemsets
Closed
Frequent
Itemsets
Maximal
Frequent
Itemsets
Frequent
itemset
border
null
{a
1
,a
2
,…,a
n
}
(a) General-to-specific
null
{a
1
,a
2
,…,a
n
}
Frequent
itemset
border
(b) Specific-to-general
..
..
..
..
Frequent
itemset
border
null
{a
1
,a
2
,…,a
n
}
(c) Bidirectional
..
..
null
ABACADBCBD
CD
AB
C
D
ABCABDACD
BCD
ABCD
null
ABAC
ADBCBDCD
AB
C
D
ABC
ABDACDBCD
ABCD
(a) Prefix tree(b) Suffix tree
(a) Breadth first(b) Depth first
TIDItems
1A,B,E
2B,C,D
3C,E
4A,C,D
5A,B,C,D
6A,E
7A,B
8A,B,C
9A,C,D
10B
Horizontal
Data Layout
ABCDE
11221
42343
55456
6789
789
810
9
Vertical Data Layout
TIDItems
1{A,B}
2{B,C,D}
3{A,C,D,E}
4{A,D,E}
5{A,B,C}
6{A,B,C,D}
7{B,C}
8{A,B,C}
9{A,B,D}
10{B,C,E}
ItemPointer
A
B
C
D
E
null
AB
AC
AD
AE
BC
BD
BE
CD
CE
DE
A
B
C
D
E
ABC
ABD
ABE
ACD
ACE
ADE
BCD
BCE
BDE
CDE
ABCD
ABCE
ABDE
ACDE
BCDE
ABCDE
TIDItems
1{B}
2{}
3{C,D,E}
4{D,E}
5{B,C}
6{B,C,D}
7{}
8{B,C}
9{B,D}
10{}
TID
Items
1
A,B,E
2
B,C,D
3
C,E
4
A,C,D
5
A,B,C,D
6
A,E
7
A,B
8
A,B,C
9
A,C,D
10
B
Horizontal
Data Layout
A
B
C
D
E
1
1
2
2
1
4
2
3
4
3
5
5
4
5
6
6
7
8
9
7
8
9
8
10
9
Vertical Data Layout
A
1
4
5
6
7
8
9
B
1
2
5
7
8
10
AB
1
5
7
8
ABCD=>{ }
BCD=>AACD=>BABD=>CABC=>D
BC=>ADBD=>ACCD=>ABAD=>BCAC=>BDAB=>CD
D=>ABCC=>ABDB=>ACDA=>BCD
ABCD=>{ }
BCD=>AACD=>BABD=>CABC=>D
BC=>ADBD=>ACCD=>ABAD=>BCAC=>BDAB=>CD
D=>ABCC=>ABDB=>ACDA=>BCD
BD=>
AC
CD=>
AB
D=>
ABC
A
Item
MS(I)
Sup(I)
A
0.10%
0.25%
B
0.20%
0.26%
C
0.30%
0.29%
D
0.50%
0.05%
E
3%
4.20%
B
C
D
E
AB
AC
AD
AE
BC
BD
BE
CD
CE
DE
ABC
ABD
ABE
ACD
ACE
ADE
BCD
BCE
BDE
CDE
A
B
C
D
E
AB
AC
AD
AE
BC
BD
BE
CD
CE
DE
ABC
ABD
ABE
ACD
ACE
ADE
BCD
BCE
BDE
CDE
Item
MS(I)
Sup(I)
A
0.10%
0.25%
B
0.20%
0.26%
C
0.30%
0.29%
D
0.50%
0.05%
E
3%
4.20%
Featur
e
Prod
uct
Prod
uct
Prod
uct
Prod
uct
Prod
uct
Prod
uct
Prod
uct
Prod
uct
Prod
uct
Prod
uct
Featur
e
Featur
e
Featur
e
Featur
e
Featur
e
Featur
e
Featur
e
Featur
e
Featur
e
Selection
Preprocessing
Mining
Postprocessing
Data
Selected
Data
Preprocessed
Data
Patterns
Knowledge
)]
(
1
)[
(
)]
(
1
)[
(
)
(
)
(
)
,
(
)
(
)
(
)
,
(
)
(
)
(
)
,
(
)
(
)
|
(
Y
P
Y
P
X
P
X
P
Y
P
X
P
Y
X
P
t
coefficien
Y
P
X
P
Y
X
P
PS
Y
P
X
P
Y
X
P
Interest
Y
P
X
Y
P
Lift
–
–
–
=
–
–
=
=
=
f
10
)
1
.
0
)(
1
.
0
(
1
.
0
=
=
Lift
11
.
1
)
9
.
0
)(
9
.
0
(
9
.
0
=
=
Lift
Example
f
11
f
10
f
01
f
00
E18123834241370
E2833026221046
E3948194127298
E43954308052961
E52886136313204431
E6150020005006000
E74000200010003000
E84000200020002000
E91720712151154
E1061248347452
B
B
A
p
q
A
r
s
A
A
B
p
r
B
q
s
1
0
0
0
0
0
0
0
0
1
0
0
0
0
1
0
0
0
0
0
0
1
1
1
1
1
1
1
1
0
1
1
1
1
0
1
1
1
1
1
A
B
C
D
(a)
(b)
0
1
1
1
1
1
1
1
1
0
0
0
0
0
1
0
0
0
0
0
(c)
E
F
5238
.
0
3
.
0
7
.
0
3
.
0
7
.
0
7
.
0
7
.
0
6
.
0
=
´
´
´
´
–
=
f
5238
.
0
3
.
0
7
.
0
3
.
0
7
.
0
3
.
0
3
.
0
2
.
0
=
´
´
´
´
–
=
f
B
B
A
p
q
A
r
s
B
B
A
p
q
A
r
s + k
SymbolMeasureRangeP1P2P3O1O2O3O3’O4
Correlation-1 … 0 … 1YesYesYesYesNoYesYesNo
Lambda0 … 1YesNoNoYesNoNo*YesNo
Odds ratio0 … 1 … Yes*YesYesYesYesYes*YesNo
Q
Yule’s Q-1 … 0 … 1YesYesYesYesYesYesYesNo
Y
Yule’s Y-1 … 0 … 1YesYesYesYesYesYesYesNo
Cohen’s-1 … 0 … 1YesYesYesYesNoNoYesNo
M
Mutual Information0 … 1YesYesYesYesNoNo*YesNo
J
J-Measure0 … 1YesNoNoNoNoNoNoNo
G
Gini Index0 … 1YesNoNoNoNoNo*YesNo
s
Support0 … 1NoYesNoYesNoNoNoNo
c
Confidence0 … 1NoYesNoYesNoNoNoYes
L
Laplace0 … 1NoYesNoYesNoNoNoNo
V
Conviction0.5 … 1 … NoYesNoYes**NoNoYesNo
I
Interest0 … 1 … Yes*YesYesYesNoNoNoNo
IS
IS (cosine)0 .. 1NoYesYesYesNoNoNoYes
PS
Piatetsky-Shapiro’s-0.25 … 0 … 0.25YesYesYesYesNoYesYesNo
F
Certainty factor-1 … 0 … 1YesYesYesNoNoNoYesNo
AV
Added value0.5 … 1 … 1YesYesYesNoNoNoNoNo
S
Collective strength0 … 1 … NoYesYesYesNoYes*YesNo
Jaccard0 .. 1NoYesYesYesNoNoNoYes
K
Klosgen’sYesYesYesNoNoNoNoNo
33
2
0
3
1
321
3
2
All Itempairs
0
100
200
300
400
500
600
700
800
900
1000
-1
-0.9-0.8-0.7-0.6-0.5-0.4-0.3-0.2-0.1
0
0.10.20.30.40.50.60.70.80.9
1
Correlation
Support < 0.01
0
50
100
150
200
250
300
-1
-0.9-0.8-0.7-0.6-0.5-0.4-0.3-0.2-0.1
0
0.10.20.30.40.50.60.70.80.9
1
Correlation
Support < 0.03
0
50
100
150
200
250
300
-1
-0.9-0.8-0.7-0.6-0.5-0.4-0.3-0.2-0.1
0
0.10.20.30.40.50.60.70.80.9
1
Correlation
Support < 0.05
0
50
100
150
200
250
300
-1
-0.9-0.8-0.7-0.6-0.5-0.4-0.3-0.2-0.1
0
0.10.20.30.40.50.60.70.80.9
1
Correlation
All Pairs (40.14%)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Conviction
Odds ratio
Col Strength
Correlation
Interest
PS
CF
Yule Y
Reliability
Kappa
Klosgen
Yule Q
Confidence
Laplace
IS
Support
Jaccard
Lambda
Gini
J-measure
Mutual Info
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
Jaccard
0.005 <= support <= 0.500 (61.45%)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Interest
Conviction
Odds ratio
Col Strength
Laplace
Confidence
Correlation
Klosgen
Reliability
PS
Yule Q
CF
Yule Y
Kappa
IS
Jaccard
Support
Lambda
Gini
J-measure
Mutual Info
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
Jaccard
0.005 <= support <= 0.300 (76.42%)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Support
Interest
Reliability
Conviction
Yule Q
Odds ratio
Confidence
CF
Yule Y
Kappa
Correlation
Col Strength
IS
Jaccard
Laplace
PS
Klosgen
Lambda
Mutual Info
Gini
J-measure
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Correlation
Jaccard
)
...
(
)
...
(
2
1
2
1
k
k
X
X
X
P
X
X
X
P
È
È
È
=
I
I
I
We provide professional writing services to help you score straight A’s by submitting custom written assignments that mirror your guidelines.
Get result-oriented writing and never worry about grades anymore. We follow the highest quality standards to make sure that you get perfect assignments.
Our writers have experience in dealing with papers of every educational level. You can surely rely on the expertise of our qualified professionals.
Your deadline is our threshold for success and we take it very seriously. We make sure you receive your papers before your predefined time.
Someone from our customer support team is always here to respond to your questions. So, hit us up if you have got any ambiguity or concern.
Sit back and relax while we help you out with writing your papers. We have an ultimate policy for keeping your personal and order-related details a secret.
We assure you that your document will be thoroughly checked for plagiarism and grammatical errors as we use highly authentic and licit sources.
Still reluctant about placing an order? Our 100% Moneyback Guarantee backs you up on rare occasions where you aren’t satisfied with the writing.
You don’t have to wait for an update for hours; you can track the progress of your order any time you want. We share the status after each step.
Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.
Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.
From brainstorming your paper's outline to perfecting its grammar, we perform every step carefully to make your paper worthy of A grade.
Hire your preferred writer anytime. Simply specify if you want your preferred expert to write your paper and we’ll make that happen.
Get an elaborate and authentic grammar check report with your work to have the grammar goodness sealed in your document.
You can purchase this feature if you want our writers to sum up your paper in the form of a concise and well-articulated summary.
You don’t have to worry about plagiarism anymore. Get a plagiarism report to certify the uniqueness of your work.
Join us for the best experience while seeking writing assistance in your college life. A good grade is all you need to boost up your academic excellence and we are all about it.
We create perfect papers according to the guidelines.
We seamlessly edit out errors from your papers.
We thoroughly read your final draft to identify errors.
Work with ultimate peace of mind because we ensure that your academic work is our responsibility and your grades are a top concern for us!
Dedication. Quality. Commitment. Punctuality
Here is what we have achieved so far. These numbers are evidence that we go the extra mile to make your college journey successful.
We have the most intuitive and minimalistic process so that you can easily place an order. Just follow a few steps to unlock success.
We understand your guidelines first before delivering any writing service. You can discuss your writing needs and we will have them evaluated by our dedicated team.
We write your papers in a standardized way. We complete your work in such a way that it turns out to be a perfect description of your guidelines.
We promise you excellent grades and academic excellence that you always longed for. Our writers stay in touch with you via email.