Data Mining

Attached Document have all the information

No plagarism1000 words

Don't use plagiarized sources. Get Your Custom Essay on

Data Mining

Just from $13/Page

Order Essay

ITS632: Data Mining

Research Paper

Note: This is an individual submission. Each student must submit their own work.

Due: Feb 17th, 2020.

Submission Instructions: Submit on iLearn/Blackboard at the placeholder of the “Research Paper”
Assignment. Submission format: Word or PDF.

Write a minimum of 2.5 page paper that describes the basic concepts of Association Analysis. Also
describe the market basket analysis with examples. Use Times 12 font, double space, with 1 inch margin.

In addition to the 2.5 pages write-up, add answers to the following to your paper.

For each of the following questions, provide an example of an association rule from the market basket
domain that satisfies the following conditions. Also, describe whether such rules are subjectively
interesting. Your answers should have detailed responses, failing which several points will be deducted.

(a) A rule that has high support and high confidence.
(b) A rule that has reasonably high support but low confidence.
(c) A rule that has low support and low confidence.
(d) A rule that has low support and high confidence.

Note-1: for citations, use the following format:

https://owl.purdue.edu/owl/research_and_citation/resources.html

If any work is not cited or plagiarized, student will receive zero points for the Research paper.

Note-2: If instructions of font size, margin and paper length are not followed, several points will be
deducted.

https://owl.purdue.edu/owl/research_and_citation/resources.html

Data Mining
Association Analysis: Basic Concepts
and Algorithms
Lecture Notes
Introduction to Data Mining
by
Tan, Steinbach, Kumar

Association Rule Mining
Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction
Market-Basket transactions
Example of Association Rules
{Diaper}  {Beer},
{Milk, Bread}  {Eggs,Coke},
{Beer, Bread}  {Milk},
Implication means co-occurrence, not causality!

TID

Items

Bread, Milk

Bread, Diaper, Beer, Eggs

Milk, Diaper, Beer, Coke

Bread, Milk, Diaper, Beer

Bread, Milk, Diaper, Coke

Definition: Frequent Itemset
Itemset
A collection of one or more items
Example: {Milk, Bread, Diaper}
k-itemset
An itemset that contains k items
Support count ()
Frequency of occurrence of an itemset
E.g. ({Milk, Bread,Diaper}) = 2
Support
Fraction of transactions that contain an itemset
E.g. s({Milk, Bread, Diaper}) = 2/5
Frequent Itemset
An itemset whose support is greater than or equal to a minsup threshold

TID

Items

Bread, Milk

Bread, Diaper, Beer, Eggs

Milk, Diaper, Beer, Coke

Bread, Milk, Diaper, Beer

Bread, Milk, Diaper, Coke

Definition: Association Rule
Association Rule
An implication expression of the form X  Y, where X and Y are itemsets
Example:
{Milk, Diaper}  {Beer}

Rule Evaluation Metrics
Support (s)
Fraction of transactions that contain both X and Y
Confidence (c)
Measures how often items in Y
appear in transactions that
contain X
Example:

TID

Items

Bread, Milk

Bread, Diaper, Beer, Eggs

Milk, Diaper, Beer, Coke

Bread, Milk, Diaper, Beer

Bread, Milk, Diaper, Coke

Association Rule Mining Task
Given a set of transactions T, the goal of association rule mining is to find all rules having
support ≥ minsup threshold
confidence ≥ minconf threshold
Brute-force approach:
List all possible association rules
Compute the support and confidence for each rule
Prune rules that fail the minsup and minconf thresholds
 Computationally prohibitive!

Mining Association Rules
Example of Rules:
{Milk,Diaper}  {Beer} (s=0.4, c=0.67)
{Milk,Beer}  {Diaper} (s=0.4, c=1.0)
{Diaper,Beer}  {Milk} (s=0.4, c=0.67)
{Beer}  {Milk,Diaper} (s=0.4, c=0.67)
{Diaper}  {Milk,Beer} (s=0.4, c=0.5)
{Milk}  {Diaper,Beer} (s=0.4, c=0.5)
Observations:
All the above rules are binary partitions of the same itemset:
{Milk, Diaper, Beer}
Rules originating from the same itemset have identical support but
can have different confidence
Thus, we may decouple the support and confidence requirements

TID

Items

Bread, Milk

Bread, Diaper, Beer, Eggs

Milk, Diaper, Beer, Coke

Bread, Milk, Diaper, Beer

Bread, Milk, Diaper, Coke

Mining Association Rules
Two-step approach:
Frequent Itemset Generation
Generate all itemsets whose support  minsup
Rule Generation
Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset

Frequent itemset generation is still computationally expensive

Frequent Itemset Generation
Given d items, there are 2d possible candidate itemsets

Frequent Itemset Generation
Brute-force approach:
Each itemset in the lattice is a candidate frequent itemset
Count the support of each candidate by scanning the database

Match each transaction against every candidate
Complexity ~ O(NMw) => Expensive since M = 2d !!!

N�
w�
M�
List of Candidates�

Computational Complexity
Given d unique items:
Total number of itemsets = 2d
Total number of possible association rules:
If d=6, R = 602 rules

Frequent Itemset Generation Strategies
Reduce the number of candidates (M)
Complete search: M=2d
Use pruning techniques to reduce M

Reduce the number of transactions (N)
Reduce size of N as the size of itemset increases
Used by DHP and vertical-based mining algorithms

Reduce the number of comparisons (NM)
Use efficient data structures to store the candidates or transactions
No need to match every candidate against every transaction

Reducing Number of Candidates
Apriori principle:
If an itemset is frequent, then all of its subsets must also be frequent

Apriori principle holds due to the following property of the support measure:

Support of an itemset never exceeds the support of its subsets
This is known as the anti-monotone property of support

Illustrating Apriori Principle
Found to be Infrequent
Pruned supersets

Illustrating Apriori Principle
Items (1-itemsets)
Pairs (2-itemsets)
(No need to generate
candidates involving Coke
or Eggs)
Triplets (3-itemsets)
Minimum Support = 3
If every subset is considered,
6C1 + 6C2 + 6C3 = 41
With support-based pruning,
6 + 6 + 1 = 13

Item
Count

Bread

Coke

Milk

Beer

Diaper

Eggs

Itemset
Count

{Bread,Milk}

{Bread,Beer}

{Bread,Diaper}

{Milk,Beer}

{Milk,Diaper}

{Beer,Diaper}

Itemset

Count

{Bread,Milk,Diaper}

Apriori Algorithm
Method:

Let k=1
Generate frequent itemsets of length 1
Repeat until no new frequent itemsets are identified
Generate length (k+1) candidate itemsets from length k frequent itemsets
Prune candidate itemsets containing subsets of length k that are infrequent
Count the support of each candidate by scanning the DB
Eliminate candidates that are infrequent, leaving only those that are frequent

Reducing Number of Comparisons
Candidate counting:
Scan the database of transactions to determine the support of each candidate itemset
To reduce the number of comparisons, store the candidates in a hash structure
Instead of matching each transaction against every candidate, match it against candidates contained in the hashed buckets

N�
k�
Buckets�
Hash Structure�

Generate Hash Tree
Suppose you have 15 candidate itemsets of length 3:
{1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4 5 8}, {1 5 9}, {1 3 6}, {2 3 4}, {5 6 7}, {3 4 5}, {3 5 6}, {3 5 7}, {6 8 9}, {3 6 7}, {3 6 8}
You need:
Hash function
Max leaf size: max number of itemsets stored in a leaf node (if number of candidate itemsets exceeds max leaf size, split the node)

2 3 4
5 6 7
1 4 5

1 3 6
1 2 4
4 5 7
1 2 5
4 5 8
1 5 9
3 4 5
3 5 6
3 5 7
6 8 9
3 6 7
3 6 8

1,4,7
2,5,8
3,6,9
Hash function

Association Rule Discovery: Hash tree
1,4,7
2,5,8
3,6,9
Hash Function
Candidate Hash Tree
Hash on 1, 4 or 7

1 5 9
1 4 5
1 3 6
3 4 5
3 6 7
3 6 8
3 5 6
3 5 7
6 8 9
2 3 4
5 6 7
1 2 4
4 5 7
1 2 5
4 5 8

Association Rule Discovery: Hash tree
1,4,7
2,5,8
3,6,9
Hash Function
Candidate Hash Tree

Hash on 2, 5 or 8

1 5 9
1 4 5
1 3 6
3 4 5
3 6 7
3 6 8
3 5 6
3 5 7
6 8 9
2 3 4
5 6 7
1 2 4
4 5 7
1 2 5
4 5 8

Association Rule Discovery: Hash tree
1,4,7
2,5,8
3,6,9
Hash Function
Candidate Hash Tree

Hash on 3, 6 or 9

1 5 9
1 4 5
1 3 6
3 4 5
3 6 7
3 6 8
3 5 6
3 5 7
6 8 9
2 3 4
5 6 7
1 2 4
4 5 7
1 2 5
4 5 8

Subset Operation
Given a transaction t, what are the possible subsets of size 3?

1 2 3 5 6�
Transaction, t�
2 3 5 6�
3 5 6�
2�
1�
5 6�
1 3�
3 5 6�
1 2�
6�
1 5�
5 6�
2 3�
6�
2 5�
5 6�
3�
1 2 3�1 2 5�1 2 6�
1 3 5�1 3 6�
1 5 6�
2 3 5�2 3 6�
2 5 6�
3 5 6�
Subsets of 3 items�
Level 1�
Level 2�
Level 3�
6�
3 5�

Subset Operation Using Hash Tree
transaction
1 4 5
3 6 7
3 6 8
3 5 6
3 5 7
6 8 9
2 3 4
5 6 7
1 2 4
4 5 7
1 2 5
4 5 8

1 5 9
1 3 6
3 4 5

1 2 3 5 6
1 +
2 3 5 6
3 5 6
2 +
5 6
3 +

2,5,8
3,6,9
Hash Function
1,4,7

Subset Operation Using Hash Tree

1 5 9
1 3 6
3 4 5
transaction
1 4 5
3 6 7
3 6 8
3 5 6
3 5 7
6 8 9
2 3 4
5 6 7
1 2 4
4 5 7
1 2 5
4 5 8

2,5,8
3,6,9
Hash Function
1,4,7
1 2 3 5 6
3 5 6
1 2 +
5 6
1 3 +
6
1 5 +
3 5 6
2 +
5 6
3 +
1 +
2 3 5 6

Subset Operation Using Hash Tree

1 5 9
1 3 6
3 4 5
transaction

Match transaction against 11 out of 15 candidates
1 4 5
3 6 7
3 6 8
3 5 6
3 5 7
6 8 9
2 3 4
5 6 7
1 2 4
4 5 7
1 2 5
4 5 8

2,5,8
3,6,9
Hash Function
1,4,7
1 2 3 5 6
3 5 6
1 2 +
5 6
1 3 +
6
1 5 +
3 5 6
2 +
5 6
3 +
1 +
2 3 5 6

Factors Affecting Complexity
Choice of minimum support threshold
lowering support threshold results in more frequent itemsets
this may increase number of candidates and max length of frequent itemsets
Dimensionality (number of items) of the data set
more space is needed to store support count of each item
if number of frequent items also increases, both computation and I/O costs may also increase
Size of database
since Apriori makes multiple passes, run time of algorithm may increase with number of transactions
Average transaction width
transaction width increases with denser data sets
This may increase max length of frequent itemsets and traversals of hash tree (number of subsets in a transaction increases with its width)

Compact Representation of Frequent Itemsets
Some itemsets are redundant because they have identical support as their supersets

Number of frequent itemsets

Need a compact representation

Maximal Frequent Itemset
Border
Infrequent Itemsets
Maximal Itemsets
An itemset is maximal frequent if none of its immediate supersets is frequent

null�
AB�
AC�
AD�
AE�
BC�
BD�
BE�
CD�
CE�
DE�
ABC�
ABD�
ABE�
ACD�
ACE�
ADE�
BCD�
BCE�
BDE�
CDE�
A�
B�
C�
D�
E�
ABCD�
ABCE�
ABDE�
ACDE�
BCDE�
ABCDE�

Closed Itemset
An itemset is closed if none of its immediate supersets has the same support as the itemset

Sheet1

TID Items

1 {A,B}

2 {B,C,D}

3 {A,B,C,D}

4 {A,B,D}

5 {A,B,C,D}

Sheet2

Sheet3

Sheet1

Itemset Support

{A} 4

{B} 5

{C} 3

{D} 4

{A,B} 4

{A,C} 2

{A,D} 3

{B,C} 3

{B,D} 4

{C,D} 3

Sheet2

Sheet3

Sheet1

Itemset Support

{A,B,C} 2

{A,B,D} 3

{A,C,D} 2

{B,C,D} 3

{A,B,C,D} 2

Sheet2

Sheet3

Maximal vs Closed Itemsets
Transaction Ids
Not supported by any transactions

Sheet1

TID Items

1 ABC

2 ABCD

3 BCE

4 ACDE

5 DE

Maximal vs Closed Frequent Itemsets
Minimum support = 2
# Closed = 9
# Maximal = 4
Closed and maximal
Closed but not maximal

Maximal vs Closed Itemsets

Frequent Itemsets�
Closed Frequent Itemsets�
Maximal Frequent Itemsets�

Alternative Methods for Frequent Itemset Generation
Traversal of Itemset Lattice
General-to-specific vs Specific-to-general

….�
Frequent itemset border�
null�
{a1,a2,…,an}�
(a) General-to-specific�
….�
null�
{a1,a2,…,an}�
Frequent itemset border�
(b) Specific-to-general�
….�
Frequent itemset border�
null�
{a1,a2,…,an}�
(c) Bidirectional�

Alternative Methods for Frequent Itemset Generation
Traversal of Itemset Lattice
Equivalent Classes

null�
AB�
AC�
AD�
null�
BC�
BD�
AB�
CD�
AC�
AD�
ABC�
ABD�
BC�
ACD�
BD�
CD�
BCD�
A�
B�
C�
A�
B�
C�
D�
D�
ABC�
ABD�
ACD�
BCD�
ABCD�
(a) Prefix tree�
(b) Suffix tree�
ABCD�

Alternative Methods for Frequent Itemset Generation
Traversal of Itemset Lattice
Breadth-first vs Depth-first

(a) Breadth first�
(b) Depth first�

Alternative Methods for Frequent Itemset Generation
Representation of Database
horizontal vs vertical data layout

Horizontal Data Layout�
Vertical Data Layout�

FP-growth Algorithm
Use a compressed representation of the database using an FP-tree
Once an FP-tree has been constructed, it uses a recursive divide-and-conquer approach to mine the frequent itemsets

FP-tree construction

null
A:1
B:1
null
A:1
B:1
B:1
C:1
D:1
After reading TID=1:
After reading TID=2:

Sheet1

TID Items

1 {A,B}

2 {B,C,D}

3 {A,C,D,E}

4 {A,D,E}

5 {A,B,C}

6 {A,B,C,D}

7 {B,C}

8 {A,B,C}

9 {A,B,D}

10 {B,C,E}

Sheet2

Sheet3

FP-Tree Construction

null
A:7
B:5
B:3
C:3
D:1

C:1
D:1
C:3
D:1
D:1
E:1
E:1
Pointers are used to assist frequent itemset generation
D:1
E:1
Transaction Database
Header table

Sheet1

TID Items

1 {A,B}

2 {B,C,D}

3 {A,C,D,E}

4 {A,D,E}

5 {A,B,C}

6 {A,B,C,D}

7 {B,C}

8 {A,B,C}

9 {A,B,D}

10 {B,C,E}

Sheet2

Sheet3

Sheet1

Item Pointer

Sheet2

Sheet3

FP-growth

null
A:7
B:5
B:1
C:1
D:1

C:1
D:1
C:3
D:1
D:1
Conditional Pattern base for D:
P = {(A:1,B:1,C:1),
(A:1,B:1),
(A:1,C:1),
(A:1),
(B:1,C:1)}
Recursively apply FP-growth on P
Frequent Itemsets found (with sup > 1):
AD, BD, CD, ACD, BCD
D:1

Tree Projection
Set enumeration tree:
Possible Extension: E(A) = {B,C,D,E}
Possible Extension: E(ABC) = {D,E}

Tree Projection
Items are listed in lexicographic order
Each node P stores the following information:
Itemset for node P
List of possible lexicographic extensions of P: E(P)
Pointer to projected database of its ancestor node
Bitvector containing information about which transactions in the projected database contain the itemset

Projected Database
Original Database:
Projected Database for node A:
For each transaction T, projected transaction at node A is T  E(A)

Sheet1

TID Items

1 {A,B}

2 {B,C,D}

3 {A,C,D,E}

4 {A,D,E}

5 {A,B,C}

6 {A,B,C,D}

7 {B,C}

8 {A,B,C}

9 {A,B,D}

10 {B,C,E}

Sheet2

Sheet3

Sheet1

TID Items

1 {B}

2 {}

3 {C,D,E}

4 {D,E}

5 {B,C}

6 {B,C,D}

7 {}

8 {B,C}

9 {B,D}

10 {}

Sheet2

Sheet3

ECLAT
For each item, store a list of transaction ids (tids)

TID-list

ECLAT
Determine support of any k-itemset by intersecting tid-lists of two of its (k-1) subsets.

3 traversal approaches:
top-down, bottom-up and hybrid
Advantage: very fast support counting
Disadvantage: intermediate tid-lists may become too large for memory



Sheet1

TID Items A B C D E

1 A,B,E 1 1 2 2 1

2 B,C,D 4 2 3 4 3

3 C,E 5 5 4 5 6

4 A,C,D 6 7 8 9

5 A,B,C,D 7 8 9

6 A,E 8 10

7 A,B 9

8 A,B,C

9 A,C,D

10 B A B

1 1

4 2

5 5

6 7

7 8

8 10

Sheet2

Sheet3

Sheet1

TID Items A B C D E

1 A,B,E 1 1 2 2 1

2 B,C,D 4 2 3 4 3

3 C,E 5 5 4 5 6

4 A,C,D 6 7 8 9

5 A,B,C,D 7 8 9

6 A,E 8 10

7 A,B 9

8 A,B,C

9 A,C,D

10 B A B

1 1

4 2

5 5

6 7

7 8

8 10

Sheet2

Sheet3

Sheet1

TID Items A B C D E

1 A,B,E 1 1 2 2 1

2 B,C,D 4 2 3 4 3

3 C,E 5 5 4 5 6

4 A,C,D 6 7 8 9

5 A,B,C,D 7 8 9

6 A,E 8 10

7 A,B 9

8 A,B,C

9 A,C,D

10 B A B AB

1 1 1

4 2 5

5 5 7

6 7 8

7 8

8 10

Sheet2

Sheet3

Rule Generation
Given a frequent itemset L, find all non-empty subsets f  L such that f  L – f satisfies the minimum confidence requirement
If {A,B,C,D} is a frequent itemset, candidate rules:
ABC D, ABD C, ACD B, BCD A,
A BCD, B ACD, C ABD, D ABC
AB CD, AC  BD, AD  BC, BC AD,
BD AC, CD AB,
If |L| = k, then there are 2k – 2 candidate association rules (ignoring L   and   L)

Rule Generation
How to efficiently generate rules from frequent itemsets?
In general, confidence does not have an anti-monotone property
c(ABC D) can be larger or smaller than c(AB D)
But confidence of rules generated from the same itemset has an anti-monotone property
e.g., L = {A,B,C,D}:

c(ABC  D)  c(AB  CD)  c(A  BCD)

Confidence is anti-monotone w.r.t. number of items on the RHS of the rule

Rule Generation for Apriori Algorithm
Lattice of rules
Low Confidence Rule
Pruned Rules

ABCD=>{ }�
BC=>AD�
BD=>AC�
CD=>AB�
AD=>BC�
AC=>BD�
AB=>CD�
D=>ABC�
C=>ABD�
B=>ACD�
A=>BCD�
ACD=>B�
ABD=>C�
ABC=>D�
BCD=>A�

Rule Generation for Apriori Algorithm
Candidate rule is generated by merging two rules that share the same prefix
in the rule consequent
join(CD=>AB,BD=>AC)
would produce the candidate
rule D => ABC
Prune rule D=>ABC if its
subset AD=>BC does not have
high confidence

Effect of Support Distribution
Many real data sets have skewed support distribution
Support distribution of a retail data set

Effect of Support Distribution
How to set the appropriate minsup threshold?
If minsup is set too high, we could miss itemsets involving interesting rare items (e.g., expensive products)

If minsup is set too low, it is computationally expensive and the number of itemsets is very large
Using a single minimum support threshold may not be effective

Multiple Minimum Support
How to apply multiple minimum supports?
MS(i): minimum support for item i
e.g.: MS(Milk)=5%, MS(Coke) = 3%,
MS(Broccoli)=0.1%, MS(Salmon)=0.5%
MS({Milk, Broccoli}) = min (MS(Milk), MS(Broccoli))
= 0.1%

Challenge: Support is no longer anti-monotone
Suppose: Support(Milk, Coke) = 1.5% and
Support(Milk, Coke, Broccoli) = 0.5%
{Milk,Coke} is infrequent but {Milk,Coke,Broccoli} is frequent

Multiple Minimum Support

Multiple Minimum Support (Liu 1999)
Order the items according to their minimum support (in ascending order)
e.g.: MS(Milk)=5%, MS(Coke) = 3%,
MS(Broccoli)=0.1%, MS(Salmon)=0.5%
Ordering: Broccoli, Salmon, Coke, Milk
Need to modify Apriori such that:
L1 : set of frequent items
F1 : set of items whose support is  MS(1)
where MS(1) is mini( MS(i) )
C2 : candidate itemsets of size 2 is generated from F1
instead of L1

Multiple Minimum Support (Liu 1999)
Modifications to Apriori:
In traditional Apriori,
A candidate (k+1)-itemset is generated by merging two
frequent itemsets of size k
The candidate is pruned if it contains any infrequent subsets
of size k
Pruning step has to be modified:
Prune only if subset contains the first item
e.g.: Candidate={Broccoli, Coke, Milk} (ordered according to
minimum support)
{Broccoli, Coke} and {Broccoli, Milk} are frequent but
{Coke, Milk} is infrequent
Candidate is not pruned because {Coke,Milk} does not contain
the first item, i.e., Broccoli.

Pattern Evaluation
Association rule algorithms tend to produce too many rules
many of them are uninteresting or redundant
Redundant if {A,B,C}  {D} and {A,B}  {D}
have same support & confidence

Interestingness measures can be used to prune/rank the derived patterns
In the original formulation of association rules, support & confidence are the only measures used

Application of Interestingness Measure
Interestingness Measures

Computing Interestingness Measure
Given a rule X  Y, information needed to compute rule interestingness can be obtained from a contingency table
Contingency table for X  Y
Used to define various measures
support, confidence, lift, Gini,
J-measure, etc.

Y Y
X f11 f10 f1+
X f01 f00 fo+
f+1 f+0 |T|

f11: support of X and Y
f10: support of X and Y
f01: support of X and Y
f00: support of X and Y

Drawback of Confidence

Coffee
Coffee
Tea 15 5 20
Tea 75 5 80
90 10 100

Association Rule: Tea  Coffee
Confidence= P(Coffee|Tea) = 0.75
but P(Coffee) = 0.9
Although confidence is high, rule is misleading
P(Coffee|Tea) = 0.9375

Statistical Independence
Population of 1000 students
600 students know how to swim (S)
700 students know how to bike (B)
420 students know how to swim and bike (S,B)
P(SB) = 420/1000 = 0.42
P(S)  P(B) = 0.6  0.7 = 0.42
P(SB) = P(S)  P(B) => Statistical independence
P(SB) > P(S)  P(B) => Positively correlated
P(SB) < P(S)  P(B) => Negatively correlated

Statistical-based Measures
Measures that take into account statistical dependence

Example: Lift/Interest
Association Rule: Tea  Coffee
Confidence= P(Coffee|Tea) = 0.75
but P(Coffee) = 0.9
Lift = 0.75/0.9= 0.8333 (< 1, therefore is negatively associated) Coffee Coffee Tea 15 5 20 Tea 75 5 80 90 10 100 Drawback of Lift & Interest Statistical independence: If P(X,Y)=P(X)P(Y) => Lift = 1
Y Y
X 10 0 10
X 0 90 90
10 90 100

Y Y
X 90 0 90
X 0 10 10
90 10 100

There are lots of measures proposed in the literature
Some measures are good for certain applications, but not for others
What criteria should we use to determine whether a measure is good or bad?
What about Apriori-style support based pruning? How does it affect these measures?

Properties of A Good Measure
Piatetsky-Shapiro:
3 properties a good measure M must satisfy:
M(A,B) = 0 if A and B are statistically independent
M(A,B) increase monotonically with P(A,B) when P(A) and P(B) remain unchanged
M(A,B) decreases monotonically with P(A) [or P(B)] when P(A,B) and P(B) [or P(A)] remain unchanged

Comparing Different Measures
10 examples of contingency tables:
Rankings of contingency tables using various measures:

Sheet1

Example

E1 8123 83 424 1370 10000 1.1581671343

E2 8330 2 622 1046 10000 1.116800672

E3 9481 94 127 298 10000 1.030581565

E4 3954 3080 5 2961 10000 1.4198707063

E5 2886 1363 1320 4431 10000 1.6148802655

E6 1500 2000 500 6000 10000 2.1428571429

E7 4000 2000 1000 3000 10000 1.3333333333

E8 4000 2000 2000 2000 10000 1.1111111111

E9 1720 7121 5 1154 10000 1.127815235

E10 61 2483 4 7452 10000 3.6889211418

Sheet2

Sheet3

Property under Variable Permutation
Does M(A,B) = M(B,A)?
Symmetric measures:
support, lift, collective strength, cosine, Jaccard, etc
Asymmetric measures:
confidence, conviction, Laplace, J-measure, etc

Property under Row/Column Scaling
Grade-Gender Example (Mosteller, 1968):
Mosteller:
Underlying association should be independent of
the relative number of male and female students
in the samples
2x
10x
Male Female
High 2 3 5
Low 1 4 5
3 7 10

Male Female
High 4 30 34
Low 2 40 42
6 70 76

Property under Inversion Operation
Transaction 1
Transaction N
.
.
.
.
.

Example: -Coefficient
-coefficient is analogous to correlation coefficient for continuous variables
 Coefficient is the same for both tables
Y Y
X 60 10 70
X 10 20 30
70 30 100

Y Y
X 20 10 30
X 10 60 70
30 70 100

Property under Null Addition
Invariant measures:
support, cosine, Jaccard, etc
Non-invariant measures:
correlation, Gini, mutual information, odds ratio, etc

Different Measures have Different Properties

Sheet1

Symbol Measure Range P1 P2 P3 O1 O2 O3 O3′ O4

F Correlation -1 … 0 … 1 Yes Yes Yes Yes No Yes Yes No

l Lambda 0 … 1 Yes No No Yes No No* Yes No

a Odds ratio Yes* Yes Yes Yes Yes Yes* Yes No

Q Yule’s Q -1 … 0 … 1 Yes Yes Yes Yes Yes Yes Yes No

Y Yule’s Y -1 … 0 … 1 Yes Yes Yes Yes Yes Yes Yes No

k Cohen’s -1 … 0 … 1 Yes Yes Yes Yes No No Yes No

M Mutual Information 0 … 1 Yes Yes Yes Yes No No* Yes No

J J-Measure 0 … 1 Yes No No No No No No No

G Gini Index 0 … 1 Yes No No No No No* Yes No

s Support 0 … 1 No Yes No Yes No No No No

c Confidence 0 … 1 No Yes No Yes No No No Yes

L Laplace 0 … 1 No Yes No Yes No No No No

V Conviction No Yes No Yes** No No Yes No

I Interest Yes* Yes Yes Yes No No No No

IS IS (cosine) 0 .. 1 No Yes Yes Yes No No No Yes

PS Piatetsky-Shapiro’s -0.25 … 0 … 0.25 Yes Yes Yes Yes No Yes Yes No

F Certainty factor -1 … 0 … 1 Yes Yes Yes No No No Yes No

AV Added value 0.5 … 1 … 1 Yes Yes Yes No No No No No

S Collective strength No Yes Yes Yes No Yes* Yes No

z Jaccard 0 .. 1 No Yes Yes Yes No No No Yes

K Klosgen’s Yes Yes Yes No No No No No

where:

O1: Symmetry under variable permutation

O2: Marginal invariance

O3: Antisymmetry under row or column permutation

O3′: Inversion invariance

O4: Null invariance

Yes*: Yes if measure is normalized

Yes**: Yes if measure is symmetrized by taking max(M(A,B),M(B,A))

No*: Symmetry under row or column permutation

Sheet2

Sheet3

–

MBD01E1AF4B.unknown

Support-based Pruning
Most of the association rule mining algorithms use support measure to prune rules and itemsets
Study effect of support pruning on correlation of itemsets
Generate 10000 random contingency tables
Compute support and pairwise correlation for each table
Apply support-based pruning and examine the tables that are removed

Effect of Support-based Pruning

Chart1

-1

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Correlation
All Itempairs
13
95
172
282
359
503
648
716
830
867
908
915
881
759
602
532
370
259
194
84
11

Chart2

-1

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Correlation
Support > 0.9
0
0
4
7
9
7
25
35
42
86
92
75
54
28
14
16
3
5
0
0
0

Chart3

-1

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Correlation
Support > 0.7
7
29
70
95
150
182
267
296
336
383
375
370
332
272
202
161
89
49
41
17
4

Chart4

-1

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Correlation
Support > 0.5
13
92
164
265
327
428
542
572
640
665
664
678
645
540
406
355
237
139
118
51
5

Chart5

-1

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Correlation
Support < 0.01 10 21 22 18 28 28 46 40 33 25 21 7 2 1 0 0 0 0 0 0 0 Chart6 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Correlation Support < 0.03 13 71 68 66 67 84 94 93 93 77 56 30 14 5 3 0 0 0 0 0 0 Chart7 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Correlation Support < 0.05 13 91 104 106 107 128 141 151 161 136 99 67 45 15 8 4 2 1 0 0 0 Sheet1 Correlation All > 0.9 > 0.7 > 0.5 < 0.01 < 0.03 < 0.05 -1 13 0 7 13 10 13 13 -0.9 95 0 29 92 21 71 91 -0.8 172 4 70 164 22 68 104 -0.7 282 7 95 265 18 66 106 -0.6 359 9 150 327 28 67 107 -0.5 503 7 182 428 28 84 128 -0.4 648 25 267 542 46 94 141 -0.3 716 35 296 572 40 93 151 -0.2 830 42 336 640 33 93 161 -0.1 867 86 383 665 25 77 136 0 908 92 375 664 21 56 99 0.1 915 75 370 678 7 30 67 0.2 881 54 332 645 2 14 45 0.3 759 28 272 540 1 5 15 0.4 602 14 202 406 0 3 8 0.5 532 16 161 355 0 0 4 0.6 370 3 89 237 0 0 2 0.7 259 5 49 139 0 0 1 0.8 194 0 41 118 0 0 0 0.9 84 0 17 51 0 0 0 1 11 0 4 5 0 0 0 Sheet2 Sheet3 Effect of Support-based Pruning Support-based pruning eliminates mostly negatively correlated itemsets Chart1 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Correlation All Itempairs 13 95 172 282 359 503 648 716 830 867 908 915 881 759 602 532 370 259 194 84 11 Chart2 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Correlation Support > 0.9
0
0
4
7
9
7
25
35
42
86
92
75
54
28
14
16
3
5
0
0
0

Chart3

-1

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Correlation
Support > 0.7
7
29
70
95
150
182
267
296
336
383
375
370
332
272
202
161
89
49
41
17
4

Chart4

-1

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Correlation
Support > 0.5
13
92
164
265
327
428
542
572
640
665
664
678
645
540
406
355
237
139
118
51
5

Chart5

-1

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Correlation
Support < 0.01 10 21 22 18 28 28 46 40 33 25 21 7 2 1 0 0 0 0 0 0 0 Chart6 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Correlation Support < 0.03 13 71 68 66 67 84 94 93 93 77 56 30 14 5 3 0 0 0 0 0 0 Chart7 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Correlation Support < 0.05 13 91 104 106 107 128 141 151 161 136 99 67 45 15 8 4 2 1 0 0 0 Sheet1 Correlation All > 0.9 > 0.7 > 0.5 < 0.01 < 0.03 < 0.05 -1 13 0 7 13 10 13 13 -0.9 95 0 29 92 21 71 91 -0.8 172 4 70 164 22 68 104 -0.7 282 7 95 265 18 66 106 -0.6 359 9 150 327 28 67 107 -0.5 503 7 182 428 28 84 128 -0.4 648 25 267 542 46 94 141 -0.3 716 35 296 572 40 93 151 -0.2 830 42 336 640 33 93 161 -0.1 867 86 383 665 25 77 136 0 908 92 375 664 21 56 99 0.1 915 75 370 678 7 30 67 0.2 881 54 332 645 2 14 45 0.3 759 28 272 540 1 5 15 0.4 602 14 202 406 0 3 8 0.5 532 16 161 355 0 0 4 0.6 370 3 89 237 0 0 2 0.7 259 5 49 139 0 0 1 0.8 194 0 41 118 0 0 0 0.9 84 0 17 51 0 0 0 1 11 0 4 5 0 0 0 Sheet2 Sheet3 Chart1 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Correlation All Itempairs 13 95 172 282 359 503 648 716 830 867 908 915 881 759 602 532 370 259 194 84 11 Chart2 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Correlation Support > 0.9
0
0
4
7
9
7
25
35
42
86
92
75
54
28
14
16
3
5
0
0
0

Chart3

-1

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Correlation
Support > 0.7
7
29
70
95
150
182
267
296
336
383
375
370
332
272
202
161
89
49
41
17
4

Chart4

-1

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Correlation
Support > 0.5
13
92
164
265
327
428
542
572
640
665
664
678
645
540
406
355
237
139
118
51
5

Chart5

-1

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Correlation
Support < 0.01 10 21 22 18 28 28 46 40 33 25 21 7 2 1 0 0 0 0 0 0 0 Chart6 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Correlation Support < 0.03 13 71 68 66 67 84 94 93 93 77 56 30 14 5 3 0 0 0 0 0 0 Chart7 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Correlation Support < 0.05 13 91 104 106 107 128 141 151 161 136 99 67 45 15 8 4 2 1 0 0 0 Sheet1 Correlation All > 0.9 > 0.7 > 0.5 < 0.01 < 0.03 < 0.05 -1 13 0 7 13 10 13 13 -0.9 95 0 29 92 21 71 91 -0.8 172 4 70 164 22 68 104 -0.7 282 7 95 265 18 66 106 -0.6 359 9 150 327 28 67 107 -0.5 503 7 182 428 28 84 128 -0.4 648 25 267 542 46 94 141 -0.3 716 35 296 572 40 93 151 -0.2 830 42 336 640 33 93 161 -0.1 867 86 383 665 25 77 136 0 908 92 375 664 21 56 99 0.1 915 75 370 678 7 30 67 0.2 881 54 332 645 2 14 45 0.3 759 28 272 540 1 5 15 0.4 602 14 202 406 0 3 8 0.5 532 16 161 355 0 0 4 0.6 370 3 89 237 0 0 2 0.7 259 5 49 139 0 0 1 0.8 194 0 41 118 0 0 0 0.9 84 0 17 51 0 0 0 1 11 0 4 5 0 0 0 Sheet2 Sheet3 Chart1 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Correlation All Itempairs 13 95 172 282 359 503 648 716 830 867 908 915 881 759 602 532 370 259 194 84 11 Chart2 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Correlation Support > 0.9
0
0
4
7
9
7
25
35
42
86
92
75
54
28
14
16
3
5
0
0
0

Chart3

-1

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Correlation
Support > 0.7
7
29
70
95
150
182
267
296
336
383
375
370
332
272
202
161
89
49
41
17
4

Chart4

-1

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Correlation
Support > 0.5
13
92
164
265
327
428
542
572
640
665
664
678
645
540
406
355
237
139
118
51
5

Chart5

-1

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Correlation
Support < 0.01 10 21 22 18 28 28 46 40 33 25 21 7 2 1 0 0 0 0 0 0 0 Chart6 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Correlation Support < 0.03 13 71 68 66 67 84 94 93 93 77 56 30 14 5 3 0 0 0 0 0 0 Chart7 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Correlation Support < 0.05 13 91 104 106 107 128 141 151 161 136 99 67 45 15 8 4 2 1 0 0 0 Sheet1 Correlation All > 0.9 > 0.7 > 0.5 < 0.01 < 0.03 < 0.05 -1 13 0 7 13 10 13 13 -0.9 95 0 29 92 21 71 91 -0.8 172 4 70 164 22 68 104 -0.7 282 7 95 265 18 66 106 -0.6 359 9 150 327 28 67 107 -0.5 503 7 182 428 28 84 128 -0.4 648 25 267 542 46 94 141 -0.3 716 35 296 572 40 93 151 -0.2 830 42 336 640 33 93 161 -0.1 867 86 383 665 25 77 136 0 908 92 375 664 21 56 99 0.1 915 75 370 678 7 30 67 0.2 881 54 332 645 2 14 45 0.3 759 28 272 540 1 5 15 0.4 602 14 202 406 0 3 8 0.5 532 16 161 355 0 0 4 0.6 370 3 89 237 0 0 2 0.7 259 5 49 139 0 0 1 0.8 194 0 41 118 0 0 0 0.9 84 0 17 51 0 0 0 1 11 0 4 5 0 0 0 Sheet2 Sheet3 Effect of Support-based Pruning Investigate how support-based pruning affects other measures Steps: Generate 10000 contingency tables Rank each table according to the different measures Compute the pair-wise correlation between the measures Effect of Support-based Pruning Without Support Pruning (All Pairs) Red cells indicate correlation between the pair of measures > 0.85
40.14% pairs have correlation > 0.85

Scatter Plot between Correlation & Jaccard Measure

Effect of Support-based Pruning
0.5%  support  50%
61.45% pairs have correlation > 0.85

Scatter Plot between Correlation & Jaccard Measure:

Effect of Support-based Pruning
0.5%  support  30%
76.42% pairs have correlation > 0.85

Scatter Plot between Correlation & Jaccard Measure

Subjective Interestingness Measure
Objective measure:
Rank patterns based on statistics computed from data
e.g., 21 measures of association (support, confidence, Laplace, Gini, mutual information, Jaccard, etc).

Subjective measure:
Rank patterns according to user’s interpretation
A pattern is subjectively interesting if it contradicts the
expectation of a user (Silberschatz & Tuzhilin)
A pattern is subjectively interesting if it is actionable
(Silberschatz & Tuzhilin)

Interestingness via Unexpectedness
Need to model expectation of users (domain knowledge)

Need to combine expectation of users with evidence from data (i.e., extracted patterns)
+
Pattern expected to be frequent
–
Pattern expected to be infrequent
Pattern found to be frequent
Pattern found to be infrequent
+
–

Expected Patterns
–
+
Unexpected Patterns

Interestingness via Unexpectedness
Web Data (Cooley et al 2001)
Domain knowledge in the form of site structure
Given an itemset F = {X1, X2, …, Xk} (Xi : Web pages)
L: number of links connecting the pages
lfactor = L / (k  k-1)
cfactor = 1 (if graph is connected), 0 (disconnected graph)
Structure evidence = cfactor  lfactor
Usage evidence

Use Dempster-Shafer theory to combine domain knowledge and evidence from data

TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke

Beer
}
Diaper
,
Milk
{
Þ
4
.
0
5
2
|
T
|
)
Beer
Diaper,
,
Milk
(
=
=
=
s
s
67
.
0
3
2
)
Diaper
,
Milk
(
)
Beer
Diaper,
Milk,
(
=
=
=
s
s
c
null
AB
AC
AD
AE
BC
BD
BE
CD
CE
DE
A
B
C
D
E
ABC
ABD
ABE
ACD
ACE
ADE
BCD
BCE
BDE
CDE
ABCD
ABCE
ABDE
ACDE
BCDE
ABCDE
TID Items
1 Bread, Milk

2 Bread, Diaper, Beer, Eggs

3 Milk, Diaper, Beer, Coke

4 Bread, Milk, Diaper, Beer

5 Bread, Milk, Diaper, Coke

N
Transactions
List of
Candidates
M
w
1
2
3
1
1
1
1
+
–
=
ú
û
ù
ê
ë
é
÷
ø
ö
ç
è
æ
–
´
÷
ø
ö
ç
è
æ
=
+
–
=
–
=
å
å
d
d
d
k
k
d
j
j
k
d
k
d
R
)
(
)
(
)
(
:
,
Y
s
X
s
Y
X
Y
X
³
Þ
Í
”
null
ABACADAEBCBDBECDCEDE
ABCDE
ABCABDABEACDACEADEBCDBCEBDECDE
ABCDABCEABDEACDEBCDE
ABCDE
null
ABACADAEBCBDBECDCEDE
ABCDE
ABCABDABEACDACEADEBCDBCEBDECDE
ABCDABCEABDEACDEBCDE
ABCDE
Item
Count
Bread
4
Coke
2
Milk
4
Beer
3
Diaper
4
Eggs
1
Itemset
Count
{
Bread,Milk}
3
{
Bread,Beer}
2
{
Bread,Diaper}
3
{
Milk,Beer}
2
{
Milk,Diaper}
3
{
Beer,Diaper}
3
Itemset

Count

{Bread,Milk,Diaper}

TID Items
1 Bread, Milk

2 Bread, Diaper, Beer, Eggs

3 Milk, Diaper, Beer, Coke

4 Bread, Milk, Diaper, Beer

5 Bread, Milk, Diaper, Coke

N
Transactions
Hash Structure
k
Buckets
1 2 3 5 6
Transaction, t
2 3 5 613 5 62
5 61 33 5 61 261 55 62 362 5
5 63
1 2 3
1 2 5
1 2 6
1 3 5
1 3 6
1 5 6
2 3 5
2 3 6
2 5 63 5 6
Subsets of 3 items
Level 1
Level 2
Level 3
63 5
TIDA1A2A3A4A5A6A7A8A9A10B1B2B3B4B5B6B7B8B9B10C1C2C3C4C5C6C7C8C9C10
1
1111111111
00000000000000000000
2
1111111111
00000000000000000000
3
1111111111
00000000000000000000
4
1111111111
00000000000000000000
5
1111111111
00000000000000000000
60000000000
1111111111
0000000000
70000000000
1111111111
0000000000
80000000000
1111111111
0000000000
90000000000
1111111111
0000000000
100000000000
1111111111
0000000000
1100000000000000000000
1111111111
1200000000000000000000
1111111111
1300000000000000000000
1111111111
1400000000000000000000
1111111111
1500000000000000000000
1111111111
å
=
÷
ø
ö
ç
è
æ
´
=
10
1
10
3
k
k
null
ABACADAEBCBDBECDCEDE
ABCDE
ABCABDABEACDACEADEBCDBCEBDECDE
ABCDABCEABDEACDEBCDE
ABCD
E
TIDItems
1{A,B}
2{B,C,D}
3{A,B,C,D}
4{A,B,D}
5{A,B,C,D}
ItemsetSupport
{A}4
{B}5
{C}3
{D}4
{A,B}4
{A,C}2
{A,D}3
{B,C}3
{B,D}4
{C,D}3
ItemsetSupport
{A,B,C}2
{A,B,D}3
{A,C,D}2
{B,C,D}3
{A,B,C,D}2
TIDItems
1ABC
2ABCD
3BCE
4ACDE
5DE
null
AB
AC
AD
AE
BC
BD
BE
CD
CE
DE
A
B
C
D
E
ABC
ABD
ABE
ACD
ACE
ADE
BCD
BCE
BDE
CDE
ABCD
ABCE
ABDE
ACDE
BCDE
ABCDE
124
123
1234
245
345
12
124
24
4
123
2
3
24
34
45
12
2
24
4
4
2
3
4
2
4
null
AB
AC
AD
AE
BC
BD
BE
CD
CE
DE
A
B
C
D
E
ABC
ABD
ABE
ACD
ACE
ADE
BCD
BCE
BDE
CDE
ABCD
ABCE
ABDE
ACDE
BCDE
ABCDE
124
123
1234
245
345
12
124
24
4
123
2
3
24
34
45
12
2
24
4
4
2
3
4
2
4
Frequent
Itemsets
Closed
Frequent
Itemsets
Maximal
Frequent
Itemsets
Frequent
itemset
border
null
{a
1
,a
2
,…,a
n
}
(a) General-to-specific
null
{a
1
,a
2
,…,a
n
}
Frequent
itemset
border
(b) Specific-to-general
..
..
..
..
Frequent
itemset
border
null
{a
1
,a
2
,…,a
n
}
(c) Bidirectional
..
..
null
ABACADBCBD
CD
AB
C
D
ABCABDACD
BCD
ABCD
null
ABAC
ADBCBDCD
AB
C
D
ABC
ABDACDBCD
ABCD
(a) Prefix tree(b) Suffix tree
(a) Breadth first(b) Depth first
TIDItems
1A,B,E
2B,C,D
3C,E
4A,C,D
5A,B,C,D
6A,E
7A,B
8A,B,C
9A,C,D
10B
Horizontal
Data Layout
ABCDE
11221
42343
55456
6789
789
810
9
Vertical Data Layout
TIDItems
1{A,B}
2{B,C,D}
3{A,C,D,E}
4{A,D,E}
5{A,B,C}
6{A,B,C,D}
7{B,C}
8{A,B,C}
9{A,B,D}
10{B,C,E}
ItemPointer
A
B
C
D
E
null
AB
AC
AD
AE
BC
BD
BE
CD
CE
DE
A
B
C
D
E
ABC
ABD
ABE
ACD
ACE
ADE
BCD
BCE
BDE
CDE
ABCD
ABCE
ABDE
ACDE
BCDE
ABCDE
TIDItems
1{B}
2{}
3{C,D,E}
4{D,E}
5{B,C}
6{B,C,D}
7{}
8{B,C}
9{B,D}
10{}
TID
Items
1
A,B,E
2
B,C,D
3
C,E
4
A,C,D
5
A,B,C,D
6
A,E
7
A,B
8
A,B,C
9
A,C,D
10
B
Horizontal
Data Layout
A
B
C
D
E
1
1
2
2
1
4
2
3
4
3
5
5
4
5
6
6
7
8
9
7
8
9
8
10
9
Vertical Data Layout
A
1
4
5
6
7
8
9
B
1
2
5
7
8
10
AB
1
5
7
8
ABCD=>{ }
BCD=>AACD=>BABD=>CABC=>D
BC=>ADBD=>ACCD=>ABAD=>BCAC=>BDAB=>CD
D=>ABCC=>ABDB=>ACDA=>BCD
ABCD=>{ }
BCD=>AACD=>BABD=>CABC=>D
BC=>ADBD=>ACCD=>ABAD=>BCAC=>BDAB=>CD
D=>ABCC=>ABDB=>ACDA=>BCD
BD=>
AC
CD=>
AB
D=>
ABC
A
Item
MS(I)
Sup(I)
A
0.10%
0.25%
B
0.20%
0.26%
C
0.30%
0.29%
D
0.50%
0.05%
E
3%
4.20%
B
C
D
E
AB
AC
AD
AE
BC
BD
BE
CD
CE
DE
ABC
ABD
ABE
ACD
ACE
ADE
BCD
BCE
BDE
CDE
A
B
C
D
E
AB
AC
AD
AE
BC
BD
BE
CD
CE
DE
ABC
ABD
ABE
ACD
ACE
ADE
BCD
BCE
BDE
CDE
Item
MS(I)
Sup(I)
A
0.10%
0.25%
B
0.20%
0.26%
C
0.30%
0.29%
D
0.50%
0.05%
E
3%
4.20%
Featur
e
Prod
uct
Prod
uct
Prod
uct
Prod
uct
Prod
uct
Prod
uct
Prod
uct
Prod
uct
Prod
uct
Prod
uct
Featur
e
Featur
e
Featur
e
Featur
e
Featur
e
Featur
e
Featur
e
Featur
e
Featur
e
Selection
Preprocessing
Mining
Postprocessing
Data
Selected
Data
Preprocessed
Data
Patterns
Knowledge
)]
(
1
)[
(
)]
(
1
)[
(
)
(
)
(
)
,
(
)
(
)
(
)
,
(
)
(
)
(
)
,
(
)
(
)
|
(
Y
P
Y
P
X
P
X
P
Y
P
X
P
Y
X
P
t
coefficien
Y
P
X
P
Y
X
P
PS
Y
P
X
P
Y
X
P
Interest
Y
P
X
Y
P
Lift
–
–
–
=
–
–
=
=
=
f
10
)
1
.
0
)(
1
.
0
(
1
.
0
=
=
Lift
11
.
1
)
9
.
0
)(
9
.
0
(
9
.
0
=
=
Lift
Example
f
11
f
10
f
01
f
00
E18123834241370
E2833026221046
E3948194127298
E43954308052961
E52886136313204431
E6150020005006000
E74000200010003000
E84000200020002000
E91720712151154
E1061248347452

1
0
0
0
0
0
0
0
0
1
0
0
0
0
1
0
0
0
0
0
0
1
1
1
1
1
1
1
1
0
1
1
1
1
0
1
1
1
1
1
A
B
C
D
(a)
(b)
0
1
1
1
1
1
1
1
1
0
0
0
0
0
1
0
0
0
0
0
(c)
E
F
5238
.
0
3
.
0
7
.
0
3
.
0
7
.
0
7
.
0
7
.
0
6
.
0
=
´
´
´
´
–
=
f
5238
.
0
3
.
0
7
.
0
3
.
0
7
.
0
3
.
0
3
.
0
2
.
0
=
´
´
´
´
–
=
f

s + k

SymbolMeasureRangeP1P2P3O1O2O3O3’O4

Correlation-1 … 0 … 1YesYesYesYesNoYesYesNo

Lambda0 … 1YesNoNoYesNoNo*YesNo

Odds ratio0 … 1 … Yes*YesYesYesYesYes*YesNo
Q
Yule’s Q-1 … 0 … 1YesYesYesYesYesYesYesNo
Y
Yule’s Y-1 … 0 … 1YesYesYesYesYesYesYesNo

Cohen’s-1 … 0 … 1YesYesYesYesNoNoYesNo
M
Mutual Information0 … 1YesYesYesYesNoNo*YesNo
J
J-Measure0 … 1YesNoNoNoNoNoNoNo
G
Gini Index0 … 1YesNoNoNoNoNo*YesNo
s
Support0 … 1NoYesNoYesNoNoNoNo
c
Confidence0 … 1NoYesNoYesNoNoNoYes
L
Laplace0 … 1NoYesNoYesNoNoNoNo
V
Conviction0.5 … 1 … NoYesNoYes**NoNoYesNo
I
Interest0 … 1 … Yes*YesYesYesNoNoNoNo
IS
IS (cosine)0 .. 1NoYesYesYesNoNoNoYes
PS
Piatetsky-Shapiro’s-0.25 … 0 … 0.25YesYesYesYesNoYesYesNo
F
Certainty factor-1 … 0 … 1YesYesYesNoNoNoYesNo
AV
Added value0.5 … 1 … 1YesYesYesNoNoNoNoNo
S
Collective strength0 … 1 … NoYesYesYesNoYes*YesNo

Jaccard0 .. 1NoYesYesYesNoNoNoYes
K
Klosgen’sYesYesYesNoNoNoNoNo
33
2
0
3
1
321
3
2



















All Itempairs
0
100
200
300
400
500
600
700
800
900
1000
-1
-0.9-0.8-0.7-0.6-0.5-0.4-0.3-0.2-0.1
0
0.10.20.30.40.50.60.70.80.9
1
Correlation
Support < 0.01 0 50 100 150 200 250 300 -1 -0.9-0.8-0.7-0.6-0.5-0.4-0.3-0.2-0.1 0 0.10.20.30.40.50.60.70.80.9 1 Correlation Support < 0.03 0 50 100 150 200 250 300 -1 -0.9-0.8-0.7-0.6-0.5-0.4-0.3-0.2-0.1 0 0.10.20.30.40.50.60.70.80.9 1 Correlation Support < 0.05 0 50 100 150 200 250 300 -1 -0.9-0.8-0.7-0.6-0.5-0.4-0.3-0.2-0.1 0 0.10.20.30.40.50.60.70.80.9 1 Correlation All Pairs (40.14%) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Conviction Odds ratio Col Strength Correlation Interest PS CF Yule Y Reliability Kappa Klosgen Yule Q Confidence Laplace IS Support Jaccard Lambda Gini J-measure Mutual Info -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Correlation Jaccard 0.005 <= support <= 0.500 (61.45%) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Interest Conviction Odds ratio Col Strength Laplace Confidence Correlation Klosgen Reliability PS Yule Q CF Yule Y Kappa IS Jaccard Support Lambda Gini J-measure Mutual Info -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Correlation Jaccard 0.005 <= support <= 0.300 (76.42%) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Support Interest Reliability Conviction Yule Q Odds ratio Confidence CF Yule Y Kappa Correlation Col Strength IS Jaccard Laplace PS Klosgen Lambda Mutual Info Gini J-measure -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Correlation Jaccard ) ... ( ) ... ( 2 1 2 1 k k X X X P X X X P È È È = I I I

What Will You Get?

We provide professional writing services to help you score straight A’s by submitting custom written assignments that mirror your guidelines.

Premium Quality

Get result-oriented writing and never worry about grades anymore. We follow the highest quality standards to make sure that you get perfect assignments.

Experienced Writers

Our writers have experience in dealing with papers of every educational level. You can surely rely on the expertise of our qualified professionals.

On-Time Delivery

Your deadline is our threshold for success and we take it very seriously. We make sure you receive your papers before your predefined time.

24/7 Customer Support

Someone from our customer support team is always here to respond to your questions. So, hit us up if you have got any ambiguity or concern.

Complete Confidentiality

Sit back and relax while we help you out with writing your papers. We have an ultimate policy for keeping your personal and order-related details a secret.

Authentic Sources

We assure you that your document will be thoroughly checked for plagiarism and grammatical errors as we use highly authentic and licit sources.

Moneyback Guarantee

Still reluctant about placing an order? Our 100% Moneyback Guarantee backs you up on rare occasions where you aren’t satisfied with the writing.

Order Tracking

You don’t have to wait for an update for hours; you can track the progress of your order any time you want. We share the status after each step.

Order Now Talk to Us

Areas of Expertise

Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.

Essay

Thesis

Presentation

Dissertation

Term Paper

Research Paper

Book Review

Assignment

Report

Case Study

Letter

Article

Coursework

Speech

Q & A

Critical Thinking

Areas of Expertise

Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.

Essay

Thesis

Presentation

Dissertation

Term Paper

Research Paper

Book Review

Assignment

Report

Case Study

Letter

Article

Coursework

Speech

Q & A

Critical Thinking

Trusted Partner of 9650+ Students for Writing

From brainstorming your paper's outline to perfecting its grammar, we perform every step carefully to make your paper worthy of A grade.

Preferred Writer

Hire your preferred writer anytime. Simply specify if you want your preferred expert to write your paper and we’ll make that happen.

Grammar Check Report

Get an elaborate and authentic grammar check report with your work to have the grammar goodness sealed in your document.

One Page Summary

You can purchase this feature if you want our writers to sum up your paper in the form of a concise and well-articulated summary.

Plagiarism Report

You don’t have to worry about plagiarism anymore. Get a plagiarism report to certify the uniqueness of your work.

Free Features $66FREE

Most Qualified Writer $10FREE
Plagiarism Scan Report $10FREE
Unlimited Revisions $08FREE
Paper Formatting $05FREE
Cover Page $05FREE
Referencing & Bibliography $10FREE
Dedicated User Area $08FREE
24/7 Order Tracking $05FREE
Periodic Email Alerts $05FREE

Our Services

Join us for the best experience while seeking writing assistance in your college life. A good grade is all you need to boost up your academic excellence and we are all about it.

On-time Delivery
24/7 Order Tracking
Access to Authentic Sources

Academic Writing

We create perfect papers according to the guidelines.

Professional Editing

We seamlessly edit out errors from your papers.

Thorough Proofreading

We thoroughly read your final draft to identify errors.

Thorough Proofreading

We thoroughly read your final draft to identify errors.

Delegate Your Challenging Writing Tasks to Experienced Professionals

Work with ultimate peace of mind because we ensure that your academic work is our responsibility and your grades are a top concern for us!

Check Out Our Sample Work

Dedication. Quality. Commitment. Punctuality

It May Not Be Much, but It’s Honest Work!

Here is what we have achieved so far. These numbers are evidence that we go the extra mile to make your college journey successful.

0+

Happy Clients

0+

Words Written This Week

0+

Ongoing Orders

0%

Customer Satisfaction Rate

Process as Fine as Brewed Coffee

We have the most intuitive and minimalistic process so that you can easily place an order. Just follow a few steps to unlock success.

Call Us +1 (877) 657-8180 Discuss Order Details Now

See How We Helped 9000+ Students Achieve Success

We Analyze Your Problem and Offer Customized Writing

We understand your guidelines first before delivering any writing service. You can discuss your writing needs and we will have them evaluated by our dedicated team.

Clear elicitation of your requirements.
Customized writing as per your needs.

We Mirror Your Guidelines to Deliver Quality Services

We write your papers in a standardized way. We complete your work in such a way that it turns out to be a perfect description of your guidelines.

Proactive analysis of your writing.
Active communication to understand requirements.

We Handle Your Writing Tasks to Ensure Excellent Grades

We promise you excellent grades and academic excellence that you always longed for. Our writers stay in touch with you via email.

Thorough research and analysis for every order.
Deliverance of reliable writing service to improve your grades.

Place an Order Start Chat Now

Data Mining

What Will You Get?

Premium Quality

Experienced Writers

On-Time Delivery

24/7 Customer Support

Complete Confidentiality

Authentic Sources

Moneyback Guarantee

Order Tracking

Areas of Expertise

Essay

Thesis

Presentation

Dissertation

Term Paper

Research Paper

Book Review

Assignment

Report

Case Study

Letter

Article

Coursework

Speech

Q & A

Critical Thinking

Areas of Expertise

Essay

Thesis

Presentation

Dissertation

Term Paper

Research Paper

Book Review

Assignment

Report

Case Study

Letter

Article

Coursework

Speech

Q & A

Critical Thinking

Trusted Partner of 9650+ Students for Writing

Preferred Writer

Grammar Check Report

One Page Summary

Plagiarism Report

Free Features $66FREE

Our Services

Academic Writing

Professional Editing

Thorough Proofreading

Thorough Proofreading

Delegate Your Challenging Writing Tasks to Experienced Professionals

Check Out Our Sample Work

It May Not Be Much, but It’s Honest Work!

0+

Happy Clients

0+

Words Written This Week

0+

Ongoing Orders

0%

Customer Satisfaction Rate

Process as Fine as Brewed Coffee

Share Your Requirements

Place Order & Deposit Funds

Release Payment to Your Writer

See How We Helped 9000+ Students Achieve Success

We Analyze Your Problem and Offer Customized Writing

We Mirror Your Guidelines to Deliver Quality Services

We Handle Your Writing Tasks to Ensure Excellent Grades