SlideShare une entreprise Scribd logo
1  sur  31
Télécharger pour lire hors ligne
Association Rule Mining with R 
Yanchang Zhao 
http://www.RDataMining.com 
30 September 2014 
1 / 30
Outline 
Introduction 
Association Rule Mining 
Removing Redundancy 
Interpreting Rules 
Visualizing Association Rules 
Further Readings and Online Resources 
2 / 30
Association Rule Mining with R 1 
I basic concepts of association rules 
I association rules mining with R 
I pruning redundant rules 
I interpreting and visualizing association rules 
I recommended readings 
1Chapter 9: Association Rules, R and Data Mining: Examples and Case 
Studies. http://www.rdatamining.com/docs/RDataMining.pdf 
3 / 30
Association Rules 
Association rules are rules presenting association or correlation 
between itemsets. 
support(A ) B) = P(A [ B) 
con
dence(A ) B) = P(BjA) 
= 
P(A [ B) 
P(A) 
lift(A ) B) = 
con
dence(A ) B) 
P(B) 
= 
P(A [ B) 
P(A)P(B) 
where P(A) is the percentage (or probability) of cases containing 
A. 
4 / 30
Association Rule Mining Algorithms in R 
I APRIORI 
I a level-wise, breadth-
rst algorithm which counts transactions 
to
nd frequent itemsets and then derive association rules from 
them 
I apriori() in package arules 
I ECLAT 
I
nds frequent itemsets with equivalence classes, depth-
rst 
search and set intersection instead of counting 
I eclat() in the same package 
5 / 30
Outline 
Introduction 
Association Rule Mining 
Removing Redundancy 
Interpreting Rules 
Visualizing Association Rules 
Further Readings and Online Resources 
6 / 30
The Titanic Dataset 
I The Titanic dataset in the datasets package is a 4-dimensional 
table with summarized information on the fate of passengers 
on the Titanic according to social class, sex, age and survival. 
I To make it suitable for association rule mining, we reconstruct 
the raw data as titanic.raw, where each row represents a 
person. 
I The reconstructed raw data can also be downloaded at 
http://www.rdatamining.com/data/titanic.raw.rdata. 
7 / 30
load("./data/titanic.raw.rdata") 
## draw a sample of 5 records 
idx <- sample(1:nrow(titanic.raw), 5) 
titanic.raw[idx, ] 
## Class Sex Age Survived 
## 950 Crew Male Adult No 
## 2176 3rd Female Adult Yes 
## 1716 Crew Male Adult Yes 
## 1001 Crew Male Adult No 
## 48 3rd Female Child No 
summary(titanic.raw) 
## Class Sex Age Survived 
## 1st :325 Female: 470 Adult:2092 No :1490 
## 2nd :285 Male :1731 Child: 109 Yes: 711 
## 3rd :706 
## Crew:885 
8 / 30
Function apriori() 
Mine frequent itemsets, association rules or association hyperedges 
using the Apriori algorithm. The Apriori algorithm employs 
level-wise search for frequent itemsets. 
Default settings: 
I minimum support: supp=0.1 
I minimum con
dence: conf=0.8 
I maximum length of rules: maxlen=10 
9 / 30
library(arules) 
rules.all <- apriori(titanic.raw) 
## 
## parameter specification: 
## confidence minval smax arem aval originalSupport support 
## 0.8 0.1 1 none FALSE TRUE 0.1 
## minlen maxlen target ext 
## 1 10 rules FALSE 
## 
## algorithmic control: 
## filter tree heap memopt load sort verbose 
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE 
## 
## apriori - find association rules with the apriori algorithm 
## version 4.21 (2004.05.09) (c) 1996-2004 Christian ... 
## set item appearances ...[0 item(s)] done [0.00s]. 
## set transactions ...[10 item(s), 2201 transaction(s)] done ... 
## sorting and recoding items ... [9 item(s)] done [0.00s]. 
## creating transaction tree ... done [0.00s]. 
## checking subsets of size 1 2 3 4 done [0.00s]. 
## writing ... [27 rule(s)] done [0.00s]. 
## creating S4 object ... done [0.00s]. 
10 / 30
inspect(rules.all) 
## lhs rhs support confidence lift 
## 1 {} => {Age=Adult} 0.9505 0.9505 1.0000 
## 2 {Class=2nd} => {Age=Adult} 0.1186 0.9158 0.9635 
## 3 {Class=1st} => {Age=Adult} 0.1449 0.9815 1.0327 
## 4 {Sex=Female} => {Age=Adult} 0.1931 0.9043 0.9514 
## 5 {Class=3rd} => {Age=Adult} 0.2849 0.8881 0.9344 
## 6 {Survived=Yes} => {Age=Adult} 0.2971 0.9198 0.9678 
## 7 {Class=Crew} => {Sex=Male} 0.3916 0.9740 1.2385 
## 8 {Class=Crew} => {Age=Adult} 0.4021 1.0000 1.0521 
## 9 {Survived=No} => {Sex=Male} 0.6197 0.9154 1.1640 
## 10 {Survived=No} => {Age=Adult} 0.6533 0.9651 1.0154 
## 11 {Sex=Male} => {Age=Adult} 0.7574 0.9630 1.0132 
## 12 {Sex=Female, 
## Survived=Yes} => {Age=Adult} 0.1436 0.9186 0.9665 
## 13 {Class=3rd, 
## Sex=Male} => {Survived=No} 0.1917 0.8275 1.2223 
## 14 {Class=3rd, 
## Survived=No} => {Age=Adult} 0.2163 0.9015 0.9485 
## 15 {Class=3rd, 
## Sex=Male} => {Age=Adult} 0.2099 0.9059 0.9531 
## 16 {Sex=Male, 
## Survived=Yes} => {Age=Adult} 0.1536 0.9210 0.9690 
11 / 30
# rules with rhs containing "Survived" only 
rules <- apriori(titanic.raw, 
control = list(verbose=F), 
parameter = list(minlen=2, supp=0.005, conf=0.8), 
appearance = list(rhs=c("Survived=No", 
"Survived=Yes"), 
default="lhs")) 
## keep three decimal places 
quality(rules) <- round(quality(rules), digits=3) 
## order rules by lift 
rules.sorted <- sort(rules, by="lift") 
12 / 30
inspect(rules.sorted) 
## lhs rhs support confidence lift 
## 1 {Class=2nd, 
## Age=Child} => {Survived=Yes} 0.011 1.000 3.096 
## 2 {Class=2nd, 
## Sex=Female, 
## Age=Child} => {Survived=Yes} 0.006 1.000 3.096 
## 3 {Class=1st, 
## Sex=Female} => {Survived=Yes} 0.064 0.972 3.010 
## 4 {Class=1st, 
## Sex=Female, 
## Age=Adult} => {Survived=Yes} 0.064 0.972 3.010 
## 5 {Class=2nd, 
## Sex=Female} => {Survived=Yes} 0.042 0.877 2.716 
## 6 {Class=Crew, 
## Sex=Female} => {Survived=Yes} 0.009 0.870 2.692 
## 7 {Class=Crew, 
## Sex=Female, 
## Age=Adult} => {Survived=Yes} 0.009 0.870 2.692 
## 8 {Class=2nd, 
## Sex=Female, 
## Age=Adult} => {Survived=Yes} 0.036 0.860 2.663 
## 9 {Class=2nd, 
13 / 30
Outline 
Introduction 
Association Rule Mining 
Removing Redundancy 
Interpreting Rules 
Visualizing Association Rules 
Further Readings and Online Resources 
14 / 30
Redundant Rules 
inspect(rules.sorted[1:2]) 
## lhs rhs support confidence lift 
## 1 {Class=2nd, 
## Age=Child} => {Survived=Yes} 0.011 1 3.096 
## 2 {Class=2nd, 
## Sex=Female, 
## Age=Child} => {Survived=Yes} 0.006 1 3.096 
I Rule #2 provides no extra knowledge in addition to rule #1, 
since rules #1 tells us that all 2nd-class children survived. 
I When a rule (such as #2) is a super rule of another rule (#1) 
and the former has the same or a lower lift, the former rule 
(#2) is considered to be redundant. 
I Other redundant rules in the above result are rules #4, #7 
and #8, compared respectively with #3, #6 and #5. 
15 / 30
Remove Redundant Rules 
## find redundant rules 
subset.matrix <- is.subset(rules.sorted, rules.sorted) 
subset.matrix[lower.tri(subset.matrix, diag = T)] <- NA 
redundant <- colSums(subset.matrix, na.rm = T) >= 1 
## which rules are redundant 
which(redundant) 
## [1] 2 4 7 8 
## remove redundant rules 
rules.pruned <- rules.sorted[!redundant] 
16 / 30
Remaining Rules 
inspect(rules.pruned) 
## lhs rhs support confidence lift 
## 1 {Class=2nd, 
## Age=Child} => {Survived=Yes} 0.011 1.000 3.096 
## 2 {Class=1st, 
## Sex=Female} => {Survived=Yes} 0.064 0.972 3.010 
## 3 {Class=2nd, 
## Sex=Female} => {Survived=Yes} 0.042 0.877 2.716 
## 4 {Class=Crew, 
## Sex=Female} => {Survived=Yes} 0.009 0.870 2.692 
## 5 {Class=2nd, 
## Sex=Male, 
## Age=Adult} => {Survived=No} 0.070 0.917 1.354 
## 6 {Class=2nd, 
## Sex=Male} => {Survived=No} 0.070 0.860 1.271 
## 7 {Class=3rd, 
## Sex=Male, 
## Age=Adult} => {Survived=No} 0.176 0.838 1.237 
## 8 {Class=3rd, 
## Sex=Male} => {Survived=No} 0.192 0.827 1.222 
17 / 30
Outline 
Introduction 
Association Rule Mining 
Removing Redundancy 
Interpreting Rules 
Visualizing Association Rules 
Further Readings and Online Resources 
18 / 30
inspect(rules.pruned[1]) 
## lhs rhs support confidence lift 
## 1 {Class=2nd, 
## Age=Child} => {Survived=Yes} 0.011 1 3.096 
Did children of the 2nd class have a higher survival rate than other 
children? 
19 / 30
inspect(rules.pruned[1]) 
## lhs rhs support confidence lift 
## 1 {Class=2nd, 
## Age=Child} => {Survived=Yes} 0.011 1 3.096 
Did children of the 2nd class have a higher survival rate than other 
children? 
The rule states only that all children of class 2 survived, but 
provides no information at all to compare the survival rates of 
dierent classes. 
19 / 30
Rules about Children 
rules - apriori(titanic.raw, control = list(verbose=F), 
parameter = list(minlen=3, supp=0.002, conf=0.2), 
appearance = list(default=none, rhs=c(Survived=Yes), 
lhs=c(Class=1st, Class=2nd, Class=3rd, 
Age=Child, Age=Adult))) 
rules.sorted - sort(rules, by=confidence) 
inspect(rules.sorted) 
## lhs rhs support confidence lift 
## 1 {Class=2nd, 
## Age=Child} = {Survived=Yes} 0.010904 1.0000 3.0956 
## 2 {Class=1st, 
## Age=Child} = {Survived=Yes} 0.002726 1.0000 3.0956 
## 3 {Class=1st, 
## Age=Adult} = {Survived=Yes} 0.089505 0.6176 1.9117 
## 4 {Class=2nd, 
## Age=Adult} = {Survived=Yes} 0.042708 0.3602 1.1149 
## 5 {Class=3rd, 
## Age=Child} = {Survived=Yes} 0.012267 0.3418 1.0580 
## 6 {Class=3rd, 
## Age=Adult} = {Survived=Yes} 0.068605 0.2408 0.7455 
20 / 30
Outline 
Introduction 
Association Rule Mining 
Removing Redundancy 
Interpreting Rules 
Visualizing Association Rules 
Further Readings and Online Resources 
21 / 30
library(arulesViz) 
plot(rules.all) 
Scatter plot for 27 rules 
1.25 
1.2 
1.15 
1.1 
1.05 
1 
0.95 
lift 
0.2 0.4 0.6 0.8 
1 
0.95 
0.9 
0.85 
support 
confidence 
22 / 30
plot(rules.all, method = grouped) 
Grouped matrix for 27 rules 
size: support 
1 (Class=Crew +2) 
1 (Class=Crew +1) 
1 (Class=3rd +2) 
1 (Age=Adult +1) 
2 (Class=Crew +1) 
2 (Class=Crew +0) 
2 (Survived=No +0) 
2 (Class=3rd +1) 
2 (Class=Crew +2) 
1 (Class=3rd +2) 
1 (Class=1st +0) 
1 (Sex=Male +1) 
1 (Sex=Male +0) 
1 (Class=1st +−1) 
2 (Survived=Yes +1) 
1 (Sex=Female +1) 
2 (Class=2nd +3) 
1 (Sex=Female +0) 
1 (Class=3rd +1) 
1 (Class=3rd +0) 
color: lift 
{Age=Adult} 
{Survived=No} 
{Sex=Male} 
LHS 
RHS 
23 / 30

Contenu connexe

Tendances

R data-import, data-export
R data-import, data-exportR data-import, data-export
R data-import, data-exportFAO
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programmingizahn
 
Covering (Rules-based) Algorithm
Covering (Rules-based) AlgorithmCovering (Rules-based) Algorithm
Covering (Rules-based) AlgorithmZHAO Sam
 
R programming slides
R  programming slidesR  programming slides
R programming slidesPankaj Saini
 
Data Types and Structures in R
Data Types and Structures in RData Types and Structures in R
Data Types and Structures in RRupak Roy
 
ANOVA in R by Aman Chauhan
ANOVA in R by Aman ChauhanANOVA in R by Aman Chauhan
ANOVA in R by Aman ChauhanAman Chauhan
 
Data visualization using R
Data visualization using RData visualization using R
Data visualization using RUmmiya Mohammedi
 
Machine Learning Feature Selection - Random Forest
Machine Learning Feature Selection - Random Forest Machine Learning Feature Selection - Random Forest
Machine Learning Feature Selection - Random Forest Rupak Roy
 
Introduction to Python Pandas for Data Analytics
Introduction to Python Pandas for Data AnalyticsIntroduction to Python Pandas for Data Analytics
Introduction to Python Pandas for Data AnalyticsPhoenix
 
Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Ram Narasimhan
 
Python Datatypes by SujithKumar
Python Datatypes by SujithKumarPython Datatypes by SujithKumar
Python Datatypes by SujithKumarSujith Kumar
 
pandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for Pythonpandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for PythonWes McKinney
 

Tendances (20)

R data-import, data-export
R data-import, data-exportR data-import, data-export
R data-import, data-export
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programming
 
Covering (Rules-based) Algorithm
Covering (Rules-based) AlgorithmCovering (Rules-based) Algorithm
Covering (Rules-based) Algorithm
 
Python Scipy Numpy
Python Scipy NumpyPython Scipy Numpy
Python Scipy Numpy
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
6. R data structures
6. R data structures6. R data structures
6. R data structures
 
R programming slides
R  programming slidesR  programming slides
R programming slides
 
Data Types and Structures in R
Data Types and Structures in RData Types and Structures in R
Data Types and Structures in R
 
ANOVA in R by Aman Chauhan
ANOVA in R by Aman ChauhanANOVA in R by Aman Chauhan
ANOVA in R by Aman Chauhan
 
Data visualization using R
Data visualization using RData visualization using R
Data visualization using R
 
Machine Learning Feature Selection - Random Forest
Machine Learning Feature Selection - Random Forest Machine Learning Feature Selection - Random Forest
Machine Learning Feature Selection - Random Forest
 
Introduction to Python Pandas for Data Analytics
Introduction to Python Pandas for Data AnalyticsIntroduction to Python Pandas for Data Analytics
Introduction to Python Pandas for Data Analytics
 
Linear regression
Linear regression Linear regression
Linear regression
 
Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)
 
Python Datatypes by SujithKumar
Python Datatypes by SujithKumarPython Datatypes by SujithKumar
Python Datatypes by SujithKumar
 
NumPy
NumPyNumPy
NumPy
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Language R
Language RLanguage R
Language R
 
Step By Step Guide to Learn R
Step By Step Guide to Learn RStep By Step Guide to Learn R
Step By Step Guide to Learn R
 
pandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for Pythonpandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for Python
 

En vedette

Apriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningApriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningWan Aezwani Wab
 
Association rule mining
Association rule miningAssociation rule mining
Association rule miningAcad
 
Eclat algorithm in association rule mining
Eclat algorithm in association rule miningEclat algorithm in association rule mining
Eclat algorithm in association rule miningDeepa Jeya
 
Text Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataText Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataYanchang Zhao
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with RYanchang Zhao
 
Data mining fp growth
Data mining fp growthData mining fp growth
Data mining fp growthShihab Rahman
 
Association Analysis
Association AnalysisAssociation Analysis
Association Analysisguest0edcaf
 
Data Exploration and Visualization with R
Data Exploration and Visualization with RData Exploration and Visualization with R
Data Exploration and Visualization with RYanchang Zhao
 
R Reference Card for Data Mining
R Reference Card for Data MiningR Reference Card for Data Mining
R Reference Card for Data MiningYanchang Zhao
 
Frequent itemset mining methods
Frequent itemset mining methodsFrequent itemset mining methods
Frequent itemset mining methodsProf.Nilesh Magar
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket AnalysisMahendra Gupta
 
Data Clustering with R
Data Clustering with RData Clustering with R
Data Clustering with RYanchang Zhao
 
Association rule visualization technique
Association rule visualization techniqueAssociation rule visualization technique
Association rule visualization techniquemustafasmart
 
Time Series Analysis and Mining with R
Time Series Analysis and Mining with RTime Series Analysis and Mining with R
Time Series Analysis and Mining with RYanchang Zhao
 
Reproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageReproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageRevolution Analytics
 

En vedette (20)

Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
 
Apriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningApriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule Mining
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Eclat algorithm in association rule mining
Eclat algorithm in association rule miningEclat algorithm in association rule mining
Eclat algorithm in association rule mining
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Text Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataText Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter Data
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with R
 
Data mining fp growth
Data mining fp growthData mining fp growth
Data mining fp growth
 
Association Analysis
Association AnalysisAssociation Analysis
Association Analysis
 
Data Exploration and Visualization with R
Data Exploration and Visualization with RData Exploration and Visualization with R
Data Exploration and Visualization with R
 
R Reference Card for Data Mining
R Reference Card for Data MiningR Reference Card for Data Mining
R Reference Card for Data Mining
 
Frequent itemset mining methods
Frequent itemset mining methodsFrequent itemset mining methods
Frequent itemset mining methods
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket Analysis
 
Data Clustering with R
Data Clustering with RData Clustering with R
Data Clustering with R
 
Association rule visualization technique
Association rule visualization techniqueAssociation rule visualization technique
Association rule visualization technique
 
Time Series Analysis and Mining with R
Time Series Analysis and Mining with RTime Series Analysis and Mining with R
Time Series Analysis and Mining with R
 
Reproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageReproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint Package
 

Similaire à Association Rule Mining with R

R Activity in Biostatistics
R Activity in BiostatisticsR Activity in Biostatistics
R Activity in BiostatisticsLarry Sultiz
 
Bigger Data v Better Math
Bigger Data v Better MathBigger Data v Better Math
Bigger Data v Better MathBrent Schneeman
 
Slide3.ppt
Slide3.pptSlide3.ppt
Slide3.pptbutest
 
Peterson_-_Machine_Learning_Project
Peterson_-_Machine_Learning_ProjectPeterson_-_Machine_Learning_Project
Peterson_-_Machine_Learning_Projectjpeterson2058
 
Profiling and monitoring ruby & rails applications
Profiling and monitoring ruby & rails applicationsProfiling and monitoring ruby & rails applications
Profiling and monitoring ruby & rails applicationsJano Suchal
 
Beautiful python - PyLadies
Beautiful python - PyLadiesBeautiful python - PyLadies
Beautiful python - PyLadiesAlicia Pérez
 
The Ring programming language version 1.5.3 book - Part 36 of 184
The Ring programming language version 1.5.3 book - Part 36 of 184The Ring programming language version 1.5.3 book - Part 36 of 184
The Ring programming language version 1.5.3 book - Part 36 of 184Mahmoud Samir Fayed
 
TAO Fayan_Report on Top 10 data mining algorithms applications with R
TAO Fayan_Report on Top 10 data mining algorithms applications with RTAO Fayan_Report on Top 10 data mining algorithms applications with R
TAO Fayan_Report on Top 10 data mining algorithms applications with RFayan TAO
 
data frames.pptx
data frames.pptxdata frames.pptx
data frames.pptxRacksaviR
 
Hierarchies of LifeExperiment 1 Classification of Common Objects.docx
Hierarchies of LifeExperiment 1 Classification of Common Objects.docxHierarchies of LifeExperiment 1 Classification of Common Objects.docx
Hierarchies of LifeExperiment 1 Classification of Common Objects.docxpooleavelina
 
[1062BPY12001] Data analysis with R / April 19
[1062BPY12001] Data analysis with R / April 19[1062BPY12001] Data analysis with R / April 19
[1062BPY12001] Data analysis with R / April 19Kevin Chun-Hsien Hsu
 
Market Basket Analysis in R
Market Basket Analysis in RMarket Basket Analysis in R
Market Basket Analysis in RRsquared Academy
 
Tree-Based Methods (Article 8 - Practical Exercises)
Tree-Based Methods (Article 8 - Practical Exercises)Tree-Based Methods (Article 8 - Practical Exercises)
Tree-Based Methods (Article 8 - Practical Exercises)Theodore Grammatikopoulos
 
Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...
Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...
Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...Matt Harrison
 
Structured data type
Structured data typeStructured data type
Structured data typeOmkar Majukar
 
Day 1b R structures objects.pptx
Day 1b   R structures   objects.pptxDay 1b   R structures   objects.pptx
Day 1b R structures objects.pptxAdrien Melquiond
 
Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.David Tollmyr
 

Similaire à Association Rule Mining with R (20)

R Activity in Biostatistics
R Activity in BiostatisticsR Activity in Biostatistics
R Activity in Biostatistics
 
R and data mining
R and data miningR and data mining
R and data mining
 
R programming
R programmingR programming
R programming
 
Next Level Testing
Next Level TestingNext Level Testing
Next Level Testing
 
Bigger Data v Better Math
Bigger Data v Better MathBigger Data v Better Math
Bigger Data v Better Math
 
Slide3.ppt
Slide3.pptSlide3.ppt
Slide3.ppt
 
Peterson_-_Machine_Learning_Project
Peterson_-_Machine_Learning_ProjectPeterson_-_Machine_Learning_Project
Peterson_-_Machine_Learning_Project
 
Profiling and monitoring ruby & rails applications
Profiling and monitoring ruby & rails applicationsProfiling and monitoring ruby & rails applications
Profiling and monitoring ruby & rails applications
 
Beautiful python - PyLadies
Beautiful python - PyLadiesBeautiful python - PyLadies
Beautiful python - PyLadies
 
The Ring programming language version 1.5.3 book - Part 36 of 184
The Ring programming language version 1.5.3 book - Part 36 of 184The Ring programming language version 1.5.3 book - Part 36 of 184
The Ring programming language version 1.5.3 book - Part 36 of 184
 
TAO Fayan_Report on Top 10 data mining algorithms applications with R
TAO Fayan_Report on Top 10 data mining algorithms applications with RTAO Fayan_Report on Top 10 data mining algorithms applications with R
TAO Fayan_Report on Top 10 data mining algorithms applications with R
 
data frames.pptx
data frames.pptxdata frames.pptx
data frames.pptx
 
Hierarchies of LifeExperiment 1 Classification of Common Objects.docx
Hierarchies of LifeExperiment 1 Classification of Common Objects.docxHierarchies of LifeExperiment 1 Classification of Common Objects.docx
Hierarchies of LifeExperiment 1 Classification of Common Objects.docx
 
[1062BPY12001] Data analysis with R / April 19
[1062BPY12001] Data analysis with R / April 19[1062BPY12001] Data analysis with R / April 19
[1062BPY12001] Data analysis with R / April 19
 
Market Basket Analysis in R
Market Basket Analysis in RMarket Basket Analysis in R
Market Basket Analysis in R
 
Tree-Based Methods (Article 8 - Practical Exercises)
Tree-Based Methods (Article 8 - Practical Exercises)Tree-Based Methods (Article 8 - Practical Exercises)
Tree-Based Methods (Article 8 - Practical Exercises)
 
Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...
Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...
Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...
 
Structured data type
Structured data typeStructured data type
Structured data type
 
Day 1b R structures objects.pptx
Day 1b   R structures   objects.pptxDay 1b   R structures   objects.pptx
Day 1b R structures objects.pptx
 
Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.
 

Plus de Yanchang Zhao

RDataMining slides-time-series-analysis
RDataMining slides-time-series-analysisRDataMining slides-time-series-analysis
RDataMining slides-time-series-analysisYanchang Zhao
 
RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rYanchang Zhao
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classificationYanchang Zhao
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programmingYanchang Zhao
 
RDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rRDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rYanchang Zhao
 
RDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisationRDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisationYanchang Zhao
 
RDataMining slides-clustering-with-r
RDataMining slides-clustering-with-rRDataMining slides-clustering-with-r
RDataMining slides-clustering-with-rYanchang Zhao
 
RDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rRDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rYanchang Zhao
 
RDataMining-reference-card
RDataMining-reference-cardRDataMining-reference-card
RDataMining-reference-cardYanchang Zhao
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RYanchang Zhao
 
Time series-mining-slides
Time series-mining-slidesTime series-mining-slides
Time series-mining-slidesYanchang Zhao
 

Plus de Yanchang Zhao (11)

RDataMining slides-time-series-analysis
RDataMining slides-time-series-analysisRDataMining slides-time-series-analysis
RDataMining slides-time-series-analysis
 
RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-r
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classification
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programming
 
RDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rRDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-r
 
RDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisationRDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisation
 
RDataMining slides-clustering-with-r
RDataMining slides-clustering-with-rRDataMining slides-clustering-with-r
RDataMining slides-clustering-with-r
 
RDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rRDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-r
 
RDataMining-reference-card
RDataMining-reference-cardRDataMining-reference-card
RDataMining-reference-card
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in R
 
Time series-mining-slides
Time series-mining-slidesTime series-mining-slides
Time series-mining-slides
 

Dernier

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 

Dernier (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 

Association Rule Mining with R

  • 1. Association Rule Mining with R Yanchang Zhao http://www.RDataMining.com 30 September 2014 1 / 30
  • 2. Outline Introduction Association Rule Mining Removing Redundancy Interpreting Rules Visualizing Association Rules Further Readings and Online Resources 2 / 30
  • 3. Association Rule Mining with R 1 I basic concepts of association rules I association rules mining with R I pruning redundant rules I interpreting and visualizing association rules I recommended readings 1Chapter 9: Association Rules, R and Data Mining: Examples and Case Studies. http://www.rdatamining.com/docs/RDataMining.pdf 3 / 30
  • 4. Association Rules Association rules are rules presenting association or correlation between itemsets. support(A ) B) = P(A [ B) con
  • 5. dence(A ) B) = P(BjA) = P(A [ B) P(A) lift(A ) B) = con
  • 6. dence(A ) B) P(B) = P(A [ B) P(A)P(B) where P(A) is the percentage (or probability) of cases containing A. 4 / 30
  • 7. Association Rule Mining Algorithms in R I APRIORI I a level-wise, breadth-
  • 8. rst algorithm which counts transactions to
  • 9. nd frequent itemsets and then derive association rules from them I apriori() in package arules I ECLAT I
  • 10. nds frequent itemsets with equivalence classes, depth-
  • 11. rst search and set intersection instead of counting I eclat() in the same package 5 / 30
  • 12. Outline Introduction Association Rule Mining Removing Redundancy Interpreting Rules Visualizing Association Rules Further Readings and Online Resources 6 / 30
  • 13. The Titanic Dataset I The Titanic dataset in the datasets package is a 4-dimensional table with summarized information on the fate of passengers on the Titanic according to social class, sex, age and survival. I To make it suitable for association rule mining, we reconstruct the raw data as titanic.raw, where each row represents a person. I The reconstructed raw data can also be downloaded at http://www.rdatamining.com/data/titanic.raw.rdata. 7 / 30
  • 14. load("./data/titanic.raw.rdata") ## draw a sample of 5 records idx <- sample(1:nrow(titanic.raw), 5) titanic.raw[idx, ] ## Class Sex Age Survived ## 950 Crew Male Adult No ## 2176 3rd Female Adult Yes ## 1716 Crew Male Adult Yes ## 1001 Crew Male Adult No ## 48 3rd Female Child No summary(titanic.raw) ## Class Sex Age Survived ## 1st :325 Female: 470 Adult:2092 No :1490 ## 2nd :285 Male :1731 Child: 109 Yes: 711 ## 3rd :706 ## Crew:885 8 / 30
  • 15. Function apriori() Mine frequent itemsets, association rules or association hyperedges using the Apriori algorithm. The Apriori algorithm employs level-wise search for frequent itemsets. Default settings: I minimum support: supp=0.1 I minimum con
  • 16. dence: conf=0.8 I maximum length of rules: maxlen=10 9 / 30
  • 17. library(arules) rules.all <- apriori(titanic.raw) ## ## parameter specification: ## confidence minval smax arem aval originalSupport support ## 0.8 0.1 1 none FALSE TRUE 0.1 ## minlen maxlen target ext ## 1 10 rules FALSE ## ## algorithmic control: ## filter tree heap memopt load sort verbose ## 0.1 TRUE TRUE FALSE TRUE 2 TRUE ## ## apriori - find association rules with the apriori algorithm ## version 4.21 (2004.05.09) (c) 1996-2004 Christian ... ## set item appearances ...[0 item(s)] done [0.00s]. ## set transactions ...[10 item(s), 2201 transaction(s)] done ... ## sorting and recoding items ... [9 item(s)] done [0.00s]. ## creating transaction tree ... done [0.00s]. ## checking subsets of size 1 2 3 4 done [0.00s]. ## writing ... [27 rule(s)] done [0.00s]. ## creating S4 object ... done [0.00s]. 10 / 30
  • 18. inspect(rules.all) ## lhs rhs support confidence lift ## 1 {} => {Age=Adult} 0.9505 0.9505 1.0000 ## 2 {Class=2nd} => {Age=Adult} 0.1186 0.9158 0.9635 ## 3 {Class=1st} => {Age=Adult} 0.1449 0.9815 1.0327 ## 4 {Sex=Female} => {Age=Adult} 0.1931 0.9043 0.9514 ## 5 {Class=3rd} => {Age=Adult} 0.2849 0.8881 0.9344 ## 6 {Survived=Yes} => {Age=Adult} 0.2971 0.9198 0.9678 ## 7 {Class=Crew} => {Sex=Male} 0.3916 0.9740 1.2385 ## 8 {Class=Crew} => {Age=Adult} 0.4021 1.0000 1.0521 ## 9 {Survived=No} => {Sex=Male} 0.6197 0.9154 1.1640 ## 10 {Survived=No} => {Age=Adult} 0.6533 0.9651 1.0154 ## 11 {Sex=Male} => {Age=Adult} 0.7574 0.9630 1.0132 ## 12 {Sex=Female, ## Survived=Yes} => {Age=Adult} 0.1436 0.9186 0.9665 ## 13 {Class=3rd, ## Sex=Male} => {Survived=No} 0.1917 0.8275 1.2223 ## 14 {Class=3rd, ## Survived=No} => {Age=Adult} 0.2163 0.9015 0.9485 ## 15 {Class=3rd, ## Sex=Male} => {Age=Adult} 0.2099 0.9059 0.9531 ## 16 {Sex=Male, ## Survived=Yes} => {Age=Adult} 0.1536 0.9210 0.9690 11 / 30
  • 19. # rules with rhs containing "Survived" only rules <- apriori(titanic.raw, control = list(verbose=F), parameter = list(minlen=2, supp=0.005, conf=0.8), appearance = list(rhs=c("Survived=No", "Survived=Yes"), default="lhs")) ## keep three decimal places quality(rules) <- round(quality(rules), digits=3) ## order rules by lift rules.sorted <- sort(rules, by="lift") 12 / 30
  • 20. inspect(rules.sorted) ## lhs rhs support confidence lift ## 1 {Class=2nd, ## Age=Child} => {Survived=Yes} 0.011 1.000 3.096 ## 2 {Class=2nd, ## Sex=Female, ## Age=Child} => {Survived=Yes} 0.006 1.000 3.096 ## 3 {Class=1st, ## Sex=Female} => {Survived=Yes} 0.064 0.972 3.010 ## 4 {Class=1st, ## Sex=Female, ## Age=Adult} => {Survived=Yes} 0.064 0.972 3.010 ## 5 {Class=2nd, ## Sex=Female} => {Survived=Yes} 0.042 0.877 2.716 ## 6 {Class=Crew, ## Sex=Female} => {Survived=Yes} 0.009 0.870 2.692 ## 7 {Class=Crew, ## Sex=Female, ## Age=Adult} => {Survived=Yes} 0.009 0.870 2.692 ## 8 {Class=2nd, ## Sex=Female, ## Age=Adult} => {Survived=Yes} 0.036 0.860 2.663 ## 9 {Class=2nd, 13 / 30
  • 21. Outline Introduction Association Rule Mining Removing Redundancy Interpreting Rules Visualizing Association Rules Further Readings and Online Resources 14 / 30
  • 22. Redundant Rules inspect(rules.sorted[1:2]) ## lhs rhs support confidence lift ## 1 {Class=2nd, ## Age=Child} => {Survived=Yes} 0.011 1 3.096 ## 2 {Class=2nd, ## Sex=Female, ## Age=Child} => {Survived=Yes} 0.006 1 3.096 I Rule #2 provides no extra knowledge in addition to rule #1, since rules #1 tells us that all 2nd-class children survived. I When a rule (such as #2) is a super rule of another rule (#1) and the former has the same or a lower lift, the former rule (#2) is considered to be redundant. I Other redundant rules in the above result are rules #4, #7 and #8, compared respectively with #3, #6 and #5. 15 / 30
  • 23. Remove Redundant Rules ## find redundant rules subset.matrix <- is.subset(rules.sorted, rules.sorted) subset.matrix[lower.tri(subset.matrix, diag = T)] <- NA redundant <- colSums(subset.matrix, na.rm = T) >= 1 ## which rules are redundant which(redundant) ## [1] 2 4 7 8 ## remove redundant rules rules.pruned <- rules.sorted[!redundant] 16 / 30
  • 24. Remaining Rules inspect(rules.pruned) ## lhs rhs support confidence lift ## 1 {Class=2nd, ## Age=Child} => {Survived=Yes} 0.011 1.000 3.096 ## 2 {Class=1st, ## Sex=Female} => {Survived=Yes} 0.064 0.972 3.010 ## 3 {Class=2nd, ## Sex=Female} => {Survived=Yes} 0.042 0.877 2.716 ## 4 {Class=Crew, ## Sex=Female} => {Survived=Yes} 0.009 0.870 2.692 ## 5 {Class=2nd, ## Sex=Male, ## Age=Adult} => {Survived=No} 0.070 0.917 1.354 ## 6 {Class=2nd, ## Sex=Male} => {Survived=No} 0.070 0.860 1.271 ## 7 {Class=3rd, ## Sex=Male, ## Age=Adult} => {Survived=No} 0.176 0.838 1.237 ## 8 {Class=3rd, ## Sex=Male} => {Survived=No} 0.192 0.827 1.222 17 / 30
  • 25. Outline Introduction Association Rule Mining Removing Redundancy Interpreting Rules Visualizing Association Rules Further Readings and Online Resources 18 / 30
  • 26. inspect(rules.pruned[1]) ## lhs rhs support confidence lift ## 1 {Class=2nd, ## Age=Child} => {Survived=Yes} 0.011 1 3.096 Did children of the 2nd class have a higher survival rate than other children? 19 / 30
  • 27. inspect(rules.pruned[1]) ## lhs rhs support confidence lift ## 1 {Class=2nd, ## Age=Child} => {Survived=Yes} 0.011 1 3.096 Did children of the 2nd class have a higher survival rate than other children? The rule states only that all children of class 2 survived, but provides no information at all to compare the survival rates of dierent classes. 19 / 30
  • 28. Rules about Children rules - apriori(titanic.raw, control = list(verbose=F), parameter = list(minlen=3, supp=0.002, conf=0.2), appearance = list(default=none, rhs=c(Survived=Yes), lhs=c(Class=1st, Class=2nd, Class=3rd, Age=Child, Age=Adult))) rules.sorted - sort(rules, by=confidence) inspect(rules.sorted) ## lhs rhs support confidence lift ## 1 {Class=2nd, ## Age=Child} = {Survived=Yes} 0.010904 1.0000 3.0956 ## 2 {Class=1st, ## Age=Child} = {Survived=Yes} 0.002726 1.0000 3.0956 ## 3 {Class=1st, ## Age=Adult} = {Survived=Yes} 0.089505 0.6176 1.9117 ## 4 {Class=2nd, ## Age=Adult} = {Survived=Yes} 0.042708 0.3602 1.1149 ## 5 {Class=3rd, ## Age=Child} = {Survived=Yes} 0.012267 0.3418 1.0580 ## 6 {Class=3rd, ## Age=Adult} = {Survived=Yes} 0.068605 0.2408 0.7455 20 / 30
  • 29. Outline Introduction Association Rule Mining Removing Redundancy Interpreting Rules Visualizing Association Rules Further Readings and Online Resources 21 / 30
  • 30. library(arulesViz) plot(rules.all) Scatter plot for 27 rules 1.25 1.2 1.15 1.1 1.05 1 0.95 lift 0.2 0.4 0.6 0.8 1 0.95 0.9 0.85 support confidence 22 / 30
  • 31. plot(rules.all, method = grouped) Grouped matrix for 27 rules size: support 1 (Class=Crew +2) 1 (Class=Crew +1) 1 (Class=3rd +2) 1 (Age=Adult +1) 2 (Class=Crew +1) 2 (Class=Crew +0) 2 (Survived=No +0) 2 (Class=3rd +1) 2 (Class=Crew +2) 1 (Class=3rd +2) 1 (Class=1st +0) 1 (Sex=Male +1) 1 (Sex=Male +0) 1 (Class=1st +−1) 2 (Survived=Yes +1) 1 (Sex=Female +1) 2 (Class=2nd +3) 1 (Sex=Female +0) 1 (Class=3rd +1) 1 (Class=3rd +0) color: lift {Age=Adult} {Survived=No} {Sex=Male} LHS RHS 23 / 30
  • 32. plot(rules.all, method = graph) Graph for 27 rules {Class=3rd,Sex=Male,Age=Adult} {Sex=Male,Survived=No} {Class=3rd,Survived=No} {Sex=Male,Survived=Yes} {Survived=No} {Age=Adult} {Age=Adult,Survived=No} {} {Class=1st} {Sex=Female} {Class=2nd} {Class=3rd,Age=Adult,Survived=No} {Class=3rd,Sex=Male,Survived=No} {Class=3rd,Sex=Male} {Class=3rd} {Class=Crew,Age=Adult,Survived=No} {Class=Crew,Age=Adult} {Class=Crew,Sex=Male,Survived=No} {Class=Crew,Sex=Male} {Class=Crew,Survived=No} {Class=Crew} {Sex=Female,Survived=Yes} {Sex=Male} {Survived=Yes} width: support (0.119 − 0.95) color: lift (0.934 − 1.266) 24 / 30
  • 33. plot(rules.all, method = graph, control = list(type = items)) Graph for 27 rules Class=1st Class=2nd Survived=Yes Class=3rd Class=Crew Sex=Female Age=Adult Sex=Male Survived=No size: support (0.119 − 0.95) color: lift (0.934 − 1.266) 25 / 30
  • 34. plot(rules.all, method = paracoord, control = list(reorder = TRUE)) Parallel coordinates plot for 27 rules 3 2 1 rhs Class=1st Survived=No Survived=Yes Class=3rd Class=2nd Sex=Male Sex=Female Class=Crew Age=Adult Position 26 / 30
  • 35. Outline Introduction Association Rule Mining Removing Redundancy Interpreting Rules Visualizing Association Rules Further Readings and Online Resources 27 / 30
  • 36. Further Readings I More than 20 interestingness measures, such as chi-square, conviction, gini and leverage Tan, P.-N., Kumar, V., and Srivastava, J. (2002). Selecting the right interestingness measure for association patterns. In Proc. of KDD '02, pages 32-41, New York, NY, USA. ACM Press. I Post mining of association rules, such as selecting interesting association rules, visualization of association rules and using association rules for classi
  • 37. cation Yanchang Zhao, Chengqi Zhang and Longbing Cao (Eds.). Post-Mining of Association Rules: Techniques for Eective Knowledge Extraction, ISBN 978-1-60566-404-0, May 2009. Information Science Reference. I Package arulesSequences: mining sequential patterns http://cran.r-project.org/web/packages/arulesSequences/ 28 / 30
  • 38. Online Resources I Chapter 9: Association Rules, in book R and Data Mining: Examples and Case Studies http://www.rdatamining.com/docs/RDataMining.pdf I R Reference Card for Data Mining http://www.rdatamining.com/docs/R-refcard-data-mining.pdf I Free online courses and documents http://www.rdatamining.com/resources/ I RDataMining Group on LinkedIn (7,000+ members) http://group.rdatamining.com I RDataMining on Twitter (1,700+ followers) @RDataMining 29 / 30
  • 39. The End Thanks! Email: yanchang(at)rdatamining.com 30 / 30