2. Data Mining Definition
There are several definitions for Data Mining:
• Mining is a term characterizing the process that finds a small set of
important knowledge from a great deal of raw material.
• Knowledge mining from data
• Knowledge extraction
• Data/Pattern analysis.
Data mining is an essential step in the process of knowledge discovery
from Data (KDD).
mmouf@2017
3. Knowledge Discovery from Data (KDD)
The knowledge discovery (KDD) process is an iterative sequence of the
following steps:
1. Data cleaning
2. Data integration
3. Data selection
4. Data transformation
5. Data mining
6. Pattern evaluation
7. Knowledge presentation
Steps 1 through 4 are different forms of data preprocessing, where data
are prepared for mining.
mmouf@2017
4. What Kinds of Data Can Be Mined?
• Database Data:
searching for trends or data patterns.
detect deviations
• Data Warehouses
Although data warehouse tools help support data analysis, additional
tools for data mining are often needed for in-depth analysis
• Transactional Data
Market basket Data Analysis
mmouf@2017
5. What Kinds of Patterns Can Be Mined?
• There are a number of data mining functionalities includes:
• Characterization and discrimination
• The mining of frequent patterns, Associations, and correlations,
• Classification and regression,
• Clustering analysis, and outlier analysis.
Data mining functionality can be classified into two categories:
• Descriptive mining tasks characterize properties of the data in a target
data set.
• Predictive mining tasks perform induction on the current data in order
to make predictions.
mmouf@2017
6. What Kinds of Patterns Can Be Mined?
mmouf@2017
Data Mining
Descriptive Predictive
Association Rule
Clustering
Summarization
Classification
Regression
Time Series
7. Descriptive Data mining
• This is used to generate correlation, frequency, cross tabulation.
• It can be used to discover regularities in the data and to uncover
patterns.
• It is also used to find subgroups in the bulk of data.
mmouf@2017
8. Association Rules:
What is Association rule?
Association rule is a method for discovering interesting relations
between variables in large databases. It is intended to identify strong
rules discovered in databases
To select interesting rules, constraints on various measures of
significance are used.
The best-known constraints are minimum thresholds on support and
confidence.
mmouf@2017
10. Association Rules
Support:
The support value of X with respect to T is defined as the proportion of
transactions in the database which contains the item-set X.
Supp(X) =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑇𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝐶𝑜𝑛𝑡𝑎𝑖𝑛𝑠 𝐼𝑡𝑒𝑚 𝑜𝑓 𝑋 {𝐵𝑟𝑒𝑎𝑑,𝐵𝑢𝑡𝑡𝑒𝑟}
𝑇𝑜𝑡𝑎𝑙 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑇𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠
Transaction Contains Item of X: Transaction 2 and Transaction 4
Total number of transaction = 5
Supp(X) = 2/5 = 0.4
This Means 40% of all transaction contains itemSet X
11. Association Rules
Confidence:
The confidence value of a rule, X => Y , with respect to a set of
transactions T, is the proportion of the transactions that contains X
which also contains Y.
Conf(X=>Y) =
𝑆𝑢𝑝𝑝(𝑋∪𝑌)
𝑆𝑢𝑝𝑝(𝑋)
𝑋∪𝑌 = {Bread, Butter, Milk}
Supp(𝑋∪𝑌) = 1 / 5 = 0.2
Conf(X=>Y) = 0.2 / 0.4 = 0.5
This means 50% the transactions containing butter and bread contains
Milk.
12. Association Rules:
Mining one level Association (Apriori)
Apriori is an algorithm for frequent item set mining and association rule
learning over transactional databases.
It proceeds by identifying the frequent individual items in the database
and extending them to larger and larger item sets as long as those item
sets appear sufficiently often in the database.
The frequent item sets determined by Apriori can be used to determine
association rules which highlight general trends in the database.
This has applications in domains such as market basket analysis.
mmouf@2017
13. Association Rules:
Mining one level Association (Apriori)
Example:
Assume the following Database transaction:
With minimum support = 0.5 (2)
mmouf@2017
Transaction Items
T1 Milk, Bread, Cookies, Juice
T2 Milk, Juice
T3 Milk, Egg
T4 Bread, Cookies, Coffee
14. Association Rules:
Mining one level Association (Apriori)
Solution:
Step1: Create 1st Level Item set
mmouf@2017
Item Support
Milk 3
Bread 2
Cookies 2
Juice 2
Egg 1
Coffee 1
Rejected as they are Below
the minimum support
15. Association Rules:
Mining one level Association (Apriori)
Solution:
Step2: Create 2nd Level Item set
mmouf@2017
Rejected as they are Below
the minimum support
Items Support
Milk, Bread 1
Milk, Cookies 1
Milk, Juice 2
Bread, Cookies 2
Bread, Juice 1
Cookies, Juice 1
Rejected as they are Below
the minimum support
16. Association Rules:
Mining one level Association (Apriori)
Solution:
Step3: Create 3rd Level Item set
There is no association at the 3rd level item set
mmouf@2017
Rejected as they are Below
the minimum support
Items Support
Milk, Juice, Bread 1
Milk, Juice, Cookies 1
Milk, Bread, Cookies 1
Juice, Bread, Cookies 1
17. Association Rules:
Mining one level Association (Apriori)
Solution:
We stop the combination of itemset in one of two cases:
• All the last level items are neglected as they are less than the min
support
• Reach Level Item set contains all element
Last Step: Association Rules
Milk=>Juice [support = 0.5, confidence = 0.67]
Juice=>Milk [support = 0.5, confidence = 1]
Bread=>Cookies [support = 0.5, confidence = 1]
Cookies=>Bread [support = 0.5, confidence = 1]
mmouf@2017
18. Association Rules
Mining Multilevel Associations
Usually the data are in form of hierarchy. So, Strong association
discovered in high level can be unreal
We may want to drill down to find novel patterns at more detailed
levels.
On the other hand, there could be too many scattered patterns at low or
primitive abstraction levels, some of which are just trivial
specializations of patterns at higher levels.
Association rules generated from mining data at multiple abstraction
levels are called multiple-level or multilevel association rules.
Multilevel association rules can be mined efficiently using concept
hierarchies under a support-confidence framework.
19. Association Rules
Mining Multilevel Associations
In general, a top-down strategy is used, where counts are accumulated
for the calculation of frequent itemsets at each concept level, starting at
concept level 1 and working downward in the hierarchy toward the
more specific concept levels, until no more frequent itemsets can be
found.
For each level, any algorithm for discovering frequent itemsets may be
used, such as Apriori.
Only the descendants of the frequent items at level 1 (L[1, 1]) are
considered as candidates in level 2 frequent 1 Itemset
20. Food
milk bread jam juice
skim2% 4% bran white cherry plum apple grape prune
kfarm smartsmith
Old
mile
wond
er
Dairy
Land
Fore
most
1 2 3 4
1 2 3 1 2 3
1 2 3
1 2 1 2
1 2 1 2
21. Association Rules
Mining Multilevel Associations
Example:
Assume the following Encoded Transaction Table
The Item is defined by its hierarchy.
Example: item 111 the first 1 “hundred” represent “milk”,
the second 1 “tenth” represent second level “2%” and
the third 1 “unit” represent third level “Dairy Land”
Minimum Support = 4 (For Level 1) minsup[1] = 4
TID Items
T1 {111, 121, 211, 221}
T2 {111, 211, 222, 323}
T3 {112, 122, 221, 411}
T4 {111, 121}
T5 {111, 122, 211, 221, 413}
T6 {211, 323, 524}
T7 {323, 411, 524, 713}
22. Association Rules
Mining Multilevel Associations
Level 1: (minsupp[1] = 4)
•Level 1 Frequent 1 item set L[1, 1]
ItemSet Support
{1**} 5
{2**} 5
{3**} 3
{4**} 2
{5**} 1
{7**} 1
Canceled as they are Below
the minimum support
23. Association Rules
Mining Multilevel Associations
Level 1: (minsupp[1] = 4)
•Level 1 Frequent 2 item set: L[1, 2].
•Produce Filtered Transaction Table
Remove any infrequent itemSet in transaction”
Remove any transaction contains infrequent itemSet only
infrequent itemset, is itemset that has a support less than the minsup[1]
ItemSet Support
{1**, 2**} 4
29. Association Rules
Mining Multilevel Associations
Level 3: (minsupp[3] = 3)
• Level 3 Frequent 2 item set L[3, 2]
We will stop at this level and frequently
111=> 211 {support = 0.43, confidence = 0.75}
211=> 111 {support = 0.43, confidence = 0.75}
ItemSet Support
{111, 211} 3
{111, 221} 2
{211, 221} 2
30. Association Rules
Mining Multilevel Associations
Low Level Minsupp:
There are 2 Approaches taken to identify the low level minimum
support:
Uniform support: The same minimum support threshold is used when
mining at each abstraction level.
The method is simple in that users are required to specify only one
minimum support threshold.
If the minimum support threshold is set too high, it could miss some
meaningful associations occurring at low abstraction levels.
If the threshold is set too low, it may generate many uninteresting
associations occurring at high abstraction levels.
31. Association Rules
Mining Multilevel Associations
Low Level Minsupp:
There are 2 Approaches taken to identify the low level minimum
support:
Reduced support: Each abstraction level has its own minimum support
threshold. The deeper the abstraction level, the smaller the
corresponding threshold.
33. Association Rules
Mining Multidimensional Associations
The data may be in a form of multidimensional and data warehouse, rather
than 2 dimension or multi-level.
In multidimensional databases, we refer to each distinct predicate in a rule as
a dimension.
Ex: buys(X, “digital camera”) => buys(X, “HP Printer”)
We can refer to it as a single-dimensional or intradimensional association rule
(single distinct predicate (buys) with multiple occurrences
Multidimensional data representation, in addition to keeping track of the
items purchased in sales transactions, a relational database may record other
attributes associated with the items and/or transactions such as the item
description or the branch location of the sale
Ex: age(X, “20..29”) Ʌoccupation(X, “Student”)=>buys(X, “laptop”)
34. Association Rules
Mining Multidimensional Associations
Ex: age(X, “20..29”) Ʌoccupation(X, “Student”)=>buys(X, “laptop”)
This refer as multidimensional association rules and contains three
predicates (age, occupation, and buys), each of which occurs only once
in the rule (no repeated predicates)
Multidimensional association rules with no repeated predicates are
called interdimensional association rules
Ex: age(X, “20..29”) Ʌ buys(X, “laptop”) =>buys(X, “HP Printer”)
Multidimensional association rules with repeated predicates, which
contain multiple occurrences of some predicates (buys repeated).
These rules are called hybrid-dimensional association rules.
35. Clustering:
Clustering is the process of grouping a set of data objects into multiple
groups or clusters so that objects within a cluster have high similarity,
but are very dissimilar to objects in other clusters.
Cluster analysis or simply clustering is the process of partitioning a set
of data objects (or observations) into subsets. Each subset is a cluster,
such that objects in a cluster are similar to one another, yet dissimilar to
objects in other clusters.
Different clustering methods may generate different clustering on the
same data set.
Clustering is useful in that it can lead to the discovery of previously
unknown groups within the data.
mmouf@2017
36. Clustering:
Cluster analysis can be used as a standalone tool to gain insight into the
distribution of data, to observe the characteristics of each cluster, and to
focus on a particular set of clusters for further analysis.
It may serve as a preprocessing step for other algorithms, such as
characterization, attribute subset selection, and classification, which
would then operate on the detected clusters and the selected attributes or
features.
mmouf@2017
37. Clustering:
k-means cluster
k-means is one of the simplest unsupervised learning algorithms that
solve the well known clustering problem.
The procedure follows a simple and easy way to classify a given data set
through a certain number of clusters (assume k clusters)
The main idea is to define k centers, one for each cluster.
These centers should be placed in a cunning way because of different
location causes different result.
So, the better choice is to place them as much as possible far away from
each other.
The next step is to take each point belonging to a given data set and
associate it to the nearest center.
mmouf@2017
38. Clustering:
k-means cluster
When no point is pending, the first step is completed and an early group
age is done.
At this point we need to re-calculate k new centroids as barycenter of
the clusters resulting from the previous step.
After we have these k new centroids, a new binding has to be done
between the same data set points and the nearest new center.
A loop has been generated.
As a result of this loop we may notice that the k centers change their
location step by step until no more changes are done or in other words
centers do not move any more.
mmouf@2017
40. Clustering:
k-means cluster
Solution:
Choose 2 points to be the center of each cluster (selected Randomly)
“A, C”
Step1: Calculate the distance between each point and the 2 selected
point
𝑙𝑒𝑛𝑔𝑡ℎ = (𝑋1 − 𝑋2)2+(𝑌1 − 𝑌2)2
mmouf@2017
i A (Cluster 1) C (Cluster 2)
A 0 1.4
B 1 2.2
C 1.4 0
D 3.2 2.8
E 4.5 4.2
41. Clustering:
k-means cluster
Compare the distance between each point and the 2 selected groups.
This point will belong to the cluster which has the smallest distance to it
Point B, belong to the Cluster of Point “A” (1 less than 2.2)
Point D, belong to the Cluster of Point “C” (2.8 less than 2.2)
Point E, belong to the Cluster of Point “C” (4.2 less than 4.5)
mmouf@2017
i X Y Cluster
A 1 1 1
B 1 0 1
C 0 2 2
D 2 4 2
E 3 5 2
42. Clustering:
k-means cluster
Calculate the mean of Cluster 1:
X = (1 + 1) / 2 = 1
Y = (1 + 0) / 2 = 0.5
Mean Cluster1 (1, 0.5)
Calculate the mean of Cluster 2:
X = (0 + 2 + 3) / 3 = 1.7
Y = (2 + 4 + 5) / 3 = 3.7
Mean Cluster2 (1.7, 3.7)
mmouf@2017
43. Clustering:
k-means cluster
Step2: Recalculate the distance from each point to the cluster means
Compare the distance between each point and the 2 cluster mean. This
point will belong to the cluster which has the smallest distance to it
Point A, belong to the Cluster 1 (0.5 less than 2.7)
Point B, belong to the Cluster 1 (0.5 less than 3.7)
mmouf@2017
I Cluster 1 Cluster 2
A 0.5 2.7
B 0.5 3.7
C 1.8 2.4
D 3.6 0.5
E 4.9 1.9
44. Clustering:
k-means cluster
Point C, belong to the Cluster 1 (1.8 less than 2.4)
Point D, belong to the Cluster 2 (0.5 less than 3.6)
Point E, belong to the Cluster 2 (1.9 less than 4.9)
mmouf@2017
i X Y Cluster
A 1 1 1
B 1 0 1
C 0 2 1
D 2 4 2
E 3 5 2
45. Clustering:
k-means cluster
Calculate the mean of Cluster 1: (0.7, 1)
Calculate the mean of Cluster 2: (2.5, 4.5)
Step3: Recalculate the distance from each point to the cluster means. In
this example we will find no change, so it is the final solution
mmouf@2017
46. Predictive Data mining
• The purpose of Predictive mining model is mainly to predict the future
outcome than current behavior.
• The prediction output can be numeric value or in categorized form.
The predictive models are the supervised learning functions which
predicts the target value.
mmouf@2017
47. Classification:
Among Predictive data mining technique, Classification model is
considered as the best-understood technique of all data mining
approaches.
The common characteristics of classification tasks are as supervised
learning, categories dependent variable and assigning new data to one of
a set of well-defined classes.
Classification technique is used in customer segmentation, modeling
businesses, credit analysis, and many other applications.
In a classification technique, you typically have historical data called
labeled examples and new examples. Each labeled example consists of
multiple predictor attributes and one target attribute that is a class label.
mmouf@2017
48. Classification:
The goal of classification is to construct a model using the data from
history and accurately predicts the new class of examples.
A classification task begins with build data in database also known as
training data for which the target values are known.
There are different classification algorithms available that uses their
different techniques for finding relations between the predictor attributes
values and the target values in the build data.
After getting the targeted data, these relations are summarized in a
model, so that they can be applied to new cases further with unknown
target values for predicting target values.
mmouf@2017
49. Classification:
The classification is: grouping the data to classes and each class
contains the similar data.
It is supervised learning, this mean that a part of the available data
(which I know its class ”Labeled data”) are used for training.
Then we use the second part of data for testing the classifier.
Example:
As “Unlabeled data”:
Age:56, Income: 45K classifier Budget Spender
mmouf@2017
Training Data
Age Income Class label
27 28K Budget Spender
35 36K Big Spender
65 45K Budget Spender
50. Classification:
Classification Steps:
Classification passes through 2 steps:
Step1: Model Construction (Learning step, training step)
Step2: Model Usage
Before using the model, we need to test its accuracy.
We have some data acting as test data (Labeled data), we pass them to
the classifier model and compare their result with the result we have.
The accuracy rate, is the percentage of test set samples that are correctly
classified by the model.
If the classifier rate is accepted, then use the model for “Unlabeled
data”.
mmouf@2017
51. Classification:
Decision Tree
A decision tree is a structure that includes a root node, branches, and
leaf nodes.
Each internal node denotes a test on an attribute, each branch denotes
the outcome of a test, and each leaf node holds a class label.
The topmost node in the tree is the root node.
The following decision tree is for the concept buy_computer that
indicates whether a customer at a company is likely to buy a computer
or not.
Each internal node represents a test on an attribute.
Each leaf node represents a class.
mmouf@2017
53. Classification:
Decision Tree
Why decision trees classifiers are so popular?
• The construction of a decision tree does not require any domain
knowledge or parameter setting
• They can handle high dimensional data
• Intuitive representation that is easily understood by humans
• Learning and classification are simple and fast
• They have a good accuracy
mmouf@2017
54. Classification:
Decision Tree
Example:
Draw the Decision Tree
mmouf@2017
RID Age Student Credit-rating Class:buyComputer
1 Youth Yes Fair Yes
2 Youth Yes Fair Yes
3 Youth Yes Fair No
4 Youth No Fair No
5 Middle No Excellent Yes
6 Senior Yes Fair No
7 senior yes excellent Yes
56. Classification:
Decision Tree
Solution:
The middle has only one record, this mean all the middle age, will get
the decision of buying a computer
mmouf@2017
Age
RID Class
1 Yes
2 Yes
3 No
4 No
RID Class
6 No
7 Yes
Youth
Middle
Senior
Yes
57. Classification:
Decision Tree
Solution:
The youth has part of its record goes to buy, and part goes to don’t buy.
So, we need to have another attribute. Here, the 4 records have the same
credit rating, so it is not effective, while it has difference in student
attribute
Also, the senior attribute, but here the student is not effective, while the
credit rating is changed
When the attribute Age = youth and the attribute Student = yes, there are
3 records (2 yes and 1 no), so we need another attribute, which the same
for these 3 records, here we will assign it to the majority of record.
mmouf@2017
59. Classification:
Decision Tree
Solution:
Usage: Find the class of the following data
Start with the root (age = youth), go to the left branch, then (student =
no) So this record will be added to class No
mmouf@2017
RID age student Credit-rating Class:buyComputer
8 youth no fair ?
60. Classification:
Naïve Bayes
The Bayes classifier is based on the Bayes theorem for conditional
probabilities.
This theorem quantifies the conditional probability of a random variable
(class variable), given known observations about the value of another
set of random variables (feature variables).
The Bayes theorem is used widely in probability and statistics.
In a Bayesian classifier, the learning agent builds a probabilistic model
of the features and uses that model to predict the classification of a new
example.
mmouf@2017
61. Classification:
Naïve Bayes
Example:
Tuple to Classify is:
X(age = youth, income = medium, student = yes, credit = fair), Maximize P(X|Ci) P(Ci)
mmouf@2017
RID age Income student Credit-rating Class: buyComputer
1 Youth High No fair No
2 Youth High No Excellent No
3 Middle High No Fair Yes
4 Senior Medium No Fair Yes
5 Senior Low Yes Fair Yes
6 Senior Low Yes Excellent No
7 Middle Low Yes Excellent Yes
8 Youth Medium No Fair No
9 Youth Low Yes Fair Yes
10 Senior Medium Yes Fair Yes
11 Youth Medium Yes Excellent Yes
12 Middle Medium No Excellent Yes
13 Middle High Yes Fair Yes
14 senior Medium no Excellent No
62. Classification:
Naïve Bayes
Solution:
Step 1: P(Ci)
P(buyComputer = Yes) = number of “Yes” / Total number
= 9 / 14 = 0.643
P(buyComputer = No) = number of “No” / Total number
= 5 / 14 = 0.357
Step 2: P(X|Ci)
Calculate the probability of X for each class, but here I will not going to get
the whole X, I will compute the probability of each attribute to each class
P(age = youth | buyComputer = yes) = 2 / 9 = 0.222
P(age = youth | buyComputer = no) = 3 / 5 = 0.666
mmouf@2017