SlideShare une entreprise Scribd logo
1  sur  36
Télécharger pour lire hors ligne
Machine	
  Learning	
  for	
  Language	
  Technology	
  2015	
  
h6p://stp.lingfil.uu.se/~san?nim/ml/2015/ml4lt_2015.htm	
  
	
  	
  
Machine	
  Learning	
  in	
  Prac-ce	
  (1)	
  
	
  
Marina	
  San-ni	
  
san$nim@stp.lingfil.uu.se	
  
	
  
Department	
  of	
  Linguis-cs	
  and	
  Philology	
  
Uppsala	
  University,	
  Uppsala,	
  Sweden	
  
	
  
Autumn	
  2015	
  
	
  
Acknowledgements	
  
•  Weka’s	
  slides	
  
•  WiHen	
  et	
  al.	
  (2011):	
  Ch	
  5	
  (156-­‐180)	
  
•  Daume’	
  III	
  (2015):	
  ch	
  4	
  pp.	
  65-­‐67.	
  
Lecture  8  ML  in  Practice  (1)	
 2
Outline	
  
l  Comparing	
  schemes:	
  the	
  t-­‐test	
  
l  Predic-ng	
  probabili-es	
  
l  Cost-­‐sensi-ve	
  measures	
  
l  Occam’s	
  razor	
  
Lecture  8  ML  in  Practice  (1)	
 3
4	
 Lecture  8  ML  in  Practice  (1)	
Comparing	
  data	
  mining	
  schemes	
  
l  Frequent question: which of two learning schemes
performs better?
l  Note: this is domain dependent!
l  Obvious way: compare 10-fold CV estimates
l  Generally sufficient in applications (we don't loose
if the chosen method is not truly better)
l  However, what about machine learning research?
♦  Need to show convincingly that a particular
method works better
5	
 Lecture  8  ML  in  Practice  (1)	
Comparing	
  schemes	
  II	
  
l  Want	
  to	
  show	
  that	
  scheme	
  A	
  is	
  beHer	
  than	
  scheme	
  B	
  in	
  a	
  
par-cular	
  domain	
  
♦  For	
  a	
  given	
  amount	
  of	
  training	
  data	
  
♦  On	
  average,	
  across	
  all	
  possible	
  training	
  sets	
  
l  Let's	
  assume	
  we	
  have	
  an	
  infinite	
  amount	
  of	
  data	
  from	
  the	
  
domain:	
  
♦  Sample	
  infinitely	
  many	
  dataset	
  of	
  specified	
  size	
  
♦  Obtain	
  cross-­‐valida-on	
  es-mate	
  on	
  each	
  dataset	
  for	
  each	
  
scheme	
  
♦  Check	
  if	
  mean	
  accuracy	
  for	
  scheme	
  A	
  is	
  beHer	
  than	
  mean	
  
accuracy	
  for	
  scheme	
  B	
  
6	
 Lecture  8  ML  in  Practice  (1)	
Paired	
  t-­‐test	
  
l  In practice we have limited data and a limited number of
estimates for computing the mean
l  Student’s t-test tells whether the means of two samples
are significantly different
l  In our case the samples are cross-validation estimates
for different datasets from the domain
l  Use a paired t-test because the individual samples are
paired
♦  The same CV is applied twice
William Gosset
Born: 1876 in Canterbury; Died: 1937 in Beaconsfield, England
Obtained a post as a chemist in the Guinness brewery in Dublin in
1899. Invented the t-test to handle small samples for quality control in
brewing. Wrote under the name "Student".
7	
 Lecture  8  ML  in  Practice  (1)	
Distribu-on	
  of	
  the	
  means	
  
l  x1 x2 … xk
l  y1 y2 … yk
l  mx and my are the means
l  With enough samples, the mean of a set of independent
samples is normally distributed
l  Estimated variances of the means are
σx
2/k and σy
2/k
l  If µx and µy are the true means then à à à
are approximately normally distributed with
mean 0, variance 1
8	
 Lecture  8  ML  in  Practice  (1)	
Student’s	
  distribu-on	
  
l  With small samples (k < 100) the mean
follows Student’s distribution with k–1 degrees
of freedom
l  Confidence limits:
0.8820%
1.3810%
1.835%
2.82
3.25
4.30
z
1%
0.5%
0.1%
Pr[X ≥ z]
0.8420%
1.2810%
1.655%
2.33
2.58
3.09
z
1%
0.5%
0.1%
Pr[X ≥ z]
9 degrees of freedom normal distribution
Assuming
we have
10 estimates
9	
 Lecture  8  ML  in  Practice  (1)	
Distribu-on	
  of	
  the	
  differences	
  
l  Let md = mx – my
l  The difference of the means (md) also has a
Student’s distribution with k–1 degrees of freedom
l  The standardized version of md is called the t-
statistic: ….
l  We use t to perform the t-test
l  σd
2 = the variance of the difference samples
10	
 Lecture  8  ML  in  Practice  (1)	
Performing	
  the	
  test	
  
•  Fix a significance level
•  If a difference is significant at the α% level,
there is a (100-α)% chance that the true means differ
•  Divide the significance level by two because the test
is two-tailed
•  i.e. the true difference can be +ve or – ve
•  Look up the value for z that corresponds to α/2
•  If t ≤ –z or t ≥z then the difference is significant
•  I.e. the null hypothesis (that the difference is zero) can be
rejected
11	
 Lecture  8  ML  in  Practice  (1)	
Unpaired	
  observa-ons	
  
l  If the CV estimates are from different
datasets, they are no longer paired
(or maybe we have k estimates for one
scheme, and j estimates for the other one)
l  Then we have to use an un paired t-test with
min(k , j) – 1 degrees of freedom
l  The estimate of the variance of the difference
of the means becomes….:
12	
 Lecture  8  ML  in  Practice  (1)	
Predic-ng	
  probabili-es	
  
l  Performance measure so far: success rate
l  Also called 0-1 loss function:
l  Most classifiers produces class probabilities
l  Depending on the application, we might want to
check the accuracy of the probability estimates
l  0-1 loss is not the right thing to use in those cases
∑ i {0	
  if	
  prediction	
  is	
  correct
1	
  if	
  prediction	
  is	
  incorrect
}
13	
 Lecture  8  ML  in  Practice  (1)	
Quadra-c	
  loss	
  func-on	
  
l  p1 … pk are probability estimates for an
instance
l  c is the index of the instance’s actual class
l  a1 … ak = 0, except for ac which is 1
l  Quadratic loss is:……
l  Want to minimize…..
14	
 Lecture  8  ML  in  Practice  (1)	
Informa-onal	
  loss	
  func-on	
  
l  The informational loss function is –log(pc),
where c is the index of the instance’s actual class
l  Let p1
* … pk
* be the true class probabilities
l  Then the expected value for the loss function is:
15	
 Lecture  8  ML  in  Practice  (1)	
Discussion	
  
l  Which loss function to choose?
♦  Quadratic loss function takes into account all
class probability estimates for an instance
♦  Informational loss focuses only on the
probability estimate for the actual class
1义∑ j p j
2
16	
 Lecture  8  ML  in  Practice  (1)	
The	
  kappa	
  sta-s-c	
  
l  Two	
  confusion	
  matrices	
  for	
  a	
  3-­‐class	
  problem:	
  
actual	
  predic-ons	
  (le])	
  vs.	
  random	
  predic-ons	
  (right)	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
l  Number	
  of	
  successes:	
  sum	
  of	
  entries	
  in	
  diagonal	
  (D)	
  	
  
l  Kappa	
  sta-s-c:	
  
	
  
measures	
  rela-ve	
  improvement	
  over	
  random	
  predic-ons	
  	
  
D obs e rve d− D rand om
D pe rfe ct− D rand om
K	
  sta-s-c:	
  Calcula-ons	
  
•  Propor-ons	
  of	
  the	
  class	
  ”a”	
  =	
  0.5	
  (ie	
  100	
  instances	
  out	
  of	
  200	
  à	
  50%	
  à	
  50/100	
  à	
  0.5)	
  
•  Propor-ons	
  of	
  the	
  class	
  ”b”	
  =	
  0.3	
  (ie	
  60	
  instances	
  out	
  of	
  200	
  à	
  30%	
  à	
  30/100	
  à	
  0.3)	
  
•  Propor-ons	
  of	
  the	
  class	
  ”c”	
  =	
  0.2	
  (ie	
  40	
  instances	
  out	
  of	
  200	
  à	
  20%	
  à	
  20/100	
  à	
  0.2)	
  
Both	
  classifiers	
  (see	
  below)	
  returns	
  120	
  a’s,	
  60	
  b’s	
  and	
  20	
  c’s,	
  but	
  one	
  classifier	
  is	
  random.	
  How	
  
much	
  the	
  actual	
  classifier	
  improves	
  on	
  the	
  random	
  classifier?	
  
A	
  classifier	
  randomly	
  guessing	
  would	
  return	
  the	
  predic-ons	
  in	
  the	
  table	
  on	
  the	
  RHS:	
  
0.5*120=60;	
  0.3*60=18;	
  0.2*20=4	
  à	
  60+18+4	
  =	
  82	
  
The	
  actual	
  classifier	
  returns	
  the	
  predic-ons	
  in	
  the	
  table	
  on	
  the	
  LHS,	
  140	
  correct	
  predic-ons	
  (see	
  
diagonal),	
  ie	
  70%	
  success	
  rate.	
  However:	
  k	
  sta$s$c	
  =	
  140-­‐82/200-­‐82	
  =	
  58/118=0.49=49%	
  
•  So	
  the	
  actual	
  success	
  rate	
  of	
  70%	
  repesents	
  an	
  improvement	
  of	
  49%	
  on	
  random	
  guessing!	
  
Lecture  8  ML  in  Practice  (1)	
 17	
D obs e rve d− D rand om
D pe rfe ct− D rand om
actual predictions (left) vs. random predictions (right)
In	
  summary	
  
•  A	
  k	
  sta-s-c	
  of	
  100%	
  (or	
  1)	
  implies	
  a	
  perfect	
  classifier.	
  	
  
•  A	
  k	
  sta-s-c	
  of	
  0	
  implies	
  that	
  the	
  classifier	
  provides	
  no	
  
informa-on	
  and	
  behaves	
  as	
  if	
  it	
  were	
  guessing	
  
randomly.	
  	
  
•  The	
  Kappa	
  sta-s-c	
  is	
  used	
  to	
  measure	
  the	
  agreement	
  
between	
  predicted	
  and	
  observed	
  categoriza-ons	
  of	
  a	
  
dataset,	
  and	
  corrects	
  the	
  agreement	
  that	
  occurs	
  by	
  
chance.	
  	
  
•  Weka	
  provides	
  the	
  k	
  sta-s-c	
  value	
  to	
  assess	
  the	
  
success	
  rate	
  beyond	
  the	
  chance.	
  
Lecture  8  ML  in  Practice  (1)	
 18
Quiz	
  1:	
  k	
  sta-s-c	
  
Our	
  classifier	
  predicts	
  Red	
  41	
  -mes,	
  Green	
  29	
  -mes	
  and	
  Blue	
  30	
  -mes.	
  The	
  actual	
  
numbers	
  	
  for	
  the	
  sample	
  are:	
  40	
  Red,	
  30	
  Green	
  and	
  30	
  Blue.	
  	
  	
  
	
  
Overall,	
  our	
  classifier	
  is	
  right	
  70%	
  of	
  the	
  -me.	
  	
  
	
  
Suppose	
  these	
  predic$ons	
  had	
  been	
  random	
  guesses.	
  Our	
  classifier	
  have	
  been	
  
randomly	
  right:	
  0.4	
  x	
  41	
  +	
  0.3	
  x	
  29	
  +	
  0.3	
  x	
  30	
  =	
  34.1	
  (random	
  guess)	
  
	
  
So	
  the	
  actual	
  success	
  rate	
  of	
  70%	
  represents	
  an	
  improvement	
  of	
  35.9%	
  on	
  random	
  
guessing.	
  
	
  
What	
  is	
  the	
  k	
  sta-s-c	
  for	
  our	
  classifier?	
  	
  
1.  0.54	
  
2.  0.60	
  
3.  0.70	
  
	
  
	
   Lecture  8  ML  in  Practice  (1)	
 19
20	
 Lecture  8  ML  in  Practice  (1)	
Coun-ng	
  the	
  cost	
  
l  In practice, different types of classification
errors often incur different costs
l  Examples:
♦  Promotional mailing
♦  Terrorist profiling
l “Not a terrorist” correct 99.99% of the time, but if
you miss 0.01% the cost will be very high
♦  Loan decisions
♦  etc.
l  There are many other types of cost!
l  E.g.: cost of collecting training data
21	
 Lecture  8  ML  in  Practice  (1)	
Coun-ng	
  the	
  cost	
  
l  The confusion matrix:
Actual class
True negativeFalse positiveNo
False negativeTrue positiveYes
NoYes
Predicted class
22	
 Lecture  8  ML  in  Practice  (1)	
Classifica-on	
  with	
  costs	
  
l  Two	
  cost	
  matrices:	
  
	
  
	
  
	
  
	
  
	
  
l  Success	
  rate	
  is	
  replaced	
  by	
  average	
  cost	
  per	
  
predic-on	
  
♦  Cost	
  is	
  given	
  by	
  appropriate	
  entry	
  in	
  the	
  cost	
  
matrix	
  
	
  
	
  
23	
 Lecture  8  ML  in  Practice  (1)	
Cost-­‐sensi-ve	
  classifica-on	
  
l  Can	
  take	
  costs	
  into	
  account	
  when	
  making	
  predic-ons	
  
♦  Basic	
  idea:	
  only	
  predict	
  high-­‐cost	
  class	
  when	
  very	
  confident	
  
about	
  predic-on	
  
l  Given:	
  predicted	
  class	
  probabili-es	
  
♦  Normally	
  we	
  just	
  predict	
  the	
  most	
  likely	
  class	
  
♦  Here,	
  we	
  should	
  make	
  the	
  predic-on	
  that	
  minimizes	
  the	
  
expected	
  cost	
  
l  Expected	
  cost:	
  dot	
  product	
  of	
  vector	
  of	
  class	
  probabili-es	
  and	
  
appropriate	
  column	
  in	
  cost	
  matrix	
  
l  Choose	
  column	
  (class)	
  that	
  minimizes	
  expected	
  cost	
  
	
  
	
  
24	
 Lecture  8  ML  in  Practice  (1)	
Cost-­‐sensi-ve	
  learning	
  
l  So far we haven't taken costs into account at
training time
l  Most learning schemes do not perform cost-
sensitive learning
l  They generate the same classifier no matter what
costs are assigned to the different classes
l  Example: standard decision tree learner
l  Simple methods for cost-sensitive learning:
l  Resampling of instances according to costs
l  Weighting of instances according to costs
l  Some schemes can take costs into account by
varying a parameter, e.g. naïve Bayes
25	
 Lecture  8  ML  in  Practice  (1)	
Li]	
  charts	
  
l  In practice, costs are rarely known
l  Decisions are usually made by comparing
possible scenarios
l  Example: promotional mailout to 1,000,000
households
•  Mail to all; 0.1% respond (1000)
•  Data mining tool identifies subset of 100,000 most
promising, 0.4% of these respond (400)
40% of responses for 10% of cost may pay off
•  Identify subset of 400,000 most promising, 0.2%
respond (800)
l  A lift chart allows a visual comparison
Data	
  for	
  a	
  li]	
  chart	
  
Lecture  8  ML  in  Practice  (1)	
 26
27	
 Lecture  8  ML  in  Practice  (1)	
Genera-ng	
  a	
  li]	
  chart	
  
l  Sort instances according to predicted
probability of being positive:
l  x axis is sample size
y axis is number of true positives
………
Yes0.884
No0.933
Yes0.932
Yes0.951
Actual classPredicted probability
28	
 Lecture  8  ML  in  Practice  (1)	
A	
  hypothe-cal	
  li]	
  chart	
  
40% of responses
for 10% of cost
80% of responses
for 40% of cost
29	
 Lecture  8  ML  in  Practice  (1)	
ROC	
  curves	
  
l  ROC curves are similar to lift charts
♦  Stands for “receiver operating characteristic”
♦  Used in signal detection to show tradeoff
between hit rate and false alarm rate over
noisy channel
l  Differences to lift chart:
♦  y axis shows percentage of true positives in
sample rather than absolute number
♦  x axis shows percentage of false positives in
sample rather than sample size
30	
 Lecture  8  ML  in  Practice  (1)	
A	
  sample	
  ROC	
  curve	
  
l  Jagged curve—one set of test data
l  Smooth curve—use cross-validation
31	
 Lecture  8  ML  in  Practice  (1)	
Cross-­‐valida-on	
  and	
  ROC	
  curves	
  
l  Simple method of getting a ROC curve using
cross-validation:
♦  Collect probabilities for instances in test folds
♦  Sort instances according to probabilities
l  This method is implemented in WEKA
l  However, this is just one possibility
♦  Another possibility is to generate an ROC curve
for each fold and average them
32	
 Lecture  8  ML  in  Practice  (1)	
ROC	
  curves	
  for	
  two	
  schemes	
  
l  For a small, focused sample, use method A
l  For a larger one, use method B
l  In between, choose between A and B with appropriate probabilities
33	
 Lecture  8  ML  in  Practice  (1)	
Recall-­‐Precision	
  Curves	
  
l  Percentage of retrieved documents that are relevant:
precision=TP/(TP+FP)
l  Percentage of relevant documents that are returned:
recall =TP/(TP+FN)
l  Precision/recall curves have hyperbolic shape
l  Summary measures: average precision at 20%, 50% and 80%
recall (three-point average recall)
l  F-measure=(2 × recall × precision)/(recall+precision)
l  sensitivity × specificity = (TP / (TP + FN)) × (TN / (FP + TN))
l  Area under the ROC curve (AUC):
probability that randomly chosen positive instance is ranked
above randomly chosen negative one
34	
 Lecture  8  ML  in  Practice  (1)	
Model	
  selec-on	
  criteria	
  
l  Model selection criteria attempt to find a good
compromise between:
l  The complexity of a model
l  Its prediction accuracy on the training data
l  Reasoning: a good model is a simple model that
achieves high accuracy on the given data
l  Also known as Occam’s Razor :
the best theory is the smallest one
that describes all the facts
William of Ockham, born in the village of Ockham in
Surrey (England) about 1285, was the most influential
philosopher of the 14th century and a controversial
theologian.
35	
 Lecture  8  ML  in  Practice  (1)	
Elegance	
  vs.	
  errors	
  
l  Model 1: very simple, elegant model that
accounts for the data almost perfectly
l  Model 2: significantly more complex model that
reproduces the data without mistakes
l  Model 1 is probably preferable.
The	
  End	
  
Lecture  8  ML  in  Practice  (1)	
 36

Contenu connexe

Tendances

Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Marina Santini
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai UniversityMadhav Mishra
 
Decision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningDecision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningAbhishek Vijayvargia
 
ensemble learning
ensemble learningensemble learning
ensemble learningbutest
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IMachine Learning Valencia
 
Machine learning Lecture 1
Machine learning Lecture 1Machine learning Lecture 1
Machine learning Lecture 1Srinivasan R
 
Performance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorialPerformance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorialBilkent University
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsSalah Amean
 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Simplilearn
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 
Machine Learning Unit 2 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 2 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 2 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 2 Semester 3 MSc IT Part 2 Mumbai UniversityMadhav Mishra
 

Tendances (19)

Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
Machine Learning and Data Mining
Machine Learning and Data MiningMachine Learning and Data Mining
Machine Learning and Data Mining
 
evaluation and credibility-Part 2
evaluation and credibility-Part 2evaluation and credibility-Part 2
evaluation and credibility-Part 2
 
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
 
Machine learning
Machine learningMachine learning
Machine learning
 
Decision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningDecision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learning
 
L4. Ensembles of Decision Trees
L4. Ensembles of Decision TreesL4. Ensembles of Decision Trees
L4. Ensembles of Decision Trees
 
ensemble learning
ensemble learningensemble learning
ensemble learning
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms I
 
Machine learning Lecture 1
Machine learning Lecture 1Machine learning Lecture 1
Machine learning Lecture 1
 
Performance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorialPerformance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorial
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
 
Multiple Classifier Systems
Multiple Classifier SystemsMultiple Classifier Systems
Multiple Classifier Systems
 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Borderline Smote
Borderline SmoteBorderline Smote
Borderline Smote
 
Machine Learning Unit 2 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 2 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 2 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 2 Semester 3 MSc IT Part 2 Mumbai University
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 

En vedette

Machine Learning
Machine LearningMachine Learning
Machine LearningShiraz316
 
Software Architecture Views and Viewpoints
Software Architecture Views and ViewpointsSoftware Architecture Views and Viewpoints
Software Architecture Views and ViewpointsHenry Muccini
 
Machine Learning and Data Mining: 14 Evaluation and Credibility
Machine Learning and Data Mining: 14 Evaluation and CredibilityMachine Learning and Data Mining: 14 Evaluation and Credibility
Machine Learning and Data Mining: 14 Evaluation and CredibilityPier Luca Lanzi
 
Statistical classification: A review on some techniques
Statistical classification: A review on some techniquesStatistical classification: A review on some techniques
Statistical classification: A review on some techniquesGiorgos Bamparopoulos
 

En vedette (7)

1 5
1 51 5
1 5
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine learning-cheat-sheet
Machine learning-cheat-sheetMachine learning-cheat-sheet
Machine learning-cheat-sheet
 
Software Architecture Views and Viewpoints
Software Architecture Views and ViewpointsSoftware Architecture Views and Viewpoints
Software Architecture Views and Viewpoints
 
Machine Learning and Data Mining: 14 Evaluation and Credibility
Machine Learning and Data Mining: 14 Evaluation and CredibilityMachine Learning and Data Mining: 14 Evaluation and Credibility
Machine Learning and Data Mining: 14 Evaluation and Credibility
 
Statistical classification: A review on some techniques
Statistical classification: A review on some techniquesStatistical classification: A review on some techniques
Statistical classification: A review on some techniques
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 

Similaire à Lecture 8: Machine Learning in Practice (1)

Advanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxAdvanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxakashayosha
 
13ClassifierPerformance.pdf
13ClassifierPerformance.pdf13ClassifierPerformance.pdf
13ClassifierPerformance.pdfssuserdce5c21
 
Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validationStéphane Canu
 
Probability distribution Function & Decision Trees in machine learning
Probability distribution Function  & Decision Trees in machine learningProbability distribution Function  & Decision Trees in machine learning
Probability distribution Function & Decision Trees in machine learningSadia Zafar
 
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavMachine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavAgile Testing Alliance
 
Workshop 4
Workshop 4Workshop 4
Workshop 4eeetq
 
MLlectureMethod.ppt
MLlectureMethod.pptMLlectureMethod.ppt
MLlectureMethod.pptbutest
 
MLlectureMethod.ppt
MLlectureMethod.pptMLlectureMethod.ppt
MLlectureMethod.pptbutest
 
2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methods2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methodsKrish_ver2
 
Suggest one psychological research question that could be answered.docx
Suggest one psychological research question that could be answered.docxSuggest one psychological research question that could be answered.docx
Suggest one psychological research question that could be answered.docxpicklesvalery
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdfgadissaassefa
 
t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750richardchandler
 
Chi-square, Yates, Fisher & McNemar
Chi-square, Yates, Fisher & McNemarChi-square, Yates, Fisher & McNemar
Chi-square, Yates, Fisher & McNemarAzmi Mohd Tamil
 
ML-ChapterFour-ModelEvaluation.pptx
ML-ChapterFour-ModelEvaluation.pptxML-ChapterFour-ModelEvaluation.pptx
ML-ChapterFour-ModelEvaluation.pptxbelay41
 
Machine learning (5)
Machine learning (5)Machine learning (5)
Machine learning (5)NYversity
 

Similaire à Lecture 8: Machine Learning in Practice (1) (20)

Advanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxAdvanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptx
 
13ClassifierPerformance.pdf
13ClassifierPerformance.pdf13ClassifierPerformance.pdf
13ClassifierPerformance.pdf
 
Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validation
 
Probability distribution Function & Decision Trees in machine learning
Probability distribution Function  & Decision Trees in machine learningProbability distribution Function  & Decision Trees in machine learning
Probability distribution Function & Decision Trees in machine learning
 
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavMachine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
 
9618821.pdf
9618821.pdf9618821.pdf
9618821.pdf
 
9618821.ppt
9618821.ppt9618821.ppt
9618821.ppt
 
Workshop 4
Workshop 4Workshop 4
Workshop 4
 
MLlectureMethod.ppt
MLlectureMethod.pptMLlectureMethod.ppt
MLlectureMethod.ppt
 
MLlectureMethod.ppt
MLlectureMethod.pptMLlectureMethod.ppt
MLlectureMethod.ppt
 
2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methods2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methods
 
Suggest one psychological research question that could be answered.docx
Suggest one psychological research question that could be answered.docxSuggest one psychological research question that could be answered.docx
Suggest one psychological research question that could be answered.docx
 
Two Proportions
Two Proportions  Two Proportions
Two Proportions
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdf
 
Chapter12
Chapter12Chapter12
Chapter12
 
t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750
 
Chi-square, Yates, Fisher & McNemar
Chi-square, Yates, Fisher & McNemarChi-square, Yates, Fisher & McNemar
Chi-square, Yates, Fisher & McNemar
 
ML-ChapterFour-ModelEvaluation.pptx
ML-ChapterFour-ModelEvaluation.pptxML-ChapterFour-ModelEvaluation.pptx
ML-ChapterFour-ModelEvaluation.pptx
 
Machine learning mathematicals.pdf
Machine learning mathematicals.pdfMachine learning mathematicals.pdf
Machine learning mathematicals.pdf
 
Machine learning (5)
Machine learning (5)Machine learning (5)
Machine learning (5)
 

Plus de Marina Santini

Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Marina Santini
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsTowards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsMarina Santini
 
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-Marina Santini
 
An Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability FeaturesAn Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability FeaturesMarina Santini
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word CloudsMarina Santini
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebMarina Santini
 
Lecture: Summarization
Lecture: SummarizationLecture: Summarization
Lecture: SummarizationMarina Santini
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question AnsweringMarina Santini
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)Marina Santini
 
Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Marina Santini
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationMarina Santini
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role LabelingMarina Santini
 
Semantics and Computational Semantics
Semantics and Computational SemanticsSemantics and Computational Semantics
Semantics and Computational SemanticsMarina Santini
 
Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Marina Santini
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Marina Santini
 
Lecture 1: Introduction to the Course (Practical Information)
Lecture 1: Introduction to the Course (Practical Information)Lecture 1: Introduction to the Course (Practical Information)
Lecture 1: Introduction to the Course (Practical Information)Marina Santini
 

Plus de Marina Santini (20)

Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsTowards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology Applications
 
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
 
An Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability FeaturesAn Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability Features
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word Clouds
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic Web
 
Lecture: Summarization
Lecture: SummarizationLecture: Summarization
Lecture: Summarization
 
Relation Extraction
Relation ExtractionRelation Extraction
Relation Extraction
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question Answering
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
 
Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
 
Lecture: Word Senses
Lecture: Word SensesLecture: Word Senses
Lecture: Word Senses
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
Semantics and Computational Semantics
Semantics and Computational SemanticsSemantics and Computational Semantics
Semantics and Computational Semantics
 
Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Lecture 5: Interval Estimation
Lecture 5: Interval Estimation
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 
Lecture 1: Introduction to the Course (Practical Information)
Lecture 1: Introduction to the Course (Practical Information)Lecture 1: Introduction to the Course (Practical Information)
Lecture 1: Introduction to the Course (Practical Information)
 

Dernier

INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 

Dernier (20)

INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 

Lecture 8: Machine Learning in Practice (1)

  • 1. Machine  Learning  for  Language  Technology  2015   h6p://stp.lingfil.uu.se/~san?nim/ml/2015/ml4lt_2015.htm       Machine  Learning  in  Prac-ce  (1)     Marina  San-ni   san$nim@stp.lingfil.uu.se     Department  of  Linguis-cs  and  Philology   Uppsala  University,  Uppsala,  Sweden     Autumn  2015    
  • 2. Acknowledgements   •  Weka’s  slides   •  WiHen  et  al.  (2011):  Ch  5  (156-­‐180)   •  Daume’  III  (2015):  ch  4  pp.  65-­‐67.   Lecture  8  ML  in  Practice  (1) 2
  • 3. Outline   l  Comparing  schemes:  the  t-­‐test   l  Predic-ng  probabili-es   l  Cost-­‐sensi-ve  measures   l  Occam’s  razor   Lecture  8  ML  in  Practice  (1) 3
  • 4. 4 Lecture  8  ML  in  Practice  (1) Comparing  data  mining  schemes   l  Frequent question: which of two learning schemes performs better? l  Note: this is domain dependent! l  Obvious way: compare 10-fold CV estimates l  Generally sufficient in applications (we don't loose if the chosen method is not truly better) l  However, what about machine learning research? ♦  Need to show convincingly that a particular method works better
  • 5. 5 Lecture  8  ML  in  Practice  (1) Comparing  schemes  II   l  Want  to  show  that  scheme  A  is  beHer  than  scheme  B  in  a   par-cular  domain   ♦  For  a  given  amount  of  training  data   ♦  On  average,  across  all  possible  training  sets   l  Let's  assume  we  have  an  infinite  amount  of  data  from  the   domain:   ♦  Sample  infinitely  many  dataset  of  specified  size   ♦  Obtain  cross-­‐valida-on  es-mate  on  each  dataset  for  each   scheme   ♦  Check  if  mean  accuracy  for  scheme  A  is  beHer  than  mean   accuracy  for  scheme  B  
  • 6. 6 Lecture  8  ML  in  Practice  (1) Paired  t-­‐test   l  In practice we have limited data and a limited number of estimates for computing the mean l  Student’s t-test tells whether the means of two samples are significantly different l  In our case the samples are cross-validation estimates for different datasets from the domain l  Use a paired t-test because the individual samples are paired ♦  The same CV is applied twice William Gosset Born: 1876 in Canterbury; Died: 1937 in Beaconsfield, England Obtained a post as a chemist in the Guinness brewery in Dublin in 1899. Invented the t-test to handle small samples for quality control in brewing. Wrote under the name "Student".
  • 7. 7 Lecture  8  ML  in  Practice  (1) Distribu-on  of  the  means   l  x1 x2 … xk l  y1 y2 … yk l  mx and my are the means l  With enough samples, the mean of a set of independent samples is normally distributed l  Estimated variances of the means are σx 2/k and σy 2/k l  If µx and µy are the true means then à à à are approximately normally distributed with mean 0, variance 1
  • 8. 8 Lecture  8  ML  in  Practice  (1) Student’s  distribu-on   l  With small samples (k < 100) the mean follows Student’s distribution with k–1 degrees of freedom l  Confidence limits: 0.8820% 1.3810% 1.835% 2.82 3.25 4.30 z 1% 0.5% 0.1% Pr[X ≥ z] 0.8420% 1.2810% 1.655% 2.33 2.58 3.09 z 1% 0.5% 0.1% Pr[X ≥ z] 9 degrees of freedom normal distribution Assuming we have 10 estimates
  • 9. 9 Lecture  8  ML  in  Practice  (1) Distribu-on  of  the  differences   l  Let md = mx – my l  The difference of the means (md) also has a Student’s distribution with k–1 degrees of freedom l  The standardized version of md is called the t- statistic: …. l  We use t to perform the t-test l  σd 2 = the variance of the difference samples
  • 10. 10 Lecture  8  ML  in  Practice  (1) Performing  the  test   •  Fix a significance level •  If a difference is significant at the α% level, there is a (100-α)% chance that the true means differ •  Divide the significance level by two because the test is two-tailed •  i.e. the true difference can be +ve or – ve •  Look up the value for z that corresponds to α/2 •  If t ≤ –z or t ≥z then the difference is significant •  I.e. the null hypothesis (that the difference is zero) can be rejected
  • 11. 11 Lecture  8  ML  in  Practice  (1) Unpaired  observa-ons   l  If the CV estimates are from different datasets, they are no longer paired (or maybe we have k estimates for one scheme, and j estimates for the other one) l  Then we have to use an un paired t-test with min(k , j) – 1 degrees of freedom l  The estimate of the variance of the difference of the means becomes….:
  • 12. 12 Lecture  8  ML  in  Practice  (1) Predic-ng  probabili-es   l  Performance measure so far: success rate l  Also called 0-1 loss function: l  Most classifiers produces class probabilities l  Depending on the application, we might want to check the accuracy of the probability estimates l  0-1 loss is not the right thing to use in those cases ∑ i {0  if  prediction  is  correct 1  if  prediction  is  incorrect }
  • 13. 13 Lecture  8  ML  in  Practice  (1) Quadra-c  loss  func-on   l  p1 … pk are probability estimates for an instance l  c is the index of the instance’s actual class l  a1 … ak = 0, except for ac which is 1 l  Quadratic loss is:…… l  Want to minimize…..
  • 14. 14 Lecture  8  ML  in  Practice  (1) Informa-onal  loss  func-on   l  The informational loss function is –log(pc), where c is the index of the instance’s actual class l  Let p1 * … pk * be the true class probabilities l  Then the expected value for the loss function is:
  • 15. 15 Lecture  8  ML  in  Practice  (1) Discussion   l  Which loss function to choose? ♦  Quadratic loss function takes into account all class probability estimates for an instance ♦  Informational loss focuses only on the probability estimate for the actual class 1义∑ j p j 2
  • 16. 16 Lecture  8  ML  in  Practice  (1) The  kappa  sta-s-c   l  Two  confusion  matrices  for  a  3-­‐class  problem:   actual  predic-ons  (le])  vs.  random  predic-ons  (right)                 l  Number  of  successes:  sum  of  entries  in  diagonal  (D)     l  Kappa  sta-s-c:     measures  rela-ve  improvement  over  random  predic-ons     D obs e rve d− D rand om D pe rfe ct− D rand om
  • 17. K  sta-s-c:  Calcula-ons   •  Propor-ons  of  the  class  ”a”  =  0.5  (ie  100  instances  out  of  200  à  50%  à  50/100  à  0.5)   •  Propor-ons  of  the  class  ”b”  =  0.3  (ie  60  instances  out  of  200  à  30%  à  30/100  à  0.3)   •  Propor-ons  of  the  class  ”c”  =  0.2  (ie  40  instances  out  of  200  à  20%  à  20/100  à  0.2)   Both  classifiers  (see  below)  returns  120  a’s,  60  b’s  and  20  c’s,  but  one  classifier  is  random.  How   much  the  actual  classifier  improves  on  the  random  classifier?   A  classifier  randomly  guessing  would  return  the  predic-ons  in  the  table  on  the  RHS:   0.5*120=60;  0.3*60=18;  0.2*20=4  à  60+18+4  =  82   The  actual  classifier  returns  the  predic-ons  in  the  table  on  the  LHS,  140  correct  predic-ons  (see   diagonal),  ie  70%  success  rate.  However:  k  sta$s$c  =  140-­‐82/200-­‐82  =  58/118=0.49=49%   •  So  the  actual  success  rate  of  70%  repesents  an  improvement  of  49%  on  random  guessing!   Lecture  8  ML  in  Practice  (1) 17 D obs e rve d− D rand om D pe rfe ct− D rand om actual predictions (left) vs. random predictions (right)
  • 18. In  summary   •  A  k  sta-s-c  of  100%  (or  1)  implies  a  perfect  classifier.     •  A  k  sta-s-c  of  0  implies  that  the  classifier  provides  no   informa-on  and  behaves  as  if  it  were  guessing   randomly.     •  The  Kappa  sta-s-c  is  used  to  measure  the  agreement   between  predicted  and  observed  categoriza-ons  of  a   dataset,  and  corrects  the  agreement  that  occurs  by   chance.     •  Weka  provides  the  k  sta-s-c  value  to  assess  the   success  rate  beyond  the  chance.   Lecture  8  ML  in  Practice  (1) 18
  • 19. Quiz  1:  k  sta-s-c   Our  classifier  predicts  Red  41  -mes,  Green  29  -mes  and  Blue  30  -mes.  The  actual   numbers    for  the  sample  are:  40  Red,  30  Green  and  30  Blue.         Overall,  our  classifier  is  right  70%  of  the  -me.       Suppose  these  predic$ons  had  been  random  guesses.  Our  classifier  have  been   randomly  right:  0.4  x  41  +  0.3  x  29  +  0.3  x  30  =  34.1  (random  guess)     So  the  actual  success  rate  of  70%  represents  an  improvement  of  35.9%  on  random   guessing.     What  is  the  k  sta-s-c  for  our  classifier?     1.  0.54   2.  0.60   3.  0.70       Lecture  8  ML  in  Practice  (1) 19
  • 20. 20 Lecture  8  ML  in  Practice  (1) Coun-ng  the  cost   l  In practice, different types of classification errors often incur different costs l  Examples: ♦  Promotional mailing ♦  Terrorist profiling l “Not a terrorist” correct 99.99% of the time, but if you miss 0.01% the cost will be very high ♦  Loan decisions ♦  etc. l  There are many other types of cost! l  E.g.: cost of collecting training data
  • 21. 21 Lecture  8  ML  in  Practice  (1) Coun-ng  the  cost   l  The confusion matrix: Actual class True negativeFalse positiveNo False negativeTrue positiveYes NoYes Predicted class
  • 22. 22 Lecture  8  ML  in  Practice  (1) Classifica-on  with  costs   l  Two  cost  matrices:             l  Success  rate  is  replaced  by  average  cost  per   predic-on   ♦  Cost  is  given  by  appropriate  entry  in  the  cost   matrix      
  • 23. 23 Lecture  8  ML  in  Practice  (1) Cost-­‐sensi-ve  classifica-on   l  Can  take  costs  into  account  when  making  predic-ons   ♦  Basic  idea:  only  predict  high-­‐cost  class  when  very  confident   about  predic-on   l  Given:  predicted  class  probabili-es   ♦  Normally  we  just  predict  the  most  likely  class   ♦  Here,  we  should  make  the  predic-on  that  minimizes  the   expected  cost   l  Expected  cost:  dot  product  of  vector  of  class  probabili-es  and   appropriate  column  in  cost  matrix   l  Choose  column  (class)  that  minimizes  expected  cost      
  • 24. 24 Lecture  8  ML  in  Practice  (1) Cost-­‐sensi-ve  learning   l  So far we haven't taken costs into account at training time l  Most learning schemes do not perform cost- sensitive learning l  They generate the same classifier no matter what costs are assigned to the different classes l  Example: standard decision tree learner l  Simple methods for cost-sensitive learning: l  Resampling of instances according to costs l  Weighting of instances according to costs l  Some schemes can take costs into account by varying a parameter, e.g. naïve Bayes
  • 25. 25 Lecture  8  ML  in  Practice  (1) Li]  charts   l  In practice, costs are rarely known l  Decisions are usually made by comparing possible scenarios l  Example: promotional mailout to 1,000,000 households •  Mail to all; 0.1% respond (1000) •  Data mining tool identifies subset of 100,000 most promising, 0.4% of these respond (400) 40% of responses for 10% of cost may pay off •  Identify subset of 400,000 most promising, 0.2% respond (800) l  A lift chart allows a visual comparison
  • 26. Data  for  a  li]  chart   Lecture  8  ML  in  Practice  (1) 26
  • 27. 27 Lecture  8  ML  in  Practice  (1) Genera-ng  a  li]  chart   l  Sort instances according to predicted probability of being positive: l  x axis is sample size y axis is number of true positives ……… Yes0.884 No0.933 Yes0.932 Yes0.951 Actual classPredicted probability
  • 28. 28 Lecture  8  ML  in  Practice  (1) A  hypothe-cal  li]  chart   40% of responses for 10% of cost 80% of responses for 40% of cost
  • 29. 29 Lecture  8  ML  in  Practice  (1) ROC  curves   l  ROC curves are similar to lift charts ♦  Stands for “receiver operating characteristic” ♦  Used in signal detection to show tradeoff between hit rate and false alarm rate over noisy channel l  Differences to lift chart: ♦  y axis shows percentage of true positives in sample rather than absolute number ♦  x axis shows percentage of false positives in sample rather than sample size
  • 30. 30 Lecture  8  ML  in  Practice  (1) A  sample  ROC  curve   l  Jagged curve—one set of test data l  Smooth curve—use cross-validation
  • 31. 31 Lecture  8  ML  in  Practice  (1) Cross-­‐valida-on  and  ROC  curves   l  Simple method of getting a ROC curve using cross-validation: ♦  Collect probabilities for instances in test folds ♦  Sort instances according to probabilities l  This method is implemented in WEKA l  However, this is just one possibility ♦  Another possibility is to generate an ROC curve for each fold and average them
  • 32. 32 Lecture  8  ML  in  Practice  (1) ROC  curves  for  two  schemes   l  For a small, focused sample, use method A l  For a larger one, use method B l  In between, choose between A and B with appropriate probabilities
  • 33. 33 Lecture  8  ML  in  Practice  (1) Recall-­‐Precision  Curves   l  Percentage of retrieved documents that are relevant: precision=TP/(TP+FP) l  Percentage of relevant documents that are returned: recall =TP/(TP+FN) l  Precision/recall curves have hyperbolic shape l  Summary measures: average precision at 20%, 50% and 80% recall (three-point average recall) l  F-measure=(2 × recall × precision)/(recall+precision) l  sensitivity × specificity = (TP / (TP + FN)) × (TN / (FP + TN)) l  Area under the ROC curve (AUC): probability that randomly chosen positive instance is ranked above randomly chosen negative one
  • 34. 34 Lecture  8  ML  in  Practice  (1) Model  selec-on  criteria   l  Model selection criteria attempt to find a good compromise between: l  The complexity of a model l  Its prediction accuracy on the training data l  Reasoning: a good model is a simple model that achieves high accuracy on the given data l  Also known as Occam’s Razor : the best theory is the smallest one that describes all the facts William of Ockham, born in the village of Ockham in Surrey (England) about 1285, was the most influential philosopher of the 14th century and a controversial theologian.
  • 35. 35 Lecture  8  ML  in  Practice  (1) Elegance  vs.  errors   l  Model 1: very simple, elegant model that accounts for the data almost perfectly l  Model 2: significantly more complex model that reproduces the data without mistakes l  Model 1 is probably preferable.
  • 36. The  End   Lecture  8  ML  in  Practice  (1) 36