SlideShare une entreprise Scribd logo
1  sur  30
Digg Data
Support Vector Machine
Ankit Sharma
www.diggdata.in
without tears
Digg Data
Content
SVM and its application
Basic SVM
•Hyperplane
•Understanding of basics
•Optimization
Soft margin SVM
Non-linear decision boundary
SVMs in “loss + penalty” form
Kernel method
•Gaussian kernel
SVM usage beyond classification
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 2
Digg Data
• In machine learning, support vector machines are supervised
learning models with associated learning algorithms that analyze
data and recognize patterns, used for classification and regression
analysis.
• Properties of SVM :
Duality
Kernels
Margin
Convexity
Sparseness
SVM : Support Vector Machine
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 3
Digg Data
Time Series
analysis
Classification
Anomaly
detection
Regression
Machine
Vision
Text
categorization
Application of SVM
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 4
Digg Data
Basic concept of SVM
Find a linear decision surface (“hyperplane”) that can separate classes and has the largest
distance (i.e., largest “gap” or “margin”) between border-line patients (i.e., “support vectors”)
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 5
Digg Data
Hyperplane as a Decision boundary
• A hyperplane is a linear decision surface that splits the space into two parts;
• It is obvious that a hyperplane is a binary classifier
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 6
Digg Data
Equation of a hyperplane
An equation of a hyperplane is defined by a
point (P0) and a perpendicular vector to the
plane (𝑤) at that point.
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 7
Digg Data
• g(x) is a linear function:
x1
x2
w x + b < 0
w x + b > 0
 A hyper-plane in the feature
space
 (Unit-length) normal vector of
the hyper-plane:

w
n
w
n
Understanding the basics
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 8
Digg Data
x1
x2How to classify these points using
a linear discriminant function in
order to minimize the error rate?
 Infinite number of answers!
 Which one is the best?
Understanding the basics
denotes +1
denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 9
Digg Data
• The linear discriminant
function (classifier) with the
maximum margin is the best
“safe zone”
 Margin is defined as the width
that the boundary could be
increased by before hitting a
data point
 Why it is the best?
Robust to outliners and thus
strong generalization ability
Margin
x1
x2
Understanding the basics
denotes +1
denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 10
Digg Data
• Given a set of data points:
 With a scale transformation on
both w and b, the above is
equivalent to
x1
x2
{( , )}, 1,2, ,i iy i nx , where
𝑭𝒐𝒓 𝒚𝒊 = +𝟏, 𝑾 𝑿𝒊 + 𝒃> 0
𝑭𝒐𝒓 𝒚𝒊 = −𝟏, 𝑾 𝑿𝒊 + 𝒃 < 𝟎
𝑭𝒐𝒓 𝒚𝒊 = +𝟏, 𝑾 𝑿𝒊 + 𝒃> +1
𝑭𝒐𝒓 𝒚𝒊 = −𝟏, 𝑾 𝑿𝒊 + 𝒃 < -1
Understanding the basics
denotes +1
denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 11
Digg Data
• We know that
 The margin width is:
x1
x2
Margin
x+
x+
x-
( )
2
( )
M  
 
  
   
x x n
w
x x
w w
n
Support Vectors
𝑾 𝑿+ + 𝒃 = +𝟏
𝑾 𝑿− + 𝒃 = −𝟏
Understanding the basics
denotes +1
denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 12
Digg Data
• Formulation:
x1
x2
Margin
x+
x+
x-
n
such that
2
maximize
w
𝑭𝒐𝒓 𝒚𝒊 = +𝟏, 𝑾 𝑿𝒊 + 𝒃> +1
𝑭𝒐𝒓 𝒚𝒊 = −𝟏, 𝑾 𝑿𝒊 + 𝒃 < -1
Understanding the basics
denotes +1
denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 13
Digg Data
• Formulation:
x1
x2
Margin
x+
x+
x-
n
21
minimize
2
w
such that
𝑭𝒐𝒓 𝒚𝒊 = +𝟏, 𝑾 𝑿𝒊 + 𝒃> +1
𝑭𝒐𝒓 𝒚𝒊 = −𝟏, 𝑾 𝑿𝒊 + 𝒃 < -1
Understanding the basics
denotes +1
denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 14
Digg Data
• Formulation:
x1
x2
Margin
x+
x+
x-
n
21
minimize
2
w
such that
𝐲𝐢 𝐖 𝐗 + 𝐛 ≥ 𝟏
Understanding the basics
denotes +1
denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 15
Digg Data
Basics of optimization: Convex functions
• A function is called convex if the function lies below the straight line
segment connecting two points, for any two points in the interval.
• Property: Any local minimum is a global minimum!
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 16
Digg Data
Basics of optimization: Quadratic programming
• Quadratic programming (QP) is a special optimization problem: the function to
optimize (“objective”) is quadratic, subject to linear constraints.
• Convex QP problems have convex objective functions.
• These problems can be solved easily and efficiently by greedy algorithms (because
every local minimum is a global minimum).
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 17
Digg Data
SVM optimization problem: Primal formulation
• This is called “primal formulation of linear SVMs”
• It is a convex quadratic programming (QP) optimization problem with n
variables (wi, i= 1,…,n), where n is the number of features in the dataset.
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 18
Digg Data
SVM optimization problem: Dual formulation
• The previous problem can be recast in the so-called “dual form” giving rise to
“dual formulation of linear SVMs”.
• Apply the method of Lagrange multipliers.
• We need to minimize this Lagrangian with respect to and simultaneously require
that the derivative with respect to vanishes , all subject to the constraints that
αi > 0
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 19
Digg Data
SVM optimization problem: Dual formulation
Cond…
It is also a convex quadratic programming problem but with N variables (αi, i= 1,…,N), where N is
the number of samples.
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 20
Digg Data
SVM optimization problem: Benefits of using
dual formulation
1) No need to access original data, need to access only dot products.
2) Number of free parameters is bounded by the number of support vectors
and not by the number of variables (beneficial for high-dimensional
problems).
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 21
Digg Data
Non linearly separable data: “Soft-margin” linear SVM
Assign a “slack variable” to each instance ,
ξi > 0 which can be thought of distance from the
separating hyperplane if an instance is misclassified
and 0 otherwise.
Primal formulation:
Dual formulation:
• When C is very large, the soft-margin SVM is equivalent
to hard-margin SVM;
• When C is very small, we admit misclassifications in the
training data at the expense of having w-vector with
small norm;
• C has to be selected for the distribution at hand as it will
be discussed later in this tutorial.
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 22
Digg Data
SVMs in “loss + penalty” form
• Many statistical learning algorithms (including SVMs) search for a decision function by solving the
following optimization problem:
Minimize (Loss+ λ Penalty)
– Loss measures error of fitting the data
– Penalty penalizes complexity of the learned function
– λ is regularization parameter that balances Loss and Penalty
• Overfitting → Poor generalization
Can also be stated as
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 23
Digg Data
Nonlinear decision boundary
Non Linear
Decision
Boundary
Kernel
method
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 24
Digg Data
Kernel method
• Kernel methods involve
– Nonlinear transformation of data to a higher dimensional feature space induced by a Mercer kernel
– Detection of optimal linear solutions in the kernel feature space
• Transformation to a higher dimensional space is expected to be helpful in conversion of nonlinear relations
into linear relations (Cover’s theorem)
– Nonlinearly separable patterns to linearly separable patterns
– Nonlinear regression to linear regression
– Nonlinear separation of clusters to linear separation of clusters
• Pattern analysis methods are implemented in such a way that the kernel feature space representation is
not explicitly required. They involve computation of pair-wise inner-products only.
• The pair-wise inner-products are computed efficiently directly from the original representation of data
using a kernel function (Kernel trick)
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 25
Digg Data
Kernel trick
Not every function RN×RN -> R can be a valid kernel; it has to satisfy so-called Mercer conditions.
Otherwise, the underlying quadratic program may not be solvable.
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 26
Digg Data
Popular kernels
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 27
Digg Data
Gaussian kernel
Consider the Gaussian kernel:
Geometrically, this is a “bump” or “cavity”
centered at the training data point 𝑥j :
The resulting mapping function is a
combination of bumps and cavities.
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 28
Digg Data
SVM usage beyond classification
Regression analysis
(ε-Support vector
regression)
Anomaly detection
(One-class SVM)
Clustering analysis
(Support Vector
Domain
Description)
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 29
Digg Data
Thank you
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 30

Contenu connexe

Tendances

Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learningSANTHOSH RAJA M G
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersFunctional Imperative
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent methodSanghyuk Chun
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptxmaha797959
 
Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning pyingkodi maran
 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic RegressionKnoldus Inc.
 
lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methodsrajshreemuthiah
 
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms Hakky St
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningShahar Cohen
 
Machine Learning using Support Vector Machine
Machine Learning using Support Vector MachineMachine Learning using Support Vector Machine
Machine Learning using Support Vector MachineMohsin Ul Haq
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and RegressionMegha Sharma
 
Dimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLabDimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLabCloudxLab
 
Support vector machine
Support vector machineSupport vector machine
Support vector machineRishabh Gupta
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter TuningJon Lederman
 
Supervised learning
Supervised learningSupervised learning
Supervised learningAlia Hamwi
 
Autoencoders in Deep Learning
Autoencoders in Deep LearningAutoencoders in Deep Learning
Autoencoders in Deep Learningmilad abbasi
 

Tendances (20)

Bayes Belief Networks
Bayes Belief NetworksBayes Belief Networks
Bayes Belief Networks
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learning
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptx
 
Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning
 
Cnn method
Cnn methodCnn method
Cnn method
 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic Regression
 
lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methods
 
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Machine Learning using Support Vector Machine
Machine Learning using Support Vector MachineMachine Learning using Support Vector Machine
Machine Learning using Support Vector Machine
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
 
Dimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLabDimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLab
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Support Vector machine
Support Vector machineSupport Vector machine
Support Vector machine
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
 
Autoencoders in Deep Learning
Autoencoders in Deep LearningAutoencoders in Deep Learning
Autoencoders in Deep Learning
 

Similaire à Support Vector Machine without tears

background.pptx
background.pptxbackground.pptx
background.pptxKabileshCm
 
Structured Forests for Fast Edge Detection [Paper Presentation]
Structured Forests for Fast Edge Detection [Paper Presentation]Structured Forests for Fast Edge Detection [Paper Presentation]
Structured Forests for Fast Edge Detection [Paper Presentation]Mohammad Shaker
 
Support Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using WekaSupport Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using WekaMacha Pujitha
 
OM-DS-Fall2022-Session10-Support vector machine.pdf
OM-DS-Fall2022-Session10-Support vector machine.pdfOM-DS-Fall2022-Session10-Support vector machine.pdf
OM-DS-Fall2022-Session10-Support vector machine.pdfssuserb016ab
 
How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?Tuan Yang
 
Support Vector Machines USING MACHINE LEARNING HOW IT WORKS
Support Vector Machines USING MACHINE LEARNING HOW IT WORKSSupport Vector Machines USING MACHINE LEARNING HOW IT WORKS
Support Vector Machines USING MACHINE LEARNING HOW IT WORKSrajalakshmi5921
 
Machine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University ChhattisgarhMachine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University ChhattisgarhPoorabpatel
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for BeginnersSanghamitra Deb
 
Machine learning, biomarker accuracy and best practices
Machine learning, biomarker accuracy and best practicesMachine learning, biomarker accuracy and best practices
Machine learning, biomarker accuracy and best practicesPradeep Redddy Raamana
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersAlbert Y. C. Chen
 
Regression vs Deep Neural net vs SVM
Regression vs Deep Neural net vs SVMRegression vs Deep Neural net vs SVM
Regression vs Deep Neural net vs SVMRatul Alahy
 
Throttling Malware Families in 2D
Throttling Malware Families in 2DThrottling Malware Families in 2D
Throttling Malware Families in 2DMohamed Nassar
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017Shuai Zhang
 

Similaire à Support Vector Machine without tears (20)

background.pptx
background.pptxbackground.pptx
background.pptx
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
 
Structured Forests for Fast Edge Detection [Paper Presentation]
Structured Forests for Fast Edge Detection [Paper Presentation]Structured Forests for Fast Edge Detection [Paper Presentation]
Structured Forests for Fast Edge Detection [Paper Presentation]
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
 
Svm ms
Svm msSvm ms
Svm ms
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
 
Support Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using WekaSupport Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using Weka
 
OM-DS-Fall2022-Session10-Support vector machine.pdf
OM-DS-Fall2022-Session10-Support vector machine.pdfOM-DS-Fall2022-Session10-Support vector machine.pdf
OM-DS-Fall2022-Session10-Support vector machine.pdf
 
How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?
 
Deeplearning
Deeplearning Deeplearning
Deeplearning
 
Support Vector Machines USING MACHINE LEARNING HOW IT WORKS
Support Vector Machines USING MACHINE LEARNING HOW IT WORKSSupport Vector Machines USING MACHINE LEARNING HOW IT WORKS
Support Vector Machines USING MACHINE LEARNING HOW IT WORKS
 
Machine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University ChhattisgarhMachine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University Chhattisgarh
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 
Machine learning, biomarker accuracy and best practices
Machine learning, biomarker accuracy and best practicesMachine learning, biomarker accuracy and best practices
Machine learning, biomarker accuracy and best practices
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
 
Regression vs Deep Neural net vs SVM
Regression vs Deep Neural net vs SVMRegression vs Deep Neural net vs SVM
Regression vs Deep Neural net vs SVM
 
Throttling Malware Families in 2D
Throttling Malware Families in 2DThrottling Malware Families in 2D
Throttling Malware Families in 2D
 
Seminar nov2017
Seminar nov2017Seminar nov2017
Seminar nov2017
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017
 

Dernier

GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excelysmaelreyes
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一F sss
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 

Dernier (20)

GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excel
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 

Support Vector Machine without tears

  • 1. Digg Data Support Vector Machine Ankit Sharma www.diggdata.in without tears
  • 2. Digg Data Content SVM and its application Basic SVM •Hyperplane •Understanding of basics •Optimization Soft margin SVM Non-linear decision boundary SVMs in “loss + penalty” form Kernel method •Gaussian kernel SVM usage beyond classification Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 2
  • 3. Digg Data • In machine learning, support vector machines are supervised learning models with associated learning algorithms that analyze data and recognize patterns, used for classification and regression analysis. • Properties of SVM : Duality Kernels Margin Convexity Sparseness SVM : Support Vector Machine Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 3
  • 5. Digg Data Basic concept of SVM Find a linear decision surface (“hyperplane”) that can separate classes and has the largest distance (i.e., largest “gap” or “margin”) between border-line patients (i.e., “support vectors”) Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 5
  • 6. Digg Data Hyperplane as a Decision boundary • A hyperplane is a linear decision surface that splits the space into two parts; • It is obvious that a hyperplane is a binary classifier Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 6
  • 7. Digg Data Equation of a hyperplane An equation of a hyperplane is defined by a point (P0) and a perpendicular vector to the plane (𝑤) at that point. Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 7
  • 8. Digg Data • g(x) is a linear function: x1 x2 w x + b < 0 w x + b > 0  A hyper-plane in the feature space  (Unit-length) normal vector of the hyper-plane:  w n w n Understanding the basics Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 8
  • 9. Digg Data x1 x2How to classify these points using a linear discriminant function in order to minimize the error rate?  Infinite number of answers!  Which one is the best? Understanding the basics denotes +1 denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 9
  • 10. Digg Data • The linear discriminant function (classifier) with the maximum margin is the best “safe zone”  Margin is defined as the width that the boundary could be increased by before hitting a data point  Why it is the best? Robust to outliners and thus strong generalization ability Margin x1 x2 Understanding the basics denotes +1 denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 10
  • 11. Digg Data • Given a set of data points:  With a scale transformation on both w and b, the above is equivalent to x1 x2 {( , )}, 1,2, ,i iy i nx , where 𝑭𝒐𝒓 𝒚𝒊 = +𝟏, 𝑾 𝑿𝒊 + 𝒃> 0 𝑭𝒐𝒓 𝒚𝒊 = −𝟏, 𝑾 𝑿𝒊 + 𝒃 < 𝟎 𝑭𝒐𝒓 𝒚𝒊 = +𝟏, 𝑾 𝑿𝒊 + 𝒃> +1 𝑭𝒐𝒓 𝒚𝒊 = −𝟏, 𝑾 𝑿𝒊 + 𝒃 < -1 Understanding the basics denotes +1 denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 11
  • 12. Digg Data • We know that  The margin width is: x1 x2 Margin x+ x+ x- ( ) 2 ( ) M            x x n w x x w w n Support Vectors 𝑾 𝑿+ + 𝒃 = +𝟏 𝑾 𝑿− + 𝒃 = −𝟏 Understanding the basics denotes +1 denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 12
  • 13. Digg Data • Formulation: x1 x2 Margin x+ x+ x- n such that 2 maximize w 𝑭𝒐𝒓 𝒚𝒊 = +𝟏, 𝑾 𝑿𝒊 + 𝒃> +1 𝑭𝒐𝒓 𝒚𝒊 = −𝟏, 𝑾 𝑿𝒊 + 𝒃 < -1 Understanding the basics denotes +1 denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 13
  • 14. Digg Data • Formulation: x1 x2 Margin x+ x+ x- n 21 minimize 2 w such that 𝑭𝒐𝒓 𝒚𝒊 = +𝟏, 𝑾 𝑿𝒊 + 𝒃> +1 𝑭𝒐𝒓 𝒚𝒊 = −𝟏, 𝑾 𝑿𝒊 + 𝒃 < -1 Understanding the basics denotes +1 denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 14
  • 15. Digg Data • Formulation: x1 x2 Margin x+ x+ x- n 21 minimize 2 w such that 𝐲𝐢 𝐖 𝐗 + 𝐛 ≥ 𝟏 Understanding the basics denotes +1 denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 15
  • 16. Digg Data Basics of optimization: Convex functions • A function is called convex if the function lies below the straight line segment connecting two points, for any two points in the interval. • Property: Any local minimum is a global minimum! Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 16
  • 17. Digg Data Basics of optimization: Quadratic programming • Quadratic programming (QP) is a special optimization problem: the function to optimize (“objective”) is quadratic, subject to linear constraints. • Convex QP problems have convex objective functions. • These problems can be solved easily and efficiently by greedy algorithms (because every local minimum is a global minimum). Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 17
  • 18. Digg Data SVM optimization problem: Primal formulation • This is called “primal formulation of linear SVMs” • It is a convex quadratic programming (QP) optimization problem with n variables (wi, i= 1,…,n), where n is the number of features in the dataset. Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 18
  • 19. Digg Data SVM optimization problem: Dual formulation • The previous problem can be recast in the so-called “dual form” giving rise to “dual formulation of linear SVMs”. • Apply the method of Lagrange multipliers. • We need to minimize this Lagrangian with respect to and simultaneously require that the derivative with respect to vanishes , all subject to the constraints that αi > 0 Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 19
  • 20. Digg Data SVM optimization problem: Dual formulation Cond… It is also a convex quadratic programming problem but with N variables (αi, i= 1,…,N), where N is the number of samples. Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 20
  • 21. Digg Data SVM optimization problem: Benefits of using dual formulation 1) No need to access original data, need to access only dot products. 2) Number of free parameters is bounded by the number of support vectors and not by the number of variables (beneficial for high-dimensional problems). Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 21
  • 22. Digg Data Non linearly separable data: “Soft-margin” linear SVM Assign a “slack variable” to each instance , ξi > 0 which can be thought of distance from the separating hyperplane if an instance is misclassified and 0 otherwise. Primal formulation: Dual formulation: • When C is very large, the soft-margin SVM is equivalent to hard-margin SVM; • When C is very small, we admit misclassifications in the training data at the expense of having w-vector with small norm; • C has to be selected for the distribution at hand as it will be discussed later in this tutorial. Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 22
  • 23. Digg Data SVMs in “loss + penalty” form • Many statistical learning algorithms (including SVMs) search for a decision function by solving the following optimization problem: Minimize (Loss+ λ Penalty) – Loss measures error of fitting the data – Penalty penalizes complexity of the learned function – λ is regularization parameter that balances Loss and Penalty • Overfitting → Poor generalization Can also be stated as Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 23
  • 24. Digg Data Nonlinear decision boundary Non Linear Decision Boundary Kernel method Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 24
  • 25. Digg Data Kernel method • Kernel methods involve – Nonlinear transformation of data to a higher dimensional feature space induced by a Mercer kernel – Detection of optimal linear solutions in the kernel feature space • Transformation to a higher dimensional space is expected to be helpful in conversion of nonlinear relations into linear relations (Cover’s theorem) – Nonlinearly separable patterns to linearly separable patterns – Nonlinear regression to linear regression – Nonlinear separation of clusters to linear separation of clusters • Pattern analysis methods are implemented in such a way that the kernel feature space representation is not explicitly required. They involve computation of pair-wise inner-products only. • The pair-wise inner-products are computed efficiently directly from the original representation of data using a kernel function (Kernel trick) Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 25
  • 26. Digg Data Kernel trick Not every function RN×RN -> R can be a valid kernel; it has to satisfy so-called Mercer conditions. Otherwise, the underlying quadratic program may not be solvable. Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 26
  • 27. Digg Data Popular kernels Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 27
  • 28. Digg Data Gaussian kernel Consider the Gaussian kernel: Geometrically, this is a “bump” or “cavity” centered at the training data point 𝑥j : The resulting mapping function is a combination of bumps and cavities. Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 28
  • 29. Digg Data SVM usage beyond classification Regression analysis (ε-Support vector regression) Anomaly detection (One-class SVM) Clustering analysis (Support Vector Domain Description) Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 29
  • 30. Digg Data Thank you Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 30