SlideShare une entreprise Scribd logo
1  sur  27
Machine Learning for Breast
Cancer Diagnosis
A Proof of Concept
P. K. SHARMA
Email: from_pramod @yahoo.com
Introduction
 Machine learning is branch of Data Science which incorporates a large set of statistical techniques.
 These techniques enable data scientists to create a model which can learn from past data and detect
patterns from massive, noisy and complex data sets.
 Researchers use machine learning for cancer prediction and prognosis.
 Machine learning allows inferences or decisions that otherwise cannot be made using conventional
statistical methodologies.
 With a robustly validated machine learning model, chances of right diagnosis improve.
 It specially helps in interpretation of results for borderline cases.
Breast Cancer: An overview
 The most common cancer in women worldwide.
 The principle cause of death from cancer among women globally.
 Early detection is the most effective way to reduce breast cancer deaths.
 Early diagnosis requires an accurate and reliable procedure to distinguish between benign breast tumors
from malignant ones
 Breast Cancer Types - three types of breast tumors: Benign breast tumors, In-situ cancers, and Invasive
cancers.
 The majority of breast tumors detected by mammography are benign.
 They are non-cancerous growths and cannot spread outside of the breast to other organs.
 In some cases, it is difficult to distinguish certain benign masses from malignant lesions with mammography.
 If the malignant cells have not gone through the basal membrane but is completely contained in the lobule or the
ducts, the cancer is called in-situ or noninvasive.
 If the cancer has broken through the basal membrane and spread into the surrounding tissue, it is called invasive.
 This analysis assists in differentiating between benign and malignant tumors.
Data Source
https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
 The data used for this POC is from University of
Wisconsin.
 Citation: This breast cancer databases was obtained from
the University of Wisconsin Hospitals, Madison from Dr.
William H. Wolberg.
 Reference :
o O. L. Mangasarian and W. H. Wolberg: "Cancer diagnosis via
linear programming", SIAM News, Volume 23, Number 5,
September 1990, pp 1 & 18.
o William H. Wolberg and O.L. Mangasarian: "Multisurface
method of pattern separation for medical diagnosis applied to
breast cytology", Proceedings of the National Academy of
Sciences, U.S.A., Volume 87, December 1990, pp 9193-9196.
o O. L. Mangasarian, R. Setiono, and W.H. Wolberg: "Pattern
recognition via linear programming: Theory and application to
medical diagnosis", in: "Large-scale numerical optimization",
Thomas F. Coleman and Yuying Li, editors, SIAM Publications,
Philadelphia 1990, pp 22-30.
o K. P. Bennett & O. L. Mangasarian: "Robust linear programming
discrimination of two linearly inseparable sets", Optimization
Methods and Software 1, 1992, 23-34 (Gordon & Breach Science
Publishers).
Data Files
Data File Name Description File Name # of records
# of
attributes
breast-cancer-wisconsin.data breast-cancer-wisconsin.names 699 11
unformatted-data
Data file with comments based on
breast-cancer-wisconsin.data
699 11
wdbc.data wdbc.names 569 32
wpbc.data wpbc.names 198 34
In this case study, lets analyze breast-cancer-wisconsin.data and wdbc.data.
Data Sets
The data is in CSV format without any column headers. Columns are interpreted from the associated “names”
files.
Flow of Data
Biopsy
Procedure
Measurements Reports Evaluation Diagnosis
Analysis of
measurements
Preparation of
ML Models
Predictions and
validation
Analysis
Classifier Params.
 min_samples_leaf
 n_estimators
 min_samples_split
 max_features
Data Preparation
 Address missing data
 Training - Testing –
Validation data
Lab Setup
Components
Libraries
EnvironmentPython
scikit-learn
RandomForestClassifier
Linux
PandasSciPy
NumPy IPython Matplotlib seaborn
StratifiedKFold
train_test_split
GridSearchCV
learning_curve
pyplot
interp
Input Files
 wdbc.data
 breast-cancer-
wisconsin.data
Outputs
 Trained Classifier
 Predictions
Data Visualization
Data Description : wdbc.data
1. ID number
2. Diagnosis (M = malignant, B = benign)
3-32. Ten real-valued features are computed for
each cell nucleus:
a) radius (mean of distances from center to
points on the perimeter)
b) texture (standard deviation of gray-scale
values)
c) perimeter
d) area
e) smoothness (local variation in radius lengths)
f) compactness (perimeter^2 / area - 1.0)
g) concavity (severity of concave portions of the
contour)
h) concave points (number of concave portions
of the contour)
i) symmetry
j) fractal dimension ("coastline approximation" -
1)
 Features are computed from a digitized image of a fine needle aspirate
(FNA) of a breast mass.
 They describe characteristics of the cell nuclei present in the image.
 The mean, standard error, and "worst" or largest (mean of the three largest
values) of these features were computed for each image, resulting in 30
features.
 For instance, field 3 is Mean Radius, field 13 is Radius SE, field 23 is Worst Radius.
All feature values are recoded with four significant digits.
wdbc.data
 Mean Radius, Mean Perimeter and Mean appear to be helpful
in classification.
 Higher the values of each parameter more are the chances of it
being malignant.
wdbc.data
 Mean Concavity, Mean Concave Points, and Mean
Compactness appear to be helpful in classification.
 Higher the values of each parameter more are the
chances of it being malignant.
wdbc.data
 Mean Smoothness,
Mean Texture,
Mean Fractal
Dimension, Mean
Symmetry and
Mean
Compactness do
not appears to
have influence on
classification.
 Both type of cases
are spread across.
Data Description : breast-cancer-wisconsin.data
 Missing attribute values: 16
 There are 16 instances in Groups 1 to 6 that
contain a single missing (i.e., unavailable)
attribute value, now denoted by "?".
# Attribute Domain
1. Sample code number id number
2. Clump Thickness 1 - 10
3. Uniformity of Cell Size 1 - 10
4. Uniformity of Cell Shape 1 - 10
5. Marginal Adhesion 1 - 10
6. Single Epithelial Cell Size 1 - 10
7. Bare Nuclei 1 - 10
8. Bland Chromatin 1 - 10
9. Normal Nucleoli 1 - 10
10. Mitoses 1 - 10
11. Class
(2 for benign, 4
for malignant)
breast-cancer-wisconsin.data
 The features distinguish between
benign and Malignant fairly well.
breast-cancer-wisconsin.data
 The feature seems to distinguish between
benign and Malignant fairly well.
breast-cancer-wisconsin.data
 The feature seems to distinguish
between benign and Malignant fairly
well.
Results
WDBC.DATA
Analysis: wdbc.data
 Training data is divided in 5 folds.
 Test data has 114 records
 Accuracy Score: 0.9561
Confusion Matrix: Predicted Benign Predicted Malignant
True Benign 69 2
True Malignant 3 40
Classification
Report:
Precision Recall f1-score Support
0 0.96 0.97 0.97 71
1 0.95 0.93 0.94 43
avg / total 0.96 0.96 0.96 114
Three cases, although
malignant, are predicted
as benign
• High accuracy.
• Supports the diagnosis.
Model performs equally
well on both test and
training sets
Two dimensional plot shows
excellent separation of
Benign and Malignant cases
Plotting three cases…
Factors influencing predictions.
Plotting three cases:
Factors having no influence on predictions…
Plotting two features at a time
 Also analyzed cases if only two of the
features were available.
 Classifier was trained on two features at
a time and decision boundary is
plotted.
 Model could predict the cases with
reasonable accuracy
Results
BREAST-CANCER-WISCONSIN.DATA
Analysis:
breast-cancer-wisconsin.data
 Training data is divided in 5 folds.
 Test data has 140 records
 Accuracy Score: 0.9643
Confusion Matrix: Predicted Benign Predicted Malignant
True Benign 92 3
True Malignant 2 43
Classification
Report:
Precision Recall f1-score Support
0 0.98 0.97 0.97 95
1 0.93 0.96 0.95 45
avg / total 0.96 0.96 0.96 140
Two cases, although
malignant, are predicted
as benign
Model performs equally
well on both training as
well as test data
• High accuracy.
• Supports the diagnosis.
Two dimensional plot shows excellent
separation of Benign and Malignant cases
Plotting two cases…
Plotting three cases…
Factors influencing predictions.
Plotting two features at a time
 Classifier was trained
on two features at a
time and decision
boundary is plotted.
 As expected, classifier
needs more than just
two parameters to
give accurate
predictions.

Contenu connexe

Tendances

Breast Cancer Detection using Convolution Neural Network
Breast Cancer Detection using Convolution Neural NetworkBreast Cancer Detection using Convolution Neural Network
Breast Cancer Detection using Convolution Neural NetworkIRJET Journal
 
Applying Deep Learning to Transform Breast Cancer Diagnosis
Applying Deep Learning to Transform Breast Cancer DiagnosisApplying Deep Learning to Transform Breast Cancer Diagnosis
Applying Deep Learning to Transform Breast Cancer DiagnosisCognizant
 
IRJET- Breast Cancer Prediction using Supervised Machine Learning Algorithms
IRJET- Breast Cancer Prediction using Supervised Machine Learning AlgorithmsIRJET- Breast Cancer Prediction using Supervised Machine Learning Algorithms
IRJET- Breast Cancer Prediction using Supervised Machine Learning AlgorithmsIRJET Journal
 
IRJET- Breast Cancer Prediction using Support Vector Machine
IRJET-  	  Breast Cancer Prediction using Support Vector MachineIRJET-  	  Breast Cancer Prediction using Support Vector Machine
IRJET- Breast Cancer Prediction using Support Vector MachineIRJET Journal
 
Deep learning for medical imaging
Deep learning for medical imagingDeep learning for medical imaging
Deep learning for medical imaginggeetachauhan
 
Breast cancer Detection using MATLAB
Breast cancer Detection using MATLABBreast cancer Detection using MATLAB
Breast cancer Detection using MATLABNupurRathi7
 
Brain tumor detection using convolutional neural network
Brain tumor detection using convolutional neural network Brain tumor detection using convolutional neural network
Brain tumor detection using convolutional neural network MD Abdullah Al Nasim
 
Lung Cancer Detection using Machine Learning
Lung Cancer Detection using Machine LearningLung Cancer Detection using Machine Learning
Lung Cancer Detection using Machine Learningijtsrd
 
A Novel Approach for Breast Cancer Detection using Data Mining Techniques
A Novel Approach for Breast Cancer Detection using Data Mining TechniquesA Novel Approach for Breast Cancer Detection using Data Mining Techniques
A Novel Approach for Breast Cancer Detection using Data Mining Techniquesahmad abdelhafeez
 
Lung Cancer Detection using transfer learning.pptx.pdf
Lung Cancer Detection using transfer learning.pptx.pdfLung Cancer Detection using transfer learning.pptx.pdf
Lung Cancer Detection using transfer learning.pptx.pdfjagan477830
 
Brain Tumor Detection using CNN
Brain Tumor Detection using CNNBrain Tumor Detection using CNN
Brain Tumor Detection using CNNMohammadRakib8
 
PPT on BRAIN TUMOR detection in MRI images based on IMAGE SEGMENTATION
PPT on BRAIN TUMOR detection in MRI images based on  IMAGE SEGMENTATION PPT on BRAIN TUMOR detection in MRI images based on  IMAGE SEGMENTATION
PPT on BRAIN TUMOR detection in MRI images based on IMAGE SEGMENTATION khanam22
 
Heart Disease Identification Method Using Machine Learnin in E-healthcare.
Heart Disease Identification Method Using Machine Learnin in E-healthcare.Heart Disease Identification Method Using Machine Learnin in E-healthcare.
Heart Disease Identification Method Using Machine Learnin in E-healthcare.SUJIT SHIBAPRASAD MAITY
 
Brain Tumour Detection.pptx
Brain Tumour Detection.pptxBrain Tumour Detection.pptx
Brain Tumour Detection.pptxRevolverRaja2
 
Wisconsin Breast Cancer dataset.pptx
Wisconsin Breast Cancer dataset.pptxWisconsin Breast Cancer dataset.pptx
Wisconsin Breast Cancer dataset.pptxDaheeKim30
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Md. Main Uddin Rony
 
Heart Disease Prediction Using Data Mining Techniques
Heart Disease Prediction Using Data Mining TechniquesHeart Disease Prediction Using Data Mining Techniques
Heart Disease Prediction Using Data Mining TechniquesIJRES Journal
 

Tendances (20)

Breast Cancer Detection using Convolution Neural Network
Breast Cancer Detection using Convolution Neural NetworkBreast Cancer Detection using Convolution Neural Network
Breast Cancer Detection using Convolution Neural Network
 
Applying Deep Learning to Transform Breast Cancer Diagnosis
Applying Deep Learning to Transform Breast Cancer DiagnosisApplying Deep Learning to Transform Breast Cancer Diagnosis
Applying Deep Learning to Transform Breast Cancer Diagnosis
 
Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selectio...
Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selectio...Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selectio...
Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selectio...
 
IRJET- Breast Cancer Prediction using Supervised Machine Learning Algorithms
IRJET- Breast Cancer Prediction using Supervised Machine Learning AlgorithmsIRJET- Breast Cancer Prediction using Supervised Machine Learning Algorithms
IRJET- Breast Cancer Prediction using Supervised Machine Learning Algorithms
 
IRJET- Breast Cancer Prediction using Support Vector Machine
IRJET-  	  Breast Cancer Prediction using Support Vector MachineIRJET-  	  Breast Cancer Prediction using Support Vector Machine
IRJET- Breast Cancer Prediction using Support Vector Machine
 
Deep learning for medical imaging
Deep learning for medical imagingDeep learning for medical imaging
Deep learning for medical imaging
 
Breast cancer Detection using MATLAB
Breast cancer Detection using MATLABBreast cancer Detection using MATLAB
Breast cancer Detection using MATLAB
 
Final ppt
Final pptFinal ppt
Final ppt
 
Brain tumor detection using convolutional neural network
Brain tumor detection using convolutional neural network Brain tumor detection using convolutional neural network
Brain tumor detection using convolutional neural network
 
Lung Cancer Detection using Machine Learning
Lung Cancer Detection using Machine LearningLung Cancer Detection using Machine Learning
Lung Cancer Detection using Machine Learning
 
A Novel Approach for Breast Cancer Detection using Data Mining Techniques
A Novel Approach for Breast Cancer Detection using Data Mining TechniquesA Novel Approach for Breast Cancer Detection using Data Mining Techniques
A Novel Approach for Breast Cancer Detection using Data Mining Techniques
 
Lung Cancer Detection using transfer learning.pptx.pdf
Lung Cancer Detection using transfer learning.pptx.pdfLung Cancer Detection using transfer learning.pptx.pdf
Lung Cancer Detection using transfer learning.pptx.pdf
 
Brain Tumor Detection using CNN
Brain Tumor Detection using CNNBrain Tumor Detection using CNN
Brain Tumor Detection using CNN
 
PPT on BRAIN TUMOR detection in MRI images based on IMAGE SEGMENTATION
PPT on BRAIN TUMOR detection in MRI images based on  IMAGE SEGMENTATION PPT on BRAIN TUMOR detection in MRI images based on  IMAGE SEGMENTATION
PPT on BRAIN TUMOR detection in MRI images based on IMAGE SEGMENTATION
 
Heart Disease Identification Method Using Machine Learnin in E-healthcare.
Heart Disease Identification Method Using Machine Learnin in E-healthcare.Heart Disease Identification Method Using Machine Learnin in E-healthcare.
Heart Disease Identification Method Using Machine Learnin in E-healthcare.
 
Disease Prediction by Machine Learning Over Big Data From Healthcare Communities
Disease Prediction by Machine Learning Over Big Data From Healthcare CommunitiesDisease Prediction by Machine Learning Over Big Data From Healthcare Communities
Disease Prediction by Machine Learning Over Big Data From Healthcare Communities
 
Brain Tumour Detection.pptx
Brain Tumour Detection.pptxBrain Tumour Detection.pptx
Brain Tumour Detection.pptx
 
Wisconsin Breast Cancer dataset.pptx
Wisconsin Breast Cancer dataset.pptxWisconsin Breast Cancer dataset.pptx
Wisconsin Breast Cancer dataset.pptx
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
 
Heart Disease Prediction Using Data Mining Techniques
Heart Disease Prediction Using Data Mining TechniquesHeart Disease Prediction Using Data Mining Techniques
Heart Disease Prediction Using Data Mining Techniques
 

En vedette

Mt. Everest-eLearning & Gamification Innovation-Keynote
Mt. Everest-eLearning & Gamification Innovation-KeynoteMt. Everest-eLearning & Gamification Innovation-Keynote
Mt. Everest-eLearning & Gamification Innovation-KeynoteErwin E. Sniedzins
 
A Study of RandomForests Learning Mechanism with Application to the Identific...
A Study of RandomForests Learning Mechanism with Application to the Identific...A Study of RandomForests Learning Mechanism with Application to the Identific...
A Study of RandomForests Learning Mechanism with Application to the Identific...Salford Systems
 
Popsters — активность аудитории в соцсетях 2016
Popsters — активность аудитории в соцсетях 2016Popsters — активность аудитории в соцсетях 2016
Popsters — активность аудитории в соцсетях 2016Daria Khokhlova
 
Data Mining Techniques In Computer Aided Cancer Diagnosis
Data Mining Techniques In Computer Aided Cancer DiagnosisData Mining Techniques In Computer Aided Cancer Diagnosis
Data Mining Techniques In Computer Aided Cancer DiagnosisDataminingTools Inc
 
Linear programming - Model formulation, Graphical Method
Linear programming  - Model formulation, Graphical MethodLinear programming  - Model formulation, Graphical Method
Linear programming - Model formulation, Graphical MethodJoseph Konnully
 

En vedette (9)

Mt. Everest-eLearning & Gamification Innovation-Keynote
Mt. Everest-eLearning & Gamification Innovation-KeynoteMt. Everest-eLearning & Gamification Innovation-Keynote
Mt. Everest-eLearning & Gamification Innovation-Keynote
 
portfolio
portfolioportfolio
portfolio
 
A Study of RandomForests Learning Mechanism with Application to the Identific...
A Study of RandomForests Learning Mechanism with Application to the Identific...A Study of RandomForests Learning Mechanism with Application to the Identific...
A Study of RandomForests Learning Mechanism with Application to the Identific...
 
Popsters — активность аудитории в соцсетях 2016
Popsters — активность аудитории в соцсетях 2016Popsters — активность аудитории в соцсетях 2016
Popsters — активность аудитории в соцсетях 2016
 
Dna replication ppt
Dna replication pptDna replication ppt
Dna replication ppt
 
Dams
DamsDams
Dams
 
SPARQL Cheat Sheet
SPARQL Cheat SheetSPARQL Cheat Sheet
SPARQL Cheat Sheet
 
Data Mining Techniques In Computer Aided Cancer Diagnosis
Data Mining Techniques In Computer Aided Cancer DiagnosisData Mining Techniques In Computer Aided Cancer Diagnosis
Data Mining Techniques In Computer Aided Cancer Diagnosis
 
Linear programming - Model formulation, Graphical Method
Linear programming  - Model formulation, Graphical MethodLinear programming  - Model formulation, Graphical Method
Linear programming - Model formulation, Graphical Method
 

Similaire à ML Cancer Diagnosis

A Review on Data Mining Techniques for Prediction of Breast Cancer Recurrence
A Review on Data Mining Techniques for Prediction of Breast Cancer RecurrenceA Review on Data Mining Techniques for Prediction of Breast Cancer Recurrence
A Review on Data Mining Techniques for Prediction of Breast Cancer RecurrenceDr. Amarjeet Singh
 
Performance and Evaluation of Data Mining Techniques in Cancer Diagnosis
Performance and Evaluation of Data Mining Techniques in Cancer DiagnosisPerformance and Evaluation of Data Mining Techniques in Cancer Diagnosis
Performance and Evaluation of Data Mining Techniques in Cancer DiagnosisIOSR Journals
 
Role of Tomosynthesis in Assessing the Size of the Breast Lesion
Role of Tomosynthesis in Assessing the Size of the Breast LesionRole of Tomosynthesis in Assessing the Size of the Breast Lesion
Role of Tomosynthesis in Assessing the Size of the Breast LesionApollo Hospitals
 
ferrari-BilateralAsymmetry
ferrari-BilateralAsymmetryferrari-BilateralAsymmetry
ferrari-BilateralAsymmetryRicardo Ferrari
 
Performance Evaluation using Supervised Learning Algorithms for Breast Cancer...
Performance Evaluation using Supervised Learning Algorithms for Breast Cancer...Performance Evaluation using Supervised Learning Algorithms for Breast Cancer...
Performance Evaluation using Supervised Learning Algorithms for Breast Cancer...IRJET Journal
 
Breast Cancer Prediction using Machine Learning
Breast Cancer Prediction using Machine LearningBreast Cancer Prediction using Machine Learning
Breast Cancer Prediction using Machine LearningIRJET Journal
 
Comparative analysis on bayesian classification for breast cancer problem
Comparative analysis on bayesian classification for breast cancer problemComparative analysis on bayesian classification for breast cancer problem
Comparative analysis on bayesian classification for breast cancer problemjournalBEEI
 
IRJET - Classifying Breast Cancer Tumour Type using Convolution Neural Netwo...
IRJET  - Classifying Breast Cancer Tumour Type using Convolution Neural Netwo...IRJET  - Classifying Breast Cancer Tumour Type using Convolution Neural Netwo...
IRJET - Classifying Breast Cancer Tumour Type using Convolution Neural Netwo...IRJET Journal
 
PREDICTION OF BREAST CANCER USING DATA MINING TECHNIQUES
PREDICTION OF BREAST CANCER USING DATA MINING TECHNIQUESPREDICTION OF BREAST CANCER USING DATA MINING TECHNIQUES
PREDICTION OF BREAST CANCER USING DATA MINING TECHNIQUESIAEME Publication
 
Comparison between mammogram and mri in detecting breast cancer
Comparison between mammogram and mri in detecting breast cancerComparison between mammogram and mri in detecting breast cancer
Comparison between mammogram and mri in detecting breast cancernordin1808
 
Comparison between mammogram and mri in detecting breast cancer
Comparison between mammogram and mri in detecting breast cancerComparison between mammogram and mri in detecting breast cancer
Comparison between mammogram and mri in detecting breast cancernordin1808
 
Breast Cancer Diagnostics with Bayesian Networks
Breast Cancer Diagnostics with Bayesian NetworksBreast Cancer Diagnostics with Bayesian Networks
Breast Cancer Diagnostics with Bayesian NetworksBayesia USA
 
Logistic Regression Model for Predicting the Malignancy of Breast Cancer
Logistic Regression Model for Predicting the Malignancy of Breast CancerLogistic Regression Model for Predicting the Malignancy of Breast Cancer
Logistic Regression Model for Predicting the Malignancy of Breast CancerIRJET Journal
 
Breast cancer classification
Breast cancer classificationBreast cancer classification
Breast cancer classificationAshwan Abdulmunem
 
Ensemble strategies for a medical diagnostic decision support system: A breas...
Ensemble strategies for a medical diagnostic decision support system: A breas...Ensemble strategies for a medical diagnostic decision support system: A breas...
Ensemble strategies for a medical diagnostic decision support system: A breas...dewisetiyana52
 
1streview_cancer_1.pptx
1streview_cancer_1.pptx1streview_cancer_1.pptx
1streview_cancer_1.pptxJeyamGv
 
Measurement of tumour size with mammography, sonography and magnetic resonanc...
Measurement of tumour size with mammography, sonography and magnetic resonanc...Measurement of tumour size with mammography, sonography and magnetic resonanc...
Measurement of tumour size with mammography, sonography and magnetic resonanc...Enrique Moreno Gonzalez
 

Similaire à ML Cancer Diagnosis (20)

A Review on Data Mining Techniques for Prediction of Breast Cancer Recurrence
A Review on Data Mining Techniques for Prediction of Breast Cancer RecurrenceA Review on Data Mining Techniques for Prediction of Breast Cancer Recurrence
A Review on Data Mining Techniques for Prediction of Breast Cancer Recurrence
 
Performance and Evaluation of Data Mining Techniques in Cancer Diagnosis
Performance and Evaluation of Data Mining Techniques in Cancer DiagnosisPerformance and Evaluation of Data Mining Techniques in Cancer Diagnosis
Performance and Evaluation of Data Mining Techniques in Cancer Diagnosis
 
Role of Tomosynthesis in Assessing the Size of the Breast Lesion
Role of Tomosynthesis in Assessing the Size of the Breast LesionRole of Tomosynthesis in Assessing the Size of the Breast Lesion
Role of Tomosynthesis in Assessing the Size of the Breast Lesion
 
ferrari-BilateralAsymmetry
ferrari-BilateralAsymmetryferrari-BilateralAsymmetry
ferrari-BilateralAsymmetry
 
Performance Evaluation using Supervised Learning Algorithms for Breast Cancer...
Performance Evaluation using Supervised Learning Algorithms for Breast Cancer...Performance Evaluation using Supervised Learning Algorithms for Breast Cancer...
Performance Evaluation using Supervised Learning Algorithms for Breast Cancer...
 
Ijetcas14 472
Ijetcas14 472Ijetcas14 472
Ijetcas14 472
 
Breast Cancer Prediction using Machine Learning
Breast Cancer Prediction using Machine LearningBreast Cancer Prediction using Machine Learning
Breast Cancer Prediction using Machine Learning
 
Comparative analysis on bayesian classification for breast cancer problem
Comparative analysis on bayesian classification for breast cancer problemComparative analysis on bayesian classification for breast cancer problem
Comparative analysis on bayesian classification for breast cancer problem
 
IRJET - Classifying Breast Cancer Tumour Type using Convolution Neural Netwo...
IRJET  - Classifying Breast Cancer Tumour Type using Convolution Neural Netwo...IRJET  - Classifying Breast Cancer Tumour Type using Convolution Neural Netwo...
IRJET - Classifying Breast Cancer Tumour Type using Convolution Neural Netwo...
 
PREDICTION OF BREAST CANCER USING DATA MINING TECHNIQUES
PREDICTION OF BREAST CANCER USING DATA MINING TECHNIQUESPREDICTION OF BREAST CANCER USING DATA MINING TECHNIQUES
PREDICTION OF BREAST CANCER USING DATA MINING TECHNIQUES
 
Comparison between mammogram and mri in detecting breast cancer
Comparison between mammogram and mri in detecting breast cancerComparison between mammogram and mri in detecting breast cancer
Comparison between mammogram and mri in detecting breast cancer
 
Comparison between mammogram and mri in detecting breast cancer
Comparison between mammogram and mri in detecting breast cancerComparison between mammogram and mri in detecting breast cancer
Comparison between mammogram and mri in detecting breast cancer
 
Breast Cancer Diagnostics with Bayesian Networks
Breast Cancer Diagnostics with Bayesian NetworksBreast Cancer Diagnostics with Bayesian Networks
Breast Cancer Diagnostics with Bayesian Networks
 
Logistic Regression Model for Predicting the Malignancy of Breast Cancer
Logistic Regression Model for Predicting the Malignancy of Breast CancerLogistic Regression Model for Predicting the Malignancy of Breast Cancer
Logistic Regression Model for Predicting the Malignancy of Breast Cancer
 
Breast cancer classification
Breast cancer classificationBreast cancer classification
Breast cancer classification
 
journals public
journals publicjournals public
journals public
 
Ensemble strategies for a medical diagnostic decision support system: A breas...
Ensemble strategies for a medical diagnostic decision support system: A breas...Ensemble strategies for a medical diagnostic decision support system: A breas...
Ensemble strategies for a medical diagnostic decision support system: A breas...
 
1streview_cancer_1.pptx
1streview_cancer_1.pptx1streview_cancer_1.pptx
1streview_cancer_1.pptx
 
Measurement of tumour size with mammography, sonography and magnetic resonanc...
Measurement of tumour size with mammography, sonography and magnetic resonanc...Measurement of tumour size with mammography, sonography and magnetic resonanc...
Measurement of tumour size with mammography, sonography and magnetic resonanc...
 
Sub14297
Sub14297Sub14297
Sub14297
 

Dernier

Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 

Dernier (20)

Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 

ML Cancer Diagnosis

  • 1. Machine Learning for Breast Cancer Diagnosis A Proof of Concept P. K. SHARMA Email: from_pramod @yahoo.com
  • 2. Introduction  Machine learning is branch of Data Science which incorporates a large set of statistical techniques.  These techniques enable data scientists to create a model which can learn from past data and detect patterns from massive, noisy and complex data sets.  Researchers use machine learning for cancer prediction and prognosis.  Machine learning allows inferences or decisions that otherwise cannot be made using conventional statistical methodologies.  With a robustly validated machine learning model, chances of right diagnosis improve.  It specially helps in interpretation of results for borderline cases.
  • 3. Breast Cancer: An overview  The most common cancer in women worldwide.  The principle cause of death from cancer among women globally.  Early detection is the most effective way to reduce breast cancer deaths.  Early diagnosis requires an accurate and reliable procedure to distinguish between benign breast tumors from malignant ones  Breast Cancer Types - three types of breast tumors: Benign breast tumors, In-situ cancers, and Invasive cancers.  The majority of breast tumors detected by mammography are benign.  They are non-cancerous growths and cannot spread outside of the breast to other organs.  In some cases, it is difficult to distinguish certain benign masses from malignant lesions with mammography.  If the malignant cells have not gone through the basal membrane but is completely contained in the lobule or the ducts, the cancer is called in-situ or noninvasive.  If the cancer has broken through the basal membrane and spread into the surrounding tissue, it is called invasive.  This analysis assists in differentiating between benign and malignant tumors.
  • 4. Data Source https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)  The data used for this POC is from University of Wisconsin.  Citation: This breast cancer databases was obtained from the University of Wisconsin Hospitals, Madison from Dr. William H. Wolberg.  Reference : o O. L. Mangasarian and W. H. Wolberg: "Cancer diagnosis via linear programming", SIAM News, Volume 23, Number 5, September 1990, pp 1 & 18. o William H. Wolberg and O.L. Mangasarian: "Multisurface method of pattern separation for medical diagnosis applied to breast cytology", Proceedings of the National Academy of Sciences, U.S.A., Volume 87, December 1990, pp 9193-9196. o O. L. Mangasarian, R. Setiono, and W.H. Wolberg: "Pattern recognition via linear programming: Theory and application to medical diagnosis", in: "Large-scale numerical optimization", Thomas F. Coleman and Yuying Li, editors, SIAM Publications, Philadelphia 1990, pp 22-30. o K. P. Bennett & O. L. Mangasarian: "Robust linear programming discrimination of two linearly inseparable sets", Optimization Methods and Software 1, 1992, 23-34 (Gordon & Breach Science Publishers).
  • 5. Data Files Data File Name Description File Name # of records # of attributes breast-cancer-wisconsin.data breast-cancer-wisconsin.names 699 11 unformatted-data Data file with comments based on breast-cancer-wisconsin.data 699 11 wdbc.data wdbc.names 569 32 wpbc.data wpbc.names 198 34 In this case study, lets analyze breast-cancer-wisconsin.data and wdbc.data.
  • 6. Data Sets The data is in CSV format without any column headers. Columns are interpreted from the associated “names” files.
  • 7. Flow of Data Biopsy Procedure Measurements Reports Evaluation Diagnosis Analysis of measurements Preparation of ML Models Predictions and validation
  • 8. Analysis Classifier Params.  min_samples_leaf  n_estimators  min_samples_split  max_features Data Preparation  Address missing data  Training - Testing – Validation data Lab Setup Components Libraries EnvironmentPython scikit-learn RandomForestClassifier Linux PandasSciPy NumPy IPython Matplotlib seaborn StratifiedKFold train_test_split GridSearchCV learning_curve pyplot interp Input Files  wdbc.data  breast-cancer- wisconsin.data Outputs  Trained Classifier  Predictions
  • 10. Data Description : wdbc.data 1. ID number 2. Diagnosis (M = malignant, B = benign) 3-32. Ten real-valued features are computed for each cell nucleus: a) radius (mean of distances from center to points on the perimeter) b) texture (standard deviation of gray-scale values) c) perimeter d) area e) smoothness (local variation in radius lengths) f) compactness (perimeter^2 / area - 1.0) g) concavity (severity of concave portions of the contour) h) concave points (number of concave portions of the contour) i) symmetry j) fractal dimension ("coastline approximation" - 1)  Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass.  They describe characteristics of the cell nuclei present in the image.  The mean, standard error, and "worst" or largest (mean of the three largest values) of these features were computed for each image, resulting in 30 features.  For instance, field 3 is Mean Radius, field 13 is Radius SE, field 23 is Worst Radius. All feature values are recoded with four significant digits.
  • 11. wdbc.data  Mean Radius, Mean Perimeter and Mean appear to be helpful in classification.  Higher the values of each parameter more are the chances of it being malignant.
  • 12. wdbc.data  Mean Concavity, Mean Concave Points, and Mean Compactness appear to be helpful in classification.  Higher the values of each parameter more are the chances of it being malignant.
  • 13. wdbc.data  Mean Smoothness, Mean Texture, Mean Fractal Dimension, Mean Symmetry and Mean Compactness do not appears to have influence on classification.  Both type of cases are spread across.
  • 14. Data Description : breast-cancer-wisconsin.data  Missing attribute values: 16  There are 16 instances in Groups 1 to 6 that contain a single missing (i.e., unavailable) attribute value, now denoted by "?". # Attribute Domain 1. Sample code number id number 2. Clump Thickness 1 - 10 3. Uniformity of Cell Size 1 - 10 4. Uniformity of Cell Shape 1 - 10 5. Marginal Adhesion 1 - 10 6. Single Epithelial Cell Size 1 - 10 7. Bare Nuclei 1 - 10 8. Bland Chromatin 1 - 10 9. Normal Nucleoli 1 - 10 10. Mitoses 1 - 10 11. Class (2 for benign, 4 for malignant)
  • 15. breast-cancer-wisconsin.data  The features distinguish between benign and Malignant fairly well.
  • 16. breast-cancer-wisconsin.data  The feature seems to distinguish between benign and Malignant fairly well.
  • 17. breast-cancer-wisconsin.data  The feature seems to distinguish between benign and Malignant fairly well.
  • 19. Analysis: wdbc.data  Training data is divided in 5 folds.  Test data has 114 records  Accuracy Score: 0.9561 Confusion Matrix: Predicted Benign Predicted Malignant True Benign 69 2 True Malignant 3 40 Classification Report: Precision Recall f1-score Support 0 0.96 0.97 0.97 71 1 0.95 0.93 0.94 43 avg / total 0.96 0.96 0.96 114 Three cases, although malignant, are predicted as benign • High accuracy. • Supports the diagnosis. Model performs equally well on both test and training sets Two dimensional plot shows excellent separation of Benign and Malignant cases
  • 20. Plotting three cases… Factors influencing predictions.
  • 21. Plotting three cases: Factors having no influence on predictions…
  • 22. Plotting two features at a time  Also analyzed cases if only two of the features were available.  Classifier was trained on two features at a time and decision boundary is plotted.  Model could predict the cases with reasonable accuracy
  • 24. Analysis: breast-cancer-wisconsin.data  Training data is divided in 5 folds.  Test data has 140 records  Accuracy Score: 0.9643 Confusion Matrix: Predicted Benign Predicted Malignant True Benign 92 3 True Malignant 2 43 Classification Report: Precision Recall f1-score Support 0 0.98 0.97 0.97 95 1 0.93 0.96 0.95 45 avg / total 0.96 0.96 0.96 140 Two cases, although malignant, are predicted as benign Model performs equally well on both training as well as test data • High accuracy. • Supports the diagnosis. Two dimensional plot shows excellent separation of Benign and Malignant cases
  • 26. Plotting three cases… Factors influencing predictions.
  • 27. Plotting two features at a time  Classifier was trained on two features at a time and decision boundary is plotted.  As expected, classifier needs more than just two parameters to give accurate predictions.

Notes de l'éditeur

  1. Random forest classifier is used to build the model