SlideShare une entreprise Scribd logo
1  sur  16
Leveraging Machine Learning for
Breast Cancer Prediction Presented by : Arwa Marfatia
Introduction
• Machine Learning technologies has a wide range of potential uses
in healthcare from improving patient data, medical research,
diagnosis and treatment, to reducing costs and making patient
safety more efficient.
• Breast Cancer is considered one of the most common cancers in
women caused by various clinical, lifestyle, social and economic
factors.
• Machine learning, with its predictive capabilities, offers a
transformative approach to understanding and predicting breast
cancer in patients.
Through data-driven insights and predictive modeling, this presentation aims to showcase my
Machine Learning Capstone Project focused on predicting breast cancer in the Healthcare
Sector.
Why Healthcare
Domain?
Machine learning provides an exciting opportunity in healthcare to improve the
accuracy of diagnoses, personalize healthcare, and find novel solutions to decades-
old problems.
Application of Machine Learning in Healthcare:
• Improve trauma-care response: By creating sensors and devices that can send a
patient’s vital information to the hospital before they arrive via ambulance or
other emergency transport, there is less time between when the patient arrives
and when they are able to receive life-saving treatment.
• Disease prediction: You can use machine learning to find trends, create
connections, and make conclusions based on large data sets. This can include
predicting disease outbreaks in communities and tracking habits leading to
patient disease.
• Visualization of biomedical data: You can use machine learning to create three-
dimensional visualisations of biomedical data such as RNA sequences, protein
structure, and genomic profiles.
• Improved diagnosis and disease identification: Identify previously
unrecognisable symptom patterns and compare them with larger data sets to
diagnose diseases earlier in their development.
Project’s Significance and
its Benefits to Healthcare
• Early Diagnosis: Combining multiple risk factors in modeling for breast cancer
prediction could help the early diagnosis of the disease with necessary care plans.
• Collection, storage, and management: of different data and intelligent systems based
on multiple factors for predicting breast cancer are effective in disease management.
• Visualization of biomedical data: You can use machine learning to create three-
dimensional visualizations of biomedical data such as RNA sequences, protein
structure, and genomic profiles.
• Improved diagnosis and disease identification: Identify previously unrecognisable
symptom patterns and compare them with larger data sets to diagnose diseases
earlier in their development.
Dataset
Information
Here are the key details about the dataset used in this project:
• Number of records: Our dataset comprises of a comparatively smaller collection
of data, consisting of 569 records. Each record represents a unique entry,
contributing to the richness and depth of our analysis.
• Features/Columns: The dataset is characterized by a diverse set of features.
Features are computed from a digitized image of a fine needle aspirate (FNA) of
a breast mass. They describe characteristics of the cell nuclei present in the
image. In total, there are 30 features/columns that form the basis of our
predictive modeling.
• Source of the Data: The dataset is sourced from Kaggle, ensuring reliability
and relevance. The data's origin plays a crucial role in shaping the context and
ensuring that our analysis is grounded in real-world scenarios and industry
dynamics.
Exploratory Data Analysis (EDA)
• Exploring the data allowed us to gain a comprehensive overview of the
data's structure. It uncovered potential patterns, helped us identify key
trends and get essential insights from the dataset.
• Throughout the EDA process, we analyzed the distribution of individual
features, investigated correlations, and explored any inherent
relationships between variables.
• Visualizations also played a crucial role in providing a clear
representation of the data, offering insights into breast cancer
prediction.
• First, we made sure there were no Null values and Duplicates in the dataset. There was only one
column with null values which was dropped since it only had null values. Our dataset was clean
to begin with.
• Then, we checked our columns to see if they were providing any useful information for us to
work with. We found out that columns like “ID” and “Unnamed 32” weren't contributing much
to the predictions. Hence, we decided to drop them during preprocessing.
• Some columns were highly correlated and could lead to overfitting and hence were dropped.
• To ensure consistent scales for numerical features, we decided to employ Standard Scaler
during preprocessing.
Exploratory Data Analysis (EDA)
Visualization
s
Our target variable ‘Diagnosis’ has 357 Benign
(Negative cases) and 212 Malignant (Positive
cases).
Upon inspecting the heatmap, we can see that there is multicollinearity observed among the
columns. As a result, some columns will be dropped.
Preprocessing
• First, “ID” and “Unnamed 32”columns were dropped as they didn’t provide any useful
information for our predictions.
• Since there is multicollinearity, columns with high correlations with other were
dropped.
• Then, we encoded the Categorical data into Numerical data with the help of Mapping.
It assigns binary numeric values to each unique class present in column with
categorical data.
Splitting the data into X and
y• In this step, we partitioned the dataset into two components: X and y.
• The variable X encompasses all independent variables, representing the features
that contribute to our predictions.
• On the other hand, y encapsulates the dependent variable or target variable,
serving as the outcome we aim to predict.
Train-Test Split
• We then split the dataset into training data and testing data.
• We did an 80:20 split, meaning 80% of our data is Training Data and 20% of our data is
Testing Data. So, our test size was set to 0.2.
• We took Random State as 40. This guaranteed the reproducibility of our results across
different runs.
• We also used Stratify = y to ensure that our Target Variable (y) is distributed
proportionally.
Standard Scaler
• We used Standard Scaler to standardize the features of the dataset.
• This ensured that the consistency between the features of the dataset was maintained.
• Standardization is crucial for certain machine learning algorithms, promoting optimal
model performance by mitigating the influence of varying magnitudes among features.
Applying Machine
Learning Algorithms
The, Breast Cancer Prediction problem, is a Binary Classification problem.
Models used:
• Logistic Regression : Logistic Regression is a powerful tool in binary classification. Its very good at
modeling the probability of an event occurring, making it suitable for scenarios where understanding the
likelihood of breast cancer cells is essential.
• Random Forest : It is based on the concept of ensemble learning, which is a process of combining multiple
classifiers to solve a complex problem and to improve the performance of the model.
• Decision Tree : A decision tree is a supervised learning algorithm that models decisions based on input
features.
• Support Vector Machine (SVC) : Support Vector Classification is a robust algorithm employed for
classification tasks, especially when there's a need for clear separation between classes.
• Naive Bayes : Naive Bayes is a probabilistic classification algorithm known for its simplicity and efficiency.
It assumes that features are independent, making calculations easier. Its often used when simplicity and
speed are crucial.
Model Selection and Considerations
• SVC outperforms Logistic Regression, Random Forest, Decision Tree and Naive Bayes in
all metrics, demonstrating higher Accuracy, Precision, Recall, and F1-Score. It seems to
be a promising model for our task.
• Based on the provided metrics, SVC stands out as the best-performing model overall. It
achieves a good balance between precision and recall, making it suitable for our Breast
Cancer prediction task.
• Hence, we will go with Support Vector Classification as our final model as it is quite
evident that it performs best for our Breast Cancer problem.
Conclusion
• With the help of several insights, patterns and trends in our data, we’ve used Machine
Learning to address the intricate challenge of predicting Breast Cancer.
• This project offers significant benefits to banks:
 Combining multiple risk factors in modeling for breast cancer prediction could help
the early diagnosis of the disease with necessary care plans.
 Collection, storage, and management of different data and intelligent systems
based on multiple factors for predicting breast cancer are effective in disease
management.
 The proposed machine-learning approaches could predict breast cancer as the
early detection of this disease could help slow down the progress of the disease and
reduce the mortality rate through appropriate therapeutic interventions at the
right time.
Thank You !

Contenu connexe

Similaire à Breast Cancer Prediction - Arwa Marfatia.pptx

Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics
 
IRJET- Breast Cancer Relapse Prognosis by Classic and Modern Structures o...
IRJET-  	  Breast Cancer Relapse Prognosis by Classic and Modern Structures o...IRJET-  	  Breast Cancer Relapse Prognosis by Classic and Modern Structures o...
IRJET- Breast Cancer Relapse Prognosis by Classic and Modern Structures o...IRJET Journal
 
brain tumor presentation.pptxbraintumorpresentationonbraintumor
brain tumor presentation.pptxbraintumorpresentationonbraintumorbrain tumor presentation.pptxbraintumorpresentationonbraintumor
brain tumor presentation.pptxbraintumorpresentationonbraintumorNagavelliMadhavi
 
DataMining Techniques in BreastCancer.pptx
DataMining Techniques in BreastCancer.pptxDataMining Techniques in BreastCancer.pptx
DataMining Techniques in BreastCancer.pptxMaligireddyTanujaRed1
 
Breast Cancer Diagnostics with Bayesian Networks
Breast Cancer Diagnostics with Bayesian NetworksBreast Cancer Diagnostics with Bayesian Networks
Breast Cancer Diagnostics with Bayesian NetworksBayesia USA
 
HEALTH PREDICTION ANALYSIS USING DATA MINING
HEALTH PREDICTION ANALYSIS USING DATA  MININGHEALTH PREDICTION ANALYSIS USING DATA  MINING
HEALTH PREDICTION ANALYSIS USING DATA MININGAshish Salve
 
Breast Cancer Detection Using Machine Learning
Breast Cancer Detection Using Machine LearningBreast Cancer Detection Using Machine Learning
Breast Cancer Detection Using Machine LearningIRJET Journal
 
A Review on Breast Cancer Detection
A Review on Breast Cancer DetectionA Review on Breast Cancer Detection
A Review on Breast Cancer DetectionIRJET Journal
 
Classification AlgorithmBased Analysis of Breast Cancer Data
Classification AlgorithmBased Analysis of Breast Cancer DataClassification AlgorithmBased Analysis of Breast Cancer Data
Classification AlgorithmBased Analysis of Breast Cancer DataIIRindia
 
Breast Cancer Prediction
Breast Cancer PredictionBreast Cancer Prediction
Breast Cancer PredictionIRJET Journal
 
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real LifeSimplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real LifePeea Bal Chakraborty
 
Breast Cancer Prediction using Machine Learning
Breast Cancer Prediction using Machine LearningBreast Cancer Prediction using Machine Learning
Breast Cancer Prediction using Machine LearningIRJET Journal
 
Classification of Breast Cancer Tissues using Decision Tree Algorithms
Classification of Breast Cancer Tissues using Decision Tree AlgorithmsClassification of Breast Cancer Tissues using Decision Tree Algorithms
Classification of Breast Cancer Tissues using Decision Tree AlgorithmsLovely Professional University
 
Classification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining TechniquesClassification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining Techniquesinventionjournals
 
first review.pptxgghggggvvvvbbvvvvvhhjjjbbvvvvbbbbbhhhhhhhhhbbh
first review.pptxgghggggvvvvbbvvvvvhhjjjbbvvvvbbbbbhhhhhhhhhbbhfirst review.pptxgghggggvvvvbbvvvvvhhjjjbbvvvvbbbbbhhhhhhhhhbbh
first review.pptxgghggggvvvvbbvvvvvhhjjjbbvvvvbbbbbhhhhhhhhhbbhmithun302002
 
Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop Fro...
Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop Fro...Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop Fro...
Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop Fro...Michael Batavia
 
IRJET - A Conceptual Method for Breast Tumor Classification using SHAP Values ...
IRJET - A Conceptual Method for Breast Tumor Classification using SHAP Values ...IRJET - A Conceptual Method for Breast Tumor Classification using SHAP Values ...
IRJET - A Conceptual Method for Breast Tumor Classification using SHAP Values ...IRJET Journal
 
IRJET- Breast Cancer Disease Prediction : Using Machine Learning Approach
IRJET- Breast Cancer Disease Prediction : Using Machine Learning ApproachIRJET- Breast Cancer Disease Prediction : Using Machine Learning Approach
IRJET- Breast Cancer Disease Prediction : Using Machine Learning ApproachIRJET Journal
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxrajalakshmi5921
 

Similaire à Breast Cancer Prediction - Arwa Marfatia.pptx (20)

Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 
IRJET- Breast Cancer Relapse Prognosis by Classic and Modern Structures o...
IRJET-  	  Breast Cancer Relapse Prognosis by Classic and Modern Structures o...IRJET-  	  Breast Cancer Relapse Prognosis by Classic and Modern Structures o...
IRJET- Breast Cancer Relapse Prognosis by Classic and Modern Structures o...
 
brain tumor presentation.pptxbraintumorpresentationonbraintumor
brain tumor presentation.pptxbraintumorpresentationonbraintumorbrain tumor presentation.pptxbraintumorpresentationonbraintumor
brain tumor presentation.pptxbraintumorpresentationonbraintumor
 
DataMining Techniques in BreastCancer.pptx
DataMining Techniques in BreastCancer.pptxDataMining Techniques in BreastCancer.pptx
DataMining Techniques in BreastCancer.pptx
 
Breast Cancer Diagnostics with Bayesian Networks
Breast Cancer Diagnostics with Bayesian NetworksBreast Cancer Diagnostics with Bayesian Networks
Breast Cancer Diagnostics with Bayesian Networks
 
HEALTH PREDICTION ANALYSIS USING DATA MINING
HEALTH PREDICTION ANALYSIS USING DATA  MININGHEALTH PREDICTION ANALYSIS USING DATA  MINING
HEALTH PREDICTION ANALYSIS USING DATA MINING
 
Breast Cancer Detection Using Machine Learning
Breast Cancer Detection Using Machine LearningBreast Cancer Detection Using Machine Learning
Breast Cancer Detection Using Machine Learning
 
A Review on Breast Cancer Detection
A Review on Breast Cancer DetectionA Review on Breast Cancer Detection
A Review on Breast Cancer Detection
 
Comparison of breast cancer classification models on Wisconsin dataset
Comparison of breast cancer classification models on Wisconsin  datasetComparison of breast cancer classification models on Wisconsin  dataset
Comparison of breast cancer classification models on Wisconsin dataset
 
Classification AlgorithmBased Analysis of Breast Cancer Data
Classification AlgorithmBased Analysis of Breast Cancer DataClassification AlgorithmBased Analysis of Breast Cancer Data
Classification AlgorithmBased Analysis of Breast Cancer Data
 
Breast Cancer Prediction
Breast Cancer PredictionBreast Cancer Prediction
Breast Cancer Prediction
 
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real LifeSimplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
 
Breast Cancer Prediction using Machine Learning
Breast Cancer Prediction using Machine LearningBreast Cancer Prediction using Machine Learning
Breast Cancer Prediction using Machine Learning
 
Classification of Breast Cancer Tissues using Decision Tree Algorithms
Classification of Breast Cancer Tissues using Decision Tree AlgorithmsClassification of Breast Cancer Tissues using Decision Tree Algorithms
Classification of Breast Cancer Tissues using Decision Tree Algorithms
 
Classification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining TechniquesClassification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining Techniques
 
first review.pptxgghggggvvvvbbvvvvvhhjjjbbvvvvbbbbbhhhhhhhhhbbh
first review.pptxgghggggvvvvbbvvvvvhhjjjbbvvvvbbbbbhhhhhhhhhbbhfirst review.pptxgghggggvvvvbbvvvvvhhjjjbbvvvvbbbbbhhhhhhhhhbbh
first review.pptxgghggggvvvvbbvvvvvhhjjjbbvvvvbbbbbhhhhhhhhhbbh
 
Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop Fro...
Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop Fro...Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop Fro...
Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop Fro...
 
IRJET - A Conceptual Method for Breast Tumor Classification using SHAP Values ...
IRJET - A Conceptual Method for Breast Tumor Classification using SHAP Values ...IRJET - A Conceptual Method for Breast Tumor Classification using SHAP Values ...
IRJET - A Conceptual Method for Breast Tumor Classification using SHAP Values ...
 
IRJET- Breast Cancer Disease Prediction : Using Machine Learning Approach
IRJET- Breast Cancer Disease Prediction : Using Machine Learning ApproachIRJET- Breast Cancer Disease Prediction : Using Machine Learning Approach
IRJET- Breast Cancer Disease Prediction : Using Machine Learning Approach
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
 

Plus de Boston Institute of Analytics

Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...Boston Institute of Analytics
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
NLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesNLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesBoston Institute of Analytics
 
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud DetectionCombating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud DetectionBoston Institute of Analytics
 
Predicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning ApproachPredicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning ApproachBoston Institute of Analytics
 
Employee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project PresentationEmployee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project PresentationBoston Institute of Analytics
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...Boston Institute of Analytics
 
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...Boston Institute of Analytics
 
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...Boston Institute of Analytics
 
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...Boston Institute of Analytics
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 

Plus de Boston Institute of Analytics (20)

Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
NLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesNLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile Prices
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Analyzing Movie Reviews : Machine learning project
Analyzing Movie Reviews : Machine learning projectAnalyzing Movie Reviews : Machine learning project
Analyzing Movie Reviews : Machine learning project
 
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud DetectionCombating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
 
Predicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning ApproachPredicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning Approach
 
Employee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project PresentationEmployee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project Presentation
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...
 
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...
 
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...
 
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 

Dernier

Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 

Dernier (20)

Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 

Breast Cancer Prediction - Arwa Marfatia.pptx

  • 1.
  • 2. Leveraging Machine Learning for Breast Cancer Prediction Presented by : Arwa Marfatia
  • 3. Introduction • Machine Learning technologies has a wide range of potential uses in healthcare from improving patient data, medical research, diagnosis and treatment, to reducing costs and making patient safety more efficient. • Breast Cancer is considered one of the most common cancers in women caused by various clinical, lifestyle, social and economic factors. • Machine learning, with its predictive capabilities, offers a transformative approach to understanding and predicting breast cancer in patients. Through data-driven insights and predictive modeling, this presentation aims to showcase my Machine Learning Capstone Project focused on predicting breast cancer in the Healthcare Sector.
  • 4. Why Healthcare Domain? Machine learning provides an exciting opportunity in healthcare to improve the accuracy of diagnoses, personalize healthcare, and find novel solutions to decades- old problems. Application of Machine Learning in Healthcare: • Improve trauma-care response: By creating sensors and devices that can send a patient’s vital information to the hospital before they arrive via ambulance or other emergency transport, there is less time between when the patient arrives and when they are able to receive life-saving treatment. • Disease prediction: You can use machine learning to find trends, create connections, and make conclusions based on large data sets. This can include predicting disease outbreaks in communities and tracking habits leading to patient disease. • Visualization of biomedical data: You can use machine learning to create three- dimensional visualisations of biomedical data such as RNA sequences, protein structure, and genomic profiles. • Improved diagnosis and disease identification: Identify previously unrecognisable symptom patterns and compare them with larger data sets to diagnose diseases earlier in their development.
  • 5. Project’s Significance and its Benefits to Healthcare • Early Diagnosis: Combining multiple risk factors in modeling for breast cancer prediction could help the early diagnosis of the disease with necessary care plans. • Collection, storage, and management: of different data and intelligent systems based on multiple factors for predicting breast cancer are effective in disease management. • Visualization of biomedical data: You can use machine learning to create three- dimensional visualizations of biomedical data such as RNA sequences, protein structure, and genomic profiles. • Improved diagnosis and disease identification: Identify previously unrecognisable symptom patterns and compare them with larger data sets to diagnose diseases earlier in their development.
  • 6. Dataset Information Here are the key details about the dataset used in this project: • Number of records: Our dataset comprises of a comparatively smaller collection of data, consisting of 569 records. Each record represents a unique entry, contributing to the richness and depth of our analysis. • Features/Columns: The dataset is characterized by a diverse set of features. Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. In total, there are 30 features/columns that form the basis of our predictive modeling. • Source of the Data: The dataset is sourced from Kaggle, ensuring reliability and relevance. The data's origin plays a crucial role in shaping the context and ensuring that our analysis is grounded in real-world scenarios and industry dynamics.
  • 7. Exploratory Data Analysis (EDA) • Exploring the data allowed us to gain a comprehensive overview of the data's structure. It uncovered potential patterns, helped us identify key trends and get essential insights from the dataset. • Throughout the EDA process, we analyzed the distribution of individual features, investigated correlations, and explored any inherent relationships between variables. • Visualizations also played a crucial role in providing a clear representation of the data, offering insights into breast cancer prediction.
  • 8. • First, we made sure there were no Null values and Duplicates in the dataset. There was only one column with null values which was dropped since it only had null values. Our dataset was clean to begin with. • Then, we checked our columns to see if they were providing any useful information for us to work with. We found out that columns like “ID” and “Unnamed 32” weren't contributing much to the predictions. Hence, we decided to drop them during preprocessing. • Some columns were highly correlated and could lead to overfitting and hence were dropped. • To ensure consistent scales for numerical features, we decided to employ Standard Scaler during preprocessing. Exploratory Data Analysis (EDA)
  • 9. Visualization s Our target variable ‘Diagnosis’ has 357 Benign (Negative cases) and 212 Malignant (Positive cases).
  • 10. Upon inspecting the heatmap, we can see that there is multicollinearity observed among the columns. As a result, some columns will be dropped.
  • 11. Preprocessing • First, “ID” and “Unnamed 32”columns were dropped as they didn’t provide any useful information for our predictions. • Since there is multicollinearity, columns with high correlations with other were dropped. • Then, we encoded the Categorical data into Numerical data with the help of Mapping. It assigns binary numeric values to each unique class present in column with categorical data. Splitting the data into X and y• In this step, we partitioned the dataset into two components: X and y. • The variable X encompasses all independent variables, representing the features that contribute to our predictions. • On the other hand, y encapsulates the dependent variable or target variable, serving as the outcome we aim to predict.
  • 12. Train-Test Split • We then split the dataset into training data and testing data. • We did an 80:20 split, meaning 80% of our data is Training Data and 20% of our data is Testing Data. So, our test size was set to 0.2. • We took Random State as 40. This guaranteed the reproducibility of our results across different runs. • We also used Stratify = y to ensure that our Target Variable (y) is distributed proportionally. Standard Scaler • We used Standard Scaler to standardize the features of the dataset. • This ensured that the consistency between the features of the dataset was maintained. • Standardization is crucial for certain machine learning algorithms, promoting optimal model performance by mitigating the influence of varying magnitudes among features.
  • 13. Applying Machine Learning Algorithms The, Breast Cancer Prediction problem, is a Binary Classification problem. Models used: • Logistic Regression : Logistic Regression is a powerful tool in binary classification. Its very good at modeling the probability of an event occurring, making it suitable for scenarios where understanding the likelihood of breast cancer cells is essential. • Random Forest : It is based on the concept of ensemble learning, which is a process of combining multiple classifiers to solve a complex problem and to improve the performance of the model. • Decision Tree : A decision tree is a supervised learning algorithm that models decisions based on input features. • Support Vector Machine (SVC) : Support Vector Classification is a robust algorithm employed for classification tasks, especially when there's a need for clear separation between classes. • Naive Bayes : Naive Bayes is a probabilistic classification algorithm known for its simplicity and efficiency. It assumes that features are independent, making calculations easier. Its often used when simplicity and speed are crucial.
  • 14. Model Selection and Considerations • SVC outperforms Logistic Regression, Random Forest, Decision Tree and Naive Bayes in all metrics, demonstrating higher Accuracy, Precision, Recall, and F1-Score. It seems to be a promising model for our task. • Based on the provided metrics, SVC stands out as the best-performing model overall. It achieves a good balance between precision and recall, making it suitable for our Breast Cancer prediction task. • Hence, we will go with Support Vector Classification as our final model as it is quite evident that it performs best for our Breast Cancer problem.
  • 15. Conclusion • With the help of several insights, patterns and trends in our data, we’ve used Machine Learning to address the intricate challenge of predicting Breast Cancer. • This project offers significant benefits to banks:  Combining multiple risk factors in modeling for breast cancer prediction could help the early diagnosis of the disease with necessary care plans.  Collection, storage, and management of different data and intelligent systems based on multiple factors for predicting breast cancer are effective in disease management.  The proposed machine-learning approaches could predict breast cancer as the early detection of this disease could help slow down the progress of the disease and reduce the mortality rate through appropriate therapeutic interventions at the right time.