E-Commerce Order PredictionShraddha Kamble.pptx

Click to edit Master title style
1
Title -
E- Commerce Order
Prediction
B y - S h r a d d h a K a m b l e

2
What is Churn Analysis?
2
• Customer churn can be defined as the rate at which customers leave a platform or service. And customer
churn analysis is the method of analysing the rate. There are usually two kinds of churn.
• Voluntary Churn: when the customer voluntarily chooses to not subscribe anymore, for example,
they got a better deal somewhere else or they had a dissatisfactory experience.
• Involuntary Churn: when the customer involuntarily leaves the platform, for example, payment
failures because credit card maxed out.
• Some of the churns are expected but when the churn is high it could highly impact the bottom line of the
company. It also reveals the net consumer perception of the company, which is essentially an important
factor in the long term growth and sustainability of the company. It is pretty much intuitive to know if
there is a significant churn or not but unless and until we have key data points to draw actionable insight it
is just a guessing game. Several factors influence churn and understanding each can give us a pretty solid
idea of what to do next.

3
Project Contents
3
• I m p o r t i n g l i b r a r i e s , l o a d i n g a n d u n d e r s t a n d i n g t h e
d a t a
• D a t a P r e p r o c e s s i n g
• D a t a Vi s u a l i z a t i o n
• M o d e l b u i l d i n g
• L o g i s t i c R e g r e s s i o n
• R a n d o m F o r e s t C l a s s i f i e r
• D e c i s i o n Tr e e C l a s s i f i e r
• X G B o o s t C l a s s i f i e r
• C o n c l u s i o n

4
Importing libraries, loading and understanding
the data
• We will be using the following libraries
1) Pandas
2) Numpy
3) Seaborn
4) Matplotlib.pyplot

5
• Loading the Dataset

6
Exploratory Data Analysis & Pre-
processing
• info () –
The info method returns the information non-
null count and dtype of the data.

7
• Statistical summary -
Getting the Count: The number of non-null
values in each column.
Mean: The average value of each column.
Standard Deviation (std): It indicates how much
individual data points deviate from the mean.
Minimum (min): The smallest value in each
column.
25th Percentile (25%): Also known as the first
quartile, it's the value below which 25% of the
data falls.
Median (50%): Also known as the second
quartile or the median, it's the middle value
when the data is sorted. It represents the central
tendency.
75th Percentile (75%): Also known as the third
quartile, it's the value below which 75% of the
data falls.
Maximum (max): The largest value in each
column

8
Correlation Heatmap -

9
Checking Imbalance Data

10
Handling Outliers

11
Univariate Analysis

12

13
Bivariate Analysis

14
Standardization
Standardization, also known as feature scaling or normalization, is a preprocessing technique commonly used in machine
learning to bring all features or variables to a similar scale. This process helps algorithms perform better by ensuring that no
single feature dominates the learning process due to its larger magnitude. Standardization is particularly important for
algorithms that rely on distances or gradients, such as k-nearest neighbors, support vector machines, and gradient descent-
based optimization algorithms.
The goal of standardization is to transform the features so that they have a mean of 0 and a standard deviation of 1. This
transformation does not change the shape of the distribution of the data; it simply scales and shifts the data to make it more
suitable for modeling.
Data Splitting

15
Model Building
• Will now build the following models
• Logistic Regression
• Random Forest Classifier
• Decision Tree Classifier
• XG Boost Classifier

16

17

18
Feature Importance

19
Conclusion
• E-Commerce Order Prediction is easily one of the most practical and
widespread use cases of machine learning in everyday businesses. Being able to
analyse why and what causes customers to leave and predict which customers
are likely to leave can make decision making much easier.
• We explored and performed an analysis of an e-commerce dataset. We ran
different classification algorithms with sci-kit learn’s Pipeline method. We used
GreedsearchCV for hyperparameter tuning to find the best algorithm with the
best set of parameters. Finally found out the features that had more influence on
prediction. So, this was all about E-Commerce Order Prediction.

20
Thank You

E-Commerce Order PredictionShraddha Kamble.pptx

Recommandé

Recommandé

Contenu connexe

Similaire à E-Commerce Order PredictionShraddha Kamble.pptx

Similaire à E-Commerce Order PredictionShraddha Kamble.pptx (20)

Plus de Boston Institute of Analytics

Plus de Boston Institute of Analytics (20)

Dernier

Dernier (20)

E-Commerce Order PredictionShraddha Kamble.pptx