SlideShare une entreprise Scribd logo
1  sur  12
Télécharger pour lire hors ligne
Classification
of Spotify
song
popularity
A PROJECT ON MACHINE LEARNING
Contents
Problem
Statement
Dataset
Description
Methodology
Key Findings
Conclusion
Problem Statement
Spotify, one of the most popular platform, used by the listeners for songs and
podcasts.
Spotify uses recommendation engine to recommend tracks to the listener in the
discover weekly section according to listener's preference and popularity.
Out of the two factors, popularity of the track is the most important factor, used
by the recommendation engine because it also tells about the popular
preferences of the people, based upon various variables and the heyday period
of the track.
Dataset Description
114000 rows and 21 columns
Target Variable
Popularity – This measure vary with past released song with present
released songs because Spotify reshuffles according to monthly
listeners. It is a multiclass variable consisting of 3 categories.
In this, only those variables are taken which are affecting the target
variable. Other variable are not taken into consideration.
Key variables
Independent Variable
Continuous:
Categorical:
•Valence% – It is the positiveness of the song. Higher the value is cheerful
and euphoric, lower the value depressing and sad.
•Danceability% – How much the song can be used for dance purpose.
•Energy% – It is the amount of energy a song have
•Acoustic Ness% – It measures the use of natural instruments or
electronically made music.
•Key – It the musical notes which is used in the track, such as 0=C, 1=C#,
and so on. There are total of 12 keys present.
•Tempo – It represents the speed of the song. Higher the tempo higher
faster the song and vice-versa.
•Duration – It represents the length of the song in seconds.
•Speech ness – It represents the amount of vocals/voices present in the
song.
Methodology
Cleaning and
preparing the
data.
Feature
Engineering.
Primary
model
building.
Re-
considering
the variables
and tuning
the model.
Building the
model using
different
algorithms to
check the
stability of
the
prediction.
EDA Report
• Duplicate values, null values and typo error were present in the data.
• There are huge outliers present in the data, which is treated by converting
them into categories maintaining the balance in the classes.
• Did some Feature engineering such as clubbing, binning and rounding the
data to reduce the classes in the data.
• The target variable “Popularity” was initially in percentage 0-100%.
However, the original data description says that it is classification problem.
So, the target variable is converted from regression to multi-classification.
• The target variable was not-balanced. Oversampling technique was used to
balance the classes.
Conversion of target variable percentile
into three categories.
In the histogram below we can see than the target column has a peak at 0, which is represents no
popularity of the tracks, so, it is assigned an independent class of the variable because it will impact
the accuracy of the model. The new classes are ‘zero popularity’, ‘low popularity’, ‘high popularity’
Algorithms report
With the different algorithms, the accuracy
is not fluctuation much, represent the
stability in the prediction.
Highest Accuracy = 85.94
Lowest Accuracy = 79.7
Algorithm wise accuracy:
• Random Forest Classifier = 85.94
• Decision Tree Classifier = 79.7
• Cat Boost Classifier = 80.3
• XG Boost Classifier = 82.28
Key Findings
• There were 20 independent variables present in the data but only 8 variables were
affecting the popularity of the song.
• Valence, danceability and energy are affecting almost 50% to the popularity.
• Song Genre is one of the most important factor when comes to individual's preference or
taste of music, that recommendation engine considers. The most popular genre is
Country-Specific which consist of Country Wise language songs, indicates people love
mother tongue when it comes to songs. Apart from that most popular genre is EDM
(Electronic Dance Music) because high valence, danceability and energy.
• Medium tempo is 2x popular than any other tempo range which is between 100-140
bpm. This tempo is used in EDM, Rock and Pop music, are the most popular genres.
The importance of each column related
to the popularity
The figure in the left shows how much each feature
is affecting the target column.
Valence + danceability + energy
16.7% + 16.0% + 15.5% = 48.2%
First 8 columns or all the 20 columns is giving the
same accuracy.
Conclusion
The overall dataset was little complicated because of the difficulty of establishing the
relationship between the target variable and the independent variables. However, with
some cleaning and feature engineering, the final model was stable with high accuracy.
The most difficult differentiation was that, the popularity was getting affecting by the
release date of the song and the release date was not available in the data, so it seemed
like the case of Endogeneity. Nonetheless, After separating the target variable, it got
sorted.
As a Data Scientist, I can conclude that this trained model with the following dataset is
predicting accurately and is ready for deployment in the Spotify recommendation Engine,
to predict the right popularity in future recommending the right tracks to the listeners .

Contenu connexe

Similaire à Prediction of Spotify song popularity.pdf

Jackdaw research music survey report
Jackdaw research music survey reportJackdaw research music survey report
Jackdaw research music survey reportJan Dawson
 
Statistics
StatisticsStatistics
Statisticspikuoec
 
Competitor analysis of Music Streaming Services
Competitor analysis of Music Streaming ServicesCompetitor analysis of Music Streaming Services
Competitor analysis of Music Streaming ServicesTiffany Sam
 
Group discussion- Netease Cloud Music
Group discussion- Netease Cloud MusicGroup discussion- Netease Cloud Music
Group discussion- Netease Cloud MusicXuanting ZHANG
 
Spotify Stream Prediction using Regression Models
Spotify Stream Prediction using Regression ModelsSpotify Stream Prediction using Regression Models
Spotify Stream Prediction using Regression ModelsIRJET Journal
 
ANOVA STATISTICAL ANALYSIS USING SPSS AND ITS IMPACT IN SOCIETY
ANOVA STATISTICAL ANALYSIS USING SPSS AND ITS IMPACT IN SOCIETYANOVA STATISTICAL ANALYSIS USING SPSS AND ITS IMPACT IN SOCIETY
ANOVA STATISTICAL ANALYSIS USING SPSS AND ITS IMPACT IN SOCIETYsaran2011
 
Back to the Future: Evolution of Music Moods From 1992 to 2022
Back to the Future: Evolution of Music Moods From 1992 to 2022Back to the Future: Evolution of Music Moods From 1992 to 2022
Back to the Future: Evolution of Music Moods From 1992 to 2022AndriaLesane
 
Digital Marketing Plan - Soundtap Radio
Digital Marketing Plan - Soundtap RadioDigital Marketing Plan - Soundtap Radio
Digital Marketing Plan - Soundtap RadioHandan Selcuk
 
Reasesrty djhjan S - explanation required.pptx
Reasesrty djhjan S - explanation required.pptxReasesrty djhjan S - explanation required.pptx
Reasesrty djhjan S - explanation required.pptxAnkitaVerma776806
 
Assignment Grading Rubric Course MT460 Uni.docx
Assignment Grading Rubric Course MT460                  Uni.docxAssignment Grading Rubric Course MT460                  Uni.docx
Assignment Grading Rubric Course MT460 Uni.docxrock73
 
music recommendation system ,Based on Million Song Dataset
music recommendation system ,Based on Million Song Datasetmusic recommendation system ,Based on Million Song Dataset
music recommendation system ,Based on Million Song DatasetSandipKumarPratihari
 
Chapter 2
Chapter 2Chapter 2
Chapter 2Lem Lem
 
Understanding ai music discovery and recommendation systems
Understanding ai music discovery and recommendation systemsUnderstanding ai music discovery and recommendation systems
Understanding ai music discovery and recommendation systemsValerio Velardo
 
June 2018 RTG presentation
June 2018 RTG presentationJune 2018 RTG presentation
June 2018 RTG presentationJulia Stelman
 
Understanding and interpreting the report findings
Understanding and interpreting the report findingsUnderstanding and interpreting the report findings
Understanding and interpreting the report findingsHoem Seiha
 
Media presentation
Media presentationMedia presentation
Media presentationDonald Ng
 
SOC2002 Lecture 11
SOC2002 Lecture 11SOC2002 Lecture 11
SOC2002 Lecture 11Bonnie Green
 

Similaire à Prediction of Spotify song popularity.pdf (20)

Hsjs.pdf
Hsjs.pdfHsjs.pdf
Hsjs.pdf
 
Jackdaw research music survey report
Jackdaw research music survey reportJackdaw research music survey report
Jackdaw research music survey report
 
Statistics
StatisticsStatistics
Statistics
 
Competitor analysis of Music Streaming Services
Competitor analysis of Music Streaming ServicesCompetitor analysis of Music Streaming Services
Competitor analysis of Music Streaming Services
 
Group discussion- Netease Cloud Music
Group discussion- Netease Cloud MusicGroup discussion- Netease Cloud Music
Group discussion- Netease Cloud Music
 
Spotify Stream Prediction using Regression Models
Spotify Stream Prediction using Regression ModelsSpotify Stream Prediction using Regression Models
Spotify Stream Prediction using Regression Models
 
ANOVA STATISTICAL ANALYSIS USING SPSS AND ITS IMPACT IN SOCIETY
ANOVA STATISTICAL ANALYSIS USING SPSS AND ITS IMPACT IN SOCIETYANOVA STATISTICAL ANALYSIS USING SPSS AND ITS IMPACT IN SOCIETY
ANOVA STATISTICAL ANALYSIS USING SPSS AND ITS IMPACT IN SOCIETY
 
Enfuse_QS.pdf
Enfuse_QS.pdfEnfuse_QS.pdf
Enfuse_QS.pdf
 
Back to the Future: Evolution of Music Moods From 1992 to 2022
Back to the Future: Evolution of Music Moods From 1992 to 2022Back to the Future: Evolution of Music Moods From 1992 to 2022
Back to the Future: Evolution of Music Moods From 1992 to 2022
 
Digital Marketing Plan - Soundtap Radio
Digital Marketing Plan - Soundtap RadioDigital Marketing Plan - Soundtap Radio
Digital Marketing Plan - Soundtap Radio
 
Reasesrty djhjan S - explanation required.pptx
Reasesrty djhjan S - explanation required.pptxReasesrty djhjan S - explanation required.pptx
Reasesrty djhjan S - explanation required.pptx
 
Assignment Grading Rubric Course MT460 Uni.docx
Assignment Grading Rubric Course MT460                  Uni.docxAssignment Grading Rubric Course MT460                  Uni.docx
Assignment Grading Rubric Course MT460 Uni.docx
 
Emofy
Emofy Emofy
Emofy
 
music recommendation system ,Based on Million Song Dataset
music recommendation system ,Based on Million Song Datasetmusic recommendation system ,Based on Million Song Dataset
music recommendation system ,Based on Million Song Dataset
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
Understanding ai music discovery and recommendation systems
Understanding ai music discovery and recommendation systemsUnderstanding ai music discovery and recommendation systems
Understanding ai music discovery and recommendation systems
 
June 2018 RTG presentation
June 2018 RTG presentationJune 2018 RTG presentation
June 2018 RTG presentation
 
Understanding and interpreting the report findings
Understanding and interpreting the report findingsUnderstanding and interpreting the report findings
Understanding and interpreting the report findings
 
Media presentation
Media presentationMedia presentation
Media presentation
 
SOC2002 Lecture 11
SOC2002 Lecture 11SOC2002 Lecture 11
SOC2002 Lecture 11
 

Dernier

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 

Dernier (20)

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 

Prediction of Spotify song popularity.pdf

  • 3. Problem Statement Spotify, one of the most popular platform, used by the listeners for songs and podcasts. Spotify uses recommendation engine to recommend tracks to the listener in the discover weekly section according to listener's preference and popularity. Out of the two factors, popularity of the track is the most important factor, used by the recommendation engine because it also tells about the popular preferences of the people, based upon various variables and the heyday period of the track.
  • 4. Dataset Description 114000 rows and 21 columns Target Variable Popularity – This measure vary with past released song with present released songs because Spotify reshuffles according to monthly listeners. It is a multiclass variable consisting of 3 categories. In this, only those variables are taken which are affecting the target variable. Other variable are not taken into consideration.
  • 5. Key variables Independent Variable Continuous: Categorical: •Valence% – It is the positiveness of the song. Higher the value is cheerful and euphoric, lower the value depressing and sad. •Danceability% – How much the song can be used for dance purpose. •Energy% – It is the amount of energy a song have •Acoustic Ness% – It measures the use of natural instruments or electronically made music. •Key – It the musical notes which is used in the track, such as 0=C, 1=C#, and so on. There are total of 12 keys present. •Tempo – It represents the speed of the song. Higher the tempo higher faster the song and vice-versa. •Duration – It represents the length of the song in seconds. •Speech ness – It represents the amount of vocals/voices present in the song.
  • 6. Methodology Cleaning and preparing the data. Feature Engineering. Primary model building. Re- considering the variables and tuning the model. Building the model using different algorithms to check the stability of the prediction.
  • 7. EDA Report • Duplicate values, null values and typo error were present in the data. • There are huge outliers present in the data, which is treated by converting them into categories maintaining the balance in the classes. • Did some Feature engineering such as clubbing, binning and rounding the data to reduce the classes in the data. • The target variable “Popularity” was initially in percentage 0-100%. However, the original data description says that it is classification problem. So, the target variable is converted from regression to multi-classification. • The target variable was not-balanced. Oversampling technique was used to balance the classes.
  • 8. Conversion of target variable percentile into three categories. In the histogram below we can see than the target column has a peak at 0, which is represents no popularity of the tracks, so, it is assigned an independent class of the variable because it will impact the accuracy of the model. The new classes are ‘zero popularity’, ‘low popularity’, ‘high popularity’
  • 9. Algorithms report With the different algorithms, the accuracy is not fluctuation much, represent the stability in the prediction. Highest Accuracy = 85.94 Lowest Accuracy = 79.7 Algorithm wise accuracy: • Random Forest Classifier = 85.94 • Decision Tree Classifier = 79.7 • Cat Boost Classifier = 80.3 • XG Boost Classifier = 82.28
  • 10. Key Findings • There were 20 independent variables present in the data but only 8 variables were affecting the popularity of the song. • Valence, danceability and energy are affecting almost 50% to the popularity. • Song Genre is one of the most important factor when comes to individual's preference or taste of music, that recommendation engine considers. The most popular genre is Country-Specific which consist of Country Wise language songs, indicates people love mother tongue when it comes to songs. Apart from that most popular genre is EDM (Electronic Dance Music) because high valence, danceability and energy. • Medium tempo is 2x popular than any other tempo range which is between 100-140 bpm. This tempo is used in EDM, Rock and Pop music, are the most popular genres.
  • 11. The importance of each column related to the popularity The figure in the left shows how much each feature is affecting the target column. Valence + danceability + energy 16.7% + 16.0% + 15.5% = 48.2% First 8 columns or all the 20 columns is giving the same accuracy.
  • 12. Conclusion The overall dataset was little complicated because of the difficulty of establishing the relationship between the target variable and the independent variables. However, with some cleaning and feature engineering, the final model was stable with high accuracy. The most difficult differentiation was that, the popularity was getting affecting by the release date of the song and the release date was not available in the data, so it seemed like the case of Endogeneity. Nonetheless, After separating the target variable, it got sorted. As a Data Scientist, I can conclude that this trained model with the following dataset is predicting accurately and is ready for deployment in the Spotify recommendation Engine, to predict the right popularity in future recommending the right tracks to the listeners .