SlideShare une entreprise Scribd logo
1  sur  36
Télécharger pour lire hors ligne
How to start your journey as a
Data Scientist?
PARVANEH SHAFIEI
WOMAN IN DATA SCIENCE – TURIN 26TH FEBRUARY
Who I am
 Past Web & Software developer
 Master: computer science in Polimi
• Founded July 2017
• Near 250 Rladiers
Where are Rladies
now?
+20 COUNTRIES
+70 CHAPTERS
14% R package developers are female
20% Of all tech startups across the world are founded by women
26% Only make up data professionals
Twitter followers by each chapter
Made with 💜 by Daniela Vázquez
‘’A data scientist is
better at statistic
than a software
engineer and is
better at software
engineering than a
statistician“
Who ia Data scientist?
 Passionate about data
 Thinker
 A listener to understand business problem
 A good communicator and storyteller
 Have technical skills for coding and analyzing data
 Obsess with solving problems
Data scientists Statistician Data engineer
• Use analytics & technical skills to
extract, analyze and model data
• Understand statistic and apply it
on real problems
• Responsible for architecture of
data
• Ensure the flow of data within
servers & applications
Be ready
for the
journey!
Learning in data science is not a linear
path!
 Data science is not about just one specific skills
 You must know all things and be expert in one of them
 It is a fast evolving field
 It does not need to be expert in the domain
Where to start learning or add more skill
sets to your toolbox?
Online courses Datasets for practices
There is a starting point but no end road!
6
5
3
2
4
1
Basic
programming &
introduction to
data science
Statistic &
Math
Data mining &
visualization
SQL &
databases
Machine learning
Advanced
Machine learning
Beginners steps
Basic programming & introduction to data science
• Benefits & potential of the languages
• Answer possible questions in the field
• How to code and use the basics such as libraries, functions,..
• How to load & read & manipulate data
• R
• Python
Beginners steps
Statistic & Math
• Learn fundamental concepts of statistics such as p-values, variance, correlation, statistical hypothesis..
• Evaluate various types of data and how to interpret their structure
• How to apply various statistical methods on the data
Beginners steps
Data mining & visualization
• Handle anomalies in data such as missing values, outliers,..
• Explore correlations among variables
• Apply feature engineering on the data
• Create graph & visualization to demonstrate findings in the data
Beginners steps
SQL & databases
• Understand how relational databases are working
• How to interact with databases for fetching, saving and manipulation of data
Beginners steps
Machine learning
• Learn various algorithms such as regression, random forest, classification tree, etc. and their concepts and
understand where to use them
• How to apply predictive modeling on set of data
• Learn how to train various model and what are the metrics of trained models and how to compare them
Intermediate & advanced steps
Advanced machine learning
• Learn how to manipulate unstructured data
• Learn and understand advanced topics such as deep learning, social network analysis, text mining, time series
processing,..
Being specialist in one field?
Other topics
• Depends on the field, business and type of problems
Kaggle learning
If you have to read
just one single book
/ or watch just 15
hours videos
Free books
Websites
You don’t have any experience?
 Do the action!
 Apply what you learn on some use cases from Kaggle or any other data set that
you can
 Do the competitions in Kaggle, share the code in blogs, GitHub account,
LinkedIn,...
 You can find internship, asking companies directly
 Ask mentors to follow you in the way
Ok, I’m all set, but how can I tackle a real
problem?
Collect data
Preprocess &
clean data
Exploratory
analysis
Train data
Test & validate
models
Result
interpretations
Define business
problem
Deployment
Define the business problem
 Listen to business’s
need
 You must be a good
listener
 The business you are facing
do not know their problem
yet? That is even fine too!
 A lot of time, they don’t
have any idea where to
start
Collect data
Structured
data
Unstructured
data
Data are in a
unstructured format
such as text, images,
audio and video
Data are in a
structured format
such as database, csv
file,...
Near 80% of
data in
organization
are
unstructured
Preprocessing & cleaning data
 Never trust your data
 Check consistency & structure of the data to be sure about its quality
 Fill missing values
 Be aware of noisy data and treat them well!
 Check outliers & remove them
 Apply data transformation such as normalization or aggregation
Exploratory analysis
 Understand and summarize the data
 Explore for insights & hidden patterns
 Investigate the correlations among various variables
 Create some hypothesis based on the findings
 Understand which model & technique to use for analyzing the data
 Feature engineering can happen in this stage too
Think about
the questions
that you are
going to
answer
Model training, testing & validation
Training Validation
Algorithms
Model 1
Testing
Model 2 Model 3 .....
60%, 20%, 20%
Decision tree
Density based
Hierarchy
PAM
K-means
PCA
Random forest
Logistic regression
Linear regression
Algorithms?
Classification /
supervised
learning
Clustering/
unsupervised
learning
Deployment
API
New data Output
Model
New data Output
Batch
Model
Real-time
streaming
"More data beats better models.
Better data beats more data." —
Riley Newman
“
”
Enjoy your
journey!

Contenu connexe

Tendances

Become a Data Analyst
Become a Data Analyst Become a Data Analyst
Become a Data Analyst Aaron Lamphere
 
Data Science
Data ScienceData Science
Data ScienceRabin BK
 
Career in Data Science
Career in Data ScienceCareer in Data Science
Career in Data ScienceActonRoy
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceSampath Kumar
 
Data Science Introduction
Data Science IntroductionData Science Introduction
Data Science IntroductionGang Tao
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data ScienceJason Geng
 
Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data scienceShilpaKrishna6
 
Data Analyst Roles & Responsibilities | Edureka
Data Analyst Roles & Responsibilities | EdurekaData Analyst Roles & Responsibilities | Edureka
Data Analyst Roles & Responsibilities | EdurekaEdureka!
 
Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...
Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...
Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...Edureka!
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data sciencebhavesh lande
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Venkata Reddy Konasani
 
How to become a Data Scientist?
How to become a Data Scientist? How to become a Data Scientist?
How to become a Data Scientist? HackerEarth
 
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...Edureka!
 
Data science & data scientist
Data science & data scientistData science & data scientist
Data science & data scientistVijayMohan Vasu
 
Introduction to data science.pptx
Introduction to data science.pptxIntroduction to data science.pptx
Introduction to data science.pptxSadhanaParameswaran
 
How to Become a Data Analyst? | Data Analyst Skills | Data Analyst Training |...
How to Become a Data Analyst? | Data Analyst Skills | Data Analyst Training |...How to Become a Data Analyst? | Data Analyst Skills | Data Analyst Training |...
How to Become a Data Analyst? | Data Analyst Skills | Data Analyst Training |...Edureka!
 

Tendances (20)

Data science
Data scienceData science
Data science
 
Become a Data Analyst
Become a Data Analyst Become a Data Analyst
Become a Data Analyst
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Data Science
Data ScienceData Science
Data Science
 
Career in Data Science
Career in Data ScienceCareer in Data Science
Career in Data Science
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Data Science Introduction
Data Science IntroductionData Science Introduction
Data Science Introduction
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data Science
 
Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data science
 
Data Analyst Roles & Responsibilities | Edureka
Data Analyst Roles & Responsibilities | EdurekaData Analyst Roles & Responsibilities | Edureka
Data Analyst Roles & Responsibilities | Edureka
 
Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...
Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...
Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data science
 
Data analytics
Data analyticsData analytics
Data analytics
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
 
How to become a Data Scientist?
How to become a Data Scientist? How to become a Data Scientist?
How to become a Data Scientist?
 
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...
 
Data science & data scientist
Data science & data scientistData science & data scientist
Data science & data scientist
 
Introduction to data science.pptx
Introduction to data science.pptxIntroduction to data science.pptx
Introduction to data science.pptx
 
8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy
 
How to Become a Data Analyst? | Data Analyst Skills | Data Analyst Training |...
How to Become a Data Analyst? | Data Analyst Skills | Data Analyst Training |...How to Become a Data Analyst? | Data Analyst Skills | Data Analyst Training |...
How to Become a Data Analyst? | Data Analyst Skills | Data Analyst Training |...
 

Similaire à Start Your Data Science Journey

Data Science Training in Chandigarh h
Data Science Training in Chandigarh    hData Science Training in Chandigarh    h
Data Science Training in Chandigarh hasmeerana605
 
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Rohit Dubey
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2Roger Barga
 
Data science-retreat-how it works plus advice for upcoming data scientists
Data science-retreat-how it works plus advice for upcoming data scientistsData science-retreat-how it works plus advice for upcoming data scientists
Data science-retreat-how it works plus advice for upcoming data scientistsJose Quesada
 
Utiva Presesentation-Shamsudeen Suleiman.pptx
Utiva Presesentation-Shamsudeen Suleiman.pptxUtiva Presesentation-Shamsudeen Suleiman.pptx
Utiva Presesentation-Shamsudeen Suleiman.pptxSuleimanBashirShamsu
 
How to start your data career
How to start your data careerHow to start your data career
How to start your data careerAdwait Bhave
 
Introduction to Business Analytics Part 1
Introduction to Business Analytics Part 1Introduction to Business Analytics Part 1
Introduction to Business Analytics Part 1Beamsync
 
Career in Data Using Tableau
Career in Data Using TableauCareer in Data Using Tableau
Career in Data Using TableauJen Vaughan
 
Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)SayyedYusufali
 
Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)SayyedYusufali
 
Data science training in hydpdf converted (1)
Data science training in hydpdf  converted (1)Data science training in hydpdf  converted (1)
Data science training in hydpdf converted (1)SayyedYusufali
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?DIGITALSAI1
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification courseKumarNaik21
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)SayyedYusufali
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabadVamsiNihal
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabadsaitejavella
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training HyderabadNithinsunil1
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabadVamsiNihal
 

Similaire à Start Your Data Science Journey (20)

Lean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science teamLean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science team
 
Data Science Training in Chandigarh h
Data Science Training in Chandigarh    hData Science Training in Chandigarh    h
Data Science Training in Chandigarh h
 
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
Data science-retreat-how it works plus advice for upcoming data scientists
Data science-retreat-how it works plus advice for upcoming data scientistsData science-retreat-how it works plus advice for upcoming data scientists
Data science-retreat-how it works plus advice for upcoming data scientists
 
So you want to be a Data Scientist?
So you want to be a Data Scientist?So you want to be a Data Scientist?
So you want to be a Data Scientist?
 
Utiva Presesentation-Shamsudeen Suleiman.pptx
Utiva Presesentation-Shamsudeen Suleiman.pptxUtiva Presesentation-Shamsudeen Suleiman.pptx
Utiva Presesentation-Shamsudeen Suleiman.pptx
 
How to start your data career
How to start your data careerHow to start your data career
How to start your data career
 
Introduction to Business Analytics Part 1
Introduction to Business Analytics Part 1Introduction to Business Analytics Part 1
Introduction to Business Analytics Part 1
 
Career in Data Using Tableau
Career in Data Using TableauCareer in Data Using Tableau
Career in Data Using Tableau
 
Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)
 
Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)
 
Data science training in hydpdf converted (1)
Data science training in hydpdf  converted (1)Data science training in hydpdf  converted (1)
Data science training in hydpdf converted (1)
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 

Dernier

Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 

Dernier (20)

Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 

Start Your Data Science Journey

  • 1. How to start your journey as a Data Scientist? PARVANEH SHAFIEI WOMAN IN DATA SCIENCE – TURIN 26TH FEBRUARY
  • 2. Who I am  Past Web & Software developer  Master: computer science in Polimi
  • 3. • Founded July 2017 • Near 250 Rladiers Where are Rladies now? +20 COUNTRIES +70 CHAPTERS
  • 4. 14% R package developers are female 20% Of all tech startups across the world are founded by women 26% Only make up data professionals
  • 5. Twitter followers by each chapter Made with 💜 by Daniela Vázquez
  • 6. ‘’A data scientist is better at statistic than a software engineer and is better at software engineering than a statistician“
  • 7. Who ia Data scientist?  Passionate about data  Thinker  A listener to understand business problem  A good communicator and storyteller  Have technical skills for coding and analyzing data  Obsess with solving problems
  • 8. Data scientists Statistician Data engineer • Use analytics & technical skills to extract, analyze and model data • Understand statistic and apply it on real problems • Responsible for architecture of data • Ensure the flow of data within servers & applications
  • 10. Learning in data science is not a linear path!  Data science is not about just one specific skills  You must know all things and be expert in one of them  It is a fast evolving field  It does not need to be expert in the domain
  • 11. Where to start learning or add more skill sets to your toolbox? Online courses Datasets for practices
  • 12. There is a starting point but no end road! 6 5 3 2 4 1 Basic programming & introduction to data science Statistic & Math Data mining & visualization SQL & databases Machine learning Advanced Machine learning
  • 13. Beginners steps Basic programming & introduction to data science • Benefits & potential of the languages • Answer possible questions in the field • How to code and use the basics such as libraries, functions,.. • How to load & read & manipulate data • R • Python
  • 14. Beginners steps Statistic & Math • Learn fundamental concepts of statistics such as p-values, variance, correlation, statistical hypothesis.. • Evaluate various types of data and how to interpret their structure • How to apply various statistical methods on the data
  • 15. Beginners steps Data mining & visualization • Handle anomalies in data such as missing values, outliers,.. • Explore correlations among variables • Apply feature engineering on the data • Create graph & visualization to demonstrate findings in the data
  • 16. Beginners steps SQL & databases • Understand how relational databases are working • How to interact with databases for fetching, saving and manipulation of data
  • 17. Beginners steps Machine learning • Learn various algorithms such as regression, random forest, classification tree, etc. and their concepts and understand where to use them • How to apply predictive modeling on set of data • Learn how to train various model and what are the metrics of trained models and how to compare them
  • 18. Intermediate & advanced steps Advanced machine learning • Learn how to manipulate unstructured data • Learn and understand advanced topics such as deep learning, social network analysis, text mining, time series processing,..
  • 19. Being specialist in one field? Other topics • Depends on the field, business and type of problems
  • 21. If you have to read just one single book / or watch just 15 hours videos
  • 24. You don’t have any experience?  Do the action!  Apply what you learn on some use cases from Kaggle or any other data set that you can  Do the competitions in Kaggle, share the code in blogs, GitHub account, LinkedIn,...  You can find internship, asking companies directly  Ask mentors to follow you in the way
  • 25.
  • 26. Ok, I’m all set, but how can I tackle a real problem? Collect data Preprocess & clean data Exploratory analysis Train data Test & validate models Result interpretations Define business problem Deployment
  • 27. Define the business problem  Listen to business’s need  You must be a good listener  The business you are facing do not know their problem yet? That is even fine too!  A lot of time, they don’t have any idea where to start
  • 28. Collect data Structured data Unstructured data Data are in a unstructured format such as text, images, audio and video Data are in a structured format such as database, csv file,... Near 80% of data in organization are unstructured
  • 29. Preprocessing & cleaning data  Never trust your data  Check consistency & structure of the data to be sure about its quality  Fill missing values  Be aware of noisy data and treat them well!  Check outliers & remove them  Apply data transformation such as normalization or aggregation
  • 30. Exploratory analysis  Understand and summarize the data  Explore for insights & hidden patterns  Investigate the correlations among various variables  Create some hypothesis based on the findings  Understand which model & technique to use for analyzing the data  Feature engineering can happen in this stage too Think about the questions that you are going to answer
  • 31. Model training, testing & validation Training Validation Algorithms Model 1 Testing Model 2 Model 3 ..... 60%, 20%, 20%
  • 32. Decision tree Density based Hierarchy PAM K-means PCA Random forest Logistic regression Linear regression Algorithms? Classification / supervised learning Clustering/ unsupervised learning
  • 33.
  • 34. Deployment API New data Output Model New data Output Batch Model Real-time streaming
  • 35. "More data beats better models. Better data beats more data." — Riley Newman