SlideShare une entreprise Scribd logo
1  sur  30
The Practice of Data Science:
People, Processes and Tools
Bob. E. Hayes, PhD
bob@businessoverbroadway.com
@bobehayes
Presented at Metis’ Demystifying Data Science: A FREE
Online Conference for Aspiring Data Scientists – Sept 27,
2017
Bob E. Hayes, PhD
Email: bob@businessoverbroadway.com
Web: www.businessoverbroadway.com
Twitter: @bobehayes
• Author of three books on customer experience
management and analytics
• PhD in industrial-organizational psychology
• #6 blogger overall on CustomerThink
(http://customerthink.com/author/bobehayes/)
• #3 blogger on the topic of customer analytics
(http://customerthink.com/top-authors-category/)
• Top expert in Big Data and Data Science
• https://www.maptive.com/the-top-100-big-data-
experts/
• http://www.kdnuggets.com/2015/02/top-big-data-
influencers-brands.html
3
Outline
• Why now?
• Definition of Data Science
• The People: Data Science Skills
• The Process: From Data to Insight
• The Tools
• Education Requirements
• Gender Diversity
4
Data and Our Ability to Process it
Analytics Skills Gap is Huge*
* From PwC: Investing in America’s Data Science and Analytics Talent
6
Data Science Defined
Data science is way of extracting
insights from data using the powers of
computer science and statistics applied to
data from a specific field of study.
7
Data Science Defined
The People
8
JobRolesinDataScience
*Researcher (e.g., researcher, scientist, statistician); Business Management (e.g., leader, business person, entrepreneur); Creative
(e.g., jack of all trades, artist, hacker); Developer (e.g., developer, engineer)
9
Three Skill Domains of Data Science
Domain
Knowledge
Math /
Statistics
Technology /
Programming
10
25 Data Science Skills
Top 10 Data Science
Skills
1. Communication
2. Managing structured data
3. Data mining and visualization tools
4. Science / Scientific method
5. Math
6. Project management
7. Data management
8. Statistics and statistical modeling
9. Product design and development
10. Business developmentData are based on responses to AnalyticsWeek and Business Over
Broadway Data Science Survey. From September 2015.
11
Skill Proficiency Varies by Data Science Role
0
10
20
30
40
50
60
70
80
Buisness development
Budgeting
Goverance and Compliance
Optimization
Math
Graphical Models
Algorithms
Bayesian Statistics
Machine Learning
Data Mining and Viz Tools
Statistics and statistical modeling
Science/Scientific Method
CommunicationUnstructured data
Structured data
NLP and text mining
Data Management
Big and distributed data
Systems Administration
Database Administration
Cloud Management
Back-end Programming
Front-end Programming
Product Design
Project management
Domain Expert
Developer
Researcher
Proficiency Standard
Math /
Statistics
Tech /
Programming
Domain Knowledge
Data are based on responses to AnalyticsWeek and Business Over Broadway Data Science Survey. From September 2015.
12
In Search of the Data Science Unicorn
I wish I knew
some Python.
Data are based on responses to AnalyticsWeek and Business Over Broadway Data Science Survey. From 2015.
14
Analytics, Data Mining and Data Science Methods
S = Start with Strategy
M = Measure Metrics and Data
A = Apply Analytics
R = Report Results
T = Transform your Business
From “CRISP-DM, still the top methodology for analytics, data mining, or data science projects“
http://www.kdnuggets.com/2014/10/crisp-dm-top-methodology-analytics-data-mining-data-science-projects.html
15
Cross Industry Standard Process
for Data Mining (CRISP-DM)
(IBM, Teradata, Daimler AG, NCR Corporation and OHRA)
From Data to Insight
For more information on these methods, see: https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining;
https://en.wikipedia.org/wiki/SEMMA; https://en.wikipedia.org/wiki/Data_mining
Knowledge Discovery in
Databases (KDD)
SEMMA
(SAS)
16
Getting Insight from Data: The Scientific Method
1. Formulate
Questions
2. Generate
hypothesis/
hunch
3. Gather /
Generate data
4. Analyze
data / Test
hypothesis
5. Take action /
Communicate
results
• Start with a problem
statement.
• What are your hunches /
hypotheses?
• Be sure your hypotheses
are testable.
• You can use experimental or
observational approach to
analyzing data.
• Integrate your data silos to ask
bigger questions; connect the
dots and get a 360 degree view of
the phenomenon you’re studying.
• Employ Predictive analytics /
Inferential statistics to test
hypotheses.
• Employ machine learning to
quickly surface insights.
• Implement your findings;
inform decision-makers;
optimize algorithms
• Use Prescriptive analytics
to guide course of action.
17
Iterative Process of Discovery
Image from Netflix Tech Blog: https://medium.com/netflix-techblog/a-b-testing-and-beyond-improving-the-netflix-streaming-experience-
with-experimentation-and-data-5b0ae9295bdf
18
Scientific Method and Data Science Skills
19
The Tools
20
Top Data Science Tools
Rexer Analytics Data
Science Survey 2015
For a comprehensive overview of different data science tools,
please see: http://r4stats.com/articles/popularity/
21
Data Science Ecosystem
Gartner Magic Quadrant (2017) Forrester Wave
Leaders
IBM
SAS
RapidMiner
KNIME
For a good review of data science platforms, please see:
https://thomaswdinsmore.com/2017/02/28/gartner-looks-at-data-science-platforms/
22
Extra
Important Skills, Role of Formal Education, Gender Diversity
23
Importance of Data Science Skills by Job Role
24
What skills are linked to project success?
25
Highest Level of Education Attained
26
Education and Data Science Skills
Data are based on responses to AnalyticsWeek and Business Over Broadway Data Science Survey. From 2015.
27
Lack of Gender Diversity
28
Job Roles in Data Science by Gender
29
Gender Diversity – Other Science Roles
30
Gender Comparison of Proficiency across Skills
31
Advice for Data Scientists
• Be specific when talking about “data scientists”
• There are different types – defined by what they do and the skills they possess
• Work with other data professionals who have complementary skills.
Teamwork is key to successful data science projects.
• Learn to use data mining and visualization tools
• R, Python, SPSS, SAS, graphics, mapping, web-based data visualization
• Be an advocate for women in the field of data science

Contenu connexe

Plus de Business Over Broadway

In a Word: The Customer Sentiment Index
In a Word: The Customer Sentiment IndexIn a Word: The Customer Sentiment Index
In a Word: The Customer Sentiment IndexBusiness Over Broadway
 
Big Data - What it Really Means for VOC and Customer Experience Professionals
Big Data - What it Really Means for VOC and Customer Experience ProfessionalsBig Data - What it Really Means for VOC and Customer Experience Professionals
Big Data - What it Really Means for VOC and Customer Experience ProfessionalsBusiness Over Broadway
 
Improving the customer experience using big data customer-centric measurement...
Improving the customer experience using big data customer-centric measurement...Improving the customer experience using big data customer-centric measurement...
Improving the customer experience using big data customer-centric measurement...Business Over Broadway
 
Customer Relationship Diagnostic: Sample Report
Customer Relationship Diagnostic: Sample ReportCustomer Relationship Diagnostic: Sample Report
Customer Relationship Diagnostic: Sample ReportBusiness Over Broadway
 
Customer Experience Management for Startups
Customer Experience Management for StartupsCustomer Experience Management for Startups
Customer Experience Management for StartupsBusiness Over Broadway
 
Big Data has Big Implications for Customer Experience Management
Big Data has Big Implications for Customer Experience ManagementBig Data has Big Implications for Customer Experience Management
Big Data has Big Implications for Customer Experience ManagementBusiness Over Broadway
 
Asking the Right CX Questions: Optimizing your Customer Relationship Survey
Asking the Right CX Questions: Optimizing your Customer Relationship SurveyAsking the Right CX Questions: Optimizing your Customer Relationship Survey
Asking the Right CX Questions: Optimizing your Customer Relationship SurveyBusiness Over Broadway
 
Linkage Analysis in Customer Feedback Programs
Linkage Analysis in Customer Feedback ProgramsLinkage Analysis in Customer Feedback Programs
Linkage Analysis in Customer Feedback ProgramsBusiness Over Broadway
 
Competitive Analytics that Drive Customer Loyalty
Competitive Analytics that Drive Customer LoyaltyCompetitive Analytics that Drive Customer Loyalty
Competitive Analytics that Drive Customer LoyaltyBusiness Over Broadway
 
Developing a Customer Centric Research Program
Developing a Customer Centric Research ProgramDeveloping a Customer Centric Research Program
Developing a Customer Centric Research ProgramBusiness Over Broadway
 
Managing Customer Loyalty - Micro and Macro Approach
Managing Customer Loyalty - Micro and Macro ApproachManaging Customer Loyalty - Micro and Macro Approach
Managing Customer Loyalty - Micro and Macro ApproachBusiness Over Broadway
 

Plus de Business Over Broadway (16)

In a Word: The Customer Sentiment Index
In a Word: The Customer Sentiment IndexIn a Word: The Customer Sentiment Index
In a Word: The Customer Sentiment Index
 
The Hidden Bias in Customer Metrics
The Hidden Bias in Customer MetricsThe Hidden Bias in Customer Metrics
The Hidden Bias in Customer Metrics
 
Big Data and Customer Experience
Big Data and Customer ExperienceBig Data and Customer Experience
Big Data and Customer Experience
 
Big Data - What it Really Means for VOC and Customer Experience Professionals
Big Data - What it Really Means for VOC and Customer Experience ProfessionalsBig Data - What it Really Means for VOC and Customer Experience Professionals
Big Data - What it Really Means for VOC and Customer Experience Professionals
 
Improving the customer experience using big data customer-centric measurement...
Improving the customer experience using big data customer-centric measurement...Improving the customer experience using big data customer-centric measurement...
Improving the customer experience using big data customer-centric measurement...
 
Customer Relationship Diagnostic: Sample Report
Customer Relationship Diagnostic: Sample ReportCustomer Relationship Diagnostic: Sample Report
Customer Relationship Diagnostic: Sample Report
 
Customer Experience Management for Startups
Customer Experience Management for StartupsCustomer Experience Management for Startups
Customer Experience Management for Startups
 
Big Data has Big Implications for Customer Experience Management
Big Data has Big Implications for Customer Experience ManagementBig Data has Big Implications for Customer Experience Management
Big Data has Big Implications for Customer Experience Management
 
Asking the Right CX Questions: Optimizing your Customer Relationship Survey
Asking the Right CX Questions: Optimizing your Customer Relationship SurveyAsking the Right CX Questions: Optimizing your Customer Relationship Survey
Asking the Right CX Questions: Optimizing your Customer Relationship Survey
 
Linkage Analysis in Customer Feedback Programs
Linkage Analysis in Customer Feedback ProgramsLinkage Analysis in Customer Feedback Programs
Linkage Analysis in Customer Feedback Programs
 
Competitive Analytics that Drive Customer Loyalty
Competitive Analytics that Drive Customer LoyaltyCompetitive Analytics that Drive Customer Loyalty
Competitive Analytics that Drive Customer Loyalty
 
Validation of Customer Survey
Validation of Customer SurveyValidation of Customer Survey
Validation of Customer Survey
 
Developing a Customer Centric Research Program
Developing a Customer Centric Research ProgramDeveloping a Customer Centric Research Program
Developing a Customer Centric Research Program
 
Managing Customer Loyalty - Micro and Macro Approach
Managing Customer Loyalty - Micro and Macro ApproachManaging Customer Loyalty - Micro and Macro Approach
Managing Customer Loyalty - Micro and Macro Approach
 
Building a Customer Feedback Program
Building a Customer Feedback ProgramBuilding a Customer Feedback Program
Building a Customer Feedback Program
 
RAPID Loyalty Measurement
RAPID Loyalty MeasurementRAPID Loyalty Measurement
RAPID Loyalty Measurement
 

Dernier

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 

Dernier (20)

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 

The Practice of Data Science - Demystifying Data Science Conference

  • 1. The Practice of Data Science: People, Processes and Tools Bob. E. Hayes, PhD bob@businessoverbroadway.com @bobehayes Presented at Metis’ Demystifying Data Science: A FREE Online Conference for Aspiring Data Scientists – Sept 27, 2017
  • 2. Bob E. Hayes, PhD Email: bob@businessoverbroadway.com Web: www.businessoverbroadway.com Twitter: @bobehayes • Author of three books on customer experience management and analytics • PhD in industrial-organizational psychology • #6 blogger overall on CustomerThink (http://customerthink.com/author/bobehayes/) • #3 blogger on the topic of customer analytics (http://customerthink.com/top-authors-category/) • Top expert in Big Data and Data Science • https://www.maptive.com/the-top-100-big-data- experts/ • http://www.kdnuggets.com/2015/02/top-big-data- influencers-brands.html
  • 3. 3 Outline • Why now? • Definition of Data Science • The People: Data Science Skills • The Process: From Data to Insight • The Tools • Education Requirements • Gender Diversity
  • 4. 4 Data and Our Ability to Process it
  • 5. Analytics Skills Gap is Huge* * From PwC: Investing in America’s Data Science and Analytics Talent
  • 6. 6 Data Science Defined Data science is way of extracting insights from data using the powers of computer science and statistics applied to data from a specific field of study.
  • 8. 8 JobRolesinDataScience *Researcher (e.g., researcher, scientist, statistician); Business Management (e.g., leader, business person, entrepreneur); Creative (e.g., jack of all trades, artist, hacker); Developer (e.g., developer, engineer)
  • 9. 9 Three Skill Domains of Data Science Domain Knowledge Math / Statistics Technology / Programming
  • 10. 10 25 Data Science Skills Top 10 Data Science Skills 1. Communication 2. Managing structured data 3. Data mining and visualization tools 4. Science / Scientific method 5. Math 6. Project management 7. Data management 8. Statistics and statistical modeling 9. Product design and development 10. Business developmentData are based on responses to AnalyticsWeek and Business Over Broadway Data Science Survey. From September 2015.
  • 11. 11 Skill Proficiency Varies by Data Science Role 0 10 20 30 40 50 60 70 80 Buisness development Budgeting Goverance and Compliance Optimization Math Graphical Models Algorithms Bayesian Statistics Machine Learning Data Mining and Viz Tools Statistics and statistical modeling Science/Scientific Method CommunicationUnstructured data Structured data NLP and text mining Data Management Big and distributed data Systems Administration Database Administration Cloud Management Back-end Programming Front-end Programming Product Design Project management Domain Expert Developer Researcher Proficiency Standard Math / Statistics Tech / Programming Domain Knowledge Data are based on responses to AnalyticsWeek and Business Over Broadway Data Science Survey. From September 2015.
  • 12. 12 In Search of the Data Science Unicorn I wish I knew some Python. Data are based on responses to AnalyticsWeek and Business Over Broadway Data Science Survey. From 2015.
  • 13. 14 Analytics, Data Mining and Data Science Methods S = Start with Strategy M = Measure Metrics and Data A = Apply Analytics R = Report Results T = Transform your Business From “CRISP-DM, still the top methodology for analytics, data mining, or data science projects“ http://www.kdnuggets.com/2014/10/crisp-dm-top-methodology-analytics-data-mining-data-science-projects.html
  • 14. 15 Cross Industry Standard Process for Data Mining (CRISP-DM) (IBM, Teradata, Daimler AG, NCR Corporation and OHRA) From Data to Insight For more information on these methods, see: https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining; https://en.wikipedia.org/wiki/SEMMA; https://en.wikipedia.org/wiki/Data_mining Knowledge Discovery in Databases (KDD) SEMMA (SAS)
  • 15. 16 Getting Insight from Data: The Scientific Method 1. Formulate Questions 2. Generate hypothesis/ hunch 3. Gather / Generate data 4. Analyze data / Test hypothesis 5. Take action / Communicate results • Start with a problem statement. • What are your hunches / hypotheses? • Be sure your hypotheses are testable. • You can use experimental or observational approach to analyzing data. • Integrate your data silos to ask bigger questions; connect the dots and get a 360 degree view of the phenomenon you’re studying. • Employ Predictive analytics / Inferential statistics to test hypotheses. • Employ machine learning to quickly surface insights. • Implement your findings; inform decision-makers; optimize algorithms • Use Prescriptive analytics to guide course of action.
  • 16. 17 Iterative Process of Discovery Image from Netflix Tech Blog: https://medium.com/netflix-techblog/a-b-testing-and-beyond-improving-the-netflix-streaming-experience- with-experimentation-and-data-5b0ae9295bdf
  • 17. 18 Scientific Method and Data Science Skills
  • 19. 20 Top Data Science Tools Rexer Analytics Data Science Survey 2015 For a comprehensive overview of different data science tools, please see: http://r4stats.com/articles/popularity/
  • 20. 21 Data Science Ecosystem Gartner Magic Quadrant (2017) Forrester Wave Leaders IBM SAS RapidMiner KNIME For a good review of data science platforms, please see: https://thomaswdinsmore.com/2017/02/28/gartner-looks-at-data-science-platforms/
  • 21. 22 Extra Important Skills, Role of Formal Education, Gender Diversity
  • 22. 23 Importance of Data Science Skills by Job Role
  • 23. 24 What skills are linked to project success?
  • 24. 25 Highest Level of Education Attained
  • 25. 26 Education and Data Science Skills Data are based on responses to AnalyticsWeek and Business Over Broadway Data Science Survey. From 2015.
  • 26. 27 Lack of Gender Diversity
  • 27. 28 Job Roles in Data Science by Gender
  • 28. 29 Gender Diversity – Other Science Roles
  • 29. 30 Gender Comparison of Proficiency across Skills
  • 30. 31 Advice for Data Scientists • Be specific when talking about “data scientists” • There are different types – defined by what they do and the skills they possess • Work with other data professionals who have complementary skills. Teamwork is key to successful data science projects. • Learn to use data mining and visualization tools • R, Python, SPSS, SAS, graphics, mapping, web-based data visualization • Be an advocate for women in the field of data science

Notes de l'éditeur

  1. Involves the collection, analysis and interpretation of data to extract empirically-based insights that augment and enhance human decisions and algorithms.
  2. Involves the collection, analysis and interpretation of data to extract empirically-based insights that augment and enhance human decisions and algorithms.
  3. Involves the collection, analysis and interpretation of data to extract empirically-based insights that augment and enhance human decisions and algorithms.
  4. Involves the collection, analysis and interpretation of data to extract empirically-based insights that augment and enhance human decisions and algorithms.
  5. Involves the collection, analysis and interpretation of data to extract empirically-based insights that augment and enhance human decisions and algorithms.
  6. Involves the collection, analysis and interpretation of data to extract empirically-based insights that augment and enhance human decisions and algorithms.
  7. SEMMA is an acronym that stands for Sample, Explore, Modify, Model, and Assess. It is a list of sequential steps developed by SAS Institute. The only other data mining approach named in these polls was SEMMA. However, SAS Institute clearly states that SEMMA is not a data mining methodology, but rather a "logical organization of the functional tool set of SAS Enterprise Miner."  The term Knowledge Discovery in Databases, or KDD for short, refers to the broad process of finding knowledge in data, and emphasizes the "high-level" application of particular data mining methods.