SlideShare une entreprise Scribd logo
1  sur  28
A Hitchhiker’s Guide to
Data Science
sudeep das
Sudeep Das
Senior Machine Learning Researcher
@datamusing
My Journey
Ph. D. Astrophysics
Cosmic Microwave Background
Gravitational Lensing
Beats Music
Core Recommendation Systems Group
What do I do?
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metrics
The Grand Innovation Workflow
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metrics
In some companies, this is a data scientist
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metrics
In some other companies, this is a data scientist
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metrics
yet in some other companies, this is a data scientist
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metrics
At Netflix, this is broadly what I do
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Tools of the trade
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metrics
SQL, Spark (scala), PySpark, Python-Pandas, Hive,AWS-S3
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metrics
Matplotlib, Tableau, Vega, Plotly, custom javascript (d3)
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metrics
Hive, s3, APIs in Flask/Django/Java
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metricsPython, SciKit-learn, Jupyter notebooks,
TensorFlow/Keras, XGBoost, SparkML/scala, Zeppelin ...
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipelines
Monitor offline
metrics
Docker, company specific platforms
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipelines
Monitor offline
metrics
Java, Scala, in some cases Python, company specific
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Types of Problems
● Personalization
● Search
● Object recognition
● Voice/speech recognition
● Pattern recognition
● Natural Language
Processing
● Trend prediction
● Segmentation/clustering
● Dynamic Pricing
● Optimization
● Outlier Detection
At Netflix, we do a bit of everything
Emergent Trends
Probabilistic Graphical Models -
Bayes Nets
Deep Learning
Causal
Inference
(Deep)
Reinforcement
Learning
What academia prepares you for
● Perseverance
● Ability to pick up new technical skills
● Presentation skills
● Some quantitative visualization skills
● Ability to distil technical research in related areas and adapt it to the problem at hand
● If you are from a quantitative and experimental field:
○ Mathematical abilities
○ Knowledge of Basic Statistics - error analysis, experiment design
○ Some parameter estimation, bayesian inference exposure
○ Some ability to write code
○ Some exposure to general machine learning
● Learning from failure: Most A/B tests fail - so do experiments in academia
● Writing papers/ technical blogs etc.
What academia doesn’t prepare you for
● Being a good listener
● Asking questions
● Understanding and articulating the business value of your technical pursuit
● Writing clean, maintainable code with documentation and unit tests
● Ability to collaborate across teams and cultures - cross-functionally
● Admitting that “Good enough” is better than perfect
● Coping with quick project timelines
● Documenting, sharing, getting early input on projects
● Dealing with live, large, and exceptionally dirty datasets.
● Understanding that research in Industry is results driven and not publication driven.
● Stepping out of your focus area and seeing your problem in the bigger context of where your
company is headed.
Marketing Yourself
Fill in your
basic skills
gaps
Databases, SQL,
Spark familiarity
Data Structures
Algo/CS 101
Get really strong
in one language -
highly
recommend
Python - pandas,
scikit ecosystem
Good coding
practices -
documentation,
modular code,
unit tests
Amp up
your ML
Knowledge
Create an
Online
Presence
Improve soft
skills
Interview
Prep
Your friends:
Online courses
and open
datasets!
Do mini projects
on ML, esp. Deep
Learning,
Reinforcement
Learning. Get
creative!
Get a rock solid
foundation in
basic stats.
Kaggle
Competitions
Github repo so
recruiters can look
at your code.
Put your hobby
projects online
Write a blog post
on something new
you learned
Follow/contribute
to Stackoverflow
Landing the First Job!
Identify
weakness in
communication
skills and work
on them.
Pick up speaking
engagements at
meetups, at your
university, and
conferences such
as PyData
Do collaborative
projects with
people who are
also transitioning
Practise whiteboarding,
collaborative coding on
CoderPad
Standard books like
Cracking the Coding
Interview, Glassdoor
Go for some “dry run”
interviews.
Do background research
on the company - be
inquisitive, ask
questions
Keep at it!
@datamusing

Contenu connexe

Tendances

Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningAnoop Deoras
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsJustin Basilico
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introductionLiang Xiang
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsJaya Kawale
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveJustin Basilico
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...Sudeep Das, Ph.D.
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixGrace T. Huang
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsYves Raimond
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at NetflixLinas Baltrunas
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectiveJustin Basilico
 
Netflix Recommendations - Beyond the 5 Stars
Netflix Recommendations - Beyond the 5 StarsNetflix Recommendations - Beyond the 5 Stars
Netflix Recommendations - Beyond the 5 StarsXavier Amatriain
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...MLconf
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectiveXavier Amatriain
 
Boston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsBoston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsJames Kirk
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized HomepageJustin Basilico
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated RecommendationsHarald Steck
 
Cohort Analysis at Scale
Cohort Analysis at ScaleCohort Analysis at Scale
Cohort Analysis at ScaleBlake Irvine
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsJustin Basilico
 

Tendances (20)

Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep Learning
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introduction
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at Netflix
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
 
Netflix Recommendations - Beyond the 5 Stars
Netflix Recommendations - Beyond the 5 StarsNetflix Recommendations - Beyond the 5 Stars
Netflix Recommendations - Beyond the 5 Stars
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
Boston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsBoston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender Systems
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized Homepage
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
 
Cohort Analysis at Scale
Cohort Analysis at ScaleCohort Analysis at Scale
Cohort Analysis at Scale
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
 

Similaire à Academia to Data Science - A Hitchhiker's Guide

Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning CCG
 
Building predictive models in Azure Machine Learning
Building predictive models in Azure Machine LearningBuilding predictive models in Azure Machine Learning
Building predictive models in Azure Machine LearningMostafa
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
 
Datascience and Azure(v1.0)
Datascience and Azure(v1.0)Datascience and Azure(v1.0)
Datascience and Azure(v1.0)Zenodia Charpy
 
Data Science on Azure
Data Science on Azure Data Science on Azure
Data Science on Azure Zenodia Charpy
 
Machine Learning Classifiers
Machine Learning ClassifiersMachine Learning Classifiers
Machine Learning ClassifiersMostafa
 
DevOpsDaysRiga 2017 ignite: Mikhail Iljin - DevOps meets Data Science - how t...
DevOpsDaysRiga 2017 ignite: Mikhail Iljin - DevOps meets Data Science - how t...DevOpsDaysRiga 2017 ignite: Mikhail Iljin - DevOps meets Data Science - how t...
DevOpsDaysRiga 2017 ignite: Mikhail Iljin - DevOps meets Data Science - how t...DevOpsDays Riga
 
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...San Diego Supercomputer Center
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
 
Automated Testing with Databases
Automated Testing with DatabasesAutomated Testing with Databases
Automated Testing with Databaseselliando dias
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dcc.titus.brown
 
The Data Science Product Management Toolkit
The Data Science Product Management ToolkitThe Data Science Product Management Toolkit
The Data Science Product Management ToolkitJack Moore
 
Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data scienceShilpaKrishna6
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2Roger Barga
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18Harvinder Atwal
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLPaco Nathan
 
Tips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the EnterpriseTips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the EnterpriseLisa Cohen
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine LearningMostafa
 
Software Development in the Brave New world
Software Development in the Brave New worldSoftware Development in the Brave New world
Software Development in the Brave New worldDavid Leip
 

Similaire à Academia to Data Science - A Hitchhiker's Guide (20)

Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning
 
Building predictive models in Azure Machine Learning
Building predictive models in Azure Machine LearningBuilding predictive models in Azure Machine Learning
Building predictive models in Azure Machine Learning
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
Datascience and Azure(v1.0)
Datascience and Azure(v1.0)Datascience and Azure(v1.0)
Datascience and Azure(v1.0)
 
Data Science on Azure
Data Science on Azure Data Science on Azure
Data Science on Azure
 
Machine Learning Classifiers
Machine Learning ClassifiersMachine Learning Classifiers
Machine Learning Classifiers
 
DevOpsDaysRiga 2017 ignite: Mikhail Iljin - DevOps meets Data Science - how t...
DevOpsDaysRiga 2017 ignite: Mikhail Iljin - DevOps meets Data Science - how t...DevOpsDaysRiga 2017 ignite: Mikhail Iljin - DevOps meets Data Science - how t...
DevOpsDaysRiga 2017 ignite: Mikhail Iljin - DevOps meets Data Science - how t...
 
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
Automated Testing with Databases
Automated Testing with DatabasesAutomated Testing with Databases
Automated Testing with Databases
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc
 
The Data Science Product Management Toolkit
The Data Science Product Management ToolkitThe Data Science Product Management Toolkit
The Data Science Product Management Toolkit
 
AI-SDV 2020: Kairntech
AI-SDV 2020: KairntechAI-SDV 2020: Kairntech
AI-SDV 2020: Kairntech
 
Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data science
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area ML
 
Tips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the EnterpriseTips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the Enterprise
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine Learning
 
Software Development in the Brave New world
Software Development in the Brave New worldSoftware Development in the Brave New world
Software Development in the Brave New world
 

Dernier

Human Rights are notes and helping material
Human Rights are notes and helping materialHuman Rights are notes and helping material
Human Rights are notes and helping materialnadeemcollege26
 
定制英国克兰菲尔德大学毕业证成绩单原版一比一
定制英国克兰菲尔德大学毕业证成绩单原版一比一定制英国克兰菲尔德大学毕业证成绩单原版一比一
定制英国克兰菲尔德大学毕业证成绩单原版一比一z zzz
 
Unlock Your Creative Potential: 7 Skills for Content Creator Evolution
Unlock Your Creative Potential: 7 Skills for Content Creator EvolutionUnlock Your Creative Potential: 7 Skills for Content Creator Evolution
Unlock Your Creative Potential: 7 Skills for Content Creator EvolutionRhazes Ghaisan
 
Jumark Morit Diezmo- Career portfolio- BPED 3A
Jumark Morit Diezmo- Career portfolio- BPED 3AJumark Morit Diezmo- Career portfolio- BPED 3A
Jumark Morit Diezmo- Career portfolio- BPED 3Ajumarkdiezmo1
 
AICTE PPT slide of Engineering college kr pete
AICTE PPT slide of Engineering college kr peteAICTE PPT slide of Engineering college kr pete
AICTE PPT slide of Engineering college kr peteshivubhavv
 
Storytelling, Ethics and Workflow in Documentary Photography
Storytelling, Ethics and Workflow in Documentary PhotographyStorytelling, Ethics and Workflow in Documentary Photography
Storytelling, Ethics and Workflow in Documentary PhotographyOrtega Alikwe
 
Spanish Classes Online In India With Tutor At Affordable Price
Spanish Classes Online In India With Tutor At Affordable PriceSpanish Classes Online In India With Tutor At Affordable Price
Spanish Classes Online In India With Tutor At Affordable PriceFluent Fast Academy
 
格里菲斯大学毕业证(Griffith毕业证)#文凭成绩单#真实留信学历认证永久存档
格里菲斯大学毕业证(Griffith毕业证)#文凭成绩单#真实留信学历认证永久存档格里菲斯大学毕业证(Griffith毕业证)#文凭成绩单#真实留信学历认证永久存档
格里菲斯大学毕业证(Griffith毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
定制(SCU毕业证书)南十字星大学毕业证成绩单原版一比一
定制(SCU毕业证书)南十字星大学毕业证成绩单原版一比一定制(SCU毕业证书)南十字星大学毕业证成绩单原版一比一
定制(SCU毕业证书)南十字星大学毕业证成绩单原版一比一z xss
 
办理老道明大学毕业证成绩单|购买美国ODU文凭证书
办理老道明大学毕业证成绩单|购买美国ODU文凭证书办理老道明大学毕业证成绩单|购买美国ODU文凭证书
办理老道明大学毕业证成绩单|购买美国ODU文凭证书saphesg8
 
Application deck- Cyril Caudroy-2024.pdf
Application deck- Cyril Caudroy-2024.pdfApplication deck- Cyril Caudroy-2024.pdf
Application deck- Cyril Caudroy-2024.pdfCyril CAUDROY
 
Pharmacoepidemiology... Pharmacovigilan e
Pharmacoepidemiology... Pharmacovigilan ePharmacoepidemiology... Pharmacovigilan e
Pharmacoepidemiology... Pharmacovigilan esonalinghatmal
 
LinkedIn for Your Job Search in April 2024
LinkedIn for Your Job Search in April 2024LinkedIn for Your Job Search in April 2024
LinkedIn for Your Job Search in April 2024Bruce Bennett
 
Escort Service Andheri WhatsApp:+91-9833363713
Escort Service Andheri WhatsApp:+91-9833363713Escort Service Andheri WhatsApp:+91-9833363713
Escort Service Andheri WhatsApp:+91-9833363713Riya Pathan
 
Thomas Calculus 12th Edition Textbook and helping material
Thomas Calculus 12th Edition Textbook and helping materialThomas Calculus 12th Edition Textbook and helping material
Thomas Calculus 12th Edition Textbook and helping materialsafdarhussainbhutta4
 
Abanoub Ghobrial, Planning Team Leader.pdf
Abanoub Ghobrial, Planning Team Leader.pdfAbanoub Ghobrial, Planning Team Leader.pdf
Abanoub Ghobrial, Planning Team Leader.pdfAbanoubGhobrial1
 
加拿大MUN学位证,纽芬兰纪念大学毕业证书1:1制作
加拿大MUN学位证,纽芬兰纪念大学毕业证书1:1制作加拿大MUN学位证,纽芬兰纪念大学毕业证书1:1制作
加拿大MUN学位证,纽芬兰纪念大学毕业证书1:1制作rpb5qxou
 
Nathan_Baughman_Resume_copywriter_and_editor
Nathan_Baughman_Resume_copywriter_and_editorNathan_Baughman_Resume_copywriter_and_editor
Nathan_Baughman_Resume_copywriter_and_editorNathanBaughman3
 
原版定制卡尔加里大学毕业证(UC毕业证)留信学历认证
原版定制卡尔加里大学毕业证(UC毕业证)留信学历认证原版定制卡尔加里大学毕业证(UC毕业证)留信学历认证
原版定制卡尔加里大学毕业证(UC毕业证)留信学历认证diploma001
 
Banged Dubai Call Girls O525547819 Call Girls Dubai
Banged Dubai Call Girls O525547819 Call Girls DubaiBanged Dubai Call Girls O525547819 Call Girls Dubai
Banged Dubai Call Girls O525547819 Call Girls Dubaikojalkojal131
 

Dernier (20)

Human Rights are notes and helping material
Human Rights are notes and helping materialHuman Rights are notes and helping material
Human Rights are notes and helping material
 
定制英国克兰菲尔德大学毕业证成绩单原版一比一
定制英国克兰菲尔德大学毕业证成绩单原版一比一定制英国克兰菲尔德大学毕业证成绩单原版一比一
定制英国克兰菲尔德大学毕业证成绩单原版一比一
 
Unlock Your Creative Potential: 7 Skills for Content Creator Evolution
Unlock Your Creative Potential: 7 Skills for Content Creator EvolutionUnlock Your Creative Potential: 7 Skills for Content Creator Evolution
Unlock Your Creative Potential: 7 Skills for Content Creator Evolution
 
Jumark Morit Diezmo- Career portfolio- BPED 3A
Jumark Morit Diezmo- Career portfolio- BPED 3AJumark Morit Diezmo- Career portfolio- BPED 3A
Jumark Morit Diezmo- Career portfolio- BPED 3A
 
AICTE PPT slide of Engineering college kr pete
AICTE PPT slide of Engineering college kr peteAICTE PPT slide of Engineering college kr pete
AICTE PPT slide of Engineering college kr pete
 
Storytelling, Ethics and Workflow in Documentary Photography
Storytelling, Ethics and Workflow in Documentary PhotographyStorytelling, Ethics and Workflow in Documentary Photography
Storytelling, Ethics and Workflow in Documentary Photography
 
Spanish Classes Online In India With Tutor At Affordable Price
Spanish Classes Online In India With Tutor At Affordable PriceSpanish Classes Online In India With Tutor At Affordable Price
Spanish Classes Online In India With Tutor At Affordable Price
 
格里菲斯大学毕业证(Griffith毕业证)#文凭成绩单#真实留信学历认证永久存档
格里菲斯大学毕业证(Griffith毕业证)#文凭成绩单#真实留信学历认证永久存档格里菲斯大学毕业证(Griffith毕业证)#文凭成绩单#真实留信学历认证永久存档
格里菲斯大学毕业证(Griffith毕业证)#文凭成绩单#真实留信学历认证永久存档
 
定制(SCU毕业证书)南十字星大学毕业证成绩单原版一比一
定制(SCU毕业证书)南十字星大学毕业证成绩单原版一比一定制(SCU毕业证书)南十字星大学毕业证成绩单原版一比一
定制(SCU毕业证书)南十字星大学毕业证成绩单原版一比一
 
办理老道明大学毕业证成绩单|购买美国ODU文凭证书
办理老道明大学毕业证成绩单|购买美国ODU文凭证书办理老道明大学毕业证成绩单|购买美国ODU文凭证书
办理老道明大学毕业证成绩单|购买美国ODU文凭证书
 
Application deck- Cyril Caudroy-2024.pdf
Application deck- Cyril Caudroy-2024.pdfApplication deck- Cyril Caudroy-2024.pdf
Application deck- Cyril Caudroy-2024.pdf
 
Pharmacoepidemiology... Pharmacovigilan e
Pharmacoepidemiology... Pharmacovigilan ePharmacoepidemiology... Pharmacovigilan e
Pharmacoepidemiology... Pharmacovigilan e
 
LinkedIn for Your Job Search in April 2024
LinkedIn for Your Job Search in April 2024LinkedIn for Your Job Search in April 2024
LinkedIn for Your Job Search in April 2024
 
Escort Service Andheri WhatsApp:+91-9833363713
Escort Service Andheri WhatsApp:+91-9833363713Escort Service Andheri WhatsApp:+91-9833363713
Escort Service Andheri WhatsApp:+91-9833363713
 
Thomas Calculus 12th Edition Textbook and helping material
Thomas Calculus 12th Edition Textbook and helping materialThomas Calculus 12th Edition Textbook and helping material
Thomas Calculus 12th Edition Textbook and helping material
 
Abanoub Ghobrial, Planning Team Leader.pdf
Abanoub Ghobrial, Planning Team Leader.pdfAbanoub Ghobrial, Planning Team Leader.pdf
Abanoub Ghobrial, Planning Team Leader.pdf
 
加拿大MUN学位证,纽芬兰纪念大学毕业证书1:1制作
加拿大MUN学位证,纽芬兰纪念大学毕业证书1:1制作加拿大MUN学位证,纽芬兰纪念大学毕业证书1:1制作
加拿大MUN学位证,纽芬兰纪念大学毕业证书1:1制作
 
Nathan_Baughman_Resume_copywriter_and_editor
Nathan_Baughman_Resume_copywriter_and_editorNathan_Baughman_Resume_copywriter_and_editor
Nathan_Baughman_Resume_copywriter_and_editor
 
原版定制卡尔加里大学毕业证(UC毕业证)留信学历认证
原版定制卡尔加里大学毕业证(UC毕业证)留信学历认证原版定制卡尔加里大学毕业证(UC毕业证)留信学历认证
原版定制卡尔加里大学毕业证(UC毕业证)留信学历认证
 
Banged Dubai Call Girls O525547819 Call Girls Dubai
Banged Dubai Call Girls O525547819 Call Girls DubaiBanged Dubai Call Girls O525547819 Call Girls Dubai
Banged Dubai Call Girls O525547819 Call Girls Dubai
 

Academia to Data Science - A Hitchhiker's Guide

  • 1. A Hitchhiker’s Guide to Data Science sudeep das Sudeep Das Senior Machine Learning Researcher @datamusing
  • 3. Ph. D. Astrophysics Cosmic Microwave Background Gravitational Lensing
  • 5. What do I do?
  • 6. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metrics The Grand Innovation Workflow Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 7. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metrics In some companies, this is a data scientist Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 8. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metrics In some other companies, this is a data scientist Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 9. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metrics yet in some other companies, this is a data scientist Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 10. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metrics At Netflix, this is broadly what I do Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 11. Tools of the trade
  • 12. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metrics SQL, Spark (scala), PySpark, Python-Pandas, Hive,AWS-S3 Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 13. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metrics Matplotlib, Tableau, Vega, Plotly, custom javascript (d3) Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 14. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metrics Hive, s3, APIs in Flask/Django/Java Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 15. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metricsPython, SciKit-learn, Jupyter notebooks, TensorFlow/Keras, XGBoost, SparkML/scala, Zeppelin ... Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 16. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipelines Monitor offline metrics Docker, company specific platforms Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 17. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipelines Monitor offline metrics Java, Scala, in some cases Python, company specific Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 19. ● Personalization ● Search ● Object recognition ● Voice/speech recognition ● Pattern recognition ● Natural Language Processing ● Trend prediction ● Segmentation/clustering ● Dynamic Pricing ● Optimization ● Outlier Detection At Netflix, we do a bit of everything
  • 21. Probabilistic Graphical Models - Bayes Nets Deep Learning Causal Inference (Deep) Reinforcement Learning
  • 23. ● Perseverance ● Ability to pick up new technical skills ● Presentation skills ● Some quantitative visualization skills ● Ability to distil technical research in related areas and adapt it to the problem at hand ● If you are from a quantitative and experimental field: ○ Mathematical abilities ○ Knowledge of Basic Statistics - error analysis, experiment design ○ Some parameter estimation, bayesian inference exposure ○ Some ability to write code ○ Some exposure to general machine learning ● Learning from failure: Most A/B tests fail - so do experiments in academia ● Writing papers/ technical blogs etc.
  • 24. What academia doesn’t prepare you for
  • 25. ● Being a good listener ● Asking questions ● Understanding and articulating the business value of your technical pursuit ● Writing clean, maintainable code with documentation and unit tests ● Ability to collaborate across teams and cultures - cross-functionally ● Admitting that “Good enough” is better than perfect ● Coping with quick project timelines ● Documenting, sharing, getting early input on projects ● Dealing with live, large, and exceptionally dirty datasets. ● Understanding that research in Industry is results driven and not publication driven. ● Stepping out of your focus area and seeing your problem in the bigger context of where your company is headed.
  • 27. Fill in your basic skills gaps Databases, SQL, Spark familiarity Data Structures Algo/CS 101 Get really strong in one language - highly recommend Python - pandas, scikit ecosystem Good coding practices - documentation, modular code, unit tests Amp up your ML Knowledge Create an Online Presence Improve soft skills Interview Prep Your friends: Online courses and open datasets! Do mini projects on ML, esp. Deep Learning, Reinforcement Learning. Get creative! Get a rock solid foundation in basic stats. Kaggle Competitions Github repo so recruiters can look at your code. Put your hobby projects online Write a blog post on something new you learned Follow/contribute to Stackoverflow Landing the First Job! Identify weakness in communication skills and work on them. Pick up speaking engagements at meetups, at your university, and conferences such as PyData Do collaborative projects with people who are also transitioning Practise whiteboarding, collaborative coding on CoderPad Standard books like Cracking the Coding Interview, Glassdoor Go for some “dry run” interviews. Do background research on the company - be inquisitive, ask questions Keep at it!