2. WHO AM I?
• 3+ years in Data Science
• MS in Applied Mathematics
• Professional interests: recommendations systems, natural language
processing, scalable data science solutions
• Authors of two blogs: energyfirefox.blogspot.com,
datascientistdiary.blogspot.com
• Fan of online education (20+ finished MOOCs)
3. • What is Data Science and why do we need it?
• Data Scientists.Who they are and what do they
do?
• How to start?
• Practical case
AGENDA
3
21. TYPES OF DATA SCIENTISTS
A - Analysis
B - Building
Robert Chang
22. DSTYPE “A” - ANALYSIS
• making sense of data or working with it in a fairly static way.
• very similar to a statistician (and may be one)
• knows all the practical details of working with data that
aren’t taught in the statistics curriculum: data cleaning,
methods for dealing with very large data sets, visualization,
deep knowledge of a particular domain, writing well
about data
23. • share some statistical background withType A
• very strong coders and may be trained software
engineers
• mainly interested in using data “in production.”
• build models which interact with users, often serving
recommendations (products, people you may know, ads,
movies, search results).
DSTYPE “B” - BUILDING
26. TYPICAL DATA SCIENCE
WORKFLOW
• Preparing to run a model (Gathering, cleaning,
transformation)
• Running the model
• Interpreting the results
“80% of work” - Aaron Kimball
“Other 80% of the work”
26
28. DOMAIN KNOWLEDGE AND
SOFT SKILLS
• Passionate about the business
• Curios about data
• Influence without authority
• Hacker mindset
• Problem solver
• Strategic, proactive, creative, innovative and collaborative
28
30. PROGRAMMING AND
DATABASES
• Computer science fundamentals
• Scripting language
• Statistical computing language
• Databases
• Relational algebra
• Distributed computations
30
31. COMMUNICATION AND
VISUALIZATION
• Ability to engage with senior management
• Storytelling skills
• Visual art design
• Knowledge of a vizualisation tool
• Translate data-driven insights into decisions and actions
31