SlideShare une entreprise Scribd logo
1  sur  36
Télécharger pour lire hors ligne
Developing Data Products
Uber Tech Talk
Pete Skomoroch @peteskomoroch
December 5 2012


©2012 LinkedIn Corporation. All Rights Reserved.
Examples, Techniques, & Lessons Learned

Developing Data Products
Our Mission
                          Connect the world’s professionals to make them
                             more productive and successful.

Our Vision
Create economic opportunity for every
   professional in the world.


Members First!
LinkedIn is the leading professional network site



         187M+
                                           1




           LinkedIn Members

                                          2


         640M+        Worldwide
                   Professionals
                                           2


 3,300M+
      Worldwide Workforce




©2012 LinkedIn Corporation. All Rights Reserved.    4
LinkedIn profiles represent our professional identity




                                 1                             2




       187M                                        Members   187M   Member
                                                                    Profiles


©2012 LinkedIn Corporation. All Rights Reserved.                               5
We have a lot of data.




©2012 LinkedIn Corporation. All Rights Reserved.
We have a lot of data.
    And (like everyone else), we store it in Hadoop.




©2012 LinkedIn Corporation. All Rights Reserved.
We have a lot of data.
  And (like everyone else), we store it in Hadoop.
  And people build awesome things with that data.




©2012 LinkedIn Corporation. All Rights Reserved.
What do we mean by data
products?
Building products from data at LinkedIn

A few examples:

    People You May Know
    Skills and Endorsements
    Year in Review
    Network Updates Digest
    InMaps
    Who’s viewed my profile
    Collaborative Filtering
    Groups You May Like
    and more…




©2012 LinkedIn Corporation. All Rights Reserved.
Collaborative Filtering: LinkedIn Skill Pages




©2012 LinkedIn Corporation. All Rights Reserved.
Classification: giving structure to unstructured data




          Extract




©2012 LinkedIn Corporation. All Rights Reserved.
Clustering & Disambiguation




©2012 LinkedIn Corporation. All Rights Reserved.
De-duplication and Normalization




©2012 LinkedIn Corporation. All Rights Reserved.
Network Algorithms: Relevance & Ranking




©2012 LinkedIn Corporation. All Rights Reserved.   15
Prediction: Personalized Skill Recommendations




©2012 LinkedIn Corporation. All Rights Reserved.
Skill Endorsements




©2012 LinkedIn Corporation. All Rights Reserved.
Social Proof and the Skill Endorsement Graph




©2012 LinkedIn Corporation. All Rights Reserved.   20
The Economic Graph: Skills, Jobs, People, Locations…




                                                   Location



©2012 LinkedIn Corporation. All Rights Reserved.              21
Lessons learned developing data
products
Collect the right data at the right time
Large amounts of data can reveal new patterns
 Probability of Job Title




                                                   Months since graduation
©2012 LinkedIn Corporation. All Rights Reserved.                             24
Be wary of “black-box” approaches




©2012 LinkedIn Corporation. All Rights Reserved.   25
Look at your data




©2012 LinkedIn Corporation. All Rights Reserved.   26
Aggregate statistics can be misleading

       12




       10




        8




        6




        4




        2




        0
                 1             2            3      4   5   6   7   8   9   10




©2012 LinkedIn Corporation. All Rights Reserved.                                27
Build a viewer app, “micro-listen”




©2012 LinkedIn Corporation. All Rights Reserved.   28
Algorithmic intuition: include data geeks in design




©2012 LinkedIn Corporation. All Rights Reserved.      29
OODA: Think like a jet fighter




©2012 LinkedIn Corporation. All Rights Reserved.   30
OODA: Observe, Orient, Decide, Act




©2012 LinkedIn Corporation. All Rights Reserved.   31
OODA: The speed you can move determines victory




©2012 LinkedIn Corporation. All Rights Reserved.   32
Red teaming: what can go wrong likely will




©2012 LinkedIn Corporation. All Rights Reserved.   33
Error data is super valuable, analyze it and adapt




©2012 LinkedIn Corporation. All Rights Reserved.     34
Conclusion: tips for developing data products

    Collect the right data at the right time
    Large amounts of data can reveal new patterns
    Be wary of “black box” approaches
    Look at your raw data
    Aggregate statistics can be misleading
    Build and use viewer apps
    Include data geeks in design process
    OODA: Think like a jet fighter
    Red-teaming: anticipate edge cases
    Find opportunity in your error data




©2012 LinkedIn Corporation. All Rights Reserved.
Questions?

More info: data.linkedin.com
@peteskomoroch


©2012 LinkedIn Corporation. All Rights Reserved.   36

Contenu connexe

Tendances

Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
Business Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected ApproachBusiness Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected ApproachDATAVERSITY
 
Power BI Overview, Deployment and Governance
Power BI Overview, Deployment and GovernancePower BI Overview, Deployment and Governance
Power BI Overview, Deployment and GovernanceJames Serra
 
Emerging Trends in Data Engineering
Emerging Trends in Data EngineeringEmerging Trends in Data Engineering
Emerging Trends in Data EngineeringAnanth PackkilDurai
 
BI Consultancy - Data, Analytics and Strategy
BI Consultancy - Data, Analytics and StrategyBI Consultancy - Data, Analytics and Strategy
BI Consultancy - Data, Analytics and StrategyShivam Dhawan
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationDATAVERSITY
 
Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...Mark Hewitt
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Tristan Baker
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
 
Gartner: Master Data Management Functionality
Gartner: Master Data Management FunctionalityGartner: Master Data Management Functionality
Gartner: Master Data Management FunctionalityGartner
 
Data Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceData Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceDenodo
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...HostedbyConfluent
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogDATAVERSITY
 
Data Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data QualityData Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data QualityDATAVERSITY
 
Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Rajesh Kumar
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)James Serra
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data MeshLibbySchulze
 

Tendances (20)

Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Business Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected ApproachBusiness Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected Approach
 
Power BI Overview, Deployment and Governance
Power BI Overview, Deployment and GovernancePower BI Overview, Deployment and Governance
Power BI Overview, Deployment and Governance
 
Emerging Trends in Data Engineering
Emerging Trends in Data EngineeringEmerging Trends in Data Engineering
Emerging Trends in Data Engineering
 
BI Consultancy - Data, Analytics and Strategy
BI Consultancy - Data, Analytics and StrategyBI Consultancy - Data, Analytics and Strategy
BI Consultancy - Data, Analytics and Strategy
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital Transformation
 
Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Gartner: Master Data Management Functionality
Gartner: Master Data Management FunctionalityGartner: Master Data Management Functionality
Gartner: Master Data Management Functionality
 
Data Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceData Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and Governance
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
 
Data Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data QualityData Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data Quality
 
Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
 

Similaire à Developing Data Products

SF Data Science: Developing Data Products
SF Data Science: Developing Data ProductsSF Data Science: Developing Data Products
SF Data Science: Developing Data ProductsPeter Skomoroch
 
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...Vitaly Gordon
 
Computing Professional Identity for the Economic Graph
Computing Professional Identity for the Economic GraphComputing Professional Identity for the Economic Graph
Computing Professional Identity for the Economic GraphVitaly Gordon
 
7 Badass Tactics for SlideShare Content Domination
7 Badass Tactics for SlideShare Content Domination7 Badass Tactics for SlideShare Content Domination
7 Badass Tactics for SlideShare Content DominationLinkedIn
 
7 Badass Tactics for Slideshare Content Domination
7 Badass Tactics for Slideshare Content Domination 7 Badass Tactics for Slideshare Content Domination
7 Badass Tactics for Slideshare Content Domination Jason Miller
 
Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.
Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips. Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.
Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips. Hakka Labs
 
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)Social Fresh Conference
 
Big Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedInBig Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedInMinh-Hoang Nguyen
 
Are Your LinkedIn or Linked Out?
Are Your LinkedIn or Linked Out?Are Your LinkedIn or Linked Out?
Are Your LinkedIn or Linked Out?virallysocial
 
Enterprise Search, more relevant now than ever
Enterprise Search, more relevant now than everEnterprise Search, more relevant now than ever
Enterprise Search, more relevant now than everMike Davis
 
Big data arch_analytics
Big data arch_analyticsBig data arch_analytics
Big data arch_analyticsSrinu Adira
 
The Data Effect: Canadian Big Data & Analytics Update - Dr. Alison Brooks Dir...
The Data Effect: Canadian Big Data & Analytics Update - Dr. Alison Brooks Dir...The Data Effect: Canadian Big Data & Analytics Update - Dr. Alison Brooks Dir...
The Data Effect: Canadian Big Data & Analytics Update - Dr. Alison Brooks Dir...CityAge
 
Building the perfect profile on LinkedIn
Building the perfect profile on LinkedInBuilding the perfect profile on LinkedIn
Building the perfect profile on LinkedInAlex Charraudeau
 
Solving Biz Problems with SugarExchange: Session 9: How to Run Contributor Ca...
Solving Biz Problems with SugarExchange: Session 9: How to Run Contributor Ca...Solving Biz Problems with SugarExchange: Session 9: How to Run Contributor Ca...
Solving Biz Problems with SugarExchange: Session 9: How to Run Contributor Ca...SugarCRM
 
Partner - Talent Solutions - Staffing Agency
Partner - Talent Solutions - Staffing AgencyPartner - Talent Solutions - Staffing Agency
Partner - Talent Solutions - Staffing AgencyLinkedIn_EMEA_Events
 
Remarkable Content for AMA New Orleans, Nov 2014
Remarkable Content for AMA New Orleans, Nov 2014Remarkable Content for AMA New Orleans, Nov 2014
Remarkable Content for AMA New Orleans, Nov 2014David Shephard
 
Partner - Talent Solutions - Corporate
Partner - Talent Solutions - CorporatePartner - Talent Solutions - Corporate
Partner - Talent Solutions - CorporateLinkedIn_EMEA_Events
 
Tamm & kitt
Tamm & kittTamm & kitt
Tamm & kittJeff Roy
 
Office 2012 LinkedIn slides
Office 2012 LinkedIn slidesOffice 2012 LinkedIn slides
Office 2012 LinkedIn slidesLuke Williams
 
Business Networking on LinkedIn
Business Networking on LinkedInBusiness Networking on LinkedIn
Business Networking on LinkedInRealty Profiler
 

Similaire à Developing Data Products (20)

SF Data Science: Developing Data Products
SF Data Science: Developing Data ProductsSF Data Science: Developing Data Products
SF Data Science: Developing Data Products
 
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
 
Computing Professional Identity for the Economic Graph
Computing Professional Identity for the Economic GraphComputing Professional Identity for the Economic Graph
Computing Professional Identity for the Economic Graph
 
7 Badass Tactics for SlideShare Content Domination
7 Badass Tactics for SlideShare Content Domination7 Badass Tactics for SlideShare Content Domination
7 Badass Tactics for SlideShare Content Domination
 
7 Badass Tactics for Slideshare Content Domination
7 Badass Tactics for Slideshare Content Domination 7 Badass Tactics for Slideshare Content Domination
7 Badass Tactics for Slideshare Content Domination
 
Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.
Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips. Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.
Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.
 
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)
 
Big Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedInBig Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedIn
 
Are Your LinkedIn or Linked Out?
Are Your LinkedIn or Linked Out?Are Your LinkedIn or Linked Out?
Are Your LinkedIn or Linked Out?
 
Enterprise Search, more relevant now than ever
Enterprise Search, more relevant now than everEnterprise Search, more relevant now than ever
Enterprise Search, more relevant now than ever
 
Big data arch_analytics
Big data arch_analyticsBig data arch_analytics
Big data arch_analytics
 
The Data Effect: Canadian Big Data & Analytics Update - Dr. Alison Brooks Dir...
The Data Effect: Canadian Big Data & Analytics Update - Dr. Alison Brooks Dir...The Data Effect: Canadian Big Data & Analytics Update - Dr. Alison Brooks Dir...
The Data Effect: Canadian Big Data & Analytics Update - Dr. Alison Brooks Dir...
 
Building the perfect profile on LinkedIn
Building the perfect profile on LinkedInBuilding the perfect profile on LinkedIn
Building the perfect profile on LinkedIn
 
Solving Biz Problems with SugarExchange: Session 9: How to Run Contributor Ca...
Solving Biz Problems with SugarExchange: Session 9: How to Run Contributor Ca...Solving Biz Problems with SugarExchange: Session 9: How to Run Contributor Ca...
Solving Biz Problems with SugarExchange: Session 9: How to Run Contributor Ca...
 
Partner - Talent Solutions - Staffing Agency
Partner - Talent Solutions - Staffing AgencyPartner - Talent Solutions - Staffing Agency
Partner - Talent Solutions - Staffing Agency
 
Remarkable Content for AMA New Orleans, Nov 2014
Remarkable Content for AMA New Orleans, Nov 2014Remarkable Content for AMA New Orleans, Nov 2014
Remarkable Content for AMA New Orleans, Nov 2014
 
Partner - Talent Solutions - Corporate
Partner - Talent Solutions - CorporatePartner - Talent Solutions - Corporate
Partner - Talent Solutions - Corporate
 
Tamm & kitt
Tamm & kittTamm & kitt
Tamm & kitt
 
Office 2012 LinkedIn slides
Office 2012 LinkedIn slidesOffice 2012 LinkedIn slides
Office 2012 LinkedIn slides
 
Business Networking on LinkedIn
Business Networking on LinkedInBusiness Networking on LinkedIn
Business Networking on LinkedIn
 

Plus de Peter Skomoroch

Bridging the AI Gap: Building Stakeholder Support
Bridging the AI Gap: Building Stakeholder SupportBridging the AI Gap: Building Stakeholder Support
Bridging the AI Gap: Building Stakeholder SupportPeter Skomoroch
 
Managing Machines: The New AI Dev Stack
Managing Machines: The New AI Dev StackManaging Machines: The New AI Dev Stack
Managing Machines: The New AI Dev StackPeter Skomoroch
 
Product Management for AI
Product Management for AIProduct Management for AI
Product Management for AIPeter Skomoroch
 
Executive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you thinkExecutive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you thinkPeter Skomoroch
 
Building Competitive Moats With Data
Building Competitive Moats With DataBuilding Competitive Moats With Data
Building Competitive Moats With DataPeter Skomoroch
 
O'Reilly Strata: Distilling Data Exhaust
O'Reilly Strata: Distilling Data ExhaustO'Reilly Strata: Distilling Data Exhaust
O'Reilly Strata: Distilling Data ExhaustPeter Skomoroch
 
Skills, Reputation, and Search
Skills, Reputation, and SearchSkills, Reputation, and Search
Skills, Reputation, and SearchPeter Skomoroch
 
LinkedIn Endorsements: Reputation, Virality, and Social Tagging
LinkedIn Endorsements: Reputation, Virality, and Social TaggingLinkedIn Endorsements: Reputation, Virality, and Social Tagging
LinkedIn Endorsements: Reputation, Virality, and Social TaggingPeter Skomoroch
 
Practical Problem Solving with Data - Onlab Data Conference, Tokyo
Practical Problem Solving with Data - Onlab Data Conference, TokyoPractical Problem Solving with Data - Onlab Data Conference, Tokyo
Practical Problem Solving with Data - Onlab Data Conference, TokyoPeter Skomoroch
 
Street Fighting Data Science
Street Fighting Data ScienceStreet Fighting Data Science
Street Fighting Data SciencePeter Skomoroch
 
Data Mashups -Data Science Summit
Data Mashups -Data Science SummitData Mashups -Data Science Summit
Data Mashups -Data Science SummitPeter Skomoroch
 
Geo Analytics Tutorial - Where 2.0 2011
Geo Analytics Tutorial - Where 2.0 2011Geo Analytics Tutorial - Where 2.0 2011
Geo Analytics Tutorial - Where 2.0 2011Peter Skomoroch
 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopPeter Skomoroch
 
Prototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.orgPrototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.orgPeter Skomoroch
 

Plus de Peter Skomoroch (15)

Bridging the AI Gap: Building Stakeholder Support
Bridging the AI Gap: Building Stakeholder SupportBridging the AI Gap: Building Stakeholder Support
Bridging the AI Gap: Building Stakeholder Support
 
Managing Machines: The New AI Dev Stack
Managing Machines: The New AI Dev StackManaging Machines: The New AI Dev Stack
Managing Machines: The New AI Dev Stack
 
Product Management for AI
Product Management for AIProduct Management for AI
Product Management for AI
 
Executive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you thinkExecutive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you think
 
Building Competitive Moats With Data
Building Competitive Moats With DataBuilding Competitive Moats With Data
Building Competitive Moats With Data
 
O'Reilly Strata: Distilling Data Exhaust
O'Reilly Strata: Distilling Data ExhaustO'Reilly Strata: Distilling Data Exhaust
O'Reilly Strata: Distilling Data Exhaust
 
Skills, Reputation, and Search
Skills, Reputation, and SearchSkills, Reputation, and Search
Skills, Reputation, and Search
 
LinkedIn Endorsements: Reputation, Virality, and Social Tagging
LinkedIn Endorsements: Reputation, Virality, and Social TaggingLinkedIn Endorsements: Reputation, Virality, and Social Tagging
LinkedIn Endorsements: Reputation, Virality, and Social Tagging
 
Practical Problem Solving with Data - Onlab Data Conference, Tokyo
Practical Problem Solving with Data - Onlab Data Conference, TokyoPractical Problem Solving with Data - Onlab Data Conference, Tokyo
Practical Problem Solving with Data - Onlab Data Conference, Tokyo
 
Street Fighting Data Science
Street Fighting Data ScienceStreet Fighting Data Science
Street Fighting Data Science
 
Data Mashups -Data Science Summit
Data Mashups -Data Science SummitData Mashups -Data Science Summit
Data Mashups -Data Science Summit
 
Geo Analytics Tutorial - Where 2.0 2011
Geo Analytics Tutorial - Where 2.0 2011Geo Analytics Tutorial - Where 2.0 2011
Geo Analytics Tutorial - Where 2.0 2011
 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With Hadoop
 
Prototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.orgPrototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.org
 
Elasticwulf Pycon Talk
Elasticwulf Pycon TalkElasticwulf Pycon Talk
Elasticwulf Pycon Talk
 

Developing Data Products

  • 1. Developing Data Products Uber Tech Talk Pete Skomoroch @peteskomoroch December 5 2012 ©2012 LinkedIn Corporation. All Rights Reserved.
  • 2. Examples, Techniques, & Lessons Learned Developing Data Products
  • 3. Our Mission Connect the world’s professionals to make them more productive and successful. Our Vision Create economic opportunity for every professional in the world. Members First!
  • 4. LinkedIn is the leading professional network site 187M+ 1 LinkedIn Members 2 640M+ Worldwide Professionals 2 3,300M+ Worldwide Workforce ©2012 LinkedIn Corporation. All Rights Reserved. 4
  • 5. LinkedIn profiles represent our professional identity 1 2 187M Members 187M Member Profiles ©2012 LinkedIn Corporation. All Rights Reserved. 5
  • 6. We have a lot of data. ©2012 LinkedIn Corporation. All Rights Reserved.
  • 7. We have a lot of data. And (like everyone else), we store it in Hadoop. ©2012 LinkedIn Corporation. All Rights Reserved.
  • 8. We have a lot of data. And (like everyone else), we store it in Hadoop. And people build awesome things with that data. ©2012 LinkedIn Corporation. All Rights Reserved.
  • 9. What do we mean by data products?
  • 10. Building products from data at LinkedIn A few examples:  People You May Know  Skills and Endorsements  Year in Review  Network Updates Digest  InMaps  Who’s viewed my profile  Collaborative Filtering  Groups You May Like  and more… ©2012 LinkedIn Corporation. All Rights Reserved.
  • 11. Collaborative Filtering: LinkedIn Skill Pages ©2012 LinkedIn Corporation. All Rights Reserved.
  • 12. Classification: giving structure to unstructured data Extract ©2012 LinkedIn Corporation. All Rights Reserved.
  • 13. Clustering & Disambiguation ©2012 LinkedIn Corporation. All Rights Reserved.
  • 14. De-duplication and Normalization ©2012 LinkedIn Corporation. All Rights Reserved.
  • 15. Network Algorithms: Relevance & Ranking ©2012 LinkedIn Corporation. All Rights Reserved. 15
  • 16. Prediction: Personalized Skill Recommendations ©2012 LinkedIn Corporation. All Rights Reserved.
  • 17.
  • 18.
  • 19. Skill Endorsements ©2012 LinkedIn Corporation. All Rights Reserved.
  • 20. Social Proof and the Skill Endorsement Graph ©2012 LinkedIn Corporation. All Rights Reserved. 20
  • 21. The Economic Graph: Skills, Jobs, People, Locations… Location ©2012 LinkedIn Corporation. All Rights Reserved. 21
  • 22. Lessons learned developing data products
  • 23. Collect the right data at the right time
  • 24. Large amounts of data can reveal new patterns Probability of Job Title Months since graduation ©2012 LinkedIn Corporation. All Rights Reserved. 24
  • 25. Be wary of “black-box” approaches ©2012 LinkedIn Corporation. All Rights Reserved. 25
  • 26. Look at your data ©2012 LinkedIn Corporation. All Rights Reserved. 26
  • 27. Aggregate statistics can be misleading 12 10 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 ©2012 LinkedIn Corporation. All Rights Reserved. 27
  • 28. Build a viewer app, “micro-listen” ©2012 LinkedIn Corporation. All Rights Reserved. 28
  • 29. Algorithmic intuition: include data geeks in design ©2012 LinkedIn Corporation. All Rights Reserved. 29
  • 30. OODA: Think like a jet fighter ©2012 LinkedIn Corporation. All Rights Reserved. 30
  • 31. OODA: Observe, Orient, Decide, Act ©2012 LinkedIn Corporation. All Rights Reserved. 31
  • 32. OODA: The speed you can move determines victory ©2012 LinkedIn Corporation. All Rights Reserved. 32
  • 33. Red teaming: what can go wrong likely will ©2012 LinkedIn Corporation. All Rights Reserved. 33
  • 34. Error data is super valuable, analyze it and adapt ©2012 LinkedIn Corporation. All Rights Reserved. 34
  • 35. Conclusion: tips for developing data products  Collect the right data at the right time  Large amounts of data can reveal new patterns  Be wary of “black box” approaches  Look at your raw data  Aggregate statistics can be misleading  Build and use viewer apps  Include data geeks in design process  OODA: Think like a jet fighter  Red-teaming: anticipate edge cases  Find opportunity in your error data ©2012 LinkedIn Corporation. All Rights Reserved.
  • 36. Questions? More info: data.linkedin.com @peteskomoroch ©2012 LinkedIn Corporation. All Rights Reserved. 36

Notes de l'éditeur

  1. Mission: For us, fundamentally changing the way the world works begins with our mission statement: To connect the world’s professionals to make them more productive and successful. This means not only helping people to find their dream jobs, but also enabling them to be great at the jobs they’re already in. Vision: But, we’re just getting started. By our measure,there are more than 640 million professionals in the world. And roughly 3.3 billion people in the global workforce. Ultimately, our vision is to create economic opportunity for every professional, which we believe is an especially crucial objective in light of current macroeconomic trends.Our most important core value is that members come first.