SlideShare une entreprise Scribd logo
1  sur  33
Télécharger pour lire hors ligne
Daniel




Scale, Structure, and
Semantics
Daniel Tunkelang
Principal Data Scientist at LinkedIn

      Recruiting Solutions                      1
Take-Aways




  Communication trumps knowledge representation.


   Communication is the problem and the solution.




                                                    2
Overview

1.  Knowledge representation is overrated.



2.  Computation is underrated.



3.  We have a communication problem.




                                             3
The Bad News

1.  Knowledge representation is overrated.



2.  Computation is underrated.



3.  We have a communication problem.




                                             4
AI: a dream deferred.




                        5
Memex: the Computer Science Version




                                      6
Cyc




      7
Freebase




           8
Wolfram Alpha




                9
Knowledge representation is overrated.

Today’s knowledge repositories are:
§  incomplete
§  inconsistent
§  inscrutable

§  and not sustained by economic incentives.



1986 estimate of effort to complete Cyc:
§  250,000 rules + 350 person-years


                                                10
The Good News

1.  Knowledge representation is overrated.



2.  Computation is underrated.



3.  We have a communication problem.




                                             11
Deep Blue




            vs.


                  12
Watson




         13
Plain Old Search Engines are Pretty Good Too




  http://blog.stephenwolfram.com/2011/01/jeopardy-ibm-and-wolframalpha/


                                                                          14
The Unreasonable Effectiveness of Data

§  simple models + lots of data >>
                              elaborate models + less data

§  machine translation: parallel corpora >>
        elaborate rules for syntactic and semantic patterns

§  semantic web formalism just means semantic
    interpretation on shorter strings between angle brackets



Alon Halevy, Peter Norvig, and Fernando Pereira (2009)

                                                               15
Today’s Challenge

1.  Knowledge representation is overrated.



2.  Computation is underrated.



3.  We have a communication problem.




                                             16
Semi-structured Data




         Michael K. Bergman, http://www.mkbergman.com/




                                                         17
Semi-structured Data at LinkedIn


Summary                            <person>
I lead a data science                 <id>
team at LinkedIn, which               <first-name />
analyzes terabytes of                 <last-name />
data to produce products              <location>
and insights that serve                    <name>
LinkedIn’s members.                        <country>
Prior to LinkedIn, I led a                     <code>
local search quality team                  </country>
at Google and was a                   </location>
founding employee of                  <industry>
faceted search pioneer                …
Endeca (acquired by                </person>
Oracle in 2010), where…
Semi-structured Search is a Killer App




                                         19
Another Example: Helping a Friend

Dear Daniel,

I'm attaching the resume of an old friend who just moved up
to the Bay Area.

He has a very strong background in:
§  mobile / wireless applications
§  start-ups and new product launches
§  international expansion

Best regards,
XXX

                                                          20
Company Search




                 21
Semi-structured Data Empowers Users




                                      22
Data-Driven Recommendations




                              23
Data-Driven Computation Serves Communication




  for i in [1..n]!
    s ← w 1 w 2 … w i!
    if Pc(s) > 0!
      a ← new Segment()!
      a.segs ← {s}!
      a.prob ← Pc(s)!
      B[i] ← {a}!
    for j in [1..i-1]!
       for b in B[j]!
         s ← wj wj+1 … wi!
         if Pc(s) > 0!
            a ← new Segment()!
            a.segs ← b.segs U {s}!
            a.prob ← b.prob * Pc(s)!
            B[i] ← B[i] U {a}!
     sort B[i] by prob!
     truncate B[i] to size k!



                                               24
Recommendations Leverage Semi-structured Data
               Job                                            Corpus Stats
                                           Matching   Transition probabilities
                                                      Connectivity
                                   Binary             yrs of experience to reach title
title          industry       …
                                     Exact matches:   education needed for this title
geo            description                            …
company        functional area       geo, industry,
                                     …

          User Base                Soft                              Similarity
                                                        (candidate expertise, job description)
                                     transition
           Filtered                                                    0.56
                                     probabilities,
                                                                     Similarity
          Candidate                  similarity,       (candidate specialties, job description)
                                     …                                  0.2
                                                               Transition probability
                                   Text                   (candidate industry, job industry)
 General       Current Position                                        0.43
 expertise     title
 specialties   summary                                              Title Similarity

 education     tenure length                                            0.8
 headline      industry
                                                              Similarity (headline, title)
 geo           functional area
 experience    …                                                        0.7
                                                                          .
                      derive
                               d
                                                                          .
                                                                          .
                                                                                             25
Skills: A Practical Knowledge Representation




                                               26
Data-Driven Query Expansion for Recall




                                         27
Data-Driven Query Refinement for Precision




                                             28
There is no perfect schema or vocabulary.

§  And even if there were, not everyone would use it.

§  Knowledge representation has only succeeded within
    narrow scope.

§  Brute force is surprisingly effective but does not leverage
    the user as an intelligent partner.




                                                                  29
Communication is the problem and the solution.

§  Rich communication channel fills gaps in system’s
    knowledge representation and in user’s knowledge.

§  Use data science to make the system smart, but be
    humble and empower the human user.



      You've got the brawn
      I've got the brains
      Let's make lots of money
      Pet Shop Boys, “Opportunities”

                                                        30
The Future is Upon Us




                        31
One More Thing

     “More data beats clever algorithms
      but better data beats more data.”

        Monica Rogati @ Strata 2012




                                          32
Thank You!

                  Questions?


                    Contact:

             dtunkelang@linkedin.com


                  We’re Hiring!




                                       33

Contenu connexe

Tendances

Fairness, Transparency, and Privacy in AI @LinkedIn
Fairness, Transparency, and Privacy in AI @LinkedInFairness, Transparency, and Privacy in AI @LinkedIn
Fairness, Transparency, and Privacy in AI @LinkedInC4Media
 
AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementTrey Grainger
 
From services to cogs and journey to cognitive bpm print version
From services to cogs and journey to cognitive bpm   print versionFrom services to cogs and journey to cognitive bpm   print version
From services to cogs and journey to cognitive bpm print versionHamid Motahari
 
Measuring Relevance in the Negative Space
Measuring Relevance in the Negative SpaceMeasuring Relevance in the Negative Space
Measuring Relevance in the Negative SpaceTrey Grainger
 
Fairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsFairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsKrishnaram Kenthapadi
 
From Services to Cogs and Journey to Cognitive BPM
From Services to Cogs and Journey to Cognitive BPMFrom Services to Cogs and Journey to Cognitive BPM
From Services to Cogs and Journey to Cognitive BPMHamid Motahari
 
Cognitive Enterprise Services
Cognitive Enterprise ServicesCognitive Enterprise Services
Cognitive Enterprise ServicesHamid Motahari
 
Chatbots in HR: Improving the Employee Experience
Chatbots in HR: Improving the Employee ExperienceChatbots in HR: Improving the Employee Experience
Chatbots in HR: Improving the Employee ExperienceAmy Kong
 
Building Knowledge Graphs in DIG
Building Knowledge Graphs in DIGBuilding Knowledge Graphs in DIG
Building Knowledge Graphs in DIGPalak Modi
 
Strategy Report for NextGen BI
Strategy Report for NextGen BIStrategy Report for NextGen BI
Strategy Report for NextGen BINeil Raden
 
Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)Bianca Pereira
 
Privacy in AI/ML Systems: Practical Challenges and Lessons Learned
Privacy in AI/ML Systems: Practical Challenges and Lessons LearnedPrivacy in AI/ML Systems: Practical Challenges and Lessons Learned
Privacy in AI/ML Systems: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
 
Imagine a World Without Strangers
Imagine a World Without StrangersImagine a World Without Strangers
Imagine a World Without StrangersCrick Waters
 
Let's Talk: fundamentals of conversational design
Let's Talk: fundamentals of conversational designLet's Talk: fundamentals of conversational design
Let's Talk: fundamentals of conversational designNikita Lukianets
 
How new ai based analytics ignite a productivity revolution in e discovery-final
How new ai based analytics ignite a productivity revolution in e discovery-finalHow new ai based analytics ignite a productivity revolution in e discovery-final
How new ai based analytics ignite a productivity revolution in e discovery-finaljcscholtes
 
Fairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsFairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsKrishnaram Kenthapadi
 
A Capability Requirements Approach for Predicting Worker Performance in Crowd...
A Capability Requirements Approach for Predicting Worker Performance in Crowd...A Capability Requirements Approach for Predicting Worker Performance in Crowd...
A Capability Requirements Approach for Predicting Worker Performance in Crowd...Umair ul Hassan
 

Tendances (19)

Fairness, Transparency, and Privacy in AI @LinkedIn
Fairness, Transparency, and Privacy in AI @LinkedInFairness, Transparency, and Privacy in AI @LinkedIn
Fairness, Transparency, and Privacy in AI @LinkedIn
 
AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge Management
 
From services to cogs and journey to cognitive bpm print version
From services to cogs and journey to cognitive bpm   print versionFrom services to cogs and journey to cognitive bpm   print version
From services to cogs and journey to cognitive bpm print version
 
Measuring Relevance in the Negative Space
Measuring Relevance in the Negative SpaceMeasuring Relevance in the Negative Space
Measuring Relevance in the Negative Space
 
Fairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsFairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML Systems
 
From Services to Cogs and Journey to Cognitive BPM
From Services to Cogs and Journey to Cognitive BPMFrom Services to Cogs and Journey to Cognitive BPM
From Services to Cogs and Journey to Cognitive BPM
 
Cognitive Enterprise Services
Cognitive Enterprise ServicesCognitive Enterprise Services
Cognitive Enterprise Services
 
Chatbots in HR: Improving the Employee Experience
Chatbots in HR: Improving the Employee ExperienceChatbots in HR: Improving the Employee Experience
Chatbots in HR: Improving the Employee Experience
 
Building Knowledge Graphs in DIG
Building Knowledge Graphs in DIGBuilding Knowledge Graphs in DIG
Building Knowledge Graphs in DIG
 
Strategy Report for NextGen BI
Strategy Report for NextGen BIStrategy Report for NextGen BI
Strategy Report for NextGen BI
 
Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)
 
Privacy in AI/ML Systems: Practical Challenges and Lessons Learned
Privacy in AI/ML Systems: Practical Challenges and Lessons LearnedPrivacy in AI/ML Systems: Practical Challenges and Lessons Learned
Privacy in AI/ML Systems: Practical Challenges and Lessons Learned
 
Imagine a World Without Strangers
Imagine a World Without StrangersImagine a World Without Strangers
Imagine a World Without Strangers
 
Let's Talk: fundamentals of conversational design
Let's Talk: fundamentals of conversational designLet's Talk: fundamentals of conversational design
Let's Talk: fundamentals of conversational design
 
How new ai based analytics ignite a productivity revolution in e discovery-final
How new ai based analytics ignite a productivity revolution in e discovery-finalHow new ai based analytics ignite a productivity revolution in e discovery-final
How new ai based analytics ignite a productivity revolution in e discovery-final
 
Enterprise Systems - MS809
Enterprise Systems -   MS809Enterprise Systems -   MS809
Enterprise Systems - MS809
 
Fairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsFairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML Systems
 
A Capability Requirements Approach for Predicting Worker Performance in Crowd...
A Capability Requirements Approach for Predicting Worker Performance in Crowd...A Capability Requirements Approach for Predicting Worker Performance in Crowd...
A Capability Requirements Approach for Predicting Worker Performance in Crowd...
 
547 551
547 551547 551
547 551
 

Similaire à Scale, Structure, and Semantics

Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The PeopleDaniel Tunkelang
 
Big Data and Data Standardization at LinkedIn
Big Data and Data Standardization at LinkedInBig Data and Data Standardization at LinkedIn
Big Data and Data Standardization at LinkedInAlexis Baird
 
Connecting Talent to Opportunity.. at scale @ LinkedIn
Connecting Talent to Opportunity.. at scale @ LinkedInConnecting Talent to Opportunity.. at scale @ LinkedIn
Connecting Talent to Opportunity.. at scale @ LinkedInAnmol Bhasin
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge GraphTrey Grainger
 
Middeware2012 crowd
Middeware2012 crowdMiddeware2012 crowd
Middeware2012 crowdmjfrankli
 
Machine Learning for Recommender Systems in the Job Market
Machine Learning for Recommender Systems in the Job MarketMachine Learning for Recommender Systems in the Job Market
Machine Learning for Recommender Systems in the Job MarketFabian Abel
 
How to Build Data Science Teams
How to Build Data Science TeamsHow to Build Data Science Teams
How to Build Data Science TeamsGanes Kesari
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
 
empirical software engineering, v2.0
empirical software engineering, v2.0empirical software engineering, v2.0
empirical software engineering, v2.0CS, NcState
 
Understanding the New World of Cognitive Computing
Understanding the New World of Cognitive ComputingUnderstanding the New World of Cognitive Computing
Understanding the New World of Cognitive ComputingDATAVERSITY
 
Ml pluss ejan2013
Ml pluss ejan2013Ml pluss ejan2013
Ml pluss ejan2013CS, NcState
 
IIPGH Webinar 1: Getting Started With Data Science
IIPGH Webinar 1: Getting Started With Data ScienceIIPGH Webinar 1: Getting Started With Data Science
IIPGH Webinar 1: Getting Started With Data Scienceds4good
 
Unlocking value in your (big) data
Unlocking value in your (big) dataUnlocking value in your (big) data
Unlocking value in your (big) dataOscar Renalias
 
Software Design
Software DesignSoftware Design
Software DesignHa Ninh
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataAndre Freitas
 
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for EveryoneGDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for EveryoneJames Anderson
 
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
 Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw... Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...Christian Posse
 
Workforce Insight - A Case Study (cfactor and DeVry)
Workforce Insight - A Case Study (cfactor and DeVry)Workforce Insight - A Case Study (cfactor and DeVry)
Workforce Insight - A Case Study (cfactor and DeVry)cfactor Works Inc.
 
Futuristic knowledge management ppt bec bagalkot mba
Futuristic knowledge management ppt bec bagalkot mbaFuturistic knowledge management ppt bec bagalkot mba
Futuristic knowledge management ppt bec bagalkot mbaBabasab Patil
 
Meetup 22/2/2018 - Artificiële Intelligentie & Human Resources
Meetup 22/2/2018 - Artificiële Intelligentie & Human ResourcesMeetup 22/2/2018 - Artificiële Intelligentie & Human Resources
Meetup 22/2/2018 - Artificiële Intelligentie & Human ResourcesDigipolis Antwerpen
 

Similaire à Scale, Structure, and Semantics (20)

Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
 
Big Data and Data Standardization at LinkedIn
Big Data and Data Standardization at LinkedInBig Data and Data Standardization at LinkedIn
Big Data and Data Standardization at LinkedIn
 
Connecting Talent to Opportunity.. at scale @ LinkedIn
Connecting Talent to Opportunity.. at scale @ LinkedInConnecting Talent to Opportunity.. at scale @ LinkedIn
Connecting Talent to Opportunity.. at scale @ LinkedIn
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
 
Middeware2012 crowd
Middeware2012 crowdMiddeware2012 crowd
Middeware2012 crowd
 
Machine Learning for Recommender Systems in the Job Market
Machine Learning for Recommender Systems in the Job MarketMachine Learning for Recommender Systems in the Job Market
Machine Learning for Recommender Systems in the Job Market
 
How to Build Data Science Teams
How to Build Data Science TeamsHow to Build Data Science Teams
How to Build Data Science Teams
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
empirical software engineering, v2.0
empirical software engineering, v2.0empirical software engineering, v2.0
empirical software engineering, v2.0
 
Understanding the New World of Cognitive Computing
Understanding the New World of Cognitive ComputingUnderstanding the New World of Cognitive Computing
Understanding the New World of Cognitive Computing
 
Ml pluss ejan2013
Ml pluss ejan2013Ml pluss ejan2013
Ml pluss ejan2013
 
IIPGH Webinar 1: Getting Started With Data Science
IIPGH Webinar 1: Getting Started With Data ScienceIIPGH Webinar 1: Getting Started With Data Science
IIPGH Webinar 1: Getting Started With Data Science
 
Unlocking value in your (big) data
Unlocking value in your (big) dataUnlocking value in your (big) data
Unlocking value in your (big) data
 
Software Design
Software DesignSoftware Design
Software Design
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
 
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for EveryoneGDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
 
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
 Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw... Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
 
Workforce Insight - A Case Study (cfactor and DeVry)
Workforce Insight - A Case Study (cfactor and DeVry)Workforce Insight - A Case Study (cfactor and DeVry)
Workforce Insight - A Case Study (cfactor and DeVry)
 
Futuristic knowledge management ppt bec bagalkot mba
Futuristic knowledge management ppt bec bagalkot mbaFuturistic knowledge management ppt bec bagalkot mba
Futuristic knowledge management ppt bec bagalkot mba
 
Meetup 22/2/2018 - Artificiële Intelligentie & Human Resources
Meetup 22/2/2018 - Artificiële Intelligentie & Human ResourcesMeetup 22/2/2018 - Artificiële Intelligentie & Human Resources
Meetup 22/2/2018 - Artificiële Intelligentie & Human Resources
 

Plus de Daniel Tunkelang

Query Understanding and Ecommerce
Query Understanding and EcommerceQuery Understanding and Ecommerce
Query Understanding and EcommerceDaniel Tunkelang
 
Semantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesSemantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesDaniel Tunkelang
 
Helping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingHelping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingDaniel Tunkelang
 
Query Understanding: A Manifesto
Query Understanding: A ManifestoQuery Understanding: A Manifesto
Query Understanding: A ManifestoDaniel Tunkelang
 
Where should you put your data scientists?
Where should you put your data scientists?Where should you put your data scientists?
Where should you put your data scientists?Daniel Tunkelang
 
Data Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityData Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityDaniel Tunkelang
 
My Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningMy Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningDaniel Tunkelang
 
Web science - How is it different?
Web science - How is it different?Web science - How is it different?
Web science - How is it different?Daniel Tunkelang
 
Better Search Through Query Understanding
Better Search Through Query UnderstandingBetter Search Through Query Understanding
Better Search Through Query UnderstandingDaniel Tunkelang
 
Social Search in a Professional Context
Social Search in a Professional ContextSocial Search in a Professional Context
Social Search in a Professional ContextDaniel Tunkelang
 
Find and be Found: Information Retrieval at LinkedIn
Find and be Found: Information Retrieval at LinkedInFind and be Found: Information Retrieval at LinkedIn
Find and be Found: Information Retrieval at LinkedInDaniel Tunkelang
 
Search as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneySearch as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneyDaniel Tunkelang
 
Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Daniel Tunkelang
 
Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Daniel Tunkelang
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data ScientistDaniel Tunkelang
 
Information, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsInformation, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsDaniel Tunkelang
 
Strata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of MicroworkStrata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of MicroworkDaniel Tunkelang
 
Recommendations as a Conversation with the User
Recommendations as a Conversation with the UserRecommendations as a Conversation with the User
Recommendations as a Conversation with the UserDaniel Tunkelang
 

Plus de Daniel Tunkelang (20)

Query Understanding and Ecommerce
Query Understanding and EcommerceQuery Understanding and Ecommerce
Query Understanding and Ecommerce
 
Semantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesSemantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce Queries
 
Helping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingHelping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query Understanding
 
MMM, Search!
MMM, Search!MMM, Search!
MMM, Search!
 
Enterprise Intelligence
Enterprise IntelligenceEnterprise Intelligence
Enterprise Intelligence
 
Query Understanding: A Manifesto
Query Understanding: A ManifestoQuery Understanding: A Manifesto
Query Understanding: A Manifesto
 
Where should you put your data scientists?
Where should you put your data scientists?Where should you put your data scientists?
Where should you put your data scientists?
 
Data Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityData Science: A Mindset for Productivity
Data Science: A Mindset for Productivity
 
My Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningMy Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine Learning
 
Web science - How is it different?
Web science - How is it different?Web science - How is it different?
Web science - How is it different?
 
Better Search Through Query Understanding
Better Search Through Query UnderstandingBetter Search Through Query Understanding
Better Search Through Query Understanding
 
Social Search in a Professional Context
Social Search in a Professional ContextSocial Search in a Professional Context
Social Search in a Professional Context
 
Find and be Found: Information Retrieval at LinkedIn
Find and be Found: Information Retrieval at LinkedInFind and be Found: Information Retrieval at LinkedIn
Find and be Found: Information Retrieval at LinkedIn
 
Search as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneySearch as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal Journey
 
Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?
 
Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
 
Information, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsInformation, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of Needs
 
Strata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of MicroworkStrata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of Microwork
 
Recommendations as a Conversation with the User
Recommendations as a Conversation with the UserRecommendations as a Conversation with the User
Recommendations as a Conversation with the User
 

Dernier

Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 

Dernier (20)

Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 

Scale, Structure, and Semantics

  • 1. Daniel Scale, Structure, and Semantics Daniel Tunkelang Principal Data Scientist at LinkedIn Recruiting Solutions 1
  • 2. Take-Aways Communication trumps knowledge representation. Communication is the problem and the solution. 2
  • 3. Overview 1.  Knowledge representation is overrated. 2.  Computation is underrated. 3.  We have a communication problem. 3
  • 4. The Bad News 1.  Knowledge representation is overrated. 2.  Computation is underrated. 3.  We have a communication problem. 4
  • 5. AI: a dream deferred. 5
  • 6. Memex: the Computer Science Version 6
  • 7. Cyc 7
  • 10. Knowledge representation is overrated. Today’s knowledge repositories are: §  incomplete §  inconsistent §  inscrutable §  and not sustained by economic incentives. 1986 estimate of effort to complete Cyc: §  250,000 rules + 350 person-years 10
  • 11. The Good News 1.  Knowledge representation is overrated. 2.  Computation is underrated. 3.  We have a communication problem. 11
  • 12. Deep Blue vs. 12
  • 13. Watson 13
  • 14. Plain Old Search Engines are Pretty Good Too http://blog.stephenwolfram.com/2011/01/jeopardy-ibm-and-wolframalpha/ 14
  • 15. The Unreasonable Effectiveness of Data §  simple models + lots of data >> elaborate models + less data §  machine translation: parallel corpora >> elaborate rules for syntactic and semantic patterns §  semantic web formalism just means semantic interpretation on shorter strings between angle brackets Alon Halevy, Peter Norvig, and Fernando Pereira (2009) 15
  • 16. Today’s Challenge 1.  Knowledge representation is overrated. 2.  Computation is underrated. 3.  We have a communication problem. 16
  • 17. Semi-structured Data Michael K. Bergman, http://www.mkbergman.com/ 17
  • 18. Semi-structured Data at LinkedIn Summary <person> I lead a data science <id> team at LinkedIn, which <first-name /> analyzes terabytes of <last-name /> data to produce products <location> and insights that serve <name> LinkedIn’s members. <country> Prior to LinkedIn, I led a <code> local search quality team </country> at Google and was a </location> founding employee of <industry> faceted search pioneer … Endeca (acquired by </person> Oracle in 2010), where…
  • 19. Semi-structured Search is a Killer App 19
  • 20. Another Example: Helping a Friend Dear Daniel, I'm attaching the resume of an old friend who just moved up to the Bay Area. He has a very strong background in: §  mobile / wireless applications §  start-ups and new product launches §  international expansion Best regards, XXX 20
  • 24. Data-Driven Computation Serves Communication for i in [1..n]! s ← w 1 w 2 … w i! if Pc(s) > 0! a ← new Segment()! a.segs ← {s}! a.prob ← Pc(s)! B[i] ← {a}! for j in [1..i-1]! for b in B[j]! s ← wj wj+1 … wi! if Pc(s) > 0! a ← new Segment()! a.segs ← b.segs U {s}! a.prob ← b.prob * Pc(s)! B[i] ← B[i] U {a}! sort B[i] by prob! truncate B[i] to size k! 24
  • 25. Recommendations Leverage Semi-structured Data Job Corpus Stats Matching Transition probabilities Connectivity Binary yrs of experience to reach title title industry … Exact matches: education needed for this title geo description … company functional area geo, industry, … User Base Soft Similarity (candidate expertise, job description) transition Filtered 0.56 probabilities, Similarity Candidate similarity, (candidate specialties, job description) … 0.2 Transition probability Text (candidate industry, job industry) General Current Position 0.43 expertise title specialties summary Title Similarity education tenure length 0.8 headline industry Similarity (headline, title) geo functional area experience … 0.7 . derive d . . 25
  • 26. Skills: A Practical Knowledge Representation 26
  • 28. Data-Driven Query Refinement for Precision 28
  • 29. There is no perfect schema or vocabulary. §  And even if there were, not everyone would use it. §  Knowledge representation has only succeeded within narrow scope. §  Brute force is surprisingly effective but does not leverage the user as an intelligent partner. 29
  • 30. Communication is the problem and the solution. §  Rich communication channel fills gaps in system’s knowledge representation and in user’s knowledge. §  Use data science to make the system smart, but be humble and empower the human user. You've got the brawn I've got the brains Let's make lots of money Pet Shop Boys, “Opportunities” 30
  • 31. The Future is Upon Us 31
  • 32. One More Thing “More data beats clever algorithms but better data beats more data.” Monica Rogati @ Strata 2012 32
  • 33. Thank You! Questions? Contact: dtunkelang@linkedin.com We’re Hiring! 33

Notes de l'éditeur

  1. Two icons of artificial intelligence from science fiction: the HAL 9000 computer from 2001: A Space Odyssey and the android Data from Star Trek: The Next Generation. Both exceed human beings in their ability to assimilate knowledge and to reason using that knowledge. Both interact with human beings in natural language.Despite all of our technological advances, the closest we have come to this vision is talking to Siri. An improvement on the 1960s ELIZA program for sure, but still a baby step.
  2. In 1945, Vannevar Bush put forth his vision of amemex (a portmanteau of &quot;memory&quot; and &quot;index”) as a device in which individuals would compress and store all of their books, records, and communications, &quot;mechanized so that it may be consulted with exceeding speed and flexibility&quot;. The memex would provide an &quot;enlarged intimate supplement to one&apos;s memory”. The concept of the memex influenced the development of hypertext systems,eventually leading to the creation of the World Wide Web and personal knowledge base software.
  3. A pure embodiment of AI vision: Cyc was started in 1984 as an artificial intelligence project that attempts to assemble a comprehensive ontology and knowledge base of everyday common sense knowledge, with the goal of enabling AI applications to perform human-like reasoning.Typical pieces of knowledge represented in the database are &quot;Every tree is a plant&quot; and &quot;Plants die eventually&quot;. When asked whether trees die, the inference engine can draw the obvious conclusion and answer the question correctly. The knowledge base contains over one million human-defined assertions, rules or common sense ideas. These are formulated in a language based on predicate calculus.
  4. Freebase is a large collaborative knowledge base consisting of metadata composed mainly by its community members. It is an online collection of structured data harvested from many sources, including individual &apos;wiki&apos; contributions. Freebase aims to create a global resource which allows people (and machines) to access common information more effectively.Freebase is a wonderful resource, and search engines are starting to use it as a structured data resource. But using Freebase for structured queries is a lot trickier than using Google for free-text queries, largely because Freebase is incomplete in unpredictable ways. In particular, Freebase has difficulties with Null, nothing, unknown or N/A values. For example, in the results for &quot;fires of unknown cause&quot;, there is no way to tell whether the cause of the fire is really unknown or the data is missing.
  5. Wolfram Alpha is an answer engine developed by Wolfram Research. It is an online service that answers factual queries directly by computing the answer from structured data.Wolfram Alpha is impressive. It’s no wonder that Wolfram Alpha serves as the back end for many Siri queries.Unfortunately, its natural-language interface is brittle. As we can see from these two queries, it can roughly report the number of software engineers in the San Francisco Bay Area, but not the number of software companies. Nobody is perfect. But what is disconcerting is that the system does nothing to suggest that the latter answer is less reliable than the former. Does the system know how to answer the second question? There is no way for the user to be sure, other than perhaps by trial and error eventually leading to resolution or frustrated resignation. This is a communication problem.
  6. Deep Blue was a chess-playing computer developed by IBM. In 1997, the machine defeated world champion Garry Kasparov in a match.What was its secret sauce? Could it think? Did it learn to play chess and represent that wisdom in a knowledge base? Not really – to borrow a line from Toy Story, it won by using brute force with style. It was a massively parallel system (by 1997 standards) made with special-purpose chips.
  7. A decade later, IBM did it again. IBM researchers decided to build a system to beat humans at a more modern game than chess – namely the Jeopardy! television quiz show featuring trivia in history, pop culture, sports,, etc. Moreover, many Jeopardy questions (or “answers”, since the gimmick of the game is that the question-answer process is inverted) involve word play, which would seem particularly challenging for a machine.Like Deep Blue, Watson is all about computation. Its knowledge base is mined from 200 million pages of structured and unstructured content consuming four terabytes of disk storage, including the full text of Wikipedia. It uses a server cluster with 720 cores and relies on parallel processing to parse questions and search its knowledge base for candidate responses.In February 2011, Watson defeated former Jeopardy! champions Ken Jennings and Brad Rutter in a televised match.
  8. Watson’s achievement was impressive. But let’s put things in perspective. Even a plain old search engines do pretty well at Jeopardy. The comparison isn’t entirely fair: in judging the search engines, we are only requiring that they return pages on which the answer should appear, not giving specific actual answers. One can try various simple strategies for going further. Like getting the answer from the title of the first hit—which with the top search engines actually does succeed about 20% of the time.Still, the point should be clear. None of these strategies are using sophisticated semantic representations. Computation – brute force with style -- is the big winner.
  9. In 2009, Google researchersAlon Halevy, Peter Norvig, and Fernando Pereira wrote a popular article entitled “The Unreasonable Effectiveness of Data”. It has often been paraphrased as better date beats clever algorithms. But for our purposed we can interpret it as celebrating the triumph of computation over knowledge representation as a means to produce semantic or intelligent behavior.
  10. Let’s take stock of what kind of data we have. Most of our data is semi-structured data -- the broad space that lives in between structured data (the rigid schemas we associated with relational database systems) and unstructured data (e.g., the free text indexed by search engines). The structure in semi-structured data takes the forms of tags and structural elements without a rigid schema (e.g., XML).
  11. LinkedIn has one of the largest and richest collections of semi-structured data on the consumer internet. Here you can see how our people data combines free text, a connection network, and a collection of structured tags. And these aren’t the only entities – we have companies, jobs, etc.
  12. Here I’m searching for people I know in the Bay Area who have “data” anywhere in their profiles and currently work at Google, Yahoo!, or Twitter. Maybe I should look at my Facebook connections too. Did I mention that I’m hiring? The power of such a search is incredible, and the experience is highly intuitive even for a user who has no idea that either the data or the search query is “semi-structured”. The interaction revolves around facets that are well represented in both the data and the user’s mental model.
  13. True story, redacted only the protect my friend’s privacy.
  14. Of course I know that the first place to research companies is LinkedIn. So I started with a generic company search for “mobile”. The results are reasonable, given the query lack of specificity. But clearly I need edto be more specific.
  15. Here is my revised query: small mobile companies headquartered in the Bay Area in software-related industries. This may not have been exactly what my friend was looking for, but it was a great starting point. Specifically, the systems helped me map his information need to a query that captured its spirit.
  16. Computation is powerful – especially at our scale of data and users. Applying machine learning allows us to produce recommendations for job matching, content, community, etc. And of course it drives the feature LinkedIn is most famous for: People You May Know.
  17. One of the steps in processing search queries is to parse them and establish query interpretations – in this case, that “linkedin” refers to a company and “ceo” refers to a job title. We do so using a hidden Markov model (HMM) trained on our corpus statistics and search logs. This allows us to handle word-sense ambiguity, e.g., “dell” as a first name, last name, or company name.
  18. In order to evaluate a job-candidate pair, we first use common-sense filtering to determine if the candidate is even plausible, e.g., we don&apos;t need fancy algorithms to determine that a sales executive in Turkey isn&apos;t a good match for a software engineering job in Mountain View. After this filtering, we take the two bags of features and create a single set of features for the pair to represent the matching. The matching features can be binary features (e.g., is the candidate in the same industry as the job?), softer (e.g., based on the transition probability between the the candidate&apos;s current job and the potential new one), and textual (we can use standard information retrieval methods to compare documents). Combining all of these using weights learned through regression, we can assign scores to matches. Note again that scale matters -- our corpus statistics are essential to computing the above features without falling victim to sparsity.
  19. If the value of your network reflects the saying that &quot;you are who you know&quot;, Skills offers the complementary &quot;you are what you know&quot;. Skills are diverse -- ranging from Ballet to Hadoop. In order to identify the set of skills, we turn again to the unreasonable effectiveness of data. Many of our 160M+ users have a Specialties section where they list their skills as free text. By mining these sections and other profile elements, we generated a set of potential skills for our entire corpus. Bootstrapping on that list, we implemented a suggested skills feature that is leading to increasing adoption of our controlled vocabulary.
  20. Skills is still in beta. But here you see how related skills – which are derived by mining our corpus – can increase recall on a search for people who have expertise in WordNet, a lexical database developed at Princeton. We can’t rely on people to mention WordNet in their profiles. But we can expand our search to include related skills like ontologies and semantic search. Of course it’s a precision / recall tradeoff – but one that is completely transparent to the user.
  21. The same technique can be used to disambiguate a query like [owl]. If you’re looking for OWL specialists rather than ornithologists, then it’s helpful to require some supporting evidence, such as expertise in the semantic web or RDF.
  22. Knowledge representation isn’t the answer. Computation is great. But with semi-structured data and data-driven computation, we can get even further.
  23. To achieve the best results, we have to exploit the strengths of both people and machines. That means using computation to support communication.
  24. Web search is beginning to embrace semi-structured data – using the unreasonable effectiveness of data to exploit the structure it has and derive latent structure where possible. The result is more user control and a more intuitive communication between the user and the system. What was once exotic is rapidly becoming mainstream.
  25. At this year’s Strata conference, my colleague Monica Rogati one-upped Norvig etal’s argument about the unreasonable effectiveness of data. Not all data is created equal, and quality trumps quality. This is a teaser – I recommend you watch her talk on “The Model and the Train Wreck”.