Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Daniel




Scale, Structure, and
Semantics
Daniel Tunkelang
Principal Data Scientist at LinkedIn

      Recruiting Solutio...
Take-Aways




  Communication trumps knowledge representation.


   Communication is the problem and the solution.




  ...
Overview

1.  Knowledge representation is overrated.



2.  Computation is underrated.



3.  We have a communication prob...
The Bad News

1.  Knowledge representation is overrated.



2.  Computation is underrated.



3.  We have a communication ...
AI: a dream deferred.




                        5
Memex: the Computer Science Version




                                      6
Cyc




      7
Freebase




           8
Wolfram Alpha




                9
Knowledge representation is overrated.

Today’s knowledge repositories are:
§  incomplete
§  inconsistent
§  inscrutabl...
The Good News

1.  Knowledge representation is overrated.



2.  Computation is underrated.



3.  We have a communication...
Deep Blue




            vs.


                  12
Watson




         13
Plain Old Search Engines are Pretty Good Too




  http://blog.stephenwolfram.com/2011/01/jeopardy-ibm-and-wolframalpha/

...
The Unreasonable Effectiveness of Data

§  simple models + lots of data >>
                              elaborate models...
Today’s Challenge

1.  Knowledge representation is overrated.



2.  Computation is underrated.



3.  We have a communica...
Semi-structured Data




         Michael K. Bergman, http://www.mkbergman.com/




                                      ...
Semi-structured Data at LinkedIn


Summary                            <person>
I lead a data science                 <id>
...
Semi-structured Search is a Killer App




                                         19
Another Example: Helping a Friend

Dear Daniel,

I'm attaching the resume of an old friend who just moved up
to the Bay Ar...
Company Search




                 21
Semi-structured Data Empowers Users




                                      22
Data-Driven Recommendations




                              23
Data-Driven Computation Serves Communication




  for i in [1..n]!
    s ← w 1 w 2 … w i!
    if Pc(s) > 0!
      a ← new...
Recommendations Leverage Semi-structured Data
               Job                                            Corpus Stats
 ...
Skills: A Practical Knowledge Representation




                                               26
Data-Driven Query Expansion for Recall




                                         27
Data-Driven Query Refinement for Precision




                                             28
There is no perfect schema or vocabulary.

§  And even if there were, not everyone would use it.

§  Knowledge represent...
Communication is the problem and the solution.

§  Rich communication channel fills gaps in system’s
    knowledge repres...
The Future is Upon Us




                        31
One More Thing

     “More data beats clever algorithms
      but better data beats more data.”

        Monica Rogati @ S...
Thank You!

                  Questions?


                    Contact:

             dtunkelang@linkedin.com


          ...
Upcoming SlideShare
Loading in …5
×

of

Scale, Structure, and Semantics Slide 1 Scale, Structure, and Semantics Slide 2 Scale, Structure, and Semantics Slide 3 Scale, Structure, and Semantics Slide 4 Scale, Structure, and Semantics Slide 5 Scale, Structure, and Semantics Slide 6 Scale, Structure, and Semantics Slide 7 Scale, Structure, and Semantics Slide 8 Scale, Structure, and Semantics Slide 9 Scale, Structure, and Semantics Slide 10 Scale, Structure, and Semantics Slide 11 Scale, Structure, and Semantics Slide 12 Scale, Structure, and Semantics Slide 13 Scale, Structure, and Semantics Slide 14 Scale, Structure, and Semantics Slide 15 Scale, Structure, and Semantics Slide 16 Scale, Structure, and Semantics Slide 17 Scale, Structure, and Semantics Slide 18 Scale, Structure, and Semantics Slide 19 Scale, Structure, and Semantics Slide 20 Scale, Structure, and Semantics Slide 21 Scale, Structure, and Semantics Slide 22 Scale, Structure, and Semantics Slide 23 Scale, Structure, and Semantics Slide 24 Scale, Structure, and Semantics Slide 25 Scale, Structure, and Semantics Slide 26 Scale, Structure, and Semantics Slide 27 Scale, Structure, and Semantics Slide 28 Scale, Structure, and Semantics Slide 29 Scale, Structure, and Semantics Slide 30 Scale, Structure, and Semantics Slide 31 Scale, Structure, and Semantics Slide 32 Scale, Structure, and Semantics Slide 33
Upcoming SlideShare
Better Search Through Query Understanding
Next
Download to read offline and view in fullscreen.

7 Likes

Share

Download to read offline

Scale, Structure, and Semantics

Download to read offline

Keynote at 2012 Semantic Technology and Business Conference

Scale, Structure, and Semantics
Daniel Tunkelang, LinkedIn

Science fiction has a mixed track record when it comes to anticipating technological innovations. While Jules Verne fared well with with his predictions of submarine and space technology, artificial intelligence hasn't produced anything like Arthur C. Clarke's HAL 9000.

Instead, we've managed to elicit intelligence from machines through unexpected means. Search engines have achieved remarkable success in organizing the world's information by crawling the web, indexing documents, and exploiting link structure to establish authoritativeness. At LinkedIn, we apply large-scale analytics to terabytes of semistructured data to deliver products and insights that serve our 150M+ members. Semantics emerge when we apply the right analytical techniques to a sufficient quality and quantity of data.

In this talk, I will describe how LinkedIn's huge and rich graph of relationship data that powers the products our users love. I believe that the lessons we have learned apply broadly to other semantic applications. While quantity and quality of data are the key challenges to delivering a semantically rich experience, the key is to create the right ecosystem that incents people to give you good data, which then forms the basis for great data products.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Scale, Structure, and Semantics

  1. Daniel Scale, Structure, and Semantics Daniel Tunkelang Principal Data Scientist at LinkedIn Recruiting Solutions 1
  2. Take-Aways Communication trumps knowledge representation. Communication is the problem and the solution. 2
  3. Overview 1.  Knowledge representation is overrated. 2.  Computation is underrated. 3.  We have a communication problem. 3
  4. The Bad News 1.  Knowledge representation is overrated. 2.  Computation is underrated. 3.  We have a communication problem. 4
  5. AI: a dream deferred. 5
  6. Memex: the Computer Science Version 6
  7. Cyc 7
  8. Freebase 8
  9. Wolfram Alpha 9
  10. Knowledge representation is overrated. Today’s knowledge repositories are: §  incomplete §  inconsistent §  inscrutable §  and not sustained by economic incentives. 1986 estimate of effort to complete Cyc: §  250,000 rules + 350 person-years 10
  11. The Good News 1.  Knowledge representation is overrated. 2.  Computation is underrated. 3.  We have a communication problem. 11
  12. Deep Blue vs. 12
  13. Watson 13
  14. Plain Old Search Engines are Pretty Good Too http://blog.stephenwolfram.com/2011/01/jeopardy-ibm-and-wolframalpha/ 14
  15. The Unreasonable Effectiveness of Data §  simple models + lots of data >> elaborate models + less data §  machine translation: parallel corpora >> elaborate rules for syntactic and semantic patterns §  semantic web formalism just means semantic interpretation on shorter strings between angle brackets Alon Halevy, Peter Norvig, and Fernando Pereira (2009) 15
  16. Today’s Challenge 1.  Knowledge representation is overrated. 2.  Computation is underrated. 3.  We have a communication problem. 16
  17. Semi-structured Data Michael K. Bergman, http://www.mkbergman.com/ 17
  18. Semi-structured Data at LinkedIn Summary <person> I lead a data science <id> team at LinkedIn, which <first-name /> analyzes terabytes of <last-name /> data to produce products <location> and insights that serve <name> LinkedIn’s members. <country> Prior to LinkedIn, I led a <code> local search quality team </country> at Google and was a </location> founding employee of <industry> faceted search pioneer … Endeca (acquired by </person> Oracle in 2010), where…
  19. Semi-structured Search is a Killer App 19
  20. Another Example: Helping a Friend Dear Daniel, I'm attaching the resume of an old friend who just moved up to the Bay Area. He has a very strong background in: §  mobile / wireless applications §  start-ups and new product launches §  international expansion Best regards, XXX 20
  21. Company Search 21
  22. Semi-structured Data Empowers Users 22
  23. Data-Driven Recommendations 23
  24. Data-Driven Computation Serves Communication for i in [1..n]! s ← w 1 w 2 … w i! if Pc(s) > 0! a ← new Segment()! a.segs ← {s}! a.prob ← Pc(s)! B[i] ← {a}! for j in [1..i-1]! for b in B[j]! s ← wj wj+1 … wi! if Pc(s) > 0! a ← new Segment()! a.segs ← b.segs U {s}! a.prob ← b.prob * Pc(s)! B[i] ← B[i] U {a}! sort B[i] by prob! truncate B[i] to size k! 24
  25. Recommendations Leverage Semi-structured Data Job Corpus Stats Matching Transition probabilities Connectivity Binary yrs of experience to reach title title industry … Exact matches: education needed for this title geo description … company functional area geo, industry, … User Base Soft Similarity (candidate expertise, job description) transition Filtered 0.56 probabilities, Similarity Candidate similarity, (candidate specialties, job description) … 0.2 Transition probability Text (candidate industry, job industry) General Current Position 0.43 expertise title specialties summary Title Similarity education tenure length 0.8 headline industry Similarity (headline, title) geo functional area experience … 0.7 . derive d . . 25
  26. Skills: A Practical Knowledge Representation 26
  27. Data-Driven Query Expansion for Recall 27
  28. Data-Driven Query Refinement for Precision 28
  29. There is no perfect schema or vocabulary. §  And even if there were, not everyone would use it. §  Knowledge representation has only succeeded within narrow scope. §  Brute force is surprisingly effective but does not leverage the user as an intelligent partner. 29
  30. Communication is the problem and the solution. §  Rich communication channel fills gaps in system’s knowledge representation and in user’s knowledge. §  Use data science to make the system smart, but be humble and empower the human user. You've got the brawn I've got the brains Let's make lots of money Pet Shop Boys, “Opportunities” 30
  31. The Future is Upon Us 31
  32. One More Thing “More data beats clever algorithms but better data beats more data.” Monica Rogati @ Strata 2012 32
  33. Thank You! Questions? Contact: dtunkelang@linkedin.com We’re Hiring! 33
  • JeannetteSanchaz

    Nov. 27, 2021
  • Golander59

    Apr. 26, 2016
  • jt_kane

    Oct. 28, 2013
  • JessicaWax

    Oct. 22, 2013
  • Ram-N

    Jun. 12, 2012
  • ntlespino

    Jun. 8, 2012
  • dtunkelang

    Jun. 7, 2012

Keynote at 2012 Semantic Technology and Business Conference Scale, Structure, and Semantics Daniel Tunkelang, LinkedIn Science fiction has a mixed track record when it comes to anticipating technological innovations. While Jules Verne fared well with with his predictions of submarine and space technology, artificial intelligence hasn't produced anything like Arthur C. Clarke's HAL 9000. Instead, we've managed to elicit intelligence from machines through unexpected means. Search engines have achieved remarkable success in organizing the world's information by crawling the web, indexing documents, and exploiting link structure to establish authoritativeness. At LinkedIn, we apply large-scale analytics to terabytes of semistructured data to deliver products and insights that serve our 150M+ members. Semantics emerge when we apply the right analytical techniques to a sufficient quality and quantity of data. In this talk, I will describe how LinkedIn's huge and rich graph of relationship data that powers the products our users love. I believe that the lessons we have learned apply broadly to other semantic applications. While quantity and quality of data are the key challenges to delivering a semantically rich experience, the key is to create the right ecosystem that incents people to give you good data, which then forms the basis for great data products.

Views

Total views

16,381

On Slideshare

0

From embeds

0

Number of embeds

3,692

Actions

Downloads

67

Shares

0

Comments

0

Likes

7

×