Scale, Structure, and Semantics: Communication is Key

Daniel

Scale, Structure, and
Semantics
Daniel Tunkelang
Principal Data Scientist at LinkedIn

Recruiting Solutions 1

Take-Aways

Communication trumps knowledge representation.

Communication is the problem and the solution.

2

Overview

1.  Knowledge representation is overrated.

2.  Computation is underrated.

3.  We have a communication problem.

3

The Bad News




4

AI: a dream deferred.

5

Memex: the Computer Science Version

6

Knowledge representation is overrated.

Today’s knowledge repositories are:
§  incomplete
§  inconsistent
§  inscrutable

§  and not sustained by economic incentives.

1986 estimate of effort to complete Cyc:
§  250,000 rules + 350 person-years

10

The Good News




11

Plain Old Search Engines are Pretty Good Too

http://blog.stephenwolfram.com/2011/01/jeopardy-ibm-and-wolframalpha/

14

The Unreasonable Effectiveness of Data

§  simple models + lots of data >>
elaborate models + less data

§  machine translation: parallel corpora >>
elaborate rules for syntactic and semantic patterns

§  semantic web formalism just means semantic
interpretation on shorter strings between angle brackets

Alon Halevy, Peter Norvig, and Fernando Pereira (2009)

15

Today’s Challenge




16

Semi-structured Data

Michael K. Bergman, http://www.mkbergman.com/

17

Semi-structured Data at LinkedIn

Summary <person>
I lead a data science <id>
team at LinkedIn, which <first-name />
analyzes terabytes of <last-name />
data to produce products <location>
and insights that serve <name>
LinkedIn’s members. <country>
Prior to LinkedIn, I led a <code>
local search quality team </country>
at Google and was a </location>
founding employee of <industry>
faceted search pioneer …
Endeca (acquired by </person>
Oracle in 2010), where…

Semi-structured Search is a Killer App

19

Another Example: Helping a Friend

Dear Daniel,

I'm attaching the resume of an old friend who just moved up
to the Bay Area.

He has a very strong background in:
§  mobile / wireless applications
§  start-ups and new product launches
§  international expansion

Best regards,
XXX

20

Semi-structured Data Empowers Users

22

Data-Driven Recommendations

23

Data-Driven Computation Serves Communication

for i in [1..n]!
s ← w 1 w 2 … w i!
if Pc(s) > 0!
a ← new Segment()!
a.segs ← {s}!
a.prob ← Pc(s)!
B[i] ← {a}!
for j in [1..i-1]!
for b in B[j]!
s ← wj wj+1 … wi!
if Pc(s) > 0!
a ← new Segment()!
a.segs ← b.segs U {s}!
a.prob ← b.prob * Pc(s)!
B[i] ← B[i] U {a}!
sort B[i] by prob!
truncate B[i] to size k!

24

Recommendations Leverage Semi-structured Data
Job Corpus Stats
Matching Transition probabilities
Connectivity
Binary yrs of experience to reach title
title industry …
Exact matches: education needed for this title
geo description …
company functional area geo, industry,
…

User Base Soft Similarity
(candidate expertise, job description)
transition
Filtered 0.56
probabilities,
Similarity
Candidate similarity, (candidate specialties, job description)
… 0.2
Transition probability
Text (candidate industry, job industry)
General Current Position 0.43
expertise title
specialties summary Title Similarity

education tenure length 0.8
headline industry
Similarity (headline, title)
geo functional area
experience … 0.7
.
derive
d
.
.
25

Skills: A Practical Knowledge Representation

26

Data-Driven Query Expansion for Recall

27

Data-Driven Query Refinement for Precision

28

There is no perfect schema or vocabulary.

§  And even if there were, not everyone would use it.

§  Knowledge representation has only succeeded within
narrow scope.

§  Brute force is surprisingly effective but does not leverage
the user as an intelligent partner.

29

Communication is the problem and the solution.

§  Rich communication channel fills gaps in system’s
knowledge representation and in user’s knowledge.

§  Use data science to make the system smart, but be
humble and empower the human user.

You've got the brawn
I've got the brains
Let's make lots of money
Pet Shop Boys, “Opportunities”

30

The Future is Upon Us

31

One More Thing

“More data beats clever algorithms
but better data beats more data.”

Monica Rogati @ Strata 2012

32

Thank You!

Questions?

Contact:

dtunkelang@linkedin.com

We’re Hiring!

33

Scale, Structure, and Semantics: Communication is Key

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (19)

Similaire à Scale, Structure, and Semantics: Communication is Key

Similaire à Scale, Structure, and Semantics: Communication is Key (20)

Plus de Daniel Tunkelang

Plus de Daniel Tunkelang (20)

Dernier

Dernier (20)

Scale, Structure, and Semantics: Communication is Key

Notes de l'éditeur