Natural Language Understanding and Word Sense Disambiguation remains one of the prevailing challenges for both conversational and written word. Natural language understanding attempts to untangle the 'hot mess' of words between more structured data in content, but the challenge is not trivial, since there is so much polysemy in language. Some recent developments in machine learning have seen significant leaps forward in understanding more clearly the context (and therefore user intent and informational need at time of query). Here we will explore these developments, and some of their implementations and seek to understand what this means for search strategists and the brands they support both now and into the future.
12. #pubcon
Word’s Context
• ”The meaning of a word is its use in a
language” (Ludwig Wittgenstein,
Philosopher, 1953)
• Image attribution: Moritz Nähr [Public
domain]
13. #pubcon
Word’s Context
Changes As A Sentence
Evolves
• The meaning of a word changes (literally) as
a sentence develops
• Due to the multiple parts of speech a word
could be in a given content
14. #pubcon
Like “like”
We can see in just this
short sentence alone
using Stanford Part of
Speech Tagger Online
that the word like is
considered to be 2
separate parts of speech
http://nlp.stanford.edu:8080/parser/index.jsp
15. #pubcon
Like “like”
For example: The word ”like” has several
possible parts of speech (including ‘verb’,
‘noun’, ‘adjective’)
POS = Part of Speech
21. #pubcon
Example Part of Speech Tagging (POS)
• Pubcon
• is
• a
• great
• conference
• NNP
• VBZ
• DT
• JJ
• NN
• Proper noun,
singular
• Verb (3rd person,
singular, present)
• Determiner
• Adjective
• Noun
22. #pubcon
Popular POS (Part of
Speech) Taggers
• Penn Treebank Tagger -> 36
different part of speech tags
• CLAWS 7 (C7) Tagset -> 146
different part of speech tags
• Brown Corpus Tagger -> 81
different part of speech tags
39. #pubcon
They need
‘Text
cohesion’
Cohesion is
the grammatical and
lexical linking within a text
or sentence that holds a text
together and gives it meaning.
Without surrounding words the
word bucket could mean
anything in a sentence
40. #pubcon
Word’s Company
“You shall know a word by
the company it keeps” (John
Rupert Firth, Linguist,1957)
Image Attribution: Wikimedia Commons Public
Domain
41. #pubcon
Words That
Live
Together Are
Strongly
Connected
• Co-occurrence
• Co-occurrence provides context
• Co-occurrence changes word’s meaning
• Words that share similar neighbours are
also strongly connected
• Similarity & relatedness
43. #pubcon
Natural Language
Recognition is NOT
Understanding
• Natural language understanding
requires understanding of context and
common sense reasoning. VERY
challenging for machines, but largely
straightforward for humans.
44. #pubcon
Language models are trained
on very large text corpora or
collections (loads of words) to
learn distributional similarity
48. #pubcon
Typical window size might be 5
Source Text
Writin
g
a lis
t
of rando
m
sentence
s
is harde
r
than I Initiall
y
though
t
it woul
d
be
Writin
g
a lis
t
of rando
m
sentence
s
is harde
r
than I Initiall
y
though
t
it woul
d
be
Writin
g
a lis
t
of rando
m
sentence
s
is harde
r
than I Initiall
y
though
t
it woul
d
be
Writin
g
a lis
t
of rando
m
sentence
s
is harde
r
than I Initiall
y
though
t
it woul
d
be
11 letters (5 left and 5 right of the moving target word)
49. #pubcon
Example context window size 3
Source Text Training
Sample
s
The quick brown fox jumps over the lazy dog (the,
quick)
(the,
brown)
(the, fox)
The quick brown fox jumps over the lazy dog (quick,
the)
(quick,
brown)
(quick,
fox)
(quick,
jumps)
The quick brown fox jumps over the lazy dog Etcetera
The quick brown fox jumps over the lazy dog Etcetera
52. #pubcon
Continuous Bag of Words
(CBoW) (Method) or Skip-
gram (Opposite of CBoW)
Continuous Bag of Words - Taking a
continuous bag of words with no context
utilize a context window of n size n-
gram) to ascertain words which are
similar or related using Euclidean
distances to create vector models and
word embeddings
59. #pubcon
Most language modellers are uni-directional
Source Text
Writin
g
a lis
t
of rando
m
sentence
s
is harde
r
than I Initiall
y
though
t
it woul
d
be
Writin
g
a lis
t
of rando
m
sentence
s
is harde
r
than I Initiall
y
though
t
it woul
d
be
Writin
g
a lis
t
of rando
m
sentence
s
is harde
r
than I Initiall
y
though
t
it woul
d
be
Writin
g
a lis
t
of rando
m
sentence
s
is harde
r
than I Initiall
y
though
t
it woul
d
be
They can traverse over the word’s context window from only left to right or
right to left. Only in one direction, but not both at the same time
60. #pubcon
They can only look at
words in the context
window before and not
the words in the rest of
the sentence. Nor
sentence to follow next
65. #pubcon
NER Example
• E.g.
Sentence: “Taylor Swift will launch her new album in Apple Music.”
• NER result:“Taylor[B-PER] Swift[I-PER] will[O] launch[O] her[O]
new[O] album[O] in[O] Apple[B-ORG] Music[I-ORG].[O]”
• PS:
[O] means no meaning
[B-PER]/[I-PER] means person name
[B-ORG]/[I-ORG] means organization name
Source: https://medium.com/@yingbiao/ner-with-bert-in-action-
936ff275bc73
71. #pubcon
BERT is different. BERT uses bi-directional
language modelling. The FIRST to do this
Source Text
Writin
g
a lis
t
of rando
m
sentence
s
is harde
r
than I Initiall
y
though
t
it woul
d
be
Writin
g
a lis
t
of rando
m
sentence
s
is harde
r
than I Initiall
y
though
t
it woul
d
be
Writin
g
a lis
t
of rando
m
sentence
s
is harde
r
than I Initiall
y
though
t
it woul
d
be
Writin
g
a lis
t
of rando
m
sentence
s
is harde
r
than I Initiall
y
though
t
it woul
d
be
Bert can see both the left and the right hand side of the target word
76. #pubcon
BERT can see the WHOLE
sentence on either side
of a word (contextual
language modelling) and
all of the words almost
at once
77. #pubcon
BERT has been pre-trained on a
lot of words … on the whole of
the English Wikipedia (2,500
million words)
78. #pubcon
Previously Uni-Directional
Previously all language
models were uni-
directional so could
only move the context
window in one
directional
A moving window of ‘n’
words (either left or
right of a target word)
to understand word’s
context
79. #pubcon
Google BERT Paper
• Devlin, J., Chang,
M.W., Lee, K. and
Toutanova, K., 2018.
Bert: Pre-training of
deep bidirectional
transformers for
language
understanding. arXiv
preprint
arXiv:1810.04805.
97. #pubcon
MS MARCO: A Human Generated
MAchine Reading Comprehension
Dataset
• Rajpurkar, P., Zhang, J.,
Lopyrev, K. and Liang, P.,
2016. Squad: 100,000+
questions for machine
comprehension of text. arXiv
preprint arXiv:1606.05250.
107. #pubcon
Algorithmic
Bias
Concerns
Ricardo Baeza-Yates' work - Bias on the Web
NoBIAS Project
IBM initiatives to prevent bias
BERT does not know why it makes decisions
BERT is considered a ‘black box algorithm’
Programmatic bias is a concern
Algorithmic justice league is active
111. #pubcon
References
• Rajpurkar, P., Zhang, J., Lopyrev, K. and
Liang, P., 2016. Squad: 100,000+ questions
for machine comprehension of text. arXiv
preprint arXiv:1606.05250.
• Vaswani, A., Shazeer, N., Parmar, N.,
Uszkoreit, J., Jones, L., Gomez, A.N.,
Kaiser, Ł. and Polosukhin, I., 2017.
Attention is all you need. In Advances in
neural information processing
systems (pp. 5998-6008).