Humans in the loop: AI in open source and industry

Humans
in
the
loop

AI
in
open
source
and
industry
Paco
Nathan
@pacoid

Dir,
Learning
Group
@
O’Reilly
Media

#NikeTechTalks

Portland
2017-‐08-‐10

Research
questions:
▪ How
do
we
personalize
learning
experiences,
across
 
ebooks,
videos,
conferences,
computable
content,
live

online
courses,
case
studies,
expert
AMAs,
etc.

▪ How
do
we
help
experts
—
by
definition,
really
busy

people
—
share
knowledge
with
their
peers
in
industry?

▪ How
do
we
manage
the
role
of
editors
at
human
scale,
 
while
technology
and
delivery
media
evolve
rapidly?

▪ How
do
we
help
organizations
learn
and
transform

continuously?

▪ Can
we
accomplish
these
goals
by
leveraging
AI
in
Media?
3

5
UX
for
content
discovery:

• partly
generated
+
curated
by
humans

• partly
generated
+
curated
by
AI
apps

AI
is
real,
but
why
now?
▪ Big
Data:
machine
data
(1997-‐ish)

▪ Big
Compute:
cloud
computing
(2006-‐ish)

▪ Big
Models:
deep
learning
(2009-‐ish)

The
confluence
of
factors
created
a
business
 
environment
where
AI
could
become
mainstream

AR/VR
combined
with
embedded
computing
and

reinforcement
learning
may
bring
it
to
a
next
level
7

Benchmark:
achieving
human
parity
2016-‐10-‐12:
Microsoft
researchers
reach
human
 
parity
in
conversational
speech
recognition

Achieving
Human
Parity
in
Conversational
Speech

Recognition 
W.
Xiong,
et
al.

Microsoft

8

Big
picture
▪ The
current
state
of
machine
intelligence
3.0 
Shivon
Zilis,
James
Cham

Bloomberg
Beta
(annual
landscape)

▪ The
Future
of
Machine
Intelligence 
David
Beyer

Amplify
Partners
(report)

▪ Artificial
Intelligence:
Teaching
Machines
to
Think
Like
People 
Jack
Clark

Open
AI
(report)

▪ The
AI
Conf 
O’Reilly
Media
and
Intel
partnership
(industry
conference)
9

“Consider
the
shift
from
steam
to
electric
power: 

it
took
a
generation
before
factory
managers 

understood
they
could
reconfigure
the
physical 

arrangement.

AI
may
be
quicker
adoption,
but
faces
similar 

extremes
of
cognitive
embrace.”

–
David
Beyer

Amplify
Partners
11

Immediate
impact
of
AI
12
personal
op-‐ed:
the
combination
of
advances
with

UX,
DevOps,
AI
together
–
specifically
–
is
taking
off

the
table
some
previous
needs
for
what
we’d
called

“software
engineering”
–
which
must
now
undergo

major
changes

2017
highlights
from
leading
teams
▪ TensorFlow:
Machine
learning
for
everyone 
Rajat
Monga

Google

▪ Distributed
deep
learning
on
AWS
using
MXNet 
Anima
Anandkumar

Amazon

▪ Squeezing
deep
learning
onto
mobile
phones 
Anirudh
Koul

Microsoft
14

15
Artificial
intelligence
in
the
software
engineering
workflow 
Peter
Norvig

Google

16
Can
machines
spot
diseases
faster
than
expert
humans? 
Suchi
Saria

Johns
Hopkins
U

17
Cars
that
coordinate
with
people 
Anca
Dragan

UC
Berkeley

18
Strategies
for
integrating
people
and
machine
learning
in

online
systems 
Jason
Laska

Clara
Labs

19
AI
for
manufacturing:
Today
and
tomorrow 
David
Rogers

Sight
Machine

20
Harnessing
the
power
of
artificial
intelligence
to
diagnose

diseases 
Kavya
Kopparapu

GirlsComputingLeague

22
Current
themes
among
leading
AI
teams:

▪ scale
up
to
solve
complex
problems
(big
models)

▪ optimize
to
deploy
consumer
products
(low
power)
Trending
strategy…

23
Most
popular
content,
among
thousands
 
of
enterprise
organizations:

Hands-‐On
Machine
Learning
with
scikit-‐learn

and
TensorFlow 
Aurélien
Géron

Python
FTW.

Along
with
Keras,
PyTorch,
Caffe,
etc.
Trending
methods…

UC
Berkeley
RISELab
24
▪ https://rise.cs.berkeley.edu/

▪ enable
machines
to
take
rapid,
intelligent

actions
based
on
real-‐time
data
and
context

from
the
world
around
them

▪ shift
away
from
prior
emphasis
on
JVM-‐based

frameworks
during
AMPLab
period
(Spark)

▪ major
focus
on
reinforcement
learning

Ray:
a
distributed
execution
framework

for
emerging
AI
applications

Increasing
role
of
the
hardware
interface
25
▪ earlier
generations
of
virtualization
abstracted
away
 
hardware;
however,
containers
allow
direct
access

▪ with
DL,
application
software
must
access
the
latest
 
hardware
features
directly
–
to
be
competitive

▪ vendors
anticipate
adv.
math
needs
for
low-‐level
hardware,

looking
beyond
DL
–
e.g.,
multi-‐linear
algebra
libraries

▪ Scaling
machine
learning
(O’Reilly
Data
Show,
21:43) 
Reza
Zadeh

Stanford
/
Matroid

Emerging
themes:
transfer
learning
▪ transfer
learning:
when
you
can
solve
a
task
well,
 
transfer
understanding
to
solve
related
problems

▪ remove
final
classification
layer,
then
extract
 
next-‐to-‐last
layer
of
a
CNN: 
tensorflow.org/tutorials/image_recognition

▪ leverage
a
network
pre-‐trained
on
a
large
dataset: 
blog.keras.io/building-‐powerful-‐image-‐classification-‐
models-‐using-‐very-‐little-‐data.html
26

Emerging
themes:
GANs
▪ generative
adversarial
networks:
neural
networks
 
compete
against
each
other
in
a
zero-‐sum
game

▪ example:
CycleGAN

(see
AI
NY
2017)

27

“Generative
Adversarial
Networks
for
Beginners”
28

LSTM
used
to
generate
content
29
Long
short-‐term
memory
(LSTM)
allows
recurrent

neural
networks
to
learn
sequences
of
data,
such
 
as
in
streams
of
voice
or
text.

Imagine
feeding
scripts
(semi-‐structured
data)
from
 
a
film
genre
through
an
LSTM,
then
generating
new

output…

LSTM
used
to
generate
content
30
http://benjamin.wtf/
Sunspring
It’s
No
Game

LSTM
in
music
composition
/
performance
31
https://github.com/IraKorshunova/folk-‐rnn

Even
romance
novels…
32

How
do
people
learn?
33

34
White
paper:
"How
do
you
learn?"

Peer
Teaching
through
a
range
of
Media
▪ books,
videos

▪ live
online
courses

▪ conferences

▪ AMAs

▪ computable
content

▪ case
studies

▪ articles

▪ podcast
interviews

▪ chat
forums
35

Example:
"Learn
alongside
innovators,
thought-‐by-‐thought"

Example:
"How
great
companies
make
change
happen"

Example:
"Why
self
assessments
improve
learning"

Key
insight
for
AI
in
Media:
▪ any
content
which
can
represented
 
as
text
can
be
parsed
by
NLP,
then

manipulated
by
available
AI
tooling

▪ labeled
images
get
really
interesting

▪ text
or
images
within
a
context
have
 
inherent
structure

▪ representation
of
that
kind
of
structure

is
rare
in
the
Media
vertical
–
so
far
39

Beyond
deep
learning…
40

Ontology
▪ provides
context
which
Deep
Learning
lacks

▪ aka,
“knowledge
graph”
–
a
computable
thesaurus

▪ maps
the
semantics
of
business
relationships

▪ S/V/O:
“nouns”,
some
“verbs”,
a
few
“adjectives”

▪ conversational
interfaces
(e.g.,
Google
Assistant)

improve
UX
by
importing
ontologies

▪ the
hard
part,
a
relatively
expensive
investment
41

Which
parts
do
people
or
machines
do
best?
42
team
goal:
maintain
structural
correspondence
between
the
layers

big
win
for
AI:
inferences
across
the
graph
human
scale

primary
structure

control
points

testability
machine
generated
data
products

~80%
of
the
graph

Open
source
tooling
44

Components
45
▪ rdflib
+
NetworkX:
ontology
graph
represented
as
N3
“turtle”

▪ PyTextRank:
NLP
parsing,
feature
vectors,
summarization

▪ Jupyter
+
nbtransom:
human-‐in-‐the-‐loop
ML
pipelines

▪ Apache
Spark:
sort,
partitioning,
task
management

▪ scikit-‐learn:
machine
learning
models

▪ gensim:
vector
embedding
/
deep
learning

▪ datasketch:
approximation
algorithms

▪ Flask,
React,
Node.js:
microservices,
UI
web
components

▪ Redis:
in-‐memory
indexing,
full-‐text
search

PyTextRank
46
TextRank
(R
Mihalcea,
P
Tarau,
2004)
a
graph
algorithm

that
extracts
key
phrases
and
summarizes
texts
–
for
NLP

which
is
improved
over
use
of
keywords,
n-‐grams,
etc.

▪ construct
a
graph
from
a
paragraph
of
text

▪ run
PageRank
on
that
graph

▪ extract
the
highly
ranked
phrases

Python
implementation
atop
spaCy,
NetworkX,
datasketch:

▪ https://pypi.python.org/pypi/pytextrank/

Working
with
text
and
NLP
48
▪ parsing

▪ named
entity
recognition

▪ vector
embedding

▪ smarter
indexing

▪ summarization
(especially
video)

▪ semantic
similarity
to
suggest
curriculum

▪ speed
development
of
assessments

▪ query
expansion

▪ amending
ontology

A
plug
for
InnerSource…
49
We
thought
the
introduction
of
data
science
had
run

headlong
into
enterprise
silos
and
lingering
tech
debt.
 
As
if!!

Introduction
of
AI
exacerbates
that
problem
even

more.
Suggested
responses:

▪ InnerSourceCommons.org
open
source
practices
 
within
enterprise

▪ design
patterns
for
working
across
silos

▪ think:
“good
house
rules
for
guests”
as
other
 
teams
submit
PRs
on
your
code
repos

A
generational
shift?
▪ We’re
12
years
beyond
the
introduction

of
YouTube
…
anyone
raising
tweens
 
now
probably
knows
about
YouTubers

▪ Below
a
certain
age
demographic,
people

tend
to
rely
more
on
video
and
audio

sources
for
information,
while
perhaps

print
is
gaining
more
for
entertainment.

Mobile
certainly
has
huge
impact
there.
51

{"graf": [[21, "let", "let", "VB", 1, 48], [0, "'s", "'s", "PRP", 0, 49],
"take", "take", "VB", 1, 50], [0, "a", "a", "DT", 0, 51], [23, "look", "l
"NN", 1, 52], [0, "at", "at", "IN", 0, 53], [0, "a", "a", "DT", 0, 54], [
"few", "few", "JJ", 1, 55], [25, "examples", "example", "NNS", 1, 56], [0
"often", "often", "RB", 0, 57], [0, "when", "when", "WRB", 0, 58], [11,
"people", "people", "NNS", 1, 59], [2, "are", "be", "VBP", 1, 60], [26, "
"first", "JJ", 1, 61], [27, "learning", "learn", "VBG", 1, 62], [0, "abou
"about", "IN", 0, 63], [28, "Docker", "docker", "NNP", 1, 64], [0, "they"
"they", "PRP", 0, 65], [29, "try", "try", "VBP", 1, 66], [0, "and", "and"
0, 67], [30, "put", "put", "VBP", 1, 68], [0, "it", "it", "PRP", 0, 69],
"in", "in", "IN", 0, 70], [0, "one", "one", "CD", 0, 71], [0, "of", "of",
0, 72], [0, "a", "a", "DT", 0, 73], [24, "few", "few", "JJ", 1, 74], [31,
"existing", "existing", "JJ", 1, 75], [18, "categories", "category", "NNS
76], [0, "sometimes", "sometimes", "RB", 0, 77], [11, "people", "people",
1, 78], [9, "think", "think", "VBP", 1, 79], [0, "it", "it", "PRP", 0, 80
"'s", "be", "VBZ", 1, 81], [0, "a", "a", "DT", 0, 82], [32, "virtualizati
"virtualization", "NN", 1, 83], [19, "tool", "tool", "NN", 1, 84], [0, "l
"like", "IN", 0, 85], [33, "VMware", "vmware", "NNP", 1, 86], [0, "or", "
"CC", 0, 87], [34, "virtualbox", "virtualbox", "NNP", 1, 88], [0, "also",
"also", "RB", 0, 89], [35, "known", "know", "VBN", 1, 90], [0, "as", "as"
0, 91], [0, "a", "a", "DT", 0, 92], [36, "hypervisor", "hypervisor", "NN"
93], [0, "these", "these", "DT", 0, 94], [2, "are", "be", "VBP", 1, 95],
"tools", "tool", "NNS", 1, 96], [0, "which", "which", "WDT", 0, 97], [2,
"be", "VBP", 1, 98], [37, "emulating", "emulate", "VBG", 1, 99], [38,
"hardware", "hardware", "NN", 1, 100], [0, "for", "for", "IN", 0, 101], [
"virtual", "virtual", "JJ", 1, 102], [40, "software", "software", "NN", 1
103]], "id": "001.video197359", "sha1":
"4b69cf60f0497887e3776619b922514f2e5b70a8"}
Video
transcription
52
{"count": 2, "ids": [32, 19], "pos": "np", "rank": 0.0194, "text": "virtualization tool"}
{"count": 2, "ids": [40, 69], "pos": "np", "rank": 0.0117, "text": "software applications"}
{"count": 4, "ids": [38], "pos": "np", "rank": 0.0114, "text": "hardware"}
{"count": 2, "ids": [33, 36], "pos": "np", "rank": 0.0099, "text": "vmware hypervisor"}
{"count": 4, "ids": [28], "pos": "np", "rank": 0.0096, "text": "docker"}
{"count": 4, "ids": [34], "pos": "np", "rank": 0.0094, "text": "virtualbox"}
{"count": 10, "ids": [11], "pos": "np", "rank": 0.0049, "text": "people"}
{"count": 4, "ids": [37], "pos": "vbg", "rank": 0.0026, "text": "emulating"}
{"count": 2, "ids": [27], "pos": "vbg", "rank": 0.0016, "text": "learning"}
Transcript: let's take a look at a few examples often when
people are first learning about Docker they try and put it in
one of a few existing categories sometimes people think it's
a virtualization tool like VMware or virtualbox also known as
a hypervisor these are tools which are emulating hardware for
virtual software
Confidence: 0.973419129848
39 KUBERNETES
0.8747 coreos
0.8624 etcd
0.8478 DOCKER CONTAINERS
0.8458 mesos
0.8406 DOCKER
0.8354 DOCKER CONTAINER
0.8260 KUBERNETES CLUSTER
0.8258 docker image
0.8252 EC2
0.8210 docker hub
0.8138 OPENSTACK
orm:Docker a orm:Vendor;
a orm:Container;
a orm:Open_Source;
a orm:Commercial_Software;
owl:sameAs dbr:Docker_%28software%29;
skos:prefLabel "Docker"@en;

Fave
mediated
learning
experience:
audio+
53

How
will
a
next
generation
learn?

Humans
in
the
loop
55

Active
learning
▪ special
case
of
semi-‐supervised
machine
learning

▪ send
difficult
calls
/
edge
cases
to
experts;
 
let
algorithms
handle
routine
decisions

▪ works
well
in
use
cases
which
have
lots
of
 
inexpensive,
unlabeled
data

▪ e.g.,
abundance
of
content
to
be
classified,
 
where
the
cost
of
labeling
is
the
expense

▪ https://en.wikipedia.org/wiki/
Active_learning_(machine_learning)
56

Active
learning
Data
preparation
in
the
age
of
deep
learning 
oreilly.com/ideas/data-‐preparation-‐in-‐the-‐
age-‐of-‐deep-‐learning 
Luke
Biewald

CrowdFlower 
O’Reilly
Data
Show,
2017-‐05-‐04

send
human
workers
cases
where
machine
learning

algorithms
signal
uncertainty
(low
probability
scores)

or
when
your
ensemble
of
machine
learning

algorithms
signals
disagreement
58

Human-‐in-‐the-‐loop
design
pattern

Building
a
business
that
combines
human
experts

and
data
science 
oreilly.com/ideas/building-‐a-‐business-‐that-‐
combines-‐human-‐experts-‐and-‐data-‐science-‐2 
Eric
Colson

StitchFix 
O’Reilly
Data
Show,
2016-‐01-‐28

“what
machines
can’t
do
are
things
around
cognition, 

things
that
have
to
do
with
ambient
information,
or 

appreciation
of
aesthetics,
or
even
the
ability
to 

relate
to
another
human” 
 
59

Weak
supervision
Creating
large
training
data
sets
quickly 
oreilly.com/ideas/creating-‐large-‐
training-‐data-‐sets-‐quickly 
Alex
Ratner

Stanford 
O’Reilly
Data
Show,
2017-‐06-‐08

Snorkel:
“data
programming”
as
another
 
instance
of
human-‐in-‐the-‐loop 
github.com/HazyResearch/snorkel

conferences.oreilly.com/strata/strata-‐ny/
public/schedule/detail/61849
60

Collaboration
through
Jupyter
61
Notebooks
get
used
to
manage
ML
pipelines,

where
machines
+
people
collaborate
on
docs

▪ “Human-‐in-‐the-‐loop
design
pattern” 
talk
@
JupyterCon
NY
2017

▪ experts
adjust
parameters
in
ML
pipelines

▪ machines
write
structured
“logs”
of
ML

modeling
and
evaluation

▪ experts
run
`jupyter
notebook`
via
SSH
tunnel
 
for
remote
monitoring
and
updates

▪ https://pypi.python.org/pypi/nbtransom

Collaboration
through
Jupyter
62
ML#Pipelines
Jupyter#kernel
Browser
SSH#tunnel

Collaboration
through
Jupyter
▪ running
notebooks
via
SSH

tunnel
removes
the
need
for

dedicated
UIs

▪ this
work
anticipates
upcoming

collaborative
document
features

in
JupyterLab:

Realtime
collaboration
for

JupyterLab
using
Google
Drive 
Ian
Rose

UC
Berkeley

Expert
review
▪ ML
pipelines
report
results:
recognizing
content,
adding
annotations,

requesting
more
examples
when
“confused”

▪ Human-‐in-‐the-‐loop
experts
–
potentially,
Customer
Service
–
 
review
decisions,
especially
edge
cases,
then
train
through
examples

▪ The
system
iterates
64

What’s
the
point
of
using
AI
in
Media?
▪ more
work,
quicker,
than
could
be
performed
 
by
editors
–
who
are
already
super-‐busy
people

▪ exceeding
human
parity,
as
a
benchmark

▪ helps
relieve
pressure
on
organizations,
 
as
learning
curves
accelerate

▪ augments
some
of
our
most
valuable 
experts,
so
they
can
get
more
done
65

Human-‐in-‐the-‐loop
as
a
management
strategy
66
personal
op-‐ed:
the
“game”
isn’t
to
replace
people
–

instead
it’s
about
leveraging
AI
to
augment
staff,
 
so
organizations
can
retain
people
with
valuable

domain
expertise,
making
their
contributions
and

expertise
even
more
vital

Why
we’ll
never
run
out
of
jobs
67

Strata
Data

NY,
Sep
25-‐28 
SG,
Dec
4-‐7 
SJ,
Mar
5-‐8,
2018 
UK,
May
21-‐24,
2018

The
AI
Conf

SF,
Sep
17-‐20 
NY,
Apr
29-‐May
2,
2018

JupyterCon

NY,
Aug
22-‐25

OSCON
(returns!)

PDX,
Jul
16-‐19,
2018
68

69
contact
the
speaker
for
 
conf
discount
coupons

70
Learn
Alongside 
Innovators
Just
Enough
Math Building
Data

Science
Teams
Hylbert-‐Speys How
Do
You
Learn?
updates,
reviews,
conference
summaries…

liber118.com/pxn/ 
@pacoid

Humans in the loop: AI in open source and industry

Humans in the loop: AI in open source and industry

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Humans in the loop: AI in open source and industry

Similar to Humans in the loop: AI in open source and industry (20)

More from Paco Nathan

More from Paco Nathan (20)

Recently uploaded

Recently uploaded (20)

Humans in the loop: AI in open source and industry