SlideShare a Scribd company logo
1 of 71
Download to read offline
Humans	
  in	
  the	
  loop	
  
AI	
  in	
  open	
  source	
  and	
  industry
Paco	
  Nathan	
  @pacoid	
  
Dir,	
  Learning	
  Group	
  @	
  O’Reilly	
  Media	
  
#NikeTechTalks	
  	
  Portland	
  2017-­‐08-­‐10
2
Research	
  questions:
▪ How	
  do	
  we	
  personalize	
  learning	
  experiences,	
  across	
  

ebooks,	
  videos,	
  conferences,	
  computable	
  content,	
  live	
  
online	
  courses,	
  case	
  studies,	
  expert	
  AMAs,	
  etc.	
  
▪ How	
  do	
  we	
  help	
  experts	
  —	
  by	
  definition,	
  really	
  busy	
  
people	
  —	
  share	
  knowledge	
  with	
  their	
  peers	
  in	
  industry?	
  
▪ How	
  do	
  we	
  manage	
  the	
  role	
  of	
  editors	
  at	
  human	
  scale,	
  

while	
  technology	
  and	
  delivery	
  media	
  evolve	
  rapidly?	
  
▪ How	
  do	
  we	
  help	
  organizations	
  learn	
  and	
  transform	
  
continuously?	
  
▪ Can	
  we	
  accomplish	
  these	
  goals	
  by	
  leveraging	
  AI	
  in	
  Media?
3
4
5
UX	
  for	
  content	
  discovery:	
  
• partly	
  generated	
  +	
  curated	
  by	
  humans	
  
• partly	
  generated	
  +	
  curated	
  by	
  AI	
  apps
AI:	
  why	
  now?
6
AI	
  is	
  real,	
  but	
  why	
  now?
▪ Big	
  Data:	
  machine	
  data	
  (1997-­‐ish)	
  
▪ Big	
  Compute:	
  cloud	
  computing	
  (2006-­‐ish)	
  
▪ Big	
  Models:	
  deep	
  learning	
  (2009-­‐ish)	
  
The	
  confluence	
  of	
  factors	
  created	
  a	
  business	
  

environment	
  where	
  AI	
  could	
  become	
  mainstream	
  
AR/VR	
  combined	
  with	
  embedded	
  computing	
  and	
  
reinforcement	
  learning	
  may	
  bring	
  it	
  to	
  a	
  next	
  level
7
Benchmark:	
  achieving	
  human	
  parity
2016-­‐10-­‐12:	
  Microsoft	
  researchers	
  reach	
  human	
  

parity	
  in	
  conversational	
  speech	
  recognition	
  
Achieving	
  Human	
  Parity	
  in	
  Conversational	
  Speech	
  
Recognition

W.	
  Xiong,	
  et	
  al.	
  	
  Microsoft	
  
8
Big	
  picture
▪ The	
  current	
  state	
  of	
  machine	
  intelligence	
  3.0

Shivon	
  Zilis,	
  James	
  Cham	
  	
  Bloomberg	
  Beta	
  (annual	
  landscape)	
  
▪ The	
  Future	
  of	
  Machine	
  Intelligence

David	
  Beyer	
  	
  Amplify	
  Partners	
  (report)	
  
▪ Artificial	
  Intelligence:	
  Teaching	
  Machines	
  to	
  Think	
  Like	
  People

Jack	
  Clark	
  	
  Open	
  AI	
  (report)	
  
▪ The	
  AI	
  Conf

O’Reilly	
  Media	
  and	
  Intel	
  partnership	
  (industry	
  conference)
9
10
“Consider	
  the	
  shift	
  from	
  steam	
  to	
  electric	
  power:

	
  	
  it	
  took	
  a	
  generation	
  before	
  factory	
  managers

	
  	
  understood	
  they	
  could	
  reconfigure	
  the	
  physical

	
  	
  arrangement.	
  
	
  	
  AI	
  may	
  be	
  quicker	
  adoption,	
  but	
  faces	
  similar

	
  	
  extremes	
  of	
  cognitive	
  embrace.”	
  
	
  	
  	
  	
  	
  	
  –	
  David	
  Beyer	
  	
  Amplify	
  Partners
11
Immediate	
  impact	
  of	
  AI
12
personal	
  op-­‐ed:	
  the	
  combination	
  of	
  advances	
  with	
  
UX,	
  DevOps,	
  AI	
  together	
  –	
  specifically	
  –	
  is	
  taking	
  off	
  
the	
  table	
  some	
  previous	
  needs	
  for	
  what	
  we’d	
  called	
  
“software	
  engineering”	
  –	
  which	
  must	
  now	
  undergo	
  
major	
  changes
The	
  leaderboard
13
2017	
  highlights	
  from	
  leading	
  teams
▪ TensorFlow:	
  Machine	
  learning	
  for	
  everyone

Rajat	
  Monga	
  	
  Google	
  
▪ Distributed	
  deep	
  learning	
  on	
  AWS	
  using	
  MXNet

Anima	
  Anandkumar	
  	
  Amazon	
  
▪ Squeezing	
  deep	
  learning	
  onto	
  mobile	
  phones

Anirudh	
  Koul	
  	
  Microsoft
14
15
Artificial	
  intelligence	
  in	
  the	
  software	
  engineering	
  workflow

Peter	
  Norvig	
  	
  Google	
  
16
Can	
  machines	
  spot	
  diseases	
  faster	
  than	
  expert	
  humans?

Suchi	
  Saria	
  	
  Johns	
  Hopkins	
  U
17
Cars	
  that	
  coordinate	
  with	
  people

Anca	
  Dragan	
  	
  UC	
  Berkeley
18
Strategies	
  for	
  integrating	
  people	
  and	
  machine	
  learning	
  in	
  
online	
  systems

Jason	
  Laska	
  	
  Clara	
  Labs	
  
19
AI	
  for	
  manufacturing:	
  Today	
  and	
  tomorrow

David	
  Rogers	
  	
  Sight	
  Machine
20
Harnessing	
  the	
  power	
  of	
  artificial	
  intelligence	
  to	
  diagnose	
  
diseases

Kavya	
  Kopparapu	
  	
  GirlsComputingLeague
Now	
  trending
21
22
Current	
  themes	
  among	
  leading	
  AI	
  teams:	
  
▪ scale	
  up	
  to	
  solve	
  complex	
  problems	
  (big	
  models)	
  	
  
▪ optimize	
  to	
  deploy	
  consumer	
  products	
  (low	
  power)
Trending	
  strategy…
23
Most	
  popular	
  content,	
  among	
  thousands	
  

of	
  enterprise	
  organizations:	
  
Hands-­‐On	
  Machine	
  Learning	
  with	
  scikit-­‐learn	
  
and	
  TensorFlow

Aurélien	
  Géron	
  
Python	
  FTW.	
  
Along	
  with	
  Keras,	
  PyTorch,	
  Caffe,	
  etc.
Trending	
  methods…
UC	
  Berkeley	
  RISELab
24
▪ https://rise.cs.berkeley.edu/	
  	
  
▪ enable	
  machines	
  to	
  take	
  rapid,	
  intelligent	
  
actions	
  based	
  on	
  real-­‐time	
  data	
  and	
  context	
  
from	
  the	
  world	
  around	
  them	
  
▪ shift	
  away	
  from	
  prior	
  emphasis	
  on	
  JVM-­‐based	
  
frameworks	
  during	
  AMPLab	
  period	
  (Spark)	
  
▪ major	
  focus	
  on	
  reinforcement	
  learning	
  
Ray:	
  a	
  distributed	
  execution	
  framework	
  
for	
  emerging	
  AI	
  applications
Increasing	
  role	
  of	
  the	
  hardware	
  interface
25
▪ earlier	
  generations	
  of	
  virtualization	
  abstracted	
  away	
  

hardware;	
  however,	
  containers	
  allow	
  direct	
  access	
  
▪ with	
  DL,	
  application	
  software	
  must	
  access	
  the	
  latest	
  

hardware	
  features	
  directly	
  –	
  to	
  be	
  competitive	
  
▪ vendors	
  anticipate	
  adv.	
  math	
  needs	
  for	
  low-­‐level	
  hardware,	
  
looking	
  beyond	
  DL	
  –	
  e.g.,	
  multi-­‐linear	
  algebra	
  libraries	
  	
  
▪ Scaling	
  machine	
  learning	
  (O’Reilly	
  Data	
  Show,	
  21:43)

Reza	
  Zadeh	
  	
  Stanford	
  /	
  Matroid
Emerging	
  themes:	
  transfer	
  learning
▪ transfer	
  learning:	
  when	
  you	
  can	
  solve	
  a	
  task	
  well,	
  

transfer	
  understanding	
  to	
  solve	
  related	
  problems	
  
▪ remove	
  final	
  classification	
  layer,	
  then	
  extract	
  

next-­‐to-­‐last	
  layer	
  of	
  a	
  CNN:

tensorflow.org/tutorials/image_recognition	
  
▪ leverage	
  a	
  network	
  pre-­‐trained	
  on	
  a	
  large	
  dataset:

blog.keras.io/building-­‐powerful-­‐image-­‐classification-­‐
models-­‐using-­‐very-­‐little-­‐data.html
26
Emerging	
  themes:	
  GANs
▪ generative	
  adversarial	
  networks:	
  neural	
  networks	
  

compete	
  against	
  each	
  other	
  in	
  a	
  zero-­‐sum	
  game	
  
▪ example:	
  CycleGAN	
  	
  (see	
  AI	
  NY	
  2017)	
  
27
“Generative	
  Adversarial	
  Networks	
  for	
  Beginners”
28
LSTM	
  used	
  to	
  generate	
  content
29
Long	
  short-­‐term	
  memory	
  (LSTM)	
  allows	
  recurrent	
  
neural	
  networks	
  to	
  learn	
  sequences	
  of	
  data,	
  such	
  

as	
  in	
  streams	
  of	
  voice	
  or	
  text.	
  
Imagine	
  feeding	
  scripts	
  (semi-­‐structured	
  data)	
  from	
  

a	
  film	
  genre	
  through	
  an	
  LSTM,	
  then	
  generating	
  new	
  
output…
LSTM	
  used	
  to	
  generate	
  content
30
http://benjamin.wtf/
Sunspring
It’s	
  No	
  Game
LSTM	
  in	
  music	
  composition	
  /	
  performance
31
https://github.com/IraKorshunova/folk-­‐rnn
Even	
  romance	
  novels…
32
How	
  do	
  people	
  learn?
33
34
White	
  paper:	
  "How	
  do	
  you	
  learn?"
Peer	
  Teaching	
  through	
  a	
  range	
  of	
  Media
▪ books,	
  videos	
  
▪ live	
  online	
  courses	
  
▪ conferences	
  
▪ AMAs	
  
▪ computable	
  content	
  
▪ case	
  studies	
  
▪ articles	
  
▪ podcast	
  interviews	
  
▪ chat	
  forums
35
Example:	
  "Learn	
  alongside	
  innovators,	
  thought-­‐by-­‐thought"
Example:	
  "How	
  great	
  companies	
  make	
  change	
  happen"
Example:	
  "Why	
  self	
  assessments	
  improve	
  learning"
Key	
  insight	
  for	
  AI	
  in	
  Media:
▪ any	
  content	
  which	
  can	
  represented	
  

as	
  text	
  can	
  be	
  parsed	
  by	
  NLP,	
  then	
  
manipulated	
  by	
  available	
  AI	
  tooling	
  	
  
▪ labeled	
  images	
  get	
  really	
  interesting	
  
▪ text	
  or	
  images	
  within	
  a	
  context	
  have	
  

inherent	
  structure	
  
▪ representation	
  of	
  that	
  kind	
  of	
  structure	
  
is	
  rare	
  in	
  the	
  Media	
  vertical	
  –	
  so	
  far
39
Beyond	
  deep	
  learning…
40
Ontology
▪ provides	
  context	
  which	
  Deep	
  Learning	
  lacks	
  
▪ aka,	
  “knowledge	
  graph”	
  –	
  a	
  computable	
  thesaurus	
  
▪ maps	
  the	
  semantics	
  of	
  business	
  relationships	
  
▪ S/V/O:	
  “nouns”,	
  some	
  “verbs”,	
  a	
  few	
  “adjectives”	
  
▪ conversational	
  interfaces	
  (e.g.,	
  Google	
  Assistant)	
  
improve	
  UX	
  by	
  importing	
  ontologies	
  
▪ the	
  hard	
  part,	
  a	
  relatively	
  expensive	
  investment
41
Which	
  parts	
  do	
  people	
  or	
  machines	
  do	
  best?
42
team	
  goal:	
  maintain	
  structural	
  correspondence	
  between	
  the	
  layers	
  
big	
  win	
  for	
  AI:	
  inferences	
  across	
  the	
  graph
human	
  scale	
  
primary	
  structure	
  
control	
  points	
  
testability
machine	
  generated	
  data	
  products	
  
~80%	
  of	
  the	
  graph
Ontology
43
Open	
  source	
  tooling
44
Components
45
▪ rdflib	
  +	
  NetworkX:	
  ontology	
  graph	
  represented	
  as	
  N3	
  “turtle”	
  
▪ PyTextRank:	
  NLP	
  parsing,	
  feature	
  vectors,	
  summarization	
  
▪ Jupyter	
  +	
  nbtransom:	
  human-­‐in-­‐the-­‐loop	
  ML	
  pipelines	
  
▪ Apache	
  Spark:	
  sort,	
  partitioning,	
  task	
  management	
  
▪ scikit-­‐learn:	
  machine	
  learning	
  models	
  
▪ gensim:	
  vector	
  embedding	
  /	
  deep	
  learning	
  
▪ datasketch:	
  approximation	
  algorithms	
  
▪ Flask,	
  React,	
  Node.js:	
  microservices,	
  UI	
  web	
  components	
  
▪ Redis:	
  in-­‐memory	
  indexing,	
  full-­‐text	
  search

PyTextRank
46
TextRank	
  (R	
  Mihalcea,	
  P	
  Tarau,	
  2004)	
  a	
  graph	
  algorithm	
  
that	
  extracts	
  key	
  phrases	
  and	
  summarizes	
  texts	
  –	
  for	
  NLP	
  
which	
  is	
  improved	
  over	
  use	
  of	
  keywords,	
  n-­‐grams,	
  etc.	
  
▪ construct	
  a	
  graph	
  from	
  a	
  paragraph	
  of	
  text	
  
▪ run	
  PageRank	
  on	
  that	
  graph	
  
▪ extract	
  the	
  highly	
  ranked	
  phrases	
  
Python	
  implementation	
  atop	
  spaCy,	
  NetworkX,	
  datasketch:	
  
▪ https://pypi.python.org/pypi/pytextrank/
PyTextRank
47
Working	
  with	
  text	
  and	
  NLP
48
▪ parsing	
  
▪ named	
  entity	
  recognition	
  
▪ vector	
  embedding	
  
▪ smarter	
  indexing	
  
▪ summarization	
  (especially	
  video)	
  
▪ semantic	
  similarity	
  to	
  suggest	
  curriculum	
  
▪ speed	
  development	
  of	
  assessments	
  
▪ query	
  expansion	
  
▪ amending	
  ontology
A	
  plug	
  for	
  InnerSource…
49
We	
  thought	
  the	
  introduction	
  of	
  data	
  science	
  had	
  run	
  
headlong	
  into	
  enterprise	
  silos	
  and	
  lingering	
  tech	
  debt.	
  

As	
  if!!	
  	
  
Introduction	
  of	
  AI	
  exacerbates	
  that	
  problem	
  even	
  
more.	
  Suggested	
  responses:	
  
▪ InnerSourceCommons.org	
  open	
  source	
  practices	
  

within	
  enterprise	
  
▪ design	
  patterns	
  for	
  working	
  across	
  silos	
  
▪ think:	
  “good	
  house	
  rules	
  for	
  guests”	
  as	
  other	
  

teams	
  submit	
  PRs	
  on	
  your	
  code	
  repos
Beyond	
  text…
50
A	
  generational	
  shift?
▪ We’re	
  12	
  years	
  beyond	
  the	
  introduction	
  
of	
  YouTube	
  …	
  anyone	
  raising	
  tweens	
  

now	
  probably	
  knows	
  about	
  YouTubers	
  
▪ Below	
  a	
  certain	
  age	
  demographic,	
  people	
  
tend	
  to	
  rely	
  more	
  on	
  video	
  and	
  audio	
  
sources	
  for	
  information,	
  while	
  perhaps	
  
print	
  is	
  gaining	
  more	
  for	
  entertainment.	
  
Mobile	
  certainly	
  has	
  huge	
  impact	
  there.
51
{"graf": [[21, "let", "let", "VB", 1, 48], [0, "'s", "'s", "PRP", 0, 49],
"take", "take", "VB", 1, 50], [0, "a", "a", "DT", 0, 51], [23, "look", "l
"NN", 1, 52], [0, "at", "at", "IN", 0, 53], [0, "a", "a", "DT", 0, 54], [
"few", "few", "JJ", 1, 55], [25, "examples", "example", "NNS", 1, 56], [0
"often", "often", "RB", 0, 57], [0, "when", "when", "WRB", 0, 58], [11,
"people", "people", "NNS", 1, 59], [2, "are", "be", "VBP", 1, 60], [26, "
"first", "JJ", 1, 61], [27, "learning", "learn", "VBG", 1, 62], [0, "abou
"about", "IN", 0, 63], [28, "Docker", "docker", "NNP", 1, 64], [0, "they"
"they", "PRP", 0, 65], [29, "try", "try", "VBP", 1, 66], [0, "and", "and"
0, 67], [30, "put", "put", "VBP", 1, 68], [0, "it", "it", "PRP", 0, 69],
"in", "in", "IN", 0, 70], [0, "one", "one", "CD", 0, 71], [0, "of", "of",
0, 72], [0, "a", "a", "DT", 0, 73], [24, "few", "few", "JJ", 1, 74], [31,
"existing", "existing", "JJ", 1, 75], [18, "categories", "category", "NNS
76], [0, "sometimes", "sometimes", "RB", 0, 77], [11, "people", "people",
1, 78], [9, "think", "think", "VBP", 1, 79], [0, "it", "it", "PRP", 0, 80
"'s", "be", "VBZ", 1, 81], [0, "a", "a", "DT", 0, 82], [32, "virtualizati
"virtualization", "NN", 1, 83], [19, "tool", "tool", "NN", 1, 84], [0, "l
"like", "IN", 0, 85], [33, "VMware", "vmware", "NNP", 1, 86], [0, "or", "
"CC", 0, 87], [34, "virtualbox", "virtualbox", "NNP", 1, 88], [0, "also",
"also", "RB", 0, 89], [35, "known", "know", "VBN", 1, 90], [0, "as", "as"
0, 91], [0, "a", "a", "DT", 0, 92], [36, "hypervisor", "hypervisor", "NN"
93], [0, "these", "these", "DT", 0, 94], [2, "are", "be", "VBP", 1, 95],
"tools", "tool", "NNS", 1, 96], [0, "which", "which", "WDT", 0, 97], [2,
"be", "VBP", 1, 98], [37, "emulating", "emulate", "VBG", 1, 99], [38,
"hardware", "hardware", "NN", 1, 100], [0, "for", "for", "IN", 0, 101], [
"virtual", "virtual", "JJ", 1, 102], [40, "software", "software", "NN", 1
103]], "id": "001.video197359", "sha1":
"4b69cf60f0497887e3776619b922514f2e5b70a8"}
Video	
  transcription
52
{"count": 2, "ids": [32, 19], "pos": "np", "rank": 0.0194, "text": "virtualization tool"}
{"count": 2, "ids": [40, 69], "pos": "np", "rank": 0.0117, "text": "software applications"}
{"count": 4, "ids": [38], "pos": "np", "rank": 0.0114, "text": "hardware"}
{"count": 2, "ids": [33, 36], "pos": "np", "rank": 0.0099, "text": "vmware hypervisor"}
{"count": 4, "ids": [28], "pos": "np", "rank": 0.0096, "text": "docker"}
{"count": 4, "ids": [34], "pos": "np", "rank": 0.0094, "text": "virtualbox"}
{"count": 10, "ids": [11], "pos": "np", "rank": 0.0049, "text": "people"}
{"count": 4, "ids": [37], "pos": "vbg", "rank": 0.0026, "text": "emulating"}
{"count": 2, "ids": [27], "pos": "vbg", "rank": 0.0016, "text": "learning"}
Transcript: let's take a look at a few examples often when
people are first learning about Docker they try and put it in
one of a few existing categories sometimes people think it's
a virtualization tool like VMware or virtualbox also known as
a hypervisor these are tools which are emulating hardware for
virtual software
Confidence: 0.973419129848
39 KUBERNETES
0.8747 coreos
0.8624 etcd
0.8478 DOCKER CONTAINERS
0.8458 mesos
0.8406 DOCKER
0.8354 DOCKER CONTAINER
0.8260 KUBERNETES CLUSTER
0.8258 docker image
0.8252 EC2
0.8210 docker hub
0.8138 OPENSTACK
orm:Docker a orm:Vendor;
a orm:Container;
a orm:Open_Source;
a orm:Commercial_Software;
owl:sameAs dbr:Docker_%28software%29;
skos:prefLabel "Docker"@en;
Fave	
  mediated	
  learning	
  experience:	
  audio+
53
How	
  will	
  a	
  next	
  generation	
  learn?
Humans	
  in	
  the	
  loop
55
Active	
  learning
▪ special	
  case	
  of	
  semi-­‐supervised	
  machine	
  learning	
  
▪ send	
  difficult	
  calls	
  /	
  edge	
  cases	
  to	
  experts;	
  

let	
  algorithms	
  handle	
  routine	
  decisions	
  
▪ works	
  well	
  in	
  use	
  cases	
  which	
  have	
  lots	
  of	
  

inexpensive,	
  unlabeled	
  data	
  
▪ e.g.,	
  abundance	
  of	
  content	
  to	
  be	
  classified,	
  

where	
  the	
  cost	
  of	
  labeling	
  is	
  the	
  expense	
  
▪ https://en.wikipedia.org/wiki/
Active_learning_(machine_learning)
56
Active	
  learning
Data	
  preparation	
  in	
  the	
  age	
  of	
  deep	
  learning

oreilly.com/ideas/data-­‐preparation-­‐in-­‐the-­‐
age-­‐of-­‐deep-­‐learning

Luke	
  Biewald	
  	
  CrowdFlower

O’Reilly	
  Data	
  Show,	
  2017-­‐05-­‐04	
  
send	
  human	
  workers	
  cases	
  where	
  machine	
  learning	
  
algorithms	
  signal	
  uncertainty	
  (low	
  probability	
  scores)	
  	
  
or	
  when	
  your	
  ensemble	
  of	
  machine	
  learning	
  
algorithms	
  signals	
  disagreement
58
Human-­‐in-­‐the-­‐loop	
  design	
  pattern	
  
Building	
  a	
  business	
  that	
  combines	
  human	
  experts	
  
and	
  data	
  science

oreilly.com/ideas/building-­‐a-­‐business-­‐that-­‐
combines-­‐human-­‐experts-­‐and-­‐data-­‐science-­‐2

Eric	
  Colson	
  	
  StitchFix

O’Reilly	
  Data	
  Show,	
  2016-­‐01-­‐28	
  
“what	
  machines	
  can’t	
  do	
  are	
  things	
  around	
  cognition,

	
  	
  things	
  that	
  have	
  to	
  do	
  with	
  ambient	
  information,	
  or

	
  	
  appreciation	
  of	
  aesthetics,	
  or	
  even	
  the	
  ability	
  to

	
  	
  relate	
  to	
  another	
  human”



59
Weak	
  supervision
Creating	
  large	
  training	
  data	
  sets	
  quickly

oreilly.com/ideas/creating-­‐large-­‐
training-­‐data-­‐sets-­‐quickly

Alex	
  Ratner	
  	
  Stanford

O’Reilly	
  Data	
  Show,	
  2017-­‐06-­‐08	
  
Snorkel:	
  “data	
  programming”	
  as	
  another	
  

instance	
  of	
  human-­‐in-­‐the-­‐loop

github.com/HazyResearch/snorkel	
  
conferences.oreilly.com/strata/strata-­‐ny/
public/schedule/detail/61849
60
Collaboration	
  through	
  Jupyter
61
Notebooks	
  get	
  used	
  to	
  manage	
  ML	
  pipelines,	
  
where	
  machines	
  +	
  people	
  collaborate	
  on	
  docs	
  
▪ “Human-­‐in-­‐the-­‐loop	
  design	
  pattern”

talk	
  @	
  JupyterCon	
  NY	
  2017	
  
▪ experts	
  adjust	
  parameters	
  in	
  ML	
  pipelines	
  
▪ machines	
  write	
  structured	
  “logs”	
  of	
  ML	
  
modeling	
  and	
  evaluation	
  
▪ experts	
  run	
  `jupyter	
  notebook`	
  via	
  SSH	
  tunnel	
  

for	
  remote	
  monitoring	
  and	
  updates	
  
▪ https://pypi.python.org/pypi/nbtransom
Collaboration	
  through	
  Jupyter
62
ML#Pipelines
Jupyter#kernel
Browser
SSH#tunnel
Collaboration	
  through	
  Jupyter
▪ running	
  notebooks	
  via	
  SSH	
  
tunnel	
  removes	
  the	
  need	
  for	
  
dedicated	
  UIs	
  
▪ this	
  work	
  anticipates	
  upcoming	
  
collaborative	
  document	
  features	
  
in	
  JupyterLab:	
  
Realtime	
  collaboration	
  for	
  
JupyterLab	
  using	
  Google	
  Drive

Ian	
  Rose	
  	
  UC	
  Berkeley
Expert	
  review
▪ ML	
  pipelines	
  report	
  results:	
  recognizing	
  content,	
  adding	
  annotations,	
  
requesting	
  more	
  examples	
  when	
  “confused”	
  
▪ Human-­‐in-­‐the-­‐loop	
  experts	
  –	
  potentially,	
  Customer	
  Service	
  –	
  

review	
  decisions,	
  especially	
  edge	
  cases,	
  then	
  train	
  through	
  examples	
  
▪ The	
  system	
  iterates
64
What’s	
  the	
  point	
  of	
  using	
  AI	
  in	
  Media?
▪ more	
  work,	
  quicker,	
  than	
  could	
  be	
  performed	
  

by	
  editors	
  –	
  who	
  are	
  already	
  super-­‐busy	
  people	
  
▪ exceeding	
  human	
  parity,	
  as	
  a	
  benchmark	
  
▪ helps	
  relieve	
  pressure	
  on	
  organizations,	
  

as	
  learning	
  curves	
  accelerate	
  
▪ augments	
  some	
  of	
  our	
  most	
  valuable

experts,	
  so	
  they	
  can	
  get	
  more	
  done
65
Human-­‐in-­‐the-­‐loop	
  as	
  a	
  management	
  strategy
66
personal	
  op-­‐ed:	
  the	
  “game”	
  isn’t	
  to	
  replace	
  people	
  –	
  
instead	
  it’s	
  about	
  leveraging	
  AI	
  to	
  augment	
  staff,	
  

so	
  organizations	
  can	
  retain	
  people	
  with	
  valuable	
  
domain	
  expertise,	
  making	
  their	
  contributions	
  and	
  
expertise	
  even	
  more	
  vital
Why	
  we’ll	
  never	
  run	
  out	
  of	
  jobs
67
Strata	
  Data	
  
NY,	
  Sep	
  25-­‐28

SG,	
  Dec	
  4-­‐7

SJ,	
  Mar	
  5-­‐8,	
  2018

UK,	
  May	
  21-­‐24,	
  2018	
  
The	
  AI	
  Conf	
  
SF,	
  Sep	
  17-­‐20

NY,	
  Apr	
  29-­‐May	
  2,	
  2018	
  
JupyterCon	
  
NY,	
  Aug	
  22-­‐25	
  
OSCON	
  (returns!)	
  
PDX,	
  Jul	
  16-­‐19,	
  2018
68
69
contact	
  the	
  speaker	
  for	
  

conf	
  discount	
  coupons
70
Learn	
  Alongside

Innovators
Just	
  Enough	
  Math Building	
  Data	
  
Science	
  Teams
Hylbert-­‐Speys How	
  Do	
  You	
  Learn?
updates,	
  reviews,	
  conference	
  summaries…	
  
liber118.com/pxn/

@pacoid
Humans in the loop: AI in open source and industry

More Related Content

What's hot

Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsMarcel Kurovski
 
Three Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data ScienceThree Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data ScienceAditya Parameswaran
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera, Inc.
 
Big Data and Data Intensive Computing: Use Cases
Big Data and Data Intensive Computing: Use CasesBig Data and Data Intensive Computing: Use Cases
Big Data and Data Intensive Computing: Use CasesJongwook Woo
 
Big Data and Data Intensive Computing on Networks
Big Data and Data Intensive Computing on NetworksBig Data and Data Intensive Computing on Networks
Big Data and Data Intensive Computing on NetworksJongwook Woo
 
Crowdsourced Data Processing: Industry and Academic Perspectives
Crowdsourced Data Processing: Industry and Academic PerspectivesCrowdsourced Data Processing: Industry and Academic Perspectives
Crowdsourced Data Processing: Industry and Academic PerspectivesAditya Parameswaran
 
Sql saturday el salvador 2016 - Me, A Data Scientist?
Sql saturday el salvador 2016 - Me, A Data Scientist?Sql saturday el salvador 2016 - Me, A Data Scientist?
Sql saturday el salvador 2016 - Me, A Data Scientist?Fabricio Quintanilla
 
Big Data and Predictive Analysis
Big Data and Predictive AnalysisBig Data and Predictive Analysis
Big Data and Predictive AnalysisJongwook Woo
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data ScientistDaniel Tunkelang
 
Introduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataIntroduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataWeCloudData
 
Collections Databases; Making the system work for you
Collections Databases; Making the system work for youCollections Databases; Making the system work for you
Collections Databases; Making the system work for youirowson
 
Introduction to Deep Learning and AI at Scale for Managers
Introduction to Deep Learning and AI at Scale for ManagersIntroduction to Deep Learning and AI at Scale for Managers
Introduction to Deep Learning and AI at Scale for ManagersDataWorks Summit
 
Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...
Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...
Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...Dana Gardner
 
Full-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamFull-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamGreg Goltsov
 
Python's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPython's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPeter Wang
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSSri Ambati
 
MIT Deep Learning Basics: Introduction and Overview by Lex Fridman
MIT Deep Learning Basics: Introduction and Overview by Lex FridmanMIT Deep Learning Basics: Introduction and Overview by Lex Fridman
MIT Deep Learning Basics: Introduction and Overview by Lex FridmanPeerasak C.
 

What's hot (20)

Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Web Scale Named Entity Mining
Web Scale Named Entity MiningWeb Scale Named Entity Mining
Web Scale Named Entity Mining
 
Three Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data ScienceThree Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data Science
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
 
Data Scientist Enablement roadmap 1.0
Data Scientist Enablement roadmap 1.0Data Scientist Enablement roadmap 1.0
Data Scientist Enablement roadmap 1.0
 
AI on Big Data
AI on Big DataAI on Big Data
AI on Big Data
 
Big Data and Data Intensive Computing: Use Cases
Big Data and Data Intensive Computing: Use CasesBig Data and Data Intensive Computing: Use Cases
Big Data and Data Intensive Computing: Use Cases
 
Big Data and Data Intensive Computing on Networks
Big Data and Data Intensive Computing on NetworksBig Data and Data Intensive Computing on Networks
Big Data and Data Intensive Computing on Networks
 
Crowdsourced Data Processing: Industry and Academic Perspectives
Crowdsourced Data Processing: Industry and Academic PerspectivesCrowdsourced Data Processing: Industry and Academic Perspectives
Crowdsourced Data Processing: Industry and Academic Perspectives
 
Sql saturday el salvador 2016 - Me, A Data Scientist?
Sql saturday el salvador 2016 - Me, A Data Scientist?Sql saturday el salvador 2016 - Me, A Data Scientist?
Sql saturday el salvador 2016 - Me, A Data Scientist?
 
Big Data and Predictive Analysis
Big Data and Predictive AnalysisBig Data and Predictive Analysis
Big Data and Predictive Analysis
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
 
Introduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataIntroduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudData
 
Collections Databases; Making the system work for you
Collections Databases; Making the system work for youCollections Databases; Making the system work for you
Collections Databases; Making the system work for you
 
Introduction to Deep Learning and AI at Scale for Managers
Introduction to Deep Learning and AI at Scale for ManagersIntroduction to Deep Learning and AI at Scale for Managers
Introduction to Deep Learning and AI at Scale for Managers
 
Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...
Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...
Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...
 
Full-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamFull-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data Team
 
Python's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPython's Role in the Future of Data Analysis
Python's Role in the Future of Data Analysis
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
 
MIT Deep Learning Basics: Introduction and Overview by Lex Fridman
MIT Deep Learning Basics: Introduction and Overview by Lex FridmanMIT Deep Learning Basics: Introduction and Overview by Lex Fridman
MIT Deep Learning Basics: Introduction and Overview by Lex Fridman
 

Similar to Humans in the loop: AI in open source and industry

Using Algorithmia to leverage AI and Machine Learning APIs
Using Algorithmia to leverage AI and Machine Learning APIsUsing Algorithmia to leverage AI and Machine Learning APIs
Using Algorithmia to leverage AI and Machine Learning APIsRakuten Group, Inc.
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...Matthew Lease
 
Agile.. and then? – Enterprise DevOps: the digital transformation of the IT...
Agile..  and then? – Enterprise DevOps:  the digital transformation of the IT...Agile..  and then? – Enterprise DevOps:  the digital transformation of the IT...
Agile.. and then? – Enterprise DevOps: the digital transformation of the IT...Peter Muryshkin
 
Computer Applications and Systems - Workshop V
Computer Applications and Systems - Workshop VComputer Applications and Systems - Workshop V
Computer Applications and Systems - Workshop VRaji Gogulapati
 
NASSCOM Tech Series - Machine Intelligence: Emerging Trends, Technologies an...
 NASSCOM Tech Series - Machine Intelligence: Emerging Trends, Technologies an... NASSCOM Tech Series - Machine Intelligence: Emerging Trends, Technologies an...
NASSCOM Tech Series - Machine Intelligence: Emerging Trends, Technologies an...Bijilash Babu
 
Building Your Dream Machine Learning Team with Python Expertise
Building Your Dream Machine Learning Team with Python ExpertiseBuilding Your Dream Machine Learning Team with Python Expertise
Building Your Dream Machine Learning Team with Python Expertiseriyak40
 
The Strategic Developer
The Strategic DeveloperThe Strategic Developer
The Strategic DeveloperIWMW
 
Azure Machine Learning 101
Azure Machine Learning 101Azure Machine Learning 101
Azure Machine Learning 101Renato Jovic
 
Building successful data science teams
Building successful data science teamsBuilding successful data science teams
Building successful data science teamsVenkatesh Umaashankar
 
Bootstrap Alliance Google Call to Action
Bootstrap Alliance Google Call to ActionBootstrap Alliance Google Call to Action
Bootstrap Alliance Google Call to Actionyesheng
 
Instructional Design for the Semantic Web
Instructional Design for the Semantic WebInstructional Design for the Semantic Web
Instructional Design for the Semantic Webguest649a93
 
Deep Learning for AI - Yoshua Bengio, Mila
Deep Learning for AI - Yoshua Bengio, MilaDeep Learning for AI - Yoshua Bengio, Mila
Deep Learning for AI - Yoshua Bengio, MilaLucidworks
 
Moving from Social Technology towards an Operating System for the Organisation
Moving from Social Technology towards an Operating System for the OrganisationMoving from Social Technology towards an Operating System for the Organisation
Moving from Social Technology towards an Operating System for the OrganisationLee Bryant
 

Similar to Humans in the loop: AI in open source and industry (20)

Using Algorithmia to leverage AI and Machine Learning APIs
Using Algorithmia to leverage AI and Machine Learning APIsUsing Algorithmia to leverage AI and Machine Learning APIs
Using Algorithmia to leverage AI and Machine Learning APIs
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
 
Agile.. and then? – Enterprise DevOps: the digital transformation of the IT...
Agile..  and then? – Enterprise DevOps:  the digital transformation of the IT...Agile..  and then? – Enterprise DevOps:  the digital transformation of the IT...
Agile.. and then? – Enterprise DevOps: the digital transformation of the IT...
 
Computer Applications and Systems - Workshop V
Computer Applications and Systems - Workshop VComputer Applications and Systems - Workshop V
Computer Applications and Systems - Workshop V
 
Data-X-v3.1
Data-X-v3.1Data-X-v3.1
Data-X-v3.1
 
GDSC INFO SESSION.pptx
GDSC INFO SESSION.pptxGDSC INFO SESSION.pptx
GDSC INFO SESSION.pptx
 
NASSCOM Tech Series - Machine Intelligence: Emerging Trends, Technologies an...
 NASSCOM Tech Series - Machine Intelligence: Emerging Trends, Technologies an... NASSCOM Tech Series - Machine Intelligence: Emerging Trends, Technologies an...
NASSCOM Tech Series - Machine Intelligence: Emerging Trends, Technologies an...
 
Data-X-Sparse-v2
Data-X-Sparse-v2Data-X-Sparse-v2
Data-X-Sparse-v2
 
Building Your Dream Machine Learning Team with Python Expertise
Building Your Dream Machine Learning Team with Python ExpertiseBuilding Your Dream Machine Learning Team with Python Expertise
Building Your Dream Machine Learning Team with Python Expertise
 
The Strategic Developer
The Strategic DeveloperThe Strategic Developer
The Strategic Developer
 
GDSC_INFO_SESSION 1.pptx
GDSC_INFO_SESSION 1.pptxGDSC_INFO_SESSION 1.pptx
GDSC_INFO_SESSION 1.pptx
 
Os Long
Os LongOs Long
Os Long
 
Azure Machine Learning 101
Azure Machine Learning 101Azure Machine Learning 101
Azure Machine Learning 101
 
Building successful data science teams
Building successful data science teamsBuilding successful data science teams
Building successful data science teams
 
OpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT RoorkeeOpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT Roorkee
 
Bootstrap Alliance Google Call to Action
Bootstrap Alliance Google Call to ActionBootstrap Alliance Google Call to Action
Bootstrap Alliance Google Call to Action
 
Instructional Design for the Semantic Web
Instructional Design for the Semantic WebInstructional Design for the Semantic Web
Instructional Design for the Semantic Web
 
OA centre of excellence
OA centre of excellenceOA centre of excellence
OA centre of excellence
 
Deep Learning for AI - Yoshua Bengio, Mila
Deep Learning for AI - Yoshua Bengio, MilaDeep Learning for AI - Yoshua Bengio, Mila
Deep Learning for AI - Yoshua Bengio, Mila
 
Moving from Social Technology towards an Operating System for the Organisation
Moving from Social Technology towards an Operating System for the OrganisationMoving from Social Technology towards an Operating System for the Organisation
Moving from Social Technology towards an Operating System for the Organisation
 

More from Paco Nathan

Computable Content
Computable ContentComputable Content
Computable ContentPaco Nathan
 
Computable Content: Lessons Learned
Computable Content: Lessons LearnedComputable Content: Lessons Learned
Computable Content: Lessons LearnedPaco Nathan
 
SF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonSF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonPaco Nathan
 
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsPaco Nathan
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving UpPaco Nathan
 
Data Science Reinvents Learning?
Data Science Reinvents Learning?Data Science Reinvents Learning?
Data Science Reinvents Learning?Paco Nathan
 
Jupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and ErasmusJupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and ErasmusPaco Nathan
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataPaco Nathan
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learningPaco Nathan
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesPaco Nathan
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in SparkPaco Nathan
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataPaco Nathan
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingPaco Nathan
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MorePaco Nathan
 
A New Year in Data Science: ML Unpaused
A New Year in Data Science: ML UnpausedA New Year in Data Science: ML Unpaused
A New Year in Data Science: ML UnpausedPaco Nathan
 
Microservices, Containers, and Machine Learning
Microservices, Containers, and Machine LearningMicroservices, Containers, and Machine Learning
Microservices, Containers, and Machine LearningPaco Nathan
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupPaco Nathan
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapePaco Nathan
 
What's new with Apache Spark?
What's new with Apache Spark?What's new with Apache Spark?
What's new with Apache Spark?Paco Nathan
 
How Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscapeHow Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscapePaco Nathan
 

More from Paco Nathan (20)

Computable Content
Computable ContentComputable Content
Computable Content
 
Computable Content: Lessons Learned
Computable Content: Lessons LearnedComputable Content: Lessons Learned
Computable Content: Lessons Learned
 
SF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonSF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in Python
 
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analytics
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
 
Data Science Reinvents Learning?
Data Science Reinvents Learning?Data Science Reinvents Learning?
Data Science Reinvents Learning?
 
Jupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and ErasmusJupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and Erasmus
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About Data
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learning
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in Spark
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big Data
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark Streaming
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
 
A New Year in Data Science: ML Unpaused
A New Year in Data Science: ML UnpausedA New Year in Data Science: ML Unpaused
A New Year in Data Science: ML Unpaused
 
Microservices, Containers, and Machine Learning
Microservices, Containers, and Machine LearningMicroservices, Containers, and Machine Learning
Microservices, Containers, and Machine Learning
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
 
What's new with Apache Spark?
What's new with Apache Spark?What's new with Apache Spark?
What's new with Apache Spark?
 
How Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscapeHow Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscape
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

Humans in the loop: AI in open source and industry

  • 1. Humans  in  the  loop   AI  in  open  source  and  industry Paco  Nathan  @pacoid   Dir,  Learning  Group  @  O’Reilly  Media   #NikeTechTalks    Portland  2017-­‐08-­‐10
  • 2. 2
  • 3. Research  questions: ▪ How  do  we  personalize  learning  experiences,  across  
 ebooks,  videos,  conferences,  computable  content,  live   online  courses,  case  studies,  expert  AMAs,  etc.   ▪ How  do  we  help  experts  —  by  definition,  really  busy   people  —  share  knowledge  with  their  peers  in  industry?   ▪ How  do  we  manage  the  role  of  editors  at  human  scale,  
 while  technology  and  delivery  media  evolve  rapidly?   ▪ How  do  we  help  organizations  learn  and  transform   continuously?   ▪ Can  we  accomplish  these  goals  by  leveraging  AI  in  Media? 3
  • 4. 4
  • 5. 5 UX  for  content  discovery:   • partly  generated  +  curated  by  humans   • partly  generated  +  curated  by  AI  apps
  • 7. AI  is  real,  but  why  now? ▪ Big  Data:  machine  data  (1997-­‐ish)   ▪ Big  Compute:  cloud  computing  (2006-­‐ish)   ▪ Big  Models:  deep  learning  (2009-­‐ish)   The  confluence  of  factors  created  a  business  
 environment  where  AI  could  become  mainstream   AR/VR  combined  with  embedded  computing  and   reinforcement  learning  may  bring  it  to  a  next  level 7
  • 8. Benchmark:  achieving  human  parity 2016-­‐10-­‐12:  Microsoft  researchers  reach  human  
 parity  in  conversational  speech  recognition   Achieving  Human  Parity  in  Conversational  Speech   Recognition
 W.  Xiong,  et  al.    Microsoft   8
  • 9. Big  picture ▪ The  current  state  of  machine  intelligence  3.0
 Shivon  Zilis,  James  Cham    Bloomberg  Beta  (annual  landscape)   ▪ The  Future  of  Machine  Intelligence
 David  Beyer    Amplify  Partners  (report)   ▪ Artificial  Intelligence:  Teaching  Machines  to  Think  Like  People
 Jack  Clark    Open  AI  (report)   ▪ The  AI  Conf
 O’Reilly  Media  and  Intel  partnership  (industry  conference) 9
  • 10. 10
  • 11. “Consider  the  shift  from  steam  to  electric  power:
    it  took  a  generation  before  factory  managers
    understood  they  could  reconfigure  the  physical
    arrangement.      AI  may  be  quicker  adoption,  but  faces  similar
    extremes  of  cognitive  embrace.”              –  David  Beyer    Amplify  Partners 11
  • 12. Immediate  impact  of  AI 12 personal  op-­‐ed:  the  combination  of  advances  with   UX,  DevOps,  AI  together  –  specifically  –  is  taking  off   the  table  some  previous  needs  for  what  we’d  called   “software  engineering”  –  which  must  now  undergo   major  changes
  • 14. 2017  highlights  from  leading  teams ▪ TensorFlow:  Machine  learning  for  everyone
 Rajat  Monga    Google   ▪ Distributed  deep  learning  on  AWS  using  MXNet
 Anima  Anandkumar    Amazon   ▪ Squeezing  deep  learning  onto  mobile  phones
 Anirudh  Koul    Microsoft 14
  • 15. 15 Artificial  intelligence  in  the  software  engineering  workflow
 Peter  Norvig    Google  
  • 16. 16 Can  machines  spot  diseases  faster  than  expert  humans?
 Suchi  Saria    Johns  Hopkins  U
  • 17. 17 Cars  that  coordinate  with  people
 Anca  Dragan    UC  Berkeley
  • 18. 18 Strategies  for  integrating  people  and  machine  learning  in   online  systems
 Jason  Laska    Clara  Labs  
  • 19. 19 AI  for  manufacturing:  Today  and  tomorrow
 David  Rogers    Sight  Machine
  • 20. 20 Harnessing  the  power  of  artificial  intelligence  to  diagnose   diseases
 Kavya  Kopparapu    GirlsComputingLeague
  • 22. 22 Current  themes  among  leading  AI  teams:   ▪ scale  up  to  solve  complex  problems  (big  models)     ▪ optimize  to  deploy  consumer  products  (low  power) Trending  strategy…
  • 23. 23 Most  popular  content,  among  thousands  
 of  enterprise  organizations:   Hands-­‐On  Machine  Learning  with  scikit-­‐learn   and  TensorFlow
 Aurélien  Géron   Python  FTW.   Along  with  Keras,  PyTorch,  Caffe,  etc. Trending  methods…
  • 24. UC  Berkeley  RISELab 24 ▪ https://rise.cs.berkeley.edu/     ▪ enable  machines  to  take  rapid,  intelligent   actions  based  on  real-­‐time  data  and  context   from  the  world  around  them   ▪ shift  away  from  prior  emphasis  on  JVM-­‐based   frameworks  during  AMPLab  period  (Spark)   ▪ major  focus  on  reinforcement  learning   Ray:  a  distributed  execution  framework   for  emerging  AI  applications
  • 25. Increasing  role  of  the  hardware  interface 25 ▪ earlier  generations  of  virtualization  abstracted  away  
 hardware;  however,  containers  allow  direct  access   ▪ with  DL,  application  software  must  access  the  latest  
 hardware  features  directly  –  to  be  competitive   ▪ vendors  anticipate  adv.  math  needs  for  low-­‐level  hardware,   looking  beyond  DL  –  e.g.,  multi-­‐linear  algebra  libraries     ▪ Scaling  machine  learning  (O’Reilly  Data  Show,  21:43)
 Reza  Zadeh    Stanford  /  Matroid
  • 26. Emerging  themes:  transfer  learning ▪ transfer  learning:  when  you  can  solve  a  task  well,  
 transfer  understanding  to  solve  related  problems   ▪ remove  final  classification  layer,  then  extract  
 next-­‐to-­‐last  layer  of  a  CNN:
 tensorflow.org/tutorials/image_recognition   ▪ leverage  a  network  pre-­‐trained  on  a  large  dataset:
 blog.keras.io/building-­‐powerful-­‐image-­‐classification-­‐ models-­‐using-­‐very-­‐little-­‐data.html 26
  • 27. Emerging  themes:  GANs ▪ generative  adversarial  networks:  neural  networks  
 compete  against  each  other  in  a  zero-­‐sum  game   ▪ example:  CycleGAN    (see  AI  NY  2017)   27
  • 28. “Generative  Adversarial  Networks  for  Beginners” 28
  • 29. LSTM  used  to  generate  content 29 Long  short-­‐term  memory  (LSTM)  allows  recurrent   neural  networks  to  learn  sequences  of  data,  such  
 as  in  streams  of  voice  or  text.   Imagine  feeding  scripts  (semi-­‐structured  data)  from  
 a  film  genre  through  an  LSTM,  then  generating  new   output…
  • 30. LSTM  used  to  generate  content 30 http://benjamin.wtf/ Sunspring It’s  No  Game
  • 31. LSTM  in  music  composition  /  performance 31 https://github.com/IraKorshunova/folk-­‐rnn
  • 33. How  do  people  learn? 33
  • 34. 34 White  paper:  "How  do  you  learn?"
  • 35. Peer  Teaching  through  a  range  of  Media ▪ books,  videos   ▪ live  online  courses   ▪ conferences   ▪ AMAs   ▪ computable  content   ▪ case  studies   ▪ articles   ▪ podcast  interviews   ▪ chat  forums 35
  • 36. Example:  "Learn  alongside  innovators,  thought-­‐by-­‐thought"
  • 37. Example:  "How  great  companies  make  change  happen"
  • 38. Example:  "Why  self  assessments  improve  learning"
  • 39. Key  insight  for  AI  in  Media: ▪ any  content  which  can  represented  
 as  text  can  be  parsed  by  NLP,  then   manipulated  by  available  AI  tooling     ▪ labeled  images  get  really  interesting   ▪ text  or  images  within  a  context  have  
 inherent  structure   ▪ representation  of  that  kind  of  structure   is  rare  in  the  Media  vertical  –  so  far 39
  • 41. Ontology ▪ provides  context  which  Deep  Learning  lacks   ▪ aka,  “knowledge  graph”  –  a  computable  thesaurus   ▪ maps  the  semantics  of  business  relationships   ▪ S/V/O:  “nouns”,  some  “verbs”,  a  few  “adjectives”   ▪ conversational  interfaces  (e.g.,  Google  Assistant)   improve  UX  by  importing  ontologies   ▪ the  hard  part,  a  relatively  expensive  investment 41
  • 42. Which  parts  do  people  or  machines  do  best? 42 team  goal:  maintain  structural  correspondence  between  the  layers   big  win  for  AI:  inferences  across  the  graph human  scale   primary  structure   control  points   testability machine  generated  data  products   ~80%  of  the  graph
  • 45. Components 45 ▪ rdflib  +  NetworkX:  ontology  graph  represented  as  N3  “turtle”   ▪ PyTextRank:  NLP  parsing,  feature  vectors,  summarization   ▪ Jupyter  +  nbtransom:  human-­‐in-­‐the-­‐loop  ML  pipelines   ▪ Apache  Spark:  sort,  partitioning,  task  management   ▪ scikit-­‐learn:  machine  learning  models   ▪ gensim:  vector  embedding  /  deep  learning   ▪ datasketch:  approximation  algorithms   ▪ Flask,  React,  Node.js:  microservices,  UI  web  components   ▪ Redis:  in-­‐memory  indexing,  full-­‐text  search

  • 46. PyTextRank 46 TextRank  (R  Mihalcea,  P  Tarau,  2004)  a  graph  algorithm   that  extracts  key  phrases  and  summarizes  texts  –  for  NLP   which  is  improved  over  use  of  keywords,  n-­‐grams,  etc.   ▪ construct  a  graph  from  a  paragraph  of  text   ▪ run  PageRank  on  that  graph   ▪ extract  the  highly  ranked  phrases   Python  implementation  atop  spaCy,  NetworkX,  datasketch:   ▪ https://pypi.python.org/pypi/pytextrank/
  • 48. Working  with  text  and  NLP 48 ▪ parsing   ▪ named  entity  recognition   ▪ vector  embedding   ▪ smarter  indexing   ▪ summarization  (especially  video)   ▪ semantic  similarity  to  suggest  curriculum   ▪ speed  development  of  assessments   ▪ query  expansion   ▪ amending  ontology
  • 49. A  plug  for  InnerSource… 49 We  thought  the  introduction  of  data  science  had  run   headlong  into  enterprise  silos  and  lingering  tech  debt.  
 As  if!!     Introduction  of  AI  exacerbates  that  problem  even   more.  Suggested  responses:   ▪ InnerSourceCommons.org  open  source  practices  
 within  enterprise   ▪ design  patterns  for  working  across  silos   ▪ think:  “good  house  rules  for  guests”  as  other  
 teams  submit  PRs  on  your  code  repos
  • 51. A  generational  shift? ▪ We’re  12  years  beyond  the  introduction   of  YouTube  …  anyone  raising  tweens  
 now  probably  knows  about  YouTubers   ▪ Below  a  certain  age  demographic,  people   tend  to  rely  more  on  video  and  audio   sources  for  information,  while  perhaps   print  is  gaining  more  for  entertainment.   Mobile  certainly  has  huge  impact  there. 51
  • 52. {"graf": [[21, "let", "let", "VB", 1, 48], [0, "'s", "'s", "PRP", 0, 49], "take", "take", "VB", 1, 50], [0, "a", "a", "DT", 0, 51], [23, "look", "l "NN", 1, 52], [0, "at", "at", "IN", 0, 53], [0, "a", "a", "DT", 0, 54], [ "few", "few", "JJ", 1, 55], [25, "examples", "example", "NNS", 1, 56], [0 "often", "often", "RB", 0, 57], [0, "when", "when", "WRB", 0, 58], [11, "people", "people", "NNS", 1, 59], [2, "are", "be", "VBP", 1, 60], [26, " "first", "JJ", 1, 61], [27, "learning", "learn", "VBG", 1, 62], [0, "abou "about", "IN", 0, 63], [28, "Docker", "docker", "NNP", 1, 64], [0, "they" "they", "PRP", 0, 65], [29, "try", "try", "VBP", 1, 66], [0, "and", "and" 0, 67], [30, "put", "put", "VBP", 1, 68], [0, "it", "it", "PRP", 0, 69], "in", "in", "IN", 0, 70], [0, "one", "one", "CD", 0, 71], [0, "of", "of", 0, 72], [0, "a", "a", "DT", 0, 73], [24, "few", "few", "JJ", 1, 74], [31, "existing", "existing", "JJ", 1, 75], [18, "categories", "category", "NNS 76], [0, "sometimes", "sometimes", "RB", 0, 77], [11, "people", "people", 1, 78], [9, "think", "think", "VBP", 1, 79], [0, "it", "it", "PRP", 0, 80 "'s", "be", "VBZ", 1, 81], [0, "a", "a", "DT", 0, 82], [32, "virtualizati "virtualization", "NN", 1, 83], [19, "tool", "tool", "NN", 1, 84], [0, "l "like", "IN", 0, 85], [33, "VMware", "vmware", "NNP", 1, 86], [0, "or", " "CC", 0, 87], [34, "virtualbox", "virtualbox", "NNP", 1, 88], [0, "also", "also", "RB", 0, 89], [35, "known", "know", "VBN", 1, 90], [0, "as", "as" 0, 91], [0, "a", "a", "DT", 0, 92], [36, "hypervisor", "hypervisor", "NN" 93], [0, "these", "these", "DT", 0, 94], [2, "are", "be", "VBP", 1, 95], "tools", "tool", "NNS", 1, 96], [0, "which", "which", "WDT", 0, 97], [2, "be", "VBP", 1, 98], [37, "emulating", "emulate", "VBG", 1, 99], [38, "hardware", "hardware", "NN", 1, 100], [0, "for", "for", "IN", 0, 101], [ "virtual", "virtual", "JJ", 1, 102], [40, "software", "software", "NN", 1 103]], "id": "001.video197359", "sha1": "4b69cf60f0497887e3776619b922514f2e5b70a8"} Video  transcription 52 {"count": 2, "ids": [32, 19], "pos": "np", "rank": 0.0194, "text": "virtualization tool"} {"count": 2, "ids": [40, 69], "pos": "np", "rank": 0.0117, "text": "software applications"} {"count": 4, "ids": [38], "pos": "np", "rank": 0.0114, "text": "hardware"} {"count": 2, "ids": [33, 36], "pos": "np", "rank": 0.0099, "text": "vmware hypervisor"} {"count": 4, "ids": [28], "pos": "np", "rank": 0.0096, "text": "docker"} {"count": 4, "ids": [34], "pos": "np", "rank": 0.0094, "text": "virtualbox"} {"count": 10, "ids": [11], "pos": "np", "rank": 0.0049, "text": "people"} {"count": 4, "ids": [37], "pos": "vbg", "rank": 0.0026, "text": "emulating"} {"count": 2, "ids": [27], "pos": "vbg", "rank": 0.0016, "text": "learning"} Transcript: let's take a look at a few examples often when people are first learning about Docker they try and put it in one of a few existing categories sometimes people think it's a virtualization tool like VMware or virtualbox also known as a hypervisor these are tools which are emulating hardware for virtual software Confidence: 0.973419129848 39 KUBERNETES 0.8747 coreos 0.8624 etcd 0.8478 DOCKER CONTAINERS 0.8458 mesos 0.8406 DOCKER 0.8354 DOCKER CONTAINER 0.8260 KUBERNETES CLUSTER 0.8258 docker image 0.8252 EC2 0.8210 docker hub 0.8138 OPENSTACK orm:Docker a orm:Vendor; a orm:Container; a orm:Open_Source; a orm:Commercial_Software; owl:sameAs dbr:Docker_%28software%29; skos:prefLabel "Docker"@en;
  • 53. Fave  mediated  learning  experience:  audio+ 53
  • 54. How  will  a  next  generation  learn?
  • 55. Humans  in  the  loop 55
  • 56. Active  learning ▪ special  case  of  semi-­‐supervised  machine  learning   ▪ send  difficult  calls  /  edge  cases  to  experts;  
 let  algorithms  handle  routine  decisions   ▪ works  well  in  use  cases  which  have  lots  of  
 inexpensive,  unlabeled  data   ▪ e.g.,  abundance  of  content  to  be  classified,  
 where  the  cost  of  labeling  is  the  expense   ▪ https://en.wikipedia.org/wiki/ Active_learning_(machine_learning) 56
  • 57.
  • 58. Active  learning Data  preparation  in  the  age  of  deep  learning
 oreilly.com/ideas/data-­‐preparation-­‐in-­‐the-­‐ age-­‐of-­‐deep-­‐learning
 Luke  Biewald    CrowdFlower
 O’Reilly  Data  Show,  2017-­‐05-­‐04   send  human  workers  cases  where  machine  learning   algorithms  signal  uncertainty  (low  probability  scores)     or  when  your  ensemble  of  machine  learning   algorithms  signals  disagreement 58
  • 59. Human-­‐in-­‐the-­‐loop  design  pattern   Building  a  business  that  combines  human  experts   and  data  science
 oreilly.com/ideas/building-­‐a-­‐business-­‐that-­‐ combines-­‐human-­‐experts-­‐and-­‐data-­‐science-­‐2
 Eric  Colson    StitchFix
 O’Reilly  Data  Show,  2016-­‐01-­‐28   “what  machines  can’t  do  are  things  around  cognition,
    things  that  have  to  do  with  ambient  information,  or
    appreciation  of  aesthetics,  or  even  the  ability  to
    relate  to  another  human”
 
 59
  • 60. Weak  supervision Creating  large  training  data  sets  quickly
 oreilly.com/ideas/creating-­‐large-­‐ training-­‐data-­‐sets-­‐quickly
 Alex  Ratner    Stanford
 O’Reilly  Data  Show,  2017-­‐06-­‐08   Snorkel:  “data  programming”  as  another  
 instance  of  human-­‐in-­‐the-­‐loop
 github.com/HazyResearch/snorkel   conferences.oreilly.com/strata/strata-­‐ny/ public/schedule/detail/61849 60
  • 61. Collaboration  through  Jupyter 61 Notebooks  get  used  to  manage  ML  pipelines,   where  machines  +  people  collaborate  on  docs   ▪ “Human-­‐in-­‐the-­‐loop  design  pattern”
 talk  @  JupyterCon  NY  2017   ▪ experts  adjust  parameters  in  ML  pipelines   ▪ machines  write  structured  “logs”  of  ML   modeling  and  evaluation   ▪ experts  run  `jupyter  notebook`  via  SSH  tunnel  
 for  remote  monitoring  and  updates   ▪ https://pypi.python.org/pypi/nbtransom
  • 63. Collaboration  through  Jupyter ▪ running  notebooks  via  SSH   tunnel  removes  the  need  for   dedicated  UIs   ▪ this  work  anticipates  upcoming   collaborative  document  features   in  JupyterLab:   Realtime  collaboration  for   JupyterLab  using  Google  Drive
 Ian  Rose    UC  Berkeley
  • 64. Expert  review ▪ ML  pipelines  report  results:  recognizing  content,  adding  annotations,   requesting  more  examples  when  “confused”   ▪ Human-­‐in-­‐the-­‐loop  experts  –  potentially,  Customer  Service  –  
 review  decisions,  especially  edge  cases,  then  train  through  examples   ▪ The  system  iterates 64
  • 65. What’s  the  point  of  using  AI  in  Media? ▪ more  work,  quicker,  than  could  be  performed  
 by  editors  –  who  are  already  super-­‐busy  people   ▪ exceeding  human  parity,  as  a  benchmark   ▪ helps  relieve  pressure  on  organizations,  
 as  learning  curves  accelerate   ▪ augments  some  of  our  most  valuable
 experts,  so  they  can  get  more  done 65
  • 66. Human-­‐in-­‐the-­‐loop  as  a  management  strategy 66 personal  op-­‐ed:  the  “game”  isn’t  to  replace  people  –   instead  it’s  about  leveraging  AI  to  augment  staff,  
 so  organizations  can  retain  people  with  valuable   domain  expertise,  making  their  contributions  and   expertise  even  more  vital
  • 67. Why  we’ll  never  run  out  of  jobs 67
  • 68. Strata  Data   NY,  Sep  25-­‐28
 SG,  Dec  4-­‐7
 SJ,  Mar  5-­‐8,  2018
 UK,  May  21-­‐24,  2018   The  AI  Conf   SF,  Sep  17-­‐20
 NY,  Apr  29-­‐May  2,  2018   JupyterCon   NY,  Aug  22-­‐25   OSCON  (returns!)   PDX,  Jul  16-­‐19,  2018 68
  • 69. 69 contact  the  speaker  for  
 conf  discount  coupons
  • 70. 70 Learn  Alongside
 Innovators Just  Enough  Math Building  Data   Science  Teams Hylbert-­‐Speys How  Do  You  Learn? updates,  reviews,  conference  summaries…   liber118.com/pxn/
 @pacoid