SlideShare une entreprise Scribd logo
1  sur  50
Télécharger pour lire hors ligne
Human-­‐in-­‐a-­‐loop:	
  
design	
  pattern	
  for	
  managing	
  teams	
  that	
  leverage	
  ML
Paco	
  Nathan	
  	
  @pacoid	
  
Director,	
  Learning	
  Group	
  @	
  O’Reilly	
  Media	
  
Big	
  Data	
  Spain,	
  Madrid	
  	
  2017-­‐11-­‐16
Framing
Imagine	
  having	
  a	
  mostly-­‐automated	
  system	
  where	
  

people	
  and	
  machines	
  collaborate	
  together…	
  
May	
  sound	
  a	
  bit	
  Sci-­‐Fi,	
  though	
  arguably	
  commonplace.	
  

One	
  challenge	
  is	
  whether	
  we	
  can	
  advance	
  beyond	
  just	
  
handling	
  rote	
  tasks.	
  	
  
Instead	
  of	
  simply	
  running	
  code	
  libraries,	
  can	
  machines	
  

make	
  difficult	
  decisions,	
  exercise	
  judgement	
  in	
  complex	
  
situations?	
  	
  
Can	
  we	
  build	
  systems	
  in	
  which	
  people	
  who	
  aren’t	
  

AI	
  experts	
  can	
  “teach”	
  machines	
  to	
  perform	
  complex	
  

work	
  –	
  based	
  on	
  examples,	
  not	
  code?
Research	
  questions
▪ How	
  do	
  we	
  personalize	
  learning	
  experiences,	
  across	
  

ebooks,	
  videos,	
  conferences,	
  computable	
  content,	
  

live	
  online	
  courses,	
  case	
  studies,	
  expert	
  AMAs,	
  etc.	
  
▪ How	
  do	
  we	
  help	
  experts	
  (by	
  definition,	
  really	
  busy	
  

people)	
  share	
  their	
  knowledge	
  with	
  peers	
  in	
  industry?	
  
▪ How	
  do	
  we	
  manage	
  the	
  role	
  of	
  editors	
  at	
  human	
  scale,	
  

while	
  technology	
  and	
  delivery	
  media	
  evolve	
  rapidly?	
  
▪ How	
  do	
  we	
  help	
  organizations	
  learn	
  and	
  transform	
  
continuously?
3
UX	
  for	
  content	
  discovery:	
  
▪ partly	
  generated	
  +	
  curated	
  by	
  people	
  
▪ partly	
  generated	
  +	
  curated	
  by	
  AI	
  apps
AI	
  in	
  Media
▪ content	
  which	
  can	
  represented	
  as	
  

text	
  can	
  be	
  parsed	
  by	
  NLP,	
  then	
  
manipulated	
  by	
  available	
  AI	
  tooling	
  	
  
▪ labeled	
  images	
  get	
  really	
  interesting	
  
▪ assumption:	
  text	
  or	
  images	
  –	
  within	
  

a	
  context	
  –	
  have	
  inherent	
  structure	
  
▪ representation	
  of	
  that	
  kind	
  of	
  structure	
  
is	
  rare	
  in	
  the	
  Media	
  vertical	
  –	
  so	
  far
6
{"graf": [[21, "let", "let", "VB", 1, 48], [0, "'s", "'s", "PRP", 0, 49],
"take", "take", "VB", 1, 50], [0, "a", "a", "DT", 0, 51], [23, "look", "l
"NN", 1, 52], [0, "at", "at", "IN", 0, 53], [0, "a", "a", "DT", 0, 54], [
"few", "few", "JJ", 1, 55], [25, "examples", "example", "NNS", 1, 56], [0
"often", "often", "RB", 0, 57], [0, "when", "when", "WRB", 0, 58], [11,
"people", "people", "NNS", 1, 59], [2, "are", "be", "VBP", 1, 60], [26, "
"first", "JJ", 1, 61], [27, "learning", "learn", "VBG", 1, 62], [0, "abou
"about", "IN", 0, 63], [28, "Docker", "docker", "NNP", 1, 64], [0, "they"
"they", "PRP", 0, 65], [29, "try", "try", "VBP", 1, 66], [0, "and", "and"
0, 67], [30, "put", "put", "VBP", 1, 68], [0, "it", "it", "PRP", 0, 69],
"in", "in", "IN", 0, 70], [0, "one", "one", "CD", 0, 71], [0, "of", "of",
0, 72], [0, "a", "a", "DT", 0, 73], [24, "few", "few", "JJ", 1, 74], [31,
"existing", "existing", "JJ", 1, 75], [18, "categories", "category", "NNS
76], [0, "sometimes", "sometimes", "RB", 0, 77], [11, "people", "people",
1, 78], [9, "think", "think", "VBP", 1, 79], [0, "it", "it", "PRP", 0, 80
"'s", "be", "VBZ", 1, 81], [0, "a", "a", "DT", 0, 82], [32, "virtualizati
"virtualization", "NN", 1, 83], [19, "tool", "tool", "NN", 1, 84], [0, "l
"like", "IN", 0, 85], [33, "VMware", "vmware", "NNP", 1, 86], [0, "or", "
"CC", 0, 87], [34, "virtualbox", "virtualbox", "NNP", 1, 88], [0, "also",
"also", "RB", 0, 89], [35, "known", "know", "VBN", 1, 90], [0, "as", "as"
0, 91], [0, "a", "a", "DT", 0, 92], [36, "hypervisor", "hypervisor", "NN"
93], [0, "these", "these", "DT", 0, 94], [2, "are", "be", "VBP", 1, 95],
"tools", "tool", "NNS", 1, 96], [0, "which", "which", "WDT", 0, 97], [2,
"be", "VBP", 1, 98], [37, "emulating", "emulate", "VBG", 1, 99], [38,
"hardware", "hardware", "NN", 1, 100], [0, "for", "for", "IN", 0, 101], [
"virtual", "virtual", "JJ", 1, 102], [40, "software", "software", "NN", 1
103]], "id": "001.video197359", "sha1":
"4b69cf60f0497887e3776619b922514f2e5b70a8"}
AI	
  in	
  Media
7
{"count": 2, "ids": [32, 19], "pos": "np", "rank": 0.0194, "text": "virtualization tool"}
{"count": 2, "ids": [40, 69], "pos": "np", "rank": 0.0117, "text": "software applications"}
{"count": 4, "ids": [38], "pos": "np", "rank": 0.0114, "text": "hardware"}
{"count": 2, "ids": [33, 36], "pos": "np", "rank": 0.0099, "text": "vmware hypervisor"}
{"count": 4, "ids": [28], "pos": "np", "rank": 0.0096, "text": "docker"}
{"count": 4, "ids": [34], "pos": "np", "rank": 0.0094, "text": "virtualbox"}
{"count": 10, "ids": [11], "pos": "np", "rank": 0.0049, "text": "people"}
{"count": 4, "ids": [37], "pos": "vbg", "rank": 0.0026, "text": "emulating"}
{"count": 2, "ids": [27], "pos": "vbg", "rank": 0.0016, "text": "learning"}
Transcript: let's take a look at a few examples often when
people are first learning about Docker they try and put it in
one of a few existing categories sometimes people think it's
a virtualization tool like VMware or virtualbox also known as
a hypervisor these are tools which are emulating hardware for
virtual software
Confidence: 0.973419129848
39 KUBERNETES
0.8747 coreos
0.8624 etcd
0.8478 DOCKER CONTAINERS
0.8458 mesos
0.8406 DOCKER
0.8354 DOCKER CONTAINER
0.8260 KUBERNETES CLUSTER
0.8258 docker image
0.8252 EC2
0.8210 docker hub
0.8138 OPENSTACK
orm:Docker a orm:Vendor;
a orm:Container;
a orm:Open_Source;
a orm:Commercial_Software;
owl:sameAs dbr:Docker_%28software%29;
skos:prefLabel "Docker"@en;
Knowledge	
  Graph
▪ used	
  to	
  construct	
  an	
  ontology	
  about	
  
technology,	
  based	
  on	
  learning	
  
materials	
  from	
  200+	
  publishers	
  
▪ uses	
  SKOS	
  as	
  a	
  foundation,	
  ties	
  into	
  

US	
  Library	
  of	
  Congress	
  and	
  DBpedia	
  

as	
  upper	
  ontologies	
  
▪ primary	
  structure	
  is	
  “human	
  scale”,	
  

used	
  as	
  control	
  points	
  
▪ majority	
  (>90%)	
  of	
  the	
  graph	
  

comes	
  from	
  machine	
  generated	
  

data	
  products
8
AI	
  is	
  real,	
  but	
  why	
  now?
▪ Big	
  Data:	
  machine	
  data	
  (1997-­‐ish)	
  
▪ Big	
  Compute:	
  cloud	
  computing	
  (2006-­‐ish)	
  
▪ Big	
  Models:	
  deep	
  learning	
  (2009-­‐ish)	
  
The	
  confluence	
  of	
  three	
  factors	
  created	
  a	
  business	
  

environment	
  where	
  AI	
  could	
  become	
  mainstream	
  
What	
  else	
  is	
  needed?
9
Background:	
  

helping	
  machines	
  learn
Machine	
  learning
supervised	
  ML:	
  
▪ take	
  a	
  dataset	
  where	
  each	
  element	
  
has	
  a	
  label	
  
▪ train	
  models	
  on	
  a	
  portion	
  of	
  the	
  
data	
  to	
  predict	
  the	
  labels,	
  then	
  

evaluate	
  on	
  the	
  holdout	
  
▪ deep	
  learning	
  is	
  a	
  popular	
  example,	
  

but	
  only	
  if	
  you	
  have	
  lots	
  of	
  labeled	
  
training	
  data	
  available
Machine	
  learning
unsupervised	
  ML:	
  
▪ run	
  lots	
  of	
  unlabeled	
  data	
  through	
  
an	
  algorithm	
  to	
  detect	
  “structure”	
  
or	
  embedding	
  
▪ for	
  example,	
  clustering	
  algorithms	
  
such	
  as	
  K-­‐means	
  
▪ unsupervised	
  approaches	
  for	
  AI	
  

are	
  an	
  open	
  research	
  question
Active	
  learning
special	
  case	
  of	
  semi-­‐supervised	
  ML:	
  
▪ send	
  difficult	
  decisions/edge	
  cases	
  

to	
  experts;	
  let	
  algorithms	
  handle	
  
routine	
  decisions	
  (automation)	
  
▪ works	
  well	
  in	
  use	
  cases	
  which	
  have	
  
lots	
  of	
  inexpensive,	
  unlabeled	
  data	
  
▪ e.g.,	
  abundance	
  of	
  content	
  to	
  be	
  
classified,	
  where	
  the	
  cost	
  of	
  
labeling	
  is	
  the	
  expense
The	
  reality	
  of	
  data	
  rates
“If	
  you	
  only	
  have	
  10	
  examples	
  of	
  something,	
  it’s	
  going

	
  	
  to	
  be	
  hard	
  to	
  make	
  deep	
  learning	
  work.	
  If	
  you	
  have

	
  	
  100,000	
  things	
  you	
  care	
  about,	
  records	
  or	
  whatever,

	
  	
  that’s	
  the	
  kind	
  of	
  scale	
  where	
  you	
  should	
  really	
  start

	
  	
  thinking	
  about	
  these	
  kinds	
  of	
  techniques.”	
  
Jeff	
  Dean	
  	
  Google

VB	
  Summit	
  2017-­‐10-­‐23	
  
venturebeat.com/2017/10/23/google-­‐brain-­‐chief-­‐says-­‐100000-­‐
examples-­‐is-­‐enough-­‐data-­‐for-­‐deep-­‐learning/
The	
  reality	
  of	
  data	
  rates
Use	
  cases	
  for	
  deep	
  learning	
  must	
  have	
  large,	
  carefully	
  
labeled	
  data	
  sets,	
  while	
  reinforcement	
  learning	
  needs	
  
much	
  more	
  data	
  than	
  that.	
  
Active	
  learning	
  can	
  yield	
  good	
  results	
  with	
  substantially	
  
smaller	
  data	
  rates,	
  while	
  leveraging	
  an	
  organization’s	
  
expertise	
  to	
  bootstrap	
  toward	
  larger	
  labeled	
  data	
  sets,	
  
e.g.,	
  as	
  preparation	
  for	
  deep	
  learning,	
  etc.
reinforcement
learning
supervised
learning
active
learning
deep
learning
data rates
(log scale)
Case	
  studies:	
  

practices	
  in	
  industry
On-­‐demand	
  humans
17
Active	
  learning
Real-­‐World	
  Active	
  Learning:	
  Applications	
  and	
  
Strategies	
  for	
  Human-­‐in-­‐the-­‐Loop	
  Machine	
  Learning

radar.oreilly.com/2015/02/human-­‐in-­‐the-­‐loop-­‐
machine-­‐learning.html

Ted	
  Cuzzillo

O’Reilly	
  Media,	
  2015-­‐02-­‐05	
  
Develop	
  a	
  policy	
  for	
  how	
  human	
  experts	
  select	
  exemplars:	
  
▪ bias	
  toward	
  labels	
  most	
  likely	
  to	
  influence	
  the	
  classifier	
  
▪ bias	
  toward	
  ensemble	
  disagreement	
  
▪ bias	
  toward	
  denser	
  regions	
  of	
  training	
  data
18
Active	
  learning
Active	
  learning	
  and	
  transfer	
  learning

safaribooksonline.com/library/view/oreilly-­‐
artificial-­‐intelligence/9781491985250/
video314919.html

Luke	
  Biewald	
  	
  CrowdFlower

The	
  AI	
  Conf,	
  2017-­‐09-­‐17	
  
breakthroughs	
  lag	
  algorithm	
  invention,	
  waiting	
  for	
  
“killer	
  data	
  set”	
  to	
  emerge,	
  often	
  decade+
19
Design	
  pattern:	
  Human-­‐in-­‐the-­‐loop
Building	
  a	
  business	
  that	
  combines	
  human	
  experts	
  
and	
  data	
  science

oreilly.com/ideas/building-­‐a-­‐business-­‐that-­‐
combines-­‐human-­‐experts-­‐and-­‐data-­‐science-­‐2

Eric	
  Colson	
  	
  StitchFix

O’Reilly	
  Data	
  Show,	
  2016-­‐01-­‐28	
  
“what	
  machines	
  can’t	
  do	
  are	
  things	
  around	
  cognition,

	
  	
  things	
  that	
  have	
  to	
  do	
  with	
  ambient	
  information,	
  or

	
  	
  appreciation	
  of	
  aesthetics,	
  or	
  even	
  the	
  ability	
  to

	
  	
  relate	
  to	
  another	
  human”



20
Design	
  pattern:	
  Human-­‐in-­‐the-­‐loop
Strategies	
  for	
  integrating	
  people	
  and	
  machine	
  
learning	
  in	
  online	
  systems

safaribooksonline.com/library/view/oreilly-­‐
artificial-­‐intelligence/9781491976289/
video311857.html

Jason	
  Laska	
  	
  Clara	
  Labs

The	
  AI	
  Conf,	
  2017-­‐06-­‐29	
  
how	
  to	
  create	
  a	
  two-­‐sided	
  marketplace	
  where	
  machines	
  
and	
  people	
  compete	
  on	
  a	
  spectrum	
  of	
  relative	
  expertise	
  
and	
  capabilities



21
Design	
  pattern:	
  Human-­‐in-­‐the-­‐loop
Building	
  human-­‐assisted	
  AI	
  applications

oreilly.com/ideas/building-­‐human-­‐
assisted-­‐ai-­‐applications

Adam	
  Marcus	
  	
  B12

O’Reilly	
  Data	
  Show,	
  2016-­‐08-­‐25	
  
Orchestra:	
  a	
  platform	
  for	
  building	
  human-­‐
assisted	
  AI	
  applications,	
  e.g.,	
  to	
  create	
  
business	
  websites

https://github.com/b12io/orchestra	
  
example	
  http://www.coloradopicked.com/
22
Design	
  pattern:	
  Flash	
  teams
Expert	
  Crowdsourcing	
  with	
  Flash	
  Teams

hci.stanford.edu/publications/2014/
flashteams/flashteams-­‐uist2014.pdf

Daniela	
  Retelny,	
  et	
  al.	
  

Stanford	
  HCI	
  
“A	
  flash	
  team	
  is	
  a	
  linked	
  set	
  of	
  modular	
  tasks	
  

	
  	
  that	
  draw	
  upon	
  paid	
  experts	
  from	
  the	
  crowd,	
  

	
  	
  often	
  three	
  to	
  six	
  at	
  a	
  time,	
  on	
  demand”	
  
http://stanfordhci.github.io/flash-­‐teams/
23
Weak	
  supervision	
  /	
  Data	
  programming
Creating	
  large	
  training	
  data	
  sets	
  quickly

oreilly.com/ideas/creating-­‐large-­‐training-­‐
data-­‐sets-­‐quickly

Alex	
  Ratner	
  	
  Stanford

O’Reilly	
  Data	
  Show,	
  2017-­‐06-­‐08	
  
Snorkel:	
  “weak	
  supervision”	
  and	
  “data	
  
programming”	
  as	
  another	
  instance	
  of	
  

human-­‐in-­‐the-­‐loop

github.com/HazyResearch/snorkel	
  
conferences.oreilly.com/strata/strata-­‐ny/public/
schedule/detail/61849
24
Prodigy	
  by	
  Explosion.ai
https://explosion.ai/blog/prodigy-­‐
annotation-­‐tool-­‐active-­‐learning
25
Problem:	
  
disambiguating	
  contexts
Disambiguating	
  contexts
Overlapping	
  contexts	
  pose	
  hard	
  problems	
  in	
  natural	
  language	
  understanding.	
  
That	
  runs	
  counter	
  to	
  the	
  correlation	
  emphasis	
  of	
  big	
  data.

NLP	
  libraries	
  lack	
  features	
  for	
  disambiguation.
Disambiguating	
  contexts
28
Suppose	
  someone	
  publishes	
  a	
  book	
  which	
  uses	
  the	
  term	
  
`IOS`:	
  are	
  they	
  talking	
  about	
  an	
  operating	
  system	
  for	
  an	
  
Apple	
  iPhone,	
  or	
  about	
  an	
  operating	
  system	
  for	
  a	
  Cisco	
  
router?	
  	
  
We	
  handle	
  lots	
  of	
  content	
  about	
  both.	
  Disambiguating	
  those	
  
contexts	
  is	
  important	
  for	
  good	
  UX	
  in	
  personalized	
  learning.	
  
In	
  other	
  words,	
  how	
  do	
  machines	
  help	
  people	
  

distinguish	
  that	
  content	
  within	
  search?	
  
Potentially	
  a	
  good	
  case	
  for	
  deep	
  learning,	
  

except	
  for	
  the	
  lack	
  of	
  labeled	
  data	
  at	
  scale.
Active	
  learning	
  through	
  Jupyter
29
Jupyter	
  notebooks	
  are	
  used	
  to	
  manage	
  ML	
  

pipelines	
  for	
  disambiguation,	
  where	
  machines	
  

and	
  people	
  collaborate:	
  
▪ ML	
  based	
  on	
  examples	
  –	
  most	
  all	
  of	
  the	
  feature	
  
engineering,	
  model	
  parameters,	
  etc.,	
  has	
  been	
  
automated	
  
▪ https://github.com/ceteri/nbtransom	
  
▪ based	
  on	
  use	
  of	
  nbformat,	
  pandas,	
  scikit-­‐learn
Active	
  learning	
  through	
  Jupyter
30
Jupyter	
  notebooks	
  are	
  used	
  to	
  manage	
  ML	
  
pipelines
and	
  people	
  collaborate:	
  
▪ ML	
  based	
  on	
  examples	
  –	
  most	
  all	
  of	
  the	
  feature	
  
engineering,	
  model	
  parameters,	
  etc.,	
  has	
  been	
  
automated	
  
▪ https://github.com/ceteri/nbtransom
▪ based	
  on	
  use	
  of	
  
Jupyter	
  notebook	
  as…	
  
▪ one	
  part	
  configuration	
  file	
  
▪ one	
  part	
  data	
  sample	
  
▪ one	
  part	
  structured	
  log	
  
▪ one	
  part	
  data	
  visualization	
  tool	
  
plus,	
  subsequent	
  data	
  mining	
  of	
  these	
  

notebooks	
  helps	
  augment	
  our	
  ontology
Active	
  learning	
  through	
  Jupyter
31
ML#Pipelines
Jupyter#kernel
Browser
SSH#tunnel
Active	
  learning	
  through	
  Jupyter
▪ Notebooks	
  allow	
  the	
  human	
  experts	
  to	
  access	
  the	
  
internals	
  of	
  a	
  mostly	
  automated	
  ML	
  pipeline,	
  rapidly	
  
▪ Stated	
  another	
  way,	
  both	
  the	
  machines	
  and	
  the	
  people	
  
become	
  collaborators	
  on	
  shared	
  documents	
  
▪ Anticipates	
  upcoming	
  collaborative	
  document	
  features	
  
in	
  JupyterLab
Active	
  learning	
  through	
  Jupyter
1. Experts	
  use	
  notebooks	
  to	
  provide	
  examples	
  of	
  book	
  chapters,	
  video	
  
segments,	
  etc.,	
  for	
  each	
  key	
  phrase	
  that	
  has	
  overlapping	
  contexts	
  
2. Machines	
  build	
  ensemble	
  ML	
  models	
  based	
  on	
  those	
  examples,	
  
updating	
  notebooks	
  with	
  model	
  evaluation	
  
3. Machines	
  attempt	
  to	
  annotate	
  labels	
  for	
  millions	
  of	
  pieces	
  of	
  content,	
  

e.g.,	
  `AlphaGo`,	
  `Golang`,	
  versus	
  a	
  mundane	
  use	
  of	
  the	
  verb	
  `go`	
  
4. Disambiguation	
  can	
  run	
  mostly	
  automated,	
  in	
  parallel	
  at	
  scale	
  –	
  

through	
  integration	
  with	
  Apache	
  Spark	
  
5. In	
  cases	
  where	
  ensembles	
  disagree,	
  ML	
  pipelines	
  defer	
  to	
  human	
  
experts	
  who	
  make	
  judgement	
  calls,	
  providing	
  further	
  examples	
  
6. New	
  examples	
  go	
  into	
  training	
  ML	
  pipelines	
  to	
  build	
  better	
  models	
  
7. Rinse,	
  lather,	
  repeat
Nuances
▪ No	
  Free	
  Lunch	
  theorem:	
  it	
  is	
  better	
  to	
  err	
  on	
  the	
  
side	
  of	
  less	
  false	
  positives	
  /	
  more	
  false	
  negatives	
  
in	
  use	
  cases	
  about	
  learning	
  materials	
  
▪ Employ	
  a	
  bias	
  toward	
  exemplars	
  policy,	
  i.e.,	
  those	
  
most	
  likely	
  to	
  influence	
  the	
  classifier	
  
▪ Potentially,	
  “AI	
  experts”	
  may	
  be	
  Customer	
  Service	
  
staff	
  who	
  review	
  edge	
  cases	
  within	
  search	
  results	
  
or	
  recommended	
  content	
  –	
  as	
  an	
  integral	
  part	
  of	
  
our	
  UX	
  –	
  then	
  re-­‐train	
  the	
  ML	
  pipelines	
  through	
  
examples	
  
Management	
  strategy	
  –	
  before
Generally	
  with	
  Big	
  Data,	
  we	
  are	
  considering:	
  
▪ DAG	
  workflow	
  execution	
  –	
  which	
  is	
  linear	
  
▪ data-­‐driven	
  organizations	
  
▪ ML	
  based	
  on	
  optimizing	
  for	
  

objective	
  functions	
  
▪ questions	
  of	
  correlation	
  

versus	
  causation	
  
▪ avoiding	
  “garbage	
  in,	
  garbage	
  out”
Scrub
token
Document
Collection
Tokenize
Word
Count
GroupBy
token
Count
Stop Word
List
Regex
token
HashJoin
Left
RHS
M
R
35
Management	
  strategy	
  –	
  after
HITL	
  introduces	
  circularities:	
  
▪ aka,	
  second-­‐order	
  cybernetics	
  
▪ leverage	
  feedback	
  loops	
  

as	
  conversations	
  
▪ focus	
  on	
  human	
  scale,	
  

design	
  thinking	
  
▪ people	
  and	
  machines	
  

work	
  together	
  on	
  teams	
  
▪ budget	
  experts’	
  time	
  on	
  

handling	
  the	
  exceptions
AI team
content
ontology
ML models attempt
to label the data
automatically
Expert judgement
about edge cases,
provides examples
ML models trained
using examples
Expert decisions
to extend vocabulary
ML models
have consensus,
confidence
labels
36
Essential	
  takeaway	
  idea:	
  
Depending	
  on	
  the	
  organization,	
  key	
  ingredients	
  
needed	
  to	
  enable	
  effective	
  AI	
  apps	
  may	
  come	
  
from	
  non-­‐traditional	
  “tech”	
  sources	
  …	
  
In	
  other	
  words,	
  based	
  on	
  human-­‐in-­‐the-­‐loop	
  
design	
  pattern,	
  AI	
  expertise	
  may	
  emerge	
  from	
  
your	
  Sales,	
  Marketing,	
  and	
  Customer	
  Service	
  
teams	
  –	
  which	
  have	
  crucial	
  insights	
  about	
  your	
  
customers’	
  needs.
Looking	
  ahead:	
  
some	
  trends	
  at	
  work
Looking	
  ahead	
  2018:	
  hardware	
  trends
Indications:	
  	
  progressively	
  more	
  advanced	
  mathematics	
  
moves	
  into	
  hardware	
  and	
  low-­‐level	
  software,	
  as	
  use	
  
cases	
  and	
  ROI	
  become	
  established	
  over	
  time	
  –	
  optimizing	
  
for	
  the	
  speed	
  of	
  calculations	
  and	
  capacity	
  of	
  data	
  storage	
  
Contra:	
  	
  programming	
  languages	
  which	
  use	
  abstraction	
  
layers	
  that	
  obscure	
  access	
  to	
  hardware	
  features,	
  aka	
  Java
39
… … … … …
Indications:
moves	
  into	
  hardware	
  and	
  low-­‐level	
  software,	
  as	
  use	
  
cases	
  and	
  ROI	
  become	
  established	
  over	
  time	
  –	
  optimizing	
  
for	
  the	
  speed	
  of	
  calculations	
  and	
  capacity	
  of	
  data	
  storage
Contra:
layers	
  that	
  obscure	
  access	
  to	
  hardware	
  features,	
  aka	
  Java
Looking	
  ahead	
  2018:	
  hardware	
  trends
40
… … … … …
Realistically,	
  current	
  use	
  of	
  math	
  in	
  ML	
  suffers	
  from	
  some	
  
“legacy	
  software”	
  aspects:	
  	
  underlying	
  libraries	
  generally	
  
focus	
  on	
  linear	
  algebra,	
  optimizing	
  for	
  1-­‐2	
  variables,	
  etc.	
  	
  
Meanwhile	
  our	
  use	
  cases	
  require	
  graphs,	
  multivariate	
  
problems,	
  and	
  other	
  compelling	
  cases	
  for	
  more	
  advanced	
  
math.	
  We	
  will	
  see	
  these	
  eventually	
  move	
  into	
  hardware	
  

and	
  low-­‐level	
  libraries:	
  	
  tensor	
  decomposition,	
  homology,	
  
hypervolume	
  optimization,	
  etc.
Looking	
  ahead	
  2018:	
  software	
  trends
Indications:	
  	
  cognitive	
  subsystems	
  progressively	
  becoming	
  
automated,	
  e.g.,	
  sensory	
  perception,	
  pattern	
  recognition,	
  
decisions,	
  gaming,	
  mimicry,	
  optimization,	
  knowledge	
  
representation,	
  language,	
  complex	
  movements,	
  planning,	
  
scheduling,	
  etc.	
  
Contra:	
  	
  merely	
  incremental	
  changes	
  for	
  practices	
  in	
  

software	
  engineering	
  and	
  product	
  management	
  –	
  within	
  the	
  
context	
  of	
  AI	
  apps	
  –	
  which	
  has	
  suffered	
  from	
  being	
  	
  too“linear”
41
Indications:
automated,	
  e.g.,	
  sensory	
  perception,	
  pattern	
  recognition,	
  
decisions,	
  gaming,	
  mimicry,	
  optimization,	
  knowledge	
  
representation,	
  language,	
  complex	
  movements,	
  planning,	
  
scheduling,	
  etc.
Contra:
software	
  engineering	
  and	
  product	
  management	
  –	
  within	
  the	
  
context	
  of	
  AI	
  apps	
  –	
  which	
  has	
  
Looking	
  ahead	
  2018:	
  software	
  trends
42
Enormous	
  upside	
  from	
  AI,	
  across	
  verticals;	
  however,	
  to	
  be	
  

in	
  the	
  game,	
  an	
  organization	
  must	
  already	
  have	
  Big	
  Data	
  
infrastructure	
  and	
  related	
  practices	
  in	
  place:	
  (1)	
  cloud	
  and	
  
SRE;	
  (2)	
  eliminating	
  data	
  silos;	
  (3)	
  cleaning	
  data	
  /	
  repairing	
  
metadata;	
  (4)	
  embracing	
  contemporary	
  data	
  science.	
  
Those	
  are	
  prerequisites,	
  there	
  are	
  no	
  short	
  cuts	
  in	
  AI.	
  

Plus,	
  there’s	
  an	
  ongoing	
  talent	
  crunch.	
  
–	
  consensus	
  among	
  major	
  consulting	
  firms,	
  

	
  	
  	
  Strata	
  2017	
  Exec	
  Briefings
Looking	
  ahead	
  2018:	
  people	
  trends
Indications:	
  	
  organizations	
  embracing	
  circularities,	
  focused	
  
on	
  optimizing	
  for	
  fitness	
  functions	
  (populations	
  of	
  priorities,	
  
longer-­‐term	
  ROI)	
  in	
  lieu	
  of	
  optimizing	
  for	
  objective	
  functions	
  
(singular	
  goals,	
  linear	
  cognition,	
  short-­‐term	
  ROI)	
  
Contra:	
  	
  conflict	
  defined	
  by	
  “confident	
  personalities	
  vs.	
  
confidence	
  intervals”,	
  see	
  goo.gl/GPYZ6v
43
Indications:
on	
  optimizing	
  for	
  
longer-­‐term	
  ROI)	
  in	
  lieu	
  of	
  optimizing	
  for	
  
(singular	
  goals,	
  linear	
  cognition,	
  short-­‐term	
  ROI)
Contra:
confidence	
  intervals”,	
  see	
  
Looking	
  ahead	
  2018:	
  people	
  trends
44
Peter	
  Norvig:	
  	
  disruptions	
  in	
  software	
  process	
  for	
  uncertain	
  
domains	
  –	
  the	
  workflow	
  of	
  the	
  AI	
  researcher	
  has	
  been	
  quite	
  
different	
  from	
  the	
  workflow	
  of	
  the	
  software	
  developer	
  	
  

goo.gl/XcDCZ2
François	
  Chollet:	
  	
  “casting	
  the	
  end	
  goal	
  of	
  intelligence	
  as	
  
the	
  optimization	
  of	
  an	
  extrinsic,	
  scalar	
  reward	
  function”	
  	
  

goo.gl/q7Je7D
Summary
Ahead	
  in	
  AI:	
  hardware	
  advances	
  force	
  abrupt	
  
changes	
  in	
  software	
  practices	
  –	
  which	
  has	
  
lagged	
  due	
  to	
  lack	
  of	
  infrastructure,	
  data	
  
quality,	
  outdated	
  process,	
  etc.	
  
HITL	
  (active	
  learning)	
  as	
  management	
  strategy	
  
for	
  AI	
  addresses	
  broad	
  needs	
  across	
  industry,	
  
especially	
  for	
  enterprise	
  organizations.	
  
Big	
  Team	
  begins	
  to	
  take	
  its	
  place	
  in	
  the	
  formula	
  
Big	
  Data	
  +	
  Big	
  Compute	
  +	
  Big	
  Models.
Summary
The	
  “game”	
  is	
  not	
  to	
  replace	
  people	
  –	
  instead	
  it	
  
is	
  about	
  leveraging	
  AI	
  to	
  augment	
  staff,	
  so	
  that	
  
organizations	
  can	
  retain	
  people	
  with	
  valuable	
  
domain	
  expertise,	
  making	
  their	
  contributions	
  
and	
  experience	
  even	
  more	
  vital.	
  
This	
  is	
  a	
  personal	
  opinion,	
  which	
  does	
  not	
  
necessarily	
  reflect	
  the	
  views	
  of	
  my	
  employer.	
  
However,	
  the	
  views	
  of	
  my	
  employer…
Why	
  we’ll	
  never	
  run	
  out	
  of	
  jobs
47
Strata	
  Data	
  
SG,	
  Dec	
  4-­‐7

SJ,	
  Mar	
  5-­‐8

UK,	
  May	
  21-­‐24

CN,	
  Jul	
  12-­‐15	
  
The	
  AI	
  Conf	
  
CN	
  Apr	
  10-­‐13

NY,	
  Apr	
  29-­‐May	
  2

SF,	
  Sep	
  4-­‐7

UK,	
  Oct	
  8-­‐11	
  
JupyterCon	
  
NY,	
  Aug	
  21-­‐24	
  
OSCON	
  
PDX,	
  Jul	
  16-­‐19,	
  2018
48
49
Get	
  Started	
  with	
  
NLP	
  in	
  Python
Just	
  Enough	
  Math Building	
  Data	
  
Science	
  Teams
Hylbert-­‐Speys How	
  Do	
  You	
  Learn?
updates,	
  reviews,	
  conference	
  summaries…	
  
liber118.com/pxn/

@pacoid
Human-in-a-loop: a design pattern for managing teams which leverage ML

Contenu connexe

Tendances

Intro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big DataIntro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big DataPaco Nathan
 
Mapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudMapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudChris Dagdigian
 
Dealing with uncertainty in fintech using AI
Dealing with uncertainty in fintech using AIDealing with uncertainty in fintech using AI
Dealing with uncertainty in fintech using AIData Products Meetup
 
Data Scientist 101 BI Dutch
Data Scientist 101 BI DutchData Scientist 101 BI Dutch
Data Scientist 101 BI DutchJos van Dongen
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
 
Multi-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC ClustersMulti-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC ClustersChris Dagdigian
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The PeopleDaniel Tunkelang
 
Big Data and Data Intensive Computing: Use Cases
Big Data and Data Intensive Computing: Use CasesBig Data and Data Intensive Computing: Use Cases
Big Data and Data Intensive Computing: Use CasesJongwook Woo
 
Yahoo Microstrategy 2008
Yahoo Microstrategy 2008Yahoo Microstrategy 2008
Yahoo Microstrategy 2008Amr Awadallah
 
Big Data and Data Intensive Computing on Networks
Big Data and Data Intensive Computing on NetworksBig Data and Data Intensive Computing on Networks
Big Data and Data Intensive Computing on NetworksJongwook Woo
 
Cloudera/Stanford EE203 (Entrepreneurial Engineer)
Cloudera/Stanford EE203 (Entrepreneurial Engineer)Cloudera/Stanford EE203 (Entrepreneurial Engineer)
Cloudera/Stanford EE203 (Entrepreneurial Engineer)Amr Awadallah
 
Findability Day 2016 - Augmented intelligence
Findability Day 2016 - Augmented intelligenceFindability Day 2016 - Augmented intelligence
Findability Day 2016 - Augmented intelligenceFindwise
 
Big Data and Predictive Analysis
Big Data and Predictive AnalysisBig Data and Predictive Analysis
Big Data and Predictive AnalysisJongwook Woo
 
Did you mean crowdsourcing for recommender systems?
Did you mean crowdsourcing for recommender systems?Did you mean crowdsourcing for recommender systems?
Did you mean crowdsourcing for recommender systems?oralonso
 
Three Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data ScienceThree Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data ScienceAditya Parameswaran
 
The Importance of Open Innovation in AI era
The Importance of Open Innovation in AI eraThe Importance of Open Innovation in AI era
The Importance of Open Innovation in AI eraJongwook Woo
 
Full-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamFull-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamGreg Goltsov
 

Tendances (20)

Intro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big DataIntro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big Data
 
Mapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudMapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the Cloud
 
Dealing with uncertainty in fintech using AI
Dealing with uncertainty in fintech using AIDealing with uncertainty in fintech using AI
Dealing with uncertainty in fintech using AI
 
Data Scientist 101 BI Dutch
Data Scientist 101 BI DutchData Scientist 101 BI Dutch
Data Scientist 101 BI Dutch
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
Multi-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC ClustersMulti-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC Clusters
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
 
Big Data and Data Intensive Computing: Use Cases
Big Data and Data Intensive Computing: Use CasesBig Data and Data Intensive Computing: Use Cases
Big Data and Data Intensive Computing: Use Cases
 
Yahoo Microstrategy 2008
Yahoo Microstrategy 2008Yahoo Microstrategy 2008
Yahoo Microstrategy 2008
 
Big Data and Data Intensive Computing on Networks
Big Data and Data Intensive Computing on NetworksBig Data and Data Intensive Computing on Networks
Big Data and Data Intensive Computing on Networks
 
Cloudera/Stanford EE203 (Entrepreneurial Engineer)
Cloudera/Stanford EE203 (Entrepreneurial Engineer)Cloudera/Stanford EE203 (Entrepreneurial Engineer)
Cloudera/Stanford EE203 (Entrepreneurial Engineer)
 
Data Scientist Enablement roadmap 1.0
Data Scientist Enablement roadmap 1.0Data Scientist Enablement roadmap 1.0
Data Scientist Enablement roadmap 1.0
 
Findability Day 2016 - Augmented intelligence
Findability Day 2016 - Augmented intelligenceFindability Day 2016 - Augmented intelligence
Findability Day 2016 - Augmented intelligence
 
Big Data and Predictive Analysis
Big Data and Predictive AnalysisBig Data and Predictive Analysis
Big Data and Predictive Analysis
 
AI on Big Data
AI on Big DataAI on Big Data
AI on Big Data
 
Did you mean crowdsourcing for recommender systems?
Did you mean crowdsourcing for recommender systems?Did you mean crowdsourcing for recommender systems?
Did you mean crowdsourcing for recommender systems?
 
Big Data: hype or necessity?
Big Data: hype or necessity?Big Data: hype or necessity?
Big Data: hype or necessity?
 
Three Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data ScienceThree Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data Science
 
The Importance of Open Innovation in AI era
The Importance of Open Innovation in AI eraThe Importance of Open Innovation in AI era
The Importance of Open Innovation in AI era
 
Full-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamFull-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data Team
 

Similaire à Human-in-a-loop: a design pattern for managing teams which leverage ML

Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02BIWUG
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointJoris Poelmans
 
O'Reilly ebook: Operationalizing the Data Lake
O'Reilly ebook: Operationalizing the Data LakeO'Reilly ebook: Operationalizing the Data Lake
O'Reilly ebook: Operationalizing the Data LakeVasu S
 
Build 2015 – Azure overview
Build 2015 – Azure overviewBuild 2015 – Azure overview
Build 2015 – Azure overviewLars Yde
 
Diagnosability vs The Cloud
Diagnosability vs The CloudDiagnosability vs The Cloud
Diagnosability vs The CloudBob Rhubart
 
Diagnosability versus The Cloud, Redwood Shores 2011-08-30
Diagnosability versus The Cloud, Redwood Shores 2011-08-30Diagnosability versus The Cloud, Redwood Shores 2011-08-30
Diagnosability versus The Cloud, Redwood Shores 2011-08-30Cary Millsap
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDBDenny Lee
 
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldBig Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldDez Blanchfield
 
Neural networks with python
Neural networks with pythonNeural networks with python
Neural networks with pythonTom Dierickx
 
Applicare patterns di sviluppo con Azure
Applicare patterns di sviluppo con AzureApplicare patterns di sviluppo con Azure
Applicare patterns di sviluppo con AzureMarco Parenzan
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataDebajani Mohanty
 
Anything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel Guide
Anything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel GuideAnything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel Guide
Anything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel GuideAhmet Akyol
 
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital.AI
 
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your DataCloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your DataCloudera, Inc.
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...Mihai Criveti
 

Similaire à Human-in-a-loop: a design pattern for managing teams which leverage ML (20)

Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePoint
 
Zarneger "Supporting AI: Best Practices for Content Delivery Platforms"
Zarneger "Supporting AI: Best Practices for Content Delivery Platforms"Zarneger "Supporting AI: Best Practices for Content Delivery Platforms"
Zarneger "Supporting AI: Best Practices for Content Delivery Platforms"
 
O'Reilly ebook: Operationalizing the Data Lake
O'Reilly ebook: Operationalizing the Data LakeO'Reilly ebook: Operationalizing the Data Lake
O'Reilly ebook: Operationalizing the Data Lake
 
Build 2015 – Azure overview
Build 2015 – Azure overviewBuild 2015 – Azure overview
Build 2015 – Azure overview
 
Diagnosability vs The Cloud
Diagnosability vs The CloudDiagnosability vs The Cloud
Diagnosability vs The Cloud
 
Diagnosability versus The Cloud, Redwood Shores 2011-08-30
Diagnosability versus The Cloud, Redwood Shores 2011-08-30Diagnosability versus The Cloud, Redwood Shores 2011-08-30
Diagnosability versus The Cloud, Redwood Shores 2011-08-30
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldBig Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
 
BigData primer
BigData primerBigData primer
BigData primer
 
Neural networks with python
Neural networks with pythonNeural networks with python
Neural networks with python
 
Applicare patterns di sviluppo con Azure
Applicare patterns di sviluppo con AzureApplicare patterns di sviluppo con Azure
Applicare patterns di sviluppo con Azure
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big Data
 
Big Data Analytics V2
Big Data Analytics V2Big Data Analytics V2
Big Data Analytics V2
 
On nosql
On nosqlOn nosql
On nosql
 
Anything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel Guide
Anything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel GuideAnything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel Guide
Anything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel Guide
 
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
 
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your DataCloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
 
NOSQL
NOSQLNOSQL
NOSQL
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 

Plus de Paco Nathan

Computable Content
Computable ContentComputable Content
Computable ContentPaco Nathan
 
Computable Content: Lessons Learned
Computable Content: Lessons LearnedComputable Content: Lessons Learned
Computable Content: Lessons LearnedPaco Nathan
 
SF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonSF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonPaco Nathan
 
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsPaco Nathan
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving UpPaco Nathan
 
Data Science Reinvents Learning?
Data Science Reinvents Learning?Data Science Reinvents Learning?
Data Science Reinvents Learning?Paco Nathan
 
Jupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and ErasmusJupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and ErasmusPaco Nathan
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataPaco Nathan
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learningPaco Nathan
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesPaco Nathan
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in SparkPaco Nathan
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataPaco Nathan
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingPaco Nathan
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MorePaco Nathan
 
A New Year in Data Science: ML Unpaused
A New Year in Data Science: ML UnpausedA New Year in Data Science: ML Unpaused
A New Year in Data Science: ML UnpausedPaco Nathan
 
Microservices, Containers, and Machine Learning
Microservices, Containers, and Machine LearningMicroservices, Containers, and Machine Learning
Microservices, Containers, and Machine LearningPaco Nathan
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupPaco Nathan
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapePaco Nathan
 
What's new with Apache Spark?
What's new with Apache Spark?What's new with Apache Spark?
What's new with Apache Spark?Paco Nathan
 
How Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscapeHow Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscapePaco Nathan
 

Plus de Paco Nathan (20)

Computable Content
Computable ContentComputable Content
Computable Content
 
Computable Content: Lessons Learned
Computable Content: Lessons LearnedComputable Content: Lessons Learned
Computable Content: Lessons Learned
 
SF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonSF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in Python
 
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analytics
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
 
Data Science Reinvents Learning?
Data Science Reinvents Learning?Data Science Reinvents Learning?
Data Science Reinvents Learning?
 
Jupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and ErasmusJupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and Erasmus
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About Data
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learning
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in Spark
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big Data
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark Streaming
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
 
A New Year in Data Science: ML Unpaused
A New Year in Data Science: ML UnpausedA New Year in Data Science: ML Unpaused
A New Year in Data Science: ML Unpaused
 
Microservices, Containers, and Machine Learning
Microservices, Containers, and Machine LearningMicroservices, Containers, and Machine Learning
Microservices, Containers, and Machine Learning
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
 
What's new with Apache Spark?
What's new with Apache Spark?What's new with Apache Spark?
What's new with Apache Spark?
 
How Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscapeHow Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscape
 

Dernier

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 

Dernier (20)

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 

Human-in-a-loop: a design pattern for managing teams which leverage ML

  • 1. Human-­‐in-­‐a-­‐loop:   design  pattern  for  managing  teams  that  leverage  ML Paco  Nathan    @pacoid   Director,  Learning  Group  @  O’Reilly  Media   Big  Data  Spain,  Madrid    2017-­‐11-­‐16
  • 2. Framing Imagine  having  a  mostly-­‐automated  system  where  
 people  and  machines  collaborate  together…   May  sound  a  bit  Sci-­‐Fi,  though  arguably  commonplace.  
 One  challenge  is  whether  we  can  advance  beyond  just   handling  rote  tasks.     Instead  of  simply  running  code  libraries,  can  machines  
 make  difficult  decisions,  exercise  judgement  in  complex   situations?     Can  we  build  systems  in  which  people  who  aren’t  
 AI  experts  can  “teach”  machines  to  perform  complex  
 work  –  based  on  examples,  not  code?
  • 3. Research  questions ▪ How  do  we  personalize  learning  experiences,  across  
 ebooks,  videos,  conferences,  computable  content,  
 live  online  courses,  case  studies,  expert  AMAs,  etc.   ▪ How  do  we  help  experts  (by  definition,  really  busy  
 people)  share  their  knowledge  with  peers  in  industry?   ▪ How  do  we  manage  the  role  of  editors  at  human  scale,  
 while  technology  and  delivery  media  evolve  rapidly?   ▪ How  do  we  help  organizations  learn  and  transform   continuously? 3
  • 4.
  • 5. UX  for  content  discovery:   ▪ partly  generated  +  curated  by  people   ▪ partly  generated  +  curated  by  AI  apps
  • 6. AI  in  Media ▪ content  which  can  represented  as  
 text  can  be  parsed  by  NLP,  then   manipulated  by  available  AI  tooling     ▪ labeled  images  get  really  interesting   ▪ assumption:  text  or  images  –  within  
 a  context  –  have  inherent  structure   ▪ representation  of  that  kind  of  structure   is  rare  in  the  Media  vertical  –  so  far 6
  • 7. {"graf": [[21, "let", "let", "VB", 1, 48], [0, "'s", "'s", "PRP", 0, 49], "take", "take", "VB", 1, 50], [0, "a", "a", "DT", 0, 51], [23, "look", "l "NN", 1, 52], [0, "at", "at", "IN", 0, 53], [0, "a", "a", "DT", 0, 54], [ "few", "few", "JJ", 1, 55], [25, "examples", "example", "NNS", 1, 56], [0 "often", "often", "RB", 0, 57], [0, "when", "when", "WRB", 0, 58], [11, "people", "people", "NNS", 1, 59], [2, "are", "be", "VBP", 1, 60], [26, " "first", "JJ", 1, 61], [27, "learning", "learn", "VBG", 1, 62], [0, "abou "about", "IN", 0, 63], [28, "Docker", "docker", "NNP", 1, 64], [0, "they" "they", "PRP", 0, 65], [29, "try", "try", "VBP", 1, 66], [0, "and", "and" 0, 67], [30, "put", "put", "VBP", 1, 68], [0, "it", "it", "PRP", 0, 69], "in", "in", "IN", 0, 70], [0, "one", "one", "CD", 0, 71], [0, "of", "of", 0, 72], [0, "a", "a", "DT", 0, 73], [24, "few", "few", "JJ", 1, 74], [31, "existing", "existing", "JJ", 1, 75], [18, "categories", "category", "NNS 76], [0, "sometimes", "sometimes", "RB", 0, 77], [11, "people", "people", 1, 78], [9, "think", "think", "VBP", 1, 79], [0, "it", "it", "PRP", 0, 80 "'s", "be", "VBZ", 1, 81], [0, "a", "a", "DT", 0, 82], [32, "virtualizati "virtualization", "NN", 1, 83], [19, "tool", "tool", "NN", 1, 84], [0, "l "like", "IN", 0, 85], [33, "VMware", "vmware", "NNP", 1, 86], [0, "or", " "CC", 0, 87], [34, "virtualbox", "virtualbox", "NNP", 1, 88], [0, "also", "also", "RB", 0, 89], [35, "known", "know", "VBN", 1, 90], [0, "as", "as" 0, 91], [0, "a", "a", "DT", 0, 92], [36, "hypervisor", "hypervisor", "NN" 93], [0, "these", "these", "DT", 0, 94], [2, "are", "be", "VBP", 1, 95], "tools", "tool", "NNS", 1, 96], [0, "which", "which", "WDT", 0, 97], [2, "be", "VBP", 1, 98], [37, "emulating", "emulate", "VBG", 1, 99], [38, "hardware", "hardware", "NN", 1, 100], [0, "for", "for", "IN", 0, 101], [ "virtual", "virtual", "JJ", 1, 102], [40, "software", "software", "NN", 1 103]], "id": "001.video197359", "sha1": "4b69cf60f0497887e3776619b922514f2e5b70a8"} AI  in  Media 7 {"count": 2, "ids": [32, 19], "pos": "np", "rank": 0.0194, "text": "virtualization tool"} {"count": 2, "ids": [40, 69], "pos": "np", "rank": 0.0117, "text": "software applications"} {"count": 4, "ids": [38], "pos": "np", "rank": 0.0114, "text": "hardware"} {"count": 2, "ids": [33, 36], "pos": "np", "rank": 0.0099, "text": "vmware hypervisor"} {"count": 4, "ids": [28], "pos": "np", "rank": 0.0096, "text": "docker"} {"count": 4, "ids": [34], "pos": "np", "rank": 0.0094, "text": "virtualbox"} {"count": 10, "ids": [11], "pos": "np", "rank": 0.0049, "text": "people"} {"count": 4, "ids": [37], "pos": "vbg", "rank": 0.0026, "text": "emulating"} {"count": 2, "ids": [27], "pos": "vbg", "rank": 0.0016, "text": "learning"} Transcript: let's take a look at a few examples often when people are first learning about Docker they try and put it in one of a few existing categories sometimes people think it's a virtualization tool like VMware or virtualbox also known as a hypervisor these are tools which are emulating hardware for virtual software Confidence: 0.973419129848 39 KUBERNETES 0.8747 coreos 0.8624 etcd 0.8478 DOCKER CONTAINERS 0.8458 mesos 0.8406 DOCKER 0.8354 DOCKER CONTAINER 0.8260 KUBERNETES CLUSTER 0.8258 docker image 0.8252 EC2 0.8210 docker hub 0.8138 OPENSTACK orm:Docker a orm:Vendor; a orm:Container; a orm:Open_Source; a orm:Commercial_Software; owl:sameAs dbr:Docker_%28software%29; skos:prefLabel "Docker"@en;
  • 8. Knowledge  Graph ▪ used  to  construct  an  ontology  about   technology,  based  on  learning   materials  from  200+  publishers   ▪ uses  SKOS  as  a  foundation,  ties  into  
 US  Library  of  Congress  and  DBpedia  
 as  upper  ontologies   ▪ primary  structure  is  “human  scale”,  
 used  as  control  points   ▪ majority  (>90%)  of  the  graph  
 comes  from  machine  generated  
 data  products 8
  • 9. AI  is  real,  but  why  now? ▪ Big  Data:  machine  data  (1997-­‐ish)   ▪ Big  Compute:  cloud  computing  (2006-­‐ish)   ▪ Big  Models:  deep  learning  (2009-­‐ish)   The  confluence  of  three  factors  created  a  business  
 environment  where  AI  could  become  mainstream   What  else  is  needed? 9
  • 11. Machine  learning supervised  ML:   ▪ take  a  dataset  where  each  element   has  a  label   ▪ train  models  on  a  portion  of  the   data  to  predict  the  labels,  then  
 evaluate  on  the  holdout   ▪ deep  learning  is  a  popular  example,  
 but  only  if  you  have  lots  of  labeled   training  data  available
  • 12. Machine  learning unsupervised  ML:   ▪ run  lots  of  unlabeled  data  through   an  algorithm  to  detect  “structure”   or  embedding   ▪ for  example,  clustering  algorithms   such  as  K-­‐means   ▪ unsupervised  approaches  for  AI  
 are  an  open  research  question
  • 13. Active  learning special  case  of  semi-­‐supervised  ML:   ▪ send  difficult  decisions/edge  cases  
 to  experts;  let  algorithms  handle   routine  decisions  (automation)   ▪ works  well  in  use  cases  which  have   lots  of  inexpensive,  unlabeled  data   ▪ e.g.,  abundance  of  content  to  be   classified,  where  the  cost  of   labeling  is  the  expense
  • 14. The  reality  of  data  rates “If  you  only  have  10  examples  of  something,  it’s  going
    to  be  hard  to  make  deep  learning  work.  If  you  have
    100,000  things  you  care  about,  records  or  whatever,
    that’s  the  kind  of  scale  where  you  should  really  start
    thinking  about  these  kinds  of  techniques.”   Jeff  Dean    Google
 VB  Summit  2017-­‐10-­‐23   venturebeat.com/2017/10/23/google-­‐brain-­‐chief-­‐says-­‐100000-­‐ examples-­‐is-­‐enough-­‐data-­‐for-­‐deep-­‐learning/
  • 15. The  reality  of  data  rates Use  cases  for  deep  learning  must  have  large,  carefully   labeled  data  sets,  while  reinforcement  learning  needs   much  more  data  than  that.   Active  learning  can  yield  good  results  with  substantially   smaller  data  rates,  while  leveraging  an  organization’s   expertise  to  bootstrap  toward  larger  labeled  data  sets,   e.g.,  as  preparation  for  deep  learning,  etc. reinforcement learning supervised learning active learning deep learning data rates (log scale)
  • 18. Active  learning Real-­‐World  Active  Learning:  Applications  and   Strategies  for  Human-­‐in-­‐the-­‐Loop  Machine  Learning
 radar.oreilly.com/2015/02/human-­‐in-­‐the-­‐loop-­‐ machine-­‐learning.html
 Ted  Cuzzillo
 O’Reilly  Media,  2015-­‐02-­‐05   Develop  a  policy  for  how  human  experts  select  exemplars:   ▪ bias  toward  labels  most  likely  to  influence  the  classifier   ▪ bias  toward  ensemble  disagreement   ▪ bias  toward  denser  regions  of  training  data 18
  • 19. Active  learning Active  learning  and  transfer  learning
 safaribooksonline.com/library/view/oreilly-­‐ artificial-­‐intelligence/9781491985250/ video314919.html
 Luke  Biewald    CrowdFlower
 The  AI  Conf,  2017-­‐09-­‐17   breakthroughs  lag  algorithm  invention,  waiting  for   “killer  data  set”  to  emerge,  often  decade+ 19
  • 20. Design  pattern:  Human-­‐in-­‐the-­‐loop Building  a  business  that  combines  human  experts   and  data  science
 oreilly.com/ideas/building-­‐a-­‐business-­‐that-­‐ combines-­‐human-­‐experts-­‐and-­‐data-­‐science-­‐2
 Eric  Colson    StitchFix
 O’Reilly  Data  Show,  2016-­‐01-­‐28   “what  machines  can’t  do  are  things  around  cognition,
    things  that  have  to  do  with  ambient  information,  or
    appreciation  of  aesthetics,  or  even  the  ability  to
    relate  to  another  human”
 
 20
  • 21. Design  pattern:  Human-­‐in-­‐the-­‐loop Strategies  for  integrating  people  and  machine   learning  in  online  systems
 safaribooksonline.com/library/view/oreilly-­‐ artificial-­‐intelligence/9781491976289/ video311857.html
 Jason  Laska    Clara  Labs
 The  AI  Conf,  2017-­‐06-­‐29   how  to  create  a  two-­‐sided  marketplace  where  machines   and  people  compete  on  a  spectrum  of  relative  expertise   and  capabilities
 
 21
  • 22. Design  pattern:  Human-­‐in-­‐the-­‐loop Building  human-­‐assisted  AI  applications
 oreilly.com/ideas/building-­‐human-­‐ assisted-­‐ai-­‐applications
 Adam  Marcus    B12
 O’Reilly  Data  Show,  2016-­‐08-­‐25   Orchestra:  a  platform  for  building  human-­‐ assisted  AI  applications,  e.g.,  to  create   business  websites
 https://github.com/b12io/orchestra   example  http://www.coloradopicked.com/ 22
  • 23. Design  pattern:  Flash  teams Expert  Crowdsourcing  with  Flash  Teams
 hci.stanford.edu/publications/2014/ flashteams/flashteams-­‐uist2014.pdf
 Daniela  Retelny,  et  al.  
 Stanford  HCI   “A  flash  team  is  a  linked  set  of  modular  tasks  
    that  draw  upon  paid  experts  from  the  crowd,  
    often  three  to  six  at  a  time,  on  demand”   http://stanfordhci.github.io/flash-­‐teams/ 23
  • 24. Weak  supervision  /  Data  programming Creating  large  training  data  sets  quickly
 oreilly.com/ideas/creating-­‐large-­‐training-­‐ data-­‐sets-­‐quickly
 Alex  Ratner    Stanford
 O’Reilly  Data  Show,  2017-­‐06-­‐08   Snorkel:  “weak  supervision”  and  “data   programming”  as  another  instance  of  
 human-­‐in-­‐the-­‐loop
 github.com/HazyResearch/snorkel   conferences.oreilly.com/strata/strata-­‐ny/public/ schedule/detail/61849 24
  • 27. Disambiguating  contexts Overlapping  contexts  pose  hard  problems  in  natural  language  understanding.   That  runs  counter  to  the  correlation  emphasis  of  big  data.
 NLP  libraries  lack  features  for  disambiguation.
  • 28. Disambiguating  contexts 28 Suppose  someone  publishes  a  book  which  uses  the  term   `IOS`:  are  they  talking  about  an  operating  system  for  an   Apple  iPhone,  or  about  an  operating  system  for  a  Cisco   router?     We  handle  lots  of  content  about  both.  Disambiguating  those   contexts  is  important  for  good  UX  in  personalized  learning.   In  other  words,  how  do  machines  help  people  
 distinguish  that  content  within  search?   Potentially  a  good  case  for  deep  learning,  
 except  for  the  lack  of  labeled  data  at  scale.
  • 29. Active  learning  through  Jupyter 29 Jupyter  notebooks  are  used  to  manage  ML  
 pipelines  for  disambiguation,  where  machines  
 and  people  collaborate:   ▪ ML  based  on  examples  –  most  all  of  the  feature   engineering,  model  parameters,  etc.,  has  been   automated   ▪ https://github.com/ceteri/nbtransom   ▪ based  on  use  of  nbformat,  pandas,  scikit-­‐learn
  • 30. Active  learning  through  Jupyter 30 Jupyter  notebooks  are  used  to  manage  ML   pipelines and  people  collaborate:   ▪ ML  based  on  examples  –  most  all  of  the  feature   engineering,  model  parameters,  etc.,  has  been   automated   ▪ https://github.com/ceteri/nbtransom ▪ based  on  use  of   Jupyter  notebook  as…   ▪ one  part  configuration  file   ▪ one  part  data  sample   ▪ one  part  structured  log   ▪ one  part  data  visualization  tool   plus,  subsequent  data  mining  of  these  
 notebooks  helps  augment  our  ontology
  • 31. Active  learning  through  Jupyter 31 ML#Pipelines Jupyter#kernel Browser SSH#tunnel
  • 32. Active  learning  through  Jupyter ▪ Notebooks  allow  the  human  experts  to  access  the   internals  of  a  mostly  automated  ML  pipeline,  rapidly   ▪ Stated  another  way,  both  the  machines  and  the  people   become  collaborators  on  shared  documents   ▪ Anticipates  upcoming  collaborative  document  features   in  JupyterLab
  • 33. Active  learning  through  Jupyter 1. Experts  use  notebooks  to  provide  examples  of  book  chapters,  video   segments,  etc.,  for  each  key  phrase  that  has  overlapping  contexts   2. Machines  build  ensemble  ML  models  based  on  those  examples,   updating  notebooks  with  model  evaluation   3. Machines  attempt  to  annotate  labels  for  millions  of  pieces  of  content,  
 e.g.,  `AlphaGo`,  `Golang`,  versus  a  mundane  use  of  the  verb  `go`   4. Disambiguation  can  run  mostly  automated,  in  parallel  at  scale  –  
 through  integration  with  Apache  Spark   5. In  cases  where  ensembles  disagree,  ML  pipelines  defer  to  human   experts  who  make  judgement  calls,  providing  further  examples   6. New  examples  go  into  training  ML  pipelines  to  build  better  models   7. Rinse,  lather,  repeat
  • 34. Nuances ▪ No  Free  Lunch  theorem:  it  is  better  to  err  on  the   side  of  less  false  positives  /  more  false  negatives   in  use  cases  about  learning  materials   ▪ Employ  a  bias  toward  exemplars  policy,  i.e.,  those   most  likely  to  influence  the  classifier   ▪ Potentially,  “AI  experts”  may  be  Customer  Service   staff  who  review  edge  cases  within  search  results   or  recommended  content  –  as  an  integral  part  of   our  UX  –  then  re-­‐train  the  ML  pipelines  through   examples  
  • 35. Management  strategy  –  before Generally  with  Big  Data,  we  are  considering:   ▪ DAG  workflow  execution  –  which  is  linear   ▪ data-­‐driven  organizations   ▪ ML  based  on  optimizing  for  
 objective  functions   ▪ questions  of  correlation  
 versus  causation   ▪ avoiding  “garbage  in,  garbage  out” Scrub token Document Collection Tokenize Word Count GroupBy token Count Stop Word List Regex token HashJoin Left RHS M R 35
  • 36. Management  strategy  –  after HITL  introduces  circularities:   ▪ aka,  second-­‐order  cybernetics   ▪ leverage  feedback  loops  
 as  conversations   ▪ focus  on  human  scale,  
 design  thinking   ▪ people  and  machines  
 work  together  on  teams   ▪ budget  experts’  time  on  
 handling  the  exceptions AI team content ontology ML models attempt to label the data automatically Expert judgement about edge cases, provides examples ML models trained using examples Expert decisions to extend vocabulary ML models have consensus, confidence labels 36
  • 37. Essential  takeaway  idea:   Depending  on  the  organization,  key  ingredients   needed  to  enable  effective  AI  apps  may  come   from  non-­‐traditional  “tech”  sources  …   In  other  words,  based  on  human-­‐in-­‐the-­‐loop   design  pattern,  AI  expertise  may  emerge  from   your  Sales,  Marketing,  and  Customer  Service   teams  –  which  have  crucial  insights  about  your   customers’  needs.
  • 38. Looking  ahead:   some  trends  at  work
  • 39. Looking  ahead  2018:  hardware  trends Indications:    progressively  more  advanced  mathematics   moves  into  hardware  and  low-­‐level  software,  as  use   cases  and  ROI  become  established  over  time  –  optimizing   for  the  speed  of  calculations  and  capacity  of  data  storage   Contra:    programming  languages  which  use  abstraction   layers  that  obscure  access  to  hardware  features,  aka  Java 39 … … … … …
  • 40. Indications: moves  into  hardware  and  low-­‐level  software,  as  use   cases  and  ROI  become  established  over  time  –  optimizing   for  the  speed  of  calculations  and  capacity  of  data  storage Contra: layers  that  obscure  access  to  hardware  features,  aka  Java Looking  ahead  2018:  hardware  trends 40 … … … … … Realistically,  current  use  of  math  in  ML  suffers  from  some   “legacy  software”  aspects:    underlying  libraries  generally   focus  on  linear  algebra,  optimizing  for  1-­‐2  variables,  etc.     Meanwhile  our  use  cases  require  graphs,  multivariate   problems,  and  other  compelling  cases  for  more  advanced   math.  We  will  see  these  eventually  move  into  hardware  
 and  low-­‐level  libraries:    tensor  decomposition,  homology,   hypervolume  optimization,  etc.
  • 41. Looking  ahead  2018:  software  trends Indications:    cognitive  subsystems  progressively  becoming   automated,  e.g.,  sensory  perception,  pattern  recognition,   decisions,  gaming,  mimicry,  optimization,  knowledge   representation,  language,  complex  movements,  planning,   scheduling,  etc.   Contra:    merely  incremental  changes  for  practices  in  
 software  engineering  and  product  management  –  within  the   context  of  AI  apps  –  which  has  suffered  from  being    too“linear” 41
  • 42. Indications: automated,  e.g.,  sensory  perception,  pattern  recognition,   decisions,  gaming,  mimicry,  optimization,  knowledge   representation,  language,  complex  movements,  planning,   scheduling,  etc. Contra: software  engineering  and  product  management  –  within  the   context  of  AI  apps  –  which  has   Looking  ahead  2018:  software  trends 42 Enormous  upside  from  AI,  across  verticals;  however,  to  be  
 in  the  game,  an  organization  must  already  have  Big  Data   infrastructure  and  related  practices  in  place:  (1)  cloud  and   SRE;  (2)  eliminating  data  silos;  (3)  cleaning  data  /  repairing   metadata;  (4)  embracing  contemporary  data  science.   Those  are  prerequisites,  there  are  no  short  cuts  in  AI.  
 Plus,  there’s  an  ongoing  talent  crunch.   –  consensus  among  major  consulting  firms,  
      Strata  2017  Exec  Briefings
  • 43. Looking  ahead  2018:  people  trends Indications:    organizations  embracing  circularities,  focused   on  optimizing  for  fitness  functions  (populations  of  priorities,   longer-­‐term  ROI)  in  lieu  of  optimizing  for  objective  functions   (singular  goals,  linear  cognition,  short-­‐term  ROI)   Contra:    conflict  defined  by  “confident  personalities  vs.   confidence  intervals”,  see  goo.gl/GPYZ6v 43
  • 44. Indications: on  optimizing  for   longer-­‐term  ROI)  in  lieu  of  optimizing  for   (singular  goals,  linear  cognition,  short-­‐term  ROI) Contra: confidence  intervals”,  see   Looking  ahead  2018:  people  trends 44 Peter  Norvig:    disruptions  in  software  process  for  uncertain   domains  –  the  workflow  of  the  AI  researcher  has  been  quite   different  from  the  workflow  of  the  software  developer    
 goo.gl/XcDCZ2 François  Chollet:    “casting  the  end  goal  of  intelligence  as   the  optimization  of  an  extrinsic,  scalar  reward  function”    
 goo.gl/q7Je7D
  • 45. Summary Ahead  in  AI:  hardware  advances  force  abrupt   changes  in  software  practices  –  which  has   lagged  due  to  lack  of  infrastructure,  data   quality,  outdated  process,  etc.   HITL  (active  learning)  as  management  strategy   for  AI  addresses  broad  needs  across  industry,   especially  for  enterprise  organizations.   Big  Team  begins  to  take  its  place  in  the  formula   Big  Data  +  Big  Compute  +  Big  Models.
  • 46. Summary The  “game”  is  not  to  replace  people  –  instead  it   is  about  leveraging  AI  to  augment  staff,  so  that   organizations  can  retain  people  with  valuable   domain  expertise,  making  their  contributions   and  experience  even  more  vital.   This  is  a  personal  opinion,  which  does  not   necessarily  reflect  the  views  of  my  employer.   However,  the  views  of  my  employer…
  • 47. Why  we’ll  never  run  out  of  jobs 47
  • 48. Strata  Data   SG,  Dec  4-­‐7
 SJ,  Mar  5-­‐8
 UK,  May  21-­‐24
 CN,  Jul  12-­‐15   The  AI  Conf   CN  Apr  10-­‐13
 NY,  Apr  29-­‐May  2
 SF,  Sep  4-­‐7
 UK,  Oct  8-­‐11   JupyterCon   NY,  Aug  21-­‐24   OSCON   PDX,  Jul  16-­‐19,  2018 48
  • 49. 49 Get  Started  with   NLP  in  Python Just  Enough  Math Building  Data   Science  Teams Hylbert-­‐Speys How  Do  You  Learn? updates,  reviews,  conference  summaries…   liber118.com/pxn/
 @pacoid