SlideShare une entreprise Scribd logo
1  sur  62
Télécharger pour lire hors ligne
Human	
  in	
  the	
  loop:	
  
a	
  design	
  pattern	
  for	
  managing	
  	
  
teams	
  working	
  with	
  ML
Paco	
  Nathan	
  	
  @pacoid	
  
R&D	
  Group	
  @	
  O’Reilly	
  Media	
  
Strata	
  CA	
  	
  San	
  Jose,	
  2018-­‐03-­‐08
The	
  reality	
  of	
  data	
  rates
“If	
  you	
  only	
  have	
  10	
  examples	
  of	
  something,	
  it’s	
  going

	
  	
  to	
  be	
  hard	
  to	
  make	
  deep	
  learning	
  work.	
  If	
  you	
  have

	
  	
  100,000	
  things	
  you	
  care	
  about,	
  records	
  or	
  whatever,

	
  	
  that’s	
  the	
  kind	
  of	
  scale	
  where	
  you	
  should	
  really	
  start

	
  	
  thinking	
  about	
  these	
  kinds	
  of	
  techniques.”	
  
Jeff	
  Dean	
  	
  Google

VB	
  Summit	
  (2017-­‐10-­‐23)	
  
venturebeat.com/2017/10/23/google-­‐brain-­‐chief-­‐says-­‐100000-­‐
examples-­‐is-­‐enough-­‐data-­‐for-­‐deep-­‐learning/
2
The	
  reality	
  of	
  data	
  rates
Transfer	
  learning	
  aside,	
  most	
  DL	
  use	
  cases	
  require	
  

large,	
  carefully	
  labeled	
  data	
  sets,	
  while	
  RL	
  requires	
  

much	
  more	
  data	
  than	
  that.	
  
Active	
  learning	
  can	
  yield	
  good	
  results	
  with	
  substantially	
  
smaller	
  data	
  rates,	
  while	
  leveraging	
  an	
  organization’s	
  
expertise	
  to	
  bootstrap	
  toward	
  larger	
  labeled	
  data	
  sets,	
  
e.g.,	
  as	
  preparation	
  for	
  deep	
  learning,	
  etc.
reinforcement
learning
supervised
learning
active
learning
deep
learning
data rates
(log scale)
3
The	
  reality	
  of	
  data	
  rates
Transfer	
  learning	
  aside,	
  most	
  DL	
  use	
  cases	
  require	
  
large
much	
  more
Active	
  learning
smaller	
  data	
  rates,	
  while	
  leveraging	
  an	
  organization
expertise	
  to	
  bootstrap	
  toward	
  larger	
  labeled	
  data	
  sets,	
  
e.g.,	
  as	
  preparation	
  for	
  deep	
  learning,	
  etc.
reinforcement
learning
supervised
learning
active
learning
deep
learning
data rates
(log scale)
reinforcement
learning
supervised
learning
active
learning
deep
learning
data rates
(log scale)
active	
  learning:	
  
indicated	
  for	
  many	
  
enterprise	
  use	
  cases
4
Why	
  are	
  AI	
  programs	
  different?
5
AI	
  in	
  the	
  software	
  engineering	
  workflow

Peter	
  Norvig	
  	
  Google

TheAIConf	
  (2017-­‐06-­‐28)	
  
▪ Content:	
  models	
  not	
  programs	
  
▪ Process:	
  training	
  not	
  debugging	
  
▪ Release:	
  retraining	
  not	
  patching	
  
▪ Uncertainty:	
  of	
  objective	
  
▪ Uncertainty:	
  of	
  action/recommendation	
  
▪ Uncertainty:	
  propagates	
  through	
  model
Active	
  Learning:	
  
case	
  studies	
  and	
  patterns
Machine	
  learning
supervised	
  ML:	
  
▪ take	
  a	
  dataset	
  where	
  each	
  element	
  
has	
  a	
  label	
  
▪ train	
  models	
  on	
  a	
  portion	
  of	
  the	
  
data	
  to	
  predict	
  the	
  labels,	
  then	
  

evaluate	
  on	
  the	
  holdout	
  
▪ deep	
  learning	
  is	
  a	
  popular	
  example,	
  

but	
  only	
  if	
  you	
  have	
  lots	
  of	
  labeled	
  
training	
  data	
  available
7
Machine	
  learning
unsupervised	
  ML:	
  
▪ run	
  lots	
  of	
  unlabeled	
  data	
  through	
  
an	
  algorithm	
  to	
  detect	
  “structure”	
  
or	
  embedding	
  
▪ for	
  example,	
  clustering	
  algorithms	
  
such	
  as	
  K-­‐means	
  
▪ unsupervised	
  approaches	
  for	
  AI	
  

are	
  an	
  open	
  research	
  question
8
Active	
  learning
special	
  case	
  of	
  semi-­‐supervised	
  ML:	
  
▪ send	
  difficult	
  decisions/edge	
  cases	
  

to	
  experts;	
  let	
  algorithms	
  handle	
  
routine	
  decisions	
  (automation)	
  
▪ works	
  well	
  in	
  use	
  cases	
  which	
  have	
  
lots	
  of	
  inexpensive,	
  unlabeled	
  data	
  
▪ e.g.,	
  abundance	
  of	
  content	
  to	
  be	
  
classified,	
  where	
  cost	
  of	
  labeling	
  

is	
  a	
  major	
  expense
9
Who’s	
  doing	
  this?
10
Design	
  pattern:	
  Active	
  learning
Real-­‐World	
  Active	
  Learning:	
  Applications	
  and	
  
Strategies	
  for	
  Human-­‐in-­‐the-­‐Loop	
  ML

Ted	
  Cuzzillo

O’Reilly	
  Media	
  (2015-­‐02-­‐05)	
  
Active	
  learning	
  and	
  transfer	
  learning

Luke	
  Biewald	
  	
  CrowdFlower

The	
  AI	
  Conf,	
  SF	
  (2017-­‐09-­‐17)	
  
breakthroughs	
  lag	
  invention	
  of	
  methods;

must	
  wait	
  for	
  “killer	
  data	
  set”	
  to	
  emerge,	
  

often	
  a	
  decade	
  or	
  more
11
Design	
  pattern:	
  Weak	
  supervision
Creating	
  large	
  training	
  data	
  sets	
  quickly

Alex	
  Ratner	
  	
  Stanford

O’Reilly	
  Data	
  Show	
  (2017-­‐06-­‐08)	
  
Snorkel:	
  using	
  weak	
  supervision	
  and	
  

data	
  programming	
  as	
  another	
  instance	
  

of	
  human-­‐in-­‐the-­‐loop

github.com/HazyResearch/snorkel	
  
conferences.oreilly.com/strata/strata-­‐ny/public/
schedule/detail/61849
12
Design	
  pattern:	
  Human-­‐in-­‐the-­‐loop
Paul	
  English	
  on	
  Lola's	
  Debut	
  for	
  Business	
  Travelers

Elizabeth	
  West

Business	
  Travel	
  News	
  (2017-­‐10-­‐04)	
  
founded	
  2015	
  by	
  Paul	
  English	
  and	
  other	
  Kayak	
  execs:	
  

on-­‐demand,	
  personal	
  travel	
  service;	
  uses	
  expert	
  travel	
  agents	
  for	
  HITL	
  
initially	
  criticized	
  by	
  travel	
  industry	
  as	
  “competing	
  against	
  Siri”;	
  

currently	
  displacing	
  OTAs	
  in	
  a	
  reversal	
  of	
  “AI	
  vs.	
  jobs”	
  
can	
  book	
  on	
  Airbnb,	
  Southwest,	
  etc.,	
  which	
  aren’t	
  available	
  via	
  OTA,	
  

because	
  of	
  the	
  human	
  delegation	
  
“The	
  first	
  time	
  you	
  use	
  Lola	
  it’s	
  going	
  to	
  be	
  great	
  because	
  it’s	
  a	
  conversation.	
  

	
  We’re	
  not	
  making	
  you	
  think	
  like	
  a	
  computer”	
  
“Instead	
  of	
  showing	
  you	
  300	
  choices	
  or	
  1,000	
  choices,	
  we	
  think	
  we	
  can	
  

	
  	
  show	
  you	
  three	
  choices,	
  kind	
  of	
  good,	
  better,	
  best”
13
Design	
  pattern:	
  Human-­‐in-­‐the-­‐loop
Anand	
  Kulkarni	
  	
  Crowdbotics	
  
HITL	
  for	
  code+test	
  gen,	
  trained	
  from	
  GitHub,	
  StackOverflow,	
  
etc.,	
  with	
  JIRA	
  tickets	
  as	
  the	
  granular	
  object	
  in	
  the	
  system	
  
parse	
  specs	
  from	
  JIRA	
  history,	
  reuse	
  what’s	
  been	
  done	
  before;	
  
generate	
  PRs	
  for	
  popular	
  web	
  stacks:	
  React,	
  Flask,	
  Ruby,	
  etc.	
  
resolve	
  specs	
  into	
  the	
  approach	
  needed	
  and	
  time	
  required,	
  

where	
  product	
  managers	
  get	
  cost	
  estimates,	
  then	
  on-­‐demand	
  
expert	
  programmers	
  implement	
  for	
  you	
  
have	
  the	
  in-­‐house	
  engineers	
  handle	
  “radically	
  novel”	
  projects	
  
results:	
  1.5x	
  software	
  dev	
  throughput
14
Design	
  pattern:	
  Human-­‐in-­‐the-­‐loop
Building	
  a	
  business	
  that	
  combines	
  human	
  
experts	
  and	
  data	
  science

Eric	
  Colson	
  	
  StitchFix

O’Reilly	
  Data	
  Show	
  (2016-­‐01-­‐28)	
  
“what	
  machines	
  can’t	
  do	
  are	
  things	
  around	
  cognition,

	
  	
  things	
  that	
  have	
  to	
  do	
  with	
  ambient	
  information,	
  or

	
  	
  appreciation	
  of	
  aesthetics,	
  or	
  even	
  the	
  ability	
  to

	
  	
  relate	
  to	
  another	
  human”

15
Design	
  pattern:	
  Human-­‐in-­‐the-­‐loop
EY,	
  Deloitte	
  And	
  PwC	
  Embrace	
  Artificial	
  
Intelligence	
  For	
  Tax	
  And	
  Accounting

Adelyn	
  Zhou

Forbes	
  (2017-­‐11-­‐14)	
  
compliance	
  use	
  cases	
  in	
  reviewing	
  lease	
  

accounting	
  standards	
  
3x	
  more	
  consistent	
  and	
  2x	
  efficient	
  than	
  

the	
  previous	
  humans-­‐only	
  teams	
  
break-­‐even	
  ROI	
  within	
  less	
  than	
  a	
  year
16
Design	
  pattern:	
  Human-­‐in-­‐the-­‐loop
Unsupervised	
  fuzzy	
  labeling	
  using	
  deep	
  
learning	
  to	
  improve	
  anomaly	
  detection

Adam	
  Gibson	
  	
  Skymind

Strata	
  Data	
  Conf,	
  Singapore	
  (2017-­‐12-­‐07)	
  
large-­‐scale	
  use	
  case	
  for	
  telecom	
  in	
  Asia	
  
method:	
  overfit	
  variational	
  autoencoders,	
  

then	
  send	
  outliers	
  to	
  human	
  analysts
17
Design	
  pattern:	
  Human-­‐in-­‐the-­‐loop
Strategies	
  for	
  integrating	
  people	
  and	
  machine	
  
learning	
  in	
  online	
  systems

Jason	
  Laska	
  	
  Clara	
  Labs

The	
  AI	
  Conf,	
  NY	
  (2017-­‐06-­‐29)	
  
establishing	
  a	
  two-­‐sided	
  marketplace	
  where	
  

machines	
  and	
  people	
  compete	
  on	
  a	
  spectrum	
  

of	
  relative	
  expertise	
  and	
  capabilities



18
Design	
  pattern:	
  Human-­‐in-­‐the-­‐loop
Strategies	
  for	
  integrating	
  people	
  and	
  machine	
  
learning	
  in	
  online	
  systems
Jason	
  Laska
The	
  AI	
  Conf
establishing	
  a	
  two-­‐sided	
  marketplace	
  where	
  
machines	
  and	
  people	
  compete	
  on	
  a	
  spectrum	
  
of	
  relative	
  


19
“the	
  trick	
  is	
  to	
  design	
  systems	
  from	
  Day	
  1

	
  	
  which	
  learn	
  implicitly	
  from	
  the	
  intelligence

	
  	
  which	
  is	
  already	
  there”	
  
	
  Michael	
  Akilian	
  	
  Clara	
  Labs	
  
Design	
  pattern:	
  Human-­‐in-­‐the-­‐loop
Building	
  human-­‐assisted	
  AI	
  applications

Adam	
  Marcus	
  	
  B12

O’Reilly	
  Data	
  Show	
  (2016-­‐08-­‐25)	
  
“Humans	
  where	
  they’re	
  best,	
  machines	
  for	
  the	
  rest.”	
  
Orchestra:	
  a	
  platform	
  for	
  building	
  human-­‐assisted	
  

AI	
  applications,	
  e.g.,	
  create/update	
  business	
  websites

https://github.com/b12io/orchestra	
  
example:	
  http://www.coloradopicked.com/
20
Design	
  pattern:	
  Flash	
  teams
Expert	
  Crowdsourcing	
  with	
  Flash	
  Teams

Daniela	
  Retelny,	
  et	
  al.	
  

Stanford	
  HCI

UIST	
  (2014-­‐10-­‐05)	
  
computationally-­‐guided	
  teams	
  of	
  crowd	
  experts	
  
supported	
  by	
  lightweight,	
  reproducible,	
  scalable	
  
team	
  structures	
  
“elastic	
  recruiting”:	
  grow	
  and	
  shrink	
  teams	
  on	
  
demand,	
  combine	
  teams	
  into	
  larger	
  organizations	
  
http://stanfordhci.github.io/flash-­‐teams/
21
Problem:	
  
disambiguating	
  contexts
AI	
  in	
  Media
▪ content	
  which	
  can	
  represented	
  as	
  

text	
  can	
  be	
  parsed	
  by	
  NLP,	
  then	
  
manipulated	
  by	
  available	
  AI	
  tooling	
  	
  
▪ labeled	
  images	
  get	
  really	
  interesting	
  
▪ text	
  or	
  images	
  within	
  a	
  context	
  have	
  

inherent	
  structure	
  
▪ representation	
  of	
  that	
  kind	
  of	
  structure	
  
is	
  rare	
  in	
  the	
  Media	
  vertical	
  –	
  so	
  far
23
Disambiguating	
  contexts
Overlapping	
  contexts	
  pose	
  hard	
  problems	
  in	
  natural	
  language	
  understanding.	
  
That	
  runs	
  counter	
  to	
  the	
  correlation	
  emphasis	
  of	
  big	
  data.

NLP	
  libraries	
  lack	
  features	
  for	
  disambiguation.
Disambiguating	
  contexts
25
Suppose	
  someone	
  publishes	
  a	
  book	
  which	
  uses	
  the	
  term	
  
`react`:	
  are	
  they	
  talking	
  about	
  a	
  JavaScript	
  library,	
  or	
  about	
  
human	
  behavior	
  during	
  interviews?	
  	
  Our	
  customers	
  ask	
  for	
  
both.	
  
We	
  handle	
  lots	
  of	
  content	
  about	
  both.	
  Disambiguating	
  those	
  
contexts	
  is	
  important	
  for	
  good	
  UX	
  in	
  personalized	
  learning.	
  
In	
  other	
  words,	
  how	
  do	
  machines	
  help	
  people	
  

distinguish	
  that	
  content	
  within	
  search?	
  
Potentially	
  a	
  good	
  case	
  for	
  deep	
  learning,	
  

except	
  for	
  the	
  lack	
  of	
  labeled	
  data	
  at	
  scale.
Active	
  learning	
  through	
  Jupyter
26
Jupyter	
  notebooks	
  are	
  used	
  to	
  manage	
  ML	
  

pipelines	
  for	
  disambiguation,	
  where	
  machines	
  

and	
  people	
  collaborate:	
  
▪ ML	
  based	
  on	
  examples	
  –	
  most	
  all	
  of	
  the	
  feature	
  
engineering,	
  model	
  parameters,	
  etc.,	
  has	
  been	
  
automated	
  
▪ https://github.com/ceteri/nbtransom	
  
▪ based	
  on	
  use	
  of	
  nbformat,	
  pandas,	
  scikit-­‐learn
Active	
  learning	
  through	
  Jupyter
27
Jupyter	
  notebooks	
  are	
  used	
  to	
  manage	
  ML	
  
pipelines
and	
  people	
  collaborate:	
  
▪ ML	
  based	
  on	
  examples	
  –	
  most	
  all	
  of	
  the	
  feature	
  
engineering,	
  model	
  parameters,	
  etc.,	
  has	
  been	
  
automated	
  
▪ https://github.com/ceteri/nbtransom
▪ based	
  on	
  use	
  of	
  
Jupyter	
  notebook	
  as…	
  
▪ one	
  part	
  configuration	
  file	
  
▪ one	
  part	
  data	
  sample	
  
▪ one	
  part	
  structured	
  log	
  
▪ one	
  part	
  data	
  visualization	
  tool	
  
plus,	
  subsequent	
  data	
  mining	
  of	
  these	
  

notebooks	
  helps	
  augment	
  our	
  ontology
Active	
  learning	
  through	
  Jupyter
28
ML#Pipelines
Jupyter#kernel
Browser
SSH#tunnel
Active	
  learning	
  through	
  Jupyter
▪ Notebooks	
  allow	
  the	
  human	
  experts	
  to	
  access	
  the	
  
internals	
  of	
  a	
  mostly	
  automated	
  ML	
  pipeline,	
  rapidly	
  
▪ Stated	
  another	
  way,	
  both	
  the	
  machines	
  and	
  the	
  people	
  
become	
  collaborators	
  on	
  shared	
  documents	
  
▪ Anticipates	
  upcoming	
  collaborative	
  document	
  features	
  
in	
  JupyterLab
Active	
  learning	
  through	
  Jupyter
1. Experts	
  use	
  notebooks	
  to	
  provide	
  examples	
  of	
  book	
  chapters,	
  video	
  
segments,	
  etc.,	
  for	
  each	
  key	
  phrase	
  that	
  has	
  overlapping	
  contexts	
  
2. Machines	
  build	
  ensemble	
  ML	
  models	
  based	
  on	
  those	
  examples,	
  
updating	
  notebooks	
  with	
  model	
  evaluation	
  
3. Machines	
  attempt	
  to	
  annotate	
  labels	
  for	
  millions	
  of	
  pieces	
  of	
  content,	
  

e.g.,	
  `AlphaGo`,	
  `Golang`,	
  versus	
  a	
  mundane	
  use	
  of	
  the	
  verb	
  `go`	
  
4. Disambiguation	
  can	
  run	
  mostly	
  automated,	
  in	
  parallel	
  at	
  scale	
  –	
  

through	
  integration	
  with	
  Apache	
  Spark	
  
5. In	
  cases	
  where	
  ensembles	
  disagree,	
  ML	
  pipelines	
  defer	
  to	
  human	
  
experts	
  who	
  make	
  judgement	
  calls,	
  providing	
  further	
  examples	
  
6. New	
  examples	
  go	
  into	
  training	
  ML	
  pipelines	
  to	
  build	
  better	
  models	
  
7. Rinse,	
  lather,	
  repeat
Social	
  Systems:	
  
collaboration	
  with	
  machines
Product	
  management
The	
  History	
  and	
  Evolution	
  of	
  Product	
  Management

Martin	
  Eriksson

Mind	
  the	
  Product	
  (2015-­‐10-­‐28)	
  
From	
  PM’s	
  origins	
  as	
  “Brand	
  Men”,

on	
  through	
  the	
  success	
  arc	
  of	
  Hewlett-­‐Packard,

on	
  to	
  Agile	
  Manifesto,	
  Lean	
  Enterprise,	
  etc.	
  
Formerly	
  part	
  of	
  Engineering	
  or	
  Marketing,

PM	
  now	
  “taking	
  a	
  seat	
  at	
  the	
  table”	
  under	
  CEOs
32
Conway’s	
  Law
How	
  Do	
  Committees	
  Invent?

Melvin	
  Conway

Datamation	
  (1968-­‐04)	
  
Organizations	
  that	
  create	
  systems	
  produce	
  designs	
  

which	
  copy	
  their	
  own	
  communication	
  structures.	
  
For	
  each	
  level	
  of	
  delegation,	
  someone’s	
  scope	
  of	
  

inquiry	
  narrows,	
  design	
  alternatives	
  also	
  narrow	
  –	
  

until	
  a	
  system	
  is	
  simple	
  enough	
  to	
  be	
  understood	
  

in	
  human	
  terms.
33
Conway’s	
  Law	
  illustrated
Organizational	
  Charts

Manu	
  Cornet	
  	
  Bonkers	
  World	
  
Cognitive	
  biases:	
  
▪ anthropocentrism	
  
▪ system	
  justification	
  
In	
  retrospect,	
  Agile	
  Manifesto	
  

contains	
  examples	
  
See	
  related	
  descriptions:

Destruction	
  and	
  Creation

John	
  R.	
  Boyd	
  	
  USAF

(1976-­‐09-­‐03)
34
First-­‐order	
  cybernetics
Cybernetics:	
  Or	
  Control	
  and	
  Communication	
  

in	
  the	
  Animal	
  and	
  the	
  Machine

Norbert	
  Wiener	
  	
  MIT

MIT	
  Press	
  (1948)	
  
early	
  work	
  had	
  been	
  about	
  closed-­‐loop	
  control	
  systems:	
  
homeostasis,	
  habituation,	
  adaptation,	
  and	
  other	
  
regulatory	
  processes	
  
given	
  a	
  system	
  which	
  has	
  input	
  and	
  output,	
  a	
  controller	
  
leveraging	
  a	
  negative	
  feedback	
  loop,	
  and	
  one	
  or	
  more	
  
observers	
  outside	
  of	
  the	
  system	
  
related	
  to	
  the	
  early	
  Macy	
  Conferences
35
“the	
  organism	
  was	
  no	
  longer	
  an	
  input/output	
  machine;

	
  	
  rather	
  it	
  was	
  part	
  of	
  a	
  loop	
  from	
  perception	
  to	
  action

	
  	
  and	
  back	
  again	
  to	
  perception”	
  
Paul	
  Pangaro	
  describing	
  Jerry	
  Lettvin	
  @	
  MIT	
  cybernetics
Second-­‐order	
  cybernetics
1. von	
  Foerster:	
  one	
  can	
  apply	
  the	
  understandings	
  developed	
  in	
  
cybernetics	
  to	
  the	
  subject	
  matter	
  itself	
  
2. presence	
  of	
  the	
  observer	
  is	
  inevitable	
  and	
  may	
  be	
  desirable:	
  

“What	
  is	
  said	
  is	
  said	
  to	
  an	
  observer”	
  
3. eigen	
  functions:	
  stable,	
  dynamically	
  self-­‐perpetuating	
  states	
  that	
  
are	
  self-­‐referential:	
  “We	
  construct	
  our	
  realities”	
  per	
  constructivism	
  
4. autopoiesis:	
  a	
  living	
  entity	
  exists	
  as	
  a	
  network	
  of	
  components,	
  
recursively	
  producing	
  itself,	
  realizing	
  its	
  boundaries;	
  it	
  grows	
  and	
  
maintains	
  itself	
  by	
  reference	
  to	
  itself	
  
5. feedback	
  loops	
  represent	
  conversations,	
  from	
  which	
  the	
  
participants	
  cannot	
  be	
  detached	
  
6. an	
  essentially	
  ethical	
  understanding	
  
7. a	
  productive	
  interaction	
  between	
  theory	
  and	
  practice,	
  in	
  which	
  
each	
  supports	
  the	
  other 37
Second-­‐order	
  cybernetics
1. von	
  Foerster:	
  one	
  can	
  apply	
  the	
  understandings	
  developed	
  in	
  
cybernetics
2. presence	
  of	
  the	
  observer	
  is	
  inevitable	
  and	
  may	
  be	
  desirable:	
  
“What	
  is	
  said	
  is	
  said	
  to	
  an	
  observer”
3. eigen	
  functions:	
  stable,	
  dynamically	
  self-­‐perpetuating	
  states	
  that	
  
are	
  self-­‐referential:	
  “We	
  construct	
  our	
  realities”	
  per	
  
4. autopoiesis
recursively	
  producing	
  itself,	
  realizing	
  its	
  boundaries;	
  it	
  grows	
  and	
  
maintains	
  itself	
  by	
  reference	
  to	
  itself	
  
5. feedback	
  loops	
  represent	
  
participants	
  cannot	
  be	
  detached
6. an	
  essentially	
  ethical	
  understanding	
  
7. a	
  productive	
  interaction	
  between	
  theory	
  and	
  practice,	
  in	
  which	
  
each	
  supports	
  the	
  other 38
second-­‐order	
  cybernetics	
  lays	
  a	
  foundation	
  for	
  AI	
  –	
  
it’s	
  about	
  the	
  semantic	
  relations	
  of	
  conversations	
  
within	
  a	
  system;	
  quite	
  apt	
  for	
  leveraging	
  NLP,	
  active	
  
learning,	
  etc.,	
  when	
  you	
  have	
  semi-­‐structured	
  dialog
Second-­‐order	
  cybernetics
Autopoiesis	
  and	
  Cognition:	
  The	
  Realization	
  of	
  the	
  Living

Humberto	
  Maturana,	
  Francisco	
  Varela

Kluwer	
  (1980	
  /	
  original	
  1972)	
  
Understanding	
  Computers	
  and	
  Cognition:	
  

A	
  New	
  Foundation	
  for	
  Design

Terry	
  Winograd,	
  Fernando	
  Flores

Intellect	
  Books	
  (1986)	
  
Conversations	
  for	
  Action	
  and	
  Collected	
  Essays

Fernando	
  Flores

Createspace	
  (2013)
39
Second-­‐order	
  cybernetics
▪ biology	
  informing	
  computer	
  science	
  
▪ historical	
  context	
  of	
  Project	
  Cybersyn	
  
▪ autopoiesis	
  and	
  cognition	
  
▪ organizational	
  closure:	
  

“self-­‐making	
  means	
  stability”	
  
▪ speech	
  acts	
  (e.g.,	
  social	
  analysis	
  of	
  open	
  source)	
  
▪ IMO,	
  blueprints	
  for	
  AI	
  systems	
  
Also,	
  the	
  focus	
  on	
  “information	
  as	
  a	
  collection	
  of	
  facts”	
  

is	
  yet	
  another	
  form	
  of	
  cognitive	
  bias	
  –	
  instilled	
  through	
  

30+	
  years	
  of	
  data	
  warehouse	
  practices,	
  where	
  data	
  must	
  

fit	
  into	
  dimensions,	
  facts,	
  schema
40
Active	
  Learning:	
  
theory,	
  practices,	
  community
HITL	
  theory:	
  choosing	
  what	
  to	
  learn
Active	
  Learning	
  Literature	
  Survey

Burr	
  Settles	
  	
  UW	
  Madison

(2010-­‐01-­‐26)	
  
Can	
  machines	
  learn	
  more	
  economically	
  if	
  they	
  ask	
  human	
  
“oracles”	
  questions?	
  	
  e.g.,	
  task	
  in-­‐house	
  experts	
  with	
  the	
  edge	
  
cases?	
  
▪ uncertainty	
  sampling:	
  query	
  about	
  instances	
  which	
  ML	
  is	
  
least	
  certain	
  how	
  to	
  label	
  -­‐	
  least	
  confidence	
  /	
  margin	
  /	
  entropy	
  
▪ query-­‐by-­‐committee:	
  ensemble	
  of	
  ML	
  models	
  votes;	
  query	
  
the	
  instance	
  about	
  which	
  they	
  disagree	
  most	
  
▪ expected	
  error	
  reduction:	
  	
  maximize	
  the	
  expected	
  
information	
  gain	
  of	
  the	
  query	
  
▪ variance	
  reduction:	
  minimize	
  future	
  generalization	
  error	
  of	
  
the	
  model	
  (e.g.,	
  loss	
  function)	
  
▪ density-­‐weighted	
  methods:	
  instances	
  which	
  are	
  both	
  
uncertain	
  and	
  “representative”	
  of	
  the	
  underlying	
  distribution
42
HITL	
  practices:	
  emerging	
  themes
while	
  ML	
  was	
  mostly	
  about	
  generalization,	
  

now	
  we	
  can	
  borrow	
  from	
  Frank	
  Knight	
  (1921):	
  

using	
  ML	
  models	
  to	
  explore	
  uncertainty	
  in	
  
relationship	
  to	
  profit	
  vs.	
  risk	
  
▪ distinguish	
  forms	
  of	
  uncertainty:	
  aleatoric	
  
(noise)	
  vs.	
  epistemic	
  (incomplete	
  model)	
  
▪ see	
  also:	
  meta-­‐learning	
  [1]	
  and	
  [2]	
  
▪ people	
  who	
  aren’t	
  ML	
  experts	
  should	
  be	
  able	
  to	
  
train	
  and	
  iterate	
  robust	
  models	
  using	
  examples	
  
▪ emphasize	
  use	
  of	
  fitness	
  functions	
  to	
  make	
  
decisions,	
  in	
  lieu	
  of	
  objective	
  functions	
  which

tend	
  to	
  rely	
  on	
  overly	
  simplified	
  KPIs 43
HITL	
  practices:	
  model	
  interpretation
explicability	
  of	
  ML	
  models	
  becomes	
  essential,	
  

must	
  be	
  intuitive	
  for	
  the	
  human	
  experts	
  involved:	
  

Skater,	
  and	
  also	
  Anchors,	
  SHAP,	
  STREAK,	
  LIME,	
  etc.

The	
  Building	
  Blocks	
  of	
  Interpretability

Chris	
  Olah,	
  et	
  al.	
  	
  Google	
  Brain

Distill	
  (2018-­‐03-­‐06)	
  
Challenges	
  for	
  Transparency

Adrian	
  Weller

WHI	
  (2017-­‐07-­‐29)	
  
The	
  Mythos	
  of	
  Model	
  Interpretability

Zachary	
  Lipton

WHI	
  (2016-­‐03-­‐06)
44
Interpreting	
  Machine	
  Learning	
  Models

Wed	
  Mar	
  28	
  |	
  10-­‐11	
  am	
  Pacific	
  
datascience.com/resources/webinars/interpreting-­‐machine-­‐learning-­‐models	
  
live	
  webinar:	
  we’ll	
  discuss	
  the	
  need	
  for	
  methods	
  which	
  make	
  the	
  process	
  of	
  
explaining	
  machine	
  learning	
  models	
  more	
  intuitive,	
  and	
  also	
  evaluate	
  myths	
  
about	
  model	
  interpretability,	
  from	
  both	
  research	
  and	
  business	
  perspectives.
45
Pramit	
  Choudhary	
  
Lead	
  Data	
  Scientist	
  
DataScience.com	
  
Sameer	
  Singh	
  
CS	
  	
  Assistant	
  Professor	
  
UC	
  Irvine
Paco	
  Nathan	
  
Dir,	
  Learning	
  Group	
  
O'Reilly	
  Media
HITL	
  resources:	
  conferences,	
  journals,	
  etc.
HILDA	
  2018

Workshop	
  on	
  Human-­‐In-­‐the-­‐Loop	
  Data	
  Analytics

Co-­‐located	
  with	
  SIGMOD	
  2018

June	
  in	
  Houston	
  
Collective	
  Intelligence	
  2018

University	
  of	
  Zurich,	
  Switzerland

collocated	
  with	
  AAAI	
  HCOMP	
  2018

July	
  in	
  Zurich	
  
HCOMP	
  in	
  Slack

https://hcomp.slack.com/	
  
Human	
  Computation	
  journal

http://hcjournal.org/ojs/index.php?journal=jhc
46
HITL	
  tooling:	
  active	
  learning
Agnostic	
  Active	
  Learning	
  Without	
  Constraints

Alina	
  Beygelzimer,	
  Daniel	
  Hsu,	
  John	
  Langford,	
  

Tong	
  Zhang

NIPS	
  (2010-­‐06-­‐14)	
  
The	
  End	
  of	
  the	
  Beginning	
  of	
  Active	
  Learning

Daniel	
  Hsu,	
  John	
  Langford

Hunch.net	
  (2011-­‐04-­‐20)	
  
https://github.com/JohnLangford/vowpal_wabbit/wiki	
  
focused	
  on	
  cases	
  where	
  labeling	
  is	
  expensive;	
  uses	
  importance	
  
weighted	
  active	
  learning;	
  handles	
  “adversarial	
  label	
  noise”	
  
as	
  good	
  or	
  better	
  than	
  supervised	
  ML,	
  wherever	
  supervised	
  
ML	
  works
47
HITL	
  tooling:	
  machine	
  teaching
Prodigy:	
  a	
  new	
  tool	
  for	
  radically	
  
efficient	
  machine	
  teaching

Matthew	
  Honnibal,	
  Ines	
  Montani	
  	
  
Explosion.ai	
  (2017)
48
Management	
  strategy:	
  before
In	
  general	
  with	
  Big	
  Data,	
  we	
  were	
  considering:	
  
▪ DAG	
  workflow	
  execution	
  –	
  

those	
  are	
  typically	
  linear	
  
▪ data-­‐driven	
  organizations	
  
▪ ML	
  based	
  on	
  optimizing	
  for	
  

objective	
  functions	
  
▪ general	
  considerations	
  about	
  

correlation	
  vs.	
  causation	
  
▪ avoid	
  “garbage	
  in,	
  garbage	
  out”
49
Jarvis	
  workflow
Management	
  strategy:	
  after
HITL	
  introduces	
  circularities:	
  
▪ deprecate	
  linear	
  input/output	
  systems	
  

as	
  the	
  “conventional	
  wisdom”	
  
▪ analogous	
  to	
  an	
  OODA	
  loop	
  which	
  
incorporates	
  automation/augmentation	
  
▪ recognize	
  multiple	
  feedback	
  loops	
  

as	
  conversations	
  for	
  action	
  
▪ recognize	
  opportunity:	
  loops	
  from	
  
perception	
  (e.g.,	
  DL)	
  to	
  action	
  (e.g.,	
  HITL)	
  
and	
  back	
  again	
  to	
  perception	
  
▪ design	
  systems	
  to	
  learn	
  implicitly	
  

from	
  the	
  intelligence	
  already	
  there	
  
▪ hint:	
  recognize	
  the	
  “verbs”	
  being	
  used,	
  
rather	
  than	
  over-­‐emphasizing	
  “nouns”
50
Experts decide
about edge cases,
providing examples
Experts learn through
Customer interactions
Customers request
Sales, Marketing,
Service, Training
Experts gain insights
via Model explanations
ML
Models
Models focus Experts
(e.g., weak supervision)
Organizational
Learning
Human
Experts
Examples,
Actions
Customers
Models act on decisions
when possible
Customer
Use Cases
Models explore
uncertainty when needed
Management	
  strategy:	
  no-­‐collar	
  workforce
No-­‐collar	
  workforce:	
  Humans	
  and	
  machines	
  in	
  one	
  loop

Anthony	
  Abbatiello,	
  Tim	
  Boehm,	
  Jeff	
  Schwartz,	
  Sharon	
  Chand

Deloitte	
  Insights	
  (2017-­‐12-­‐05)	
  
▪ near-­‐future:	
  human	
  workers	
  and	
  machines	
  complement	
  
the	
  other’s	
  efforts	
  in	
  a	
  single	
  loop	
  of	
  productivity	
  
▪ 2018-­‐20:	
  expect	
  firms	
  to	
  embrace	
  a	
  “no-­‐collar	
  workforce”	
  
trend	
  by	
  redesigning	
  jobs	
  
▪ yet	
  only	
  ~17%	
  ready	
  to	
  manage	
  a	
  workforce	
  in	
  which	
  
people,	
  robots,	
  and	
  AI	
  work	
  side	
  by	
  side	
  –	
  largely	
  due	
  to	
  
cultural,	
  tech	
  fluency,	
  regulatory	
  issues	
  
▪ e.g.,	
  what	
  about	
  onboarding	
  or	
  retiring	
  non-­‐human	
  
workers?	
  these	
  are	
  no	
  longer	
  theoretical	
  questions	
  
▪ HR	
  orgs	
  must	
  develop	
  strategies	
  and	
  tools	
  for	
  recruiting,	
  
managing,	
  and	
  training	
  a	
  hybrid	
  workforce
51
Summary:	
  
how	
  this	
  matters
Conference	
  summaries,	
  Oct	
  2017	
  part	
  1

PN	
  	
  (2017-­‐10-­‐10)	
  
Themes	
  emerging	
  in	
  AI	
  conferences	
  about	
  the	
  impact	
  
of	
  ML	
  on	
  software	
  process,	
  i.e.,	
  something’s	
  afoot:	
  
2009–ish,	
  data	
  science	
  ran	
  headlong	
  into	
  prod	
  mgmt

2012-­‐ish,	
  data	
  sci	
  leaders	
  moved	
  into	
  prod	
  exec	
  roles	
  
2018-­‐ish,	
  AI	
  apps	
  disrupting	
  prod	
  mgmt

…
53
Extrapolating	
  trends
Flywheel	
  Effect,	
  circa	
  2018	
  
AI	
  drives	
  features	
  in	
  products	
  and	
  services	
  …	
  

which	
  in	
  turn	
  drives	
  cloud	
  consumption	
  …	
  

which	
  in	
  turn	
  acquires	
  even	
  more	
  data	
  …	
  

particularly	
  for	
  mobile	
  or	
  embedded	
  products	
  
Incumbents	
  now	
  lead	
  in	
  AI	
  +	
  cloud	
  +	
  mobile/embed:	
  

Google,	
  Amazon,	
  Microsoft,	
  IBM,	
  Apple,	
  Baidu,	
  etc.
segment assets liabilities
Google,	
  
Amazon,	
  
Microsoft,

IBM,

	
  Apple,	
  

Baidu,	
  

etc.
▪ AI	
  +	
  cloud	
  +	
  mobile/embed,	
  

leveraging	
  a	
  flywheel	
  effect	
  
▪ had	
  focused	
  business	
  lines	
  well	
  

in	
  advance	
  to	
  prepare	
  large-­‐scale	
  

labeled	
  data	
  sets	
  
▪ uses	
  AI	
  to	
  explore	
  uncertainty,	
  

focusing	
  their	
  core	
  expertise
▪ high	
  capital	
  expenses,	
  long-­‐term	
  R&D	
  

as	
  hardware	
  evolves	
  rapidly	
  
▪ potential	
  vulnerabilities	
  by	
  automating	
  

too	
  much	
  
▪ potential	
  vulnerabilities	
  by	
  mistaking	
  

first-­‐order	
  cybernetics	
  for	
  second-­‐order
<	
  50%
▪ HITL	
  provides	
  a	
  vector	
  to	
  compete	
  

against	
  top	
  incumbents,	
  with	
  many	
  
unexplored	
  areas	
  of	
  opportunity
▪ facing	
  barriers:	
  talent	
  gap,	
  competing	
  

investment	
  priorities,	
  security	
  concerns	
  
▪ verticals	
  eroded	
  by	
  horizontal	
  business	
  

lines	
  from	
  top	
  incumbents
>	
  50% ??
▪ struggling	
  to	
  recognize	
  business	
  use	
  cases	
  
▪ buried	
  in	
  tech	
  debt	
  from	
  digital	
  infrastructure	
  
▪ lacks	
  management	
  support
Challenge:	
  adoption	
  by	
  industry	
  segment
55
What	
  is	
  changing	
  and	
  why?
Second-­‐order	
  cybernetics	
  began	
  partly	
  as	
  a	
  study	
  of	
  how	
  
complex	
  systems	
  fail,	
  and	
  also	
  about	
  what	
  social	
  systems	
  

and	
  physical	
  systems	
  had	
  in	
  common	
  
It	
  provides	
  foundations	
  for	
  AI	
  systems	
  of	
  people	
  +	
  machines	
  
Feedback	
  loops	
  represent	
  structured	
  conversations	
  for	
  action,	
  
from	
  which	
  the	
  participants	
  cannot	
  be	
  detached	
  
The	
  organization	
  is	
  no	
  longer	
  viewed	
  as	
  an	
  input/output	
  
machine;	
  rather	
  it’s	
  a	
  pluralistic	
  network	
  of	
  loops	
  from	
  
perception	
  to	
  action	
  and	
  back	
  again	
  to	
  perception	
  –	
  

e.g.,	
  DL	
  augments	
  perception	
  and	
  RL	
  augments	
  actions 56
Second-­‐order	
  cybernetics	
  began	
  partly	
  as	
  a	
  study	
  of	
  how	
  
complex	
  systems	
  
and	
  physical	
  systems	
  had	
  in	
  common	
  
It	
  provides	
  foundations	
  for	
  
Feedback	
  loops	
  represent	
  structured	
  
action
The	
  organization	
  is	
  no	
  longer	
  viewed	
  as	
  an	
  input/output	
  
machine;	
  rather	
  it’s	
  a	
  pluralistic	
  network	
  of	
  loops	
  from	
  
perception	
  to	
  action	
  and	
  back	
  again	
  to	
  perception	
  
e.g.,	
  DL	
  augments	
  
What	
  is	
  changing	
  and	
  why?
57
In	
  other	
  words,	
  as	
  the	
  flywheel	
  effect	
  itself	
  

is	
  evolving,	
  to	
  stay	
  ahead	
  we	
  must	
  recognize	
  
the	
  emerging	
  “verbs”,	
  which	
  are	
  entry	
  points	
  
into	
  the	
  business	
  use	
  cases
What	
  do	
  organizations	
  carry	
  into	
  AI?
Assess	
  the	
  cognitive	
  biases	
  we	
  bring	
  into	
  AI	
  systems	
  of	
  people	
  +	
  machines:	
  
▪ anthropocentrism	
  and	
  system	
  justification,	
  as	
  shown	
  by	
  Conway’s	
  Law	
  
▪ DW	
  +	
  BI	
  cultural	
  lens	
  overemphasizes	
  “information	
  as	
  a	
  collection	
  of	
  facts”,	
  

missing	
  the	
  conversations	
  for	
  action	
  
▪ digitalization	
  sequence	
  “Product”,	
  “Service”,	
  “Data”:	
  overreacting	
  to	
  the	
  nouns	
  
(facts),	
  while	
  ignoring	
  the	
  verbs	
  (relations)	
  
▪ delegation	
  +	
  committee:	
  narrowing	
  the	
  scope	
  of	
  inquiry	
  and	
  design	
  alternatives	
  

until	
  a	
  system	
  is	
  simple	
  enough	
  to	
  understand	
  in	
  human	
  terms	
  
▪ some	
  incumbents	
  hold	
  tenaciously	
  to	
  ML	
  apps	
  within	
  first-­‐order	
  cybernetics,	
  

i.e.,	
  bias	
  toward	
  mostly	
  top-­‐down	
  command	
  and	
  control	
  
Instead,	
  we	
  must	
  design	
  systems	
  that	
  learn	
  implicitly	
  from	
  the	
  intelligence	
  
already	
  within	
  an	
  organization	
  and	
  its	
  relationships	
  with	
  the	
  customers,	
  
channels,	
  etc.	
  Sales,	
  Customer	
  Support,	
  Professional	
  Services,	
  Marketing
58
What	
  do	
  organizations	
  carry	
  into	
  AI?
Assess	
  the	
  
▪ anthropocentrism
▪ DW	
  +	
  BI	
  cultural	
  lens	
  overemphasizes	
  “
missing	
  the	
  conversations	
  for	
  action
▪ digitalization	
  sequence	
  “Product”,	
  “Service”,	
  “Data”:	
  
(facts),	
  while	
  
▪ delegation
until	
  a	
  system	
  is	
  simple	
  enough	
  to	
  understand	
  in	
  human	
  terms
▪ some	
  incumbents	
  hold	
  tenaciously	
  to	
  ML	
  apps	
  within	
  first-­‐order	
  cybernetics,	
  
i.e.,	
  bias	
  toward	
  
Instead,	
  we	
  must	
  design	
  systems	
  that	
  learn	
  implicitly	
  from	
  the	
  intelligence	
  
already	
  within	
  an	
  organization	
  and	
  its	
  relationships	
  with	
  the	
  customers,	
  
channels,	
  etc.	
  
59
Could	
  we	
  be	
  encountering	
  early	
  stages	
  of	
  
not-­‐only-­‐human	
  cognition	
  attempting	
  to	
  
optimize	
  beyond	
  human	
  predispositions	
  
and	
  cognitive	
  biases?
[ed	
  note:	
  say	
  at	
  least	
  one	
  strange	
  thing]
“The	
  future	
  belongs	
  to	
  those	
  who

	
  	
  understand	
  at	
  a	
  very	
  deep	
  level	
  how

	
  	
  to	
  combine	
  their	
  unique	
  expertise

	
  	
  with	
  what	
  algorithms	
  do	
  best.”	
  
	
  	
  	
  	
  	
  	
  –	
  Pedro	
  Domingos,	
  The	
  Master	
  Algorithm
The	
  AI	
  Conf	
  
CN	
  Apr	
  10-­‐13

NY,	
  Apr	
  29-­‐May	
  2

SF,	
  Sep	
  4-­‐7

UK,	
  Oct	
  8-­‐11	
  
Strata	
  Data	
  
UK,	
  May	
  21-­‐24

NY,	
  Sep	
  11-­‐14

SF,	
  Mar	
  26-­‐28	
  
JupyterCon	
  +	
  events	
  
BOS,	
  Mar	
  21

ATL,	
  Mar	
  31

DC,	
  May	
  15

NY,	
  Aug	
  21-­‐25	
  
OSCON	
  
PDX,	
  Jul	
  16-­‐19
Get	
  Started	
  with	
  
NLP	
  in	
  Python
Just	
  Enough	
  Math Building	
  Data	
  
Science	
  Teams
Hylbert-­‐Speys How	
  Do	
  You	
  Learn?
arycles,	
  online	
  courses,	
  conference	
  summaries…	
  
liber118.com/pxn/

@pacoid

Contenu connexe

Tendances

Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
 
Big Data and the Art of Data Science
Big Data and the Art of Data ScienceBig Data and the Art of Data Science
Big Data and the Art of Data ScienceAndrew Gardner
 
Intro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big DataIntro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big DataPaco Nathan
 
Data science e machine learning
Data science e machine learningData science e machine learning
Data science e machine learningGiuseppe Manco
 
Dealing with uncertainty in fintech using AI
Dealing with uncertainty in fintech using AIDealing with uncertainty in fintech using AI
Dealing with uncertainty in fintech using AIData Products Meetup
 
Big Data and Predictive Analysis
Big Data and Predictive AnalysisBig Data and Predictive Analysis
Big Data and Predictive AnalysisJongwook Woo
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The PeopleDaniel Tunkelang
 
Big Data and Data Intensive Computing: Use Cases
Big Data and Data Intensive Computing: Use CasesBig Data and Data Intensive Computing: Use Cases
Big Data and Data Intensive Computing: Use CasesJongwook Woo
 
The Importance of Open Innovation in AI era
The Importance of Open Innovation in AI eraThe Importance of Open Innovation in AI era
The Importance of Open Innovation in AI eraJongwook Woo
 
Big Data and Data Intensive Computing on Networks
Big Data and Data Intensive Computing on NetworksBig Data and Data Intensive Computing on Networks
Big Data and Data Intensive Computing on NetworksJongwook Woo
 
Mapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudMapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudChris Dagdigian
 
Rating Prediction using Deep Learning and Spark
Rating Prediction using Deep Learning and SparkRating Prediction using Deep Learning and Spark
Rating Prediction using Deep Learning and SparkJongwook Woo
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSSri Ambati
 
Introduction to Big Data and AI for Business Analytics and Prediction
Introduction to Big Data and AI for Business Analytics and PredictionIntroduction to Big Data and AI for Business Analytics and Prediction
Introduction to Big Data and AI for Business Analytics and PredictionJongwook Woo
 
Full-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamFull-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamGreg Goltsov
 
Three Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data ScienceThree Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data ScienceAditya Parameswaran
 
Traffic Data Analysis and Prediction using Big Data
Traffic Data Analysis and Prediction using Big DataTraffic Data Analysis and Prediction using Big Data
Traffic Data Analysis and Prediction using Big DataJongwook Woo
 
Crowdsourced Data Processing: Industry and Academic Perspectives
Crowdsourced Data Processing: Industry and Academic PerspectivesCrowdsourced Data Processing: Industry and Academic Perspectives
Crowdsourced Data Processing: Industry and Academic PerspectivesAditya Parameswaran
 

Tendances (20)

Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
Big Data and the Art of Data Science
Big Data and the Art of Data ScienceBig Data and the Art of Data Science
Big Data and the Art of Data Science
 
Intro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big DataIntro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big Data
 
Data Scientist Enablement roadmap 1.0
Data Scientist Enablement roadmap 1.0Data Scientist Enablement roadmap 1.0
Data Scientist Enablement roadmap 1.0
 
Data science e machine learning
Data science e machine learningData science e machine learning
Data science e machine learning
 
Dealing with uncertainty in fintech using AI
Dealing with uncertainty in fintech using AIDealing with uncertainty in fintech using AI
Dealing with uncertainty in fintech using AI
 
Big Data and Predictive Analysis
Big Data and Predictive AnalysisBig Data and Predictive Analysis
Big Data and Predictive Analysis
 
AI on Big Data
AI on Big DataAI on Big Data
AI on Big Data
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
 
Big Data and Data Intensive Computing: Use Cases
Big Data and Data Intensive Computing: Use CasesBig Data and Data Intensive Computing: Use Cases
Big Data and Data Intensive Computing: Use Cases
 
The Importance of Open Innovation in AI era
The Importance of Open Innovation in AI eraThe Importance of Open Innovation in AI era
The Importance of Open Innovation in AI era
 
Big Data and Data Intensive Computing on Networks
Big Data and Data Intensive Computing on NetworksBig Data and Data Intensive Computing on Networks
Big Data and Data Intensive Computing on Networks
 
Mapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudMapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the Cloud
 
Rating Prediction using Deep Learning and Spark
Rating Prediction using Deep Learning and SparkRating Prediction using Deep Learning and Spark
Rating Prediction using Deep Learning and Spark
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
 
Introduction to Big Data and AI for Business Analytics and Prediction
Introduction to Big Data and AI for Business Analytics and PredictionIntroduction to Big Data and AI for Business Analytics and Prediction
Introduction to Big Data and AI for Business Analytics and Prediction
 
Full-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamFull-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data Team
 
Three Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data ScienceThree Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data Science
 
Traffic Data Analysis and Prediction using Big Data
Traffic Data Analysis and Prediction using Big DataTraffic Data Analysis and Prediction using Big Data
Traffic Data Analysis and Prediction using Big Data
 
Crowdsourced Data Processing: Industry and Academic Perspectives
Crowdsourced Data Processing: Industry and Academic PerspectivesCrowdsourced Data Processing: Industry and Academic Perspectives
Crowdsourced Data Processing: Industry and Academic Perspectives
 

Similaire à Human in the loop: a design pattern for managing teams working with ML

Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...Sri Ambati
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Application and Methods of Deep Learning in IoT
Application and Methods of Deep Learning in IoTApplication and Methods of Deep Learning in IoT
Application and Methods of Deep Learning in IoTIJAEMSJORNAL
 
IIPGH Webinar 1: Getting Started With Data Science
IIPGH Webinar 1: Getting Started With Data ScienceIIPGH Webinar 1: Getting Started With Data Science
IIPGH Webinar 1: Getting Started With Data Scienceds4good
 
Machine Learning for Marketers - CTAConf 2019
Machine Learning for Marketers - CTAConf 2019Machine Learning for Marketers - CTAConf 2019
Machine Learning for Marketers - CTAConf 2019Britney Muller
 
Machine Learning Project Lifecycle
Machine Learning Project LifecycleMachine Learning Project Lifecycle
Machine Learning Project LifecycleAbdelhak MAHMOUDI
 
Minne analytics presentation 2018 12 03 final compressed
Minne analytics presentation 2018 12 03 final   compressedMinne analytics presentation 2018 12 03 final   compressed
Minne analytics presentation 2018 12 03 final compressedBonnie Holub
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...Marco Brambilla
 
Minne analytics presentation 2018 12 03 final compressed
Minne analytics presentation 2018 12 03 final   compressedMinne analytics presentation 2018 12 03 final   compressed
Minne analytics presentation 2018 12 03 final compressedBonnie Holub
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTrivadis
 
Data Science Demystified
Data Science DemystifiedData Science Demystified
Data Science DemystifiedEmily Robinson
 
How to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryHow to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryAli Dasdan
 
Azure Machine Learning 101
Azure Machine Learning 101Azure Machine Learning 101
Azure Machine Learning 101Renato Jovic
 
Introduction to DS, ML and IBM Tools
Introduction to DS, ML and IBM ToolsIntroduction to DS, ML and IBM Tools
Introduction to DS, ML and IBM ToolsQamar un Nisa
 
Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Enterprise Knowledge
 
Driving Customer Loyalty with Azure Machine Learning
Driving Customer Loyalty with Azure Machine LearningDriving Customer Loyalty with Azure Machine Learning
Driving Customer Loyalty with Azure Machine LearningCCG
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxShanmugasundaram M
 

Similaire à Human in the loop: a design pattern for managing teams working with ML (20)

Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Application and Methods of Deep Learning in IoT
Application and Methods of Deep Learning in IoTApplication and Methods of Deep Learning in IoT
Application and Methods of Deep Learning in IoT
 
What is data science ?
What is data science ?What is data science ?
What is data science ?
 
IIPGH Webinar 1: Getting Started With Data Science
IIPGH Webinar 1: Getting Started With Data ScienceIIPGH Webinar 1: Getting Started With Data Science
IIPGH Webinar 1: Getting Started With Data Science
 
Machine Learning for Marketers - CTAConf 2019
Machine Learning for Marketers - CTAConf 2019Machine Learning for Marketers - CTAConf 2019
Machine Learning for Marketers - CTAConf 2019
 
AI meets Big Data
AI meets Big DataAI meets Big Data
AI meets Big Data
 
Machine Learning Project Lifecycle
Machine Learning Project LifecycleMachine Learning Project Lifecycle
Machine Learning Project Lifecycle
 
Minne analytics presentation 2018 12 03 final compressed
Minne analytics presentation 2018 12 03 final   compressedMinne analytics presentation 2018 12 03 final   compressed
Minne analytics presentation 2018 12 03 final compressed
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
 
Minne analytics presentation 2018 12 03 final compressed
Minne analytics presentation 2018 12 03 final   compressedMinne analytics presentation 2018 12 03 final   compressed
Minne analytics presentation 2018 12 03 final compressed
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
 
Data Science Demystified
Data Science DemystifiedData Science Demystified
Data Science Demystified
 
How to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryHow to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st century
 
Azure Machine Learning 101
Azure Machine Learning 101Azure Machine Learning 101
Azure Machine Learning 101
 
Introduction to DS, ML and IBM Tools
Introduction to DS, ML and IBM ToolsIntroduction to DS, ML and IBM Tools
Introduction to DS, ML and IBM Tools
 
Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020
 
Driving Customer Loyalty with Azure Machine Learning
Driving Customer Loyalty with Azure Machine LearningDriving Customer Loyalty with Azure Machine Learning
Driving Customer Loyalty with Azure Machine Learning
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 

Plus de Paco Nathan

Computable Content
Computable ContentComputable Content
Computable ContentPaco Nathan
 
Computable Content: Lessons Learned
Computable Content: Lessons LearnedComputable Content: Lessons Learned
Computable Content: Lessons LearnedPaco Nathan
 
SF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonSF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonPaco Nathan
 
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsPaco Nathan
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving UpPaco Nathan
 
Data Science Reinvents Learning?
Data Science Reinvents Learning?Data Science Reinvents Learning?
Data Science Reinvents Learning?Paco Nathan
 
Jupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and ErasmusJupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and ErasmusPaco Nathan
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataPaco Nathan
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learningPaco Nathan
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesPaco Nathan
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in SparkPaco Nathan
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataPaco Nathan
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingPaco Nathan
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MorePaco Nathan
 
A New Year in Data Science: ML Unpaused
A New Year in Data Science: ML UnpausedA New Year in Data Science: ML Unpaused
A New Year in Data Science: ML UnpausedPaco Nathan
 
Microservices, Containers, and Machine Learning
Microservices, Containers, and Machine LearningMicroservices, Containers, and Machine Learning
Microservices, Containers, and Machine LearningPaco Nathan
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupPaco Nathan
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapePaco Nathan
 
What's new with Apache Spark?
What's new with Apache Spark?What's new with Apache Spark?
What's new with Apache Spark?Paco Nathan
 
How Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscapeHow Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscapePaco Nathan
 

Plus de Paco Nathan (20)

Computable Content
Computable ContentComputable Content
Computable Content
 
Computable Content: Lessons Learned
Computable Content: Lessons LearnedComputable Content: Lessons Learned
Computable Content: Lessons Learned
 
SF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonSF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in Python
 
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analytics
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
 
Data Science Reinvents Learning?
Data Science Reinvents Learning?Data Science Reinvents Learning?
Data Science Reinvents Learning?
 
Jupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and ErasmusJupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and Erasmus
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About Data
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learning
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in Spark
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big Data
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark Streaming
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
 
A New Year in Data Science: ML Unpaused
A New Year in Data Science: ML UnpausedA New Year in Data Science: ML Unpaused
A New Year in Data Science: ML Unpaused
 
Microservices, Containers, and Machine Learning
Microservices, Containers, and Machine LearningMicroservices, Containers, and Machine Learning
Microservices, Containers, and Machine Learning
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
 
What's new with Apache Spark?
What's new with Apache Spark?What's new with Apache Spark?
What's new with Apache Spark?
 
How Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscapeHow Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscape
 

Dernier

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Dernier (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Human in the loop: a design pattern for managing teams working with ML

  • 1. Human  in  the  loop:   a  design  pattern  for  managing     teams  working  with  ML Paco  Nathan    @pacoid   R&D  Group  @  O’Reilly  Media   Strata  CA    San  Jose,  2018-­‐03-­‐08
  • 2. The  reality  of  data  rates “If  you  only  have  10  examples  of  something,  it’s  going
    to  be  hard  to  make  deep  learning  work.  If  you  have
    100,000  things  you  care  about,  records  or  whatever,
    that’s  the  kind  of  scale  where  you  should  really  start
    thinking  about  these  kinds  of  techniques.”   Jeff  Dean    Google
 VB  Summit  (2017-­‐10-­‐23)   venturebeat.com/2017/10/23/google-­‐brain-­‐chief-­‐says-­‐100000-­‐ examples-­‐is-­‐enough-­‐data-­‐for-­‐deep-­‐learning/ 2
  • 3. The  reality  of  data  rates Transfer  learning  aside,  most  DL  use  cases  require  
 large,  carefully  labeled  data  sets,  while  RL  requires  
 much  more  data  than  that.   Active  learning  can  yield  good  results  with  substantially   smaller  data  rates,  while  leveraging  an  organization’s   expertise  to  bootstrap  toward  larger  labeled  data  sets,   e.g.,  as  preparation  for  deep  learning,  etc. reinforcement learning supervised learning active learning deep learning data rates (log scale) 3
  • 4. The  reality  of  data  rates Transfer  learning  aside,  most  DL  use  cases  require   large much  more Active  learning smaller  data  rates,  while  leveraging  an  organization expertise  to  bootstrap  toward  larger  labeled  data  sets,   e.g.,  as  preparation  for  deep  learning,  etc. reinforcement learning supervised learning active learning deep learning data rates (log scale) reinforcement learning supervised learning active learning deep learning data rates (log scale) active  learning:   indicated  for  many   enterprise  use  cases 4
  • 5. Why  are  AI  programs  different? 5 AI  in  the  software  engineering  workflow
 Peter  Norvig    Google
 TheAIConf  (2017-­‐06-­‐28)   ▪ Content:  models  not  programs   ▪ Process:  training  not  debugging   ▪ Release:  retraining  not  patching   ▪ Uncertainty:  of  objective   ▪ Uncertainty:  of  action/recommendation   ▪ Uncertainty:  propagates  through  model
  • 6. Active  Learning:   case  studies  and  patterns
  • 7. Machine  learning supervised  ML:   ▪ take  a  dataset  where  each  element   has  a  label   ▪ train  models  on  a  portion  of  the   data  to  predict  the  labels,  then  
 evaluate  on  the  holdout   ▪ deep  learning  is  a  popular  example,  
 but  only  if  you  have  lots  of  labeled   training  data  available 7
  • 8. Machine  learning unsupervised  ML:   ▪ run  lots  of  unlabeled  data  through   an  algorithm  to  detect  “structure”   or  embedding   ▪ for  example,  clustering  algorithms   such  as  K-­‐means   ▪ unsupervised  approaches  for  AI  
 are  an  open  research  question 8
  • 9. Active  learning special  case  of  semi-­‐supervised  ML:   ▪ send  difficult  decisions/edge  cases  
 to  experts;  let  algorithms  handle   routine  decisions  (automation)   ▪ works  well  in  use  cases  which  have   lots  of  inexpensive,  unlabeled  data   ▪ e.g.,  abundance  of  content  to  be   classified,  where  cost  of  labeling  
 is  a  major  expense 9
  • 11. Design  pattern:  Active  learning Real-­‐World  Active  Learning:  Applications  and   Strategies  for  Human-­‐in-­‐the-­‐Loop  ML
 Ted  Cuzzillo
 O’Reilly  Media  (2015-­‐02-­‐05)   Active  learning  and  transfer  learning
 Luke  Biewald    CrowdFlower
 The  AI  Conf,  SF  (2017-­‐09-­‐17)   breakthroughs  lag  invention  of  methods;
 must  wait  for  “killer  data  set”  to  emerge,  
 often  a  decade  or  more 11
  • 12. Design  pattern:  Weak  supervision Creating  large  training  data  sets  quickly
 Alex  Ratner    Stanford
 O’Reilly  Data  Show  (2017-­‐06-­‐08)   Snorkel:  using  weak  supervision  and  
 data  programming  as  another  instance  
 of  human-­‐in-­‐the-­‐loop
 github.com/HazyResearch/snorkel   conferences.oreilly.com/strata/strata-­‐ny/public/ schedule/detail/61849 12
  • 13. Design  pattern:  Human-­‐in-­‐the-­‐loop Paul  English  on  Lola's  Debut  for  Business  Travelers
 Elizabeth  West
 Business  Travel  News  (2017-­‐10-­‐04)   founded  2015  by  Paul  English  and  other  Kayak  execs:  
 on-­‐demand,  personal  travel  service;  uses  expert  travel  agents  for  HITL   initially  criticized  by  travel  industry  as  “competing  against  Siri”;  
 currently  displacing  OTAs  in  a  reversal  of  “AI  vs.  jobs”   can  book  on  Airbnb,  Southwest,  etc.,  which  aren’t  available  via  OTA,  
 because  of  the  human  delegation   “The  first  time  you  use  Lola  it’s  going  to  be  great  because  it’s  a  conversation.  
  We’re  not  making  you  think  like  a  computer”   “Instead  of  showing  you  300  choices  or  1,000  choices,  we  think  we  can  
    show  you  three  choices,  kind  of  good,  better,  best” 13
  • 14. Design  pattern:  Human-­‐in-­‐the-­‐loop Anand  Kulkarni    Crowdbotics   HITL  for  code+test  gen,  trained  from  GitHub,  StackOverflow,   etc.,  with  JIRA  tickets  as  the  granular  object  in  the  system   parse  specs  from  JIRA  history,  reuse  what’s  been  done  before;   generate  PRs  for  popular  web  stacks:  React,  Flask,  Ruby,  etc.   resolve  specs  into  the  approach  needed  and  time  required,  
 where  product  managers  get  cost  estimates,  then  on-­‐demand   expert  programmers  implement  for  you   have  the  in-­‐house  engineers  handle  “radically  novel”  projects   results:  1.5x  software  dev  throughput 14
  • 15. Design  pattern:  Human-­‐in-­‐the-­‐loop Building  a  business  that  combines  human   experts  and  data  science
 Eric  Colson    StitchFix
 O’Reilly  Data  Show  (2016-­‐01-­‐28)   “what  machines  can’t  do  are  things  around  cognition,
    things  that  have  to  do  with  ambient  information,  or
    appreciation  of  aesthetics,  or  even  the  ability  to
    relate  to  another  human”
 15
  • 16. Design  pattern:  Human-­‐in-­‐the-­‐loop EY,  Deloitte  And  PwC  Embrace  Artificial   Intelligence  For  Tax  And  Accounting
 Adelyn  Zhou
 Forbes  (2017-­‐11-­‐14)   compliance  use  cases  in  reviewing  lease  
 accounting  standards   3x  more  consistent  and  2x  efficient  than  
 the  previous  humans-­‐only  teams   break-­‐even  ROI  within  less  than  a  year 16
  • 17. Design  pattern:  Human-­‐in-­‐the-­‐loop Unsupervised  fuzzy  labeling  using  deep   learning  to  improve  anomaly  detection
 Adam  Gibson    Skymind
 Strata  Data  Conf,  Singapore  (2017-­‐12-­‐07)   large-­‐scale  use  case  for  telecom  in  Asia   method:  overfit  variational  autoencoders,  
 then  send  outliers  to  human  analysts 17
  • 18. Design  pattern:  Human-­‐in-­‐the-­‐loop Strategies  for  integrating  people  and  machine   learning  in  online  systems
 Jason  Laska    Clara  Labs
 The  AI  Conf,  NY  (2017-­‐06-­‐29)   establishing  a  two-­‐sided  marketplace  where  
 machines  and  people  compete  on  a  spectrum  
 of  relative  expertise  and  capabilities
 
 18
  • 19. Design  pattern:  Human-­‐in-­‐the-­‐loop Strategies  for  integrating  people  and  machine   learning  in  online  systems Jason  Laska The  AI  Conf establishing  a  two-­‐sided  marketplace  where   machines  and  people  compete  on  a  spectrum   of  relative   
 19 “the  trick  is  to  design  systems  from  Day  1
    which  learn  implicitly  from  the  intelligence
    which  is  already  there”    Michael  Akilian    Clara  Labs  
  • 20. Design  pattern:  Human-­‐in-­‐the-­‐loop Building  human-­‐assisted  AI  applications
 Adam  Marcus    B12
 O’Reilly  Data  Show  (2016-­‐08-­‐25)   “Humans  where  they’re  best,  machines  for  the  rest.”   Orchestra:  a  platform  for  building  human-­‐assisted  
 AI  applications,  e.g.,  create/update  business  websites
 https://github.com/b12io/orchestra   example:  http://www.coloradopicked.com/ 20
  • 21. Design  pattern:  Flash  teams Expert  Crowdsourcing  with  Flash  Teams
 Daniela  Retelny,  et  al.  
 Stanford  HCI
 UIST  (2014-­‐10-­‐05)   computationally-­‐guided  teams  of  crowd  experts   supported  by  lightweight,  reproducible,  scalable   team  structures   “elastic  recruiting”:  grow  and  shrink  teams  on   demand,  combine  teams  into  larger  organizations   http://stanfordhci.github.io/flash-­‐teams/ 21
  • 23. AI  in  Media ▪ content  which  can  represented  as  
 text  can  be  parsed  by  NLP,  then   manipulated  by  available  AI  tooling     ▪ labeled  images  get  really  interesting   ▪ text  or  images  within  a  context  have  
 inherent  structure   ▪ representation  of  that  kind  of  structure   is  rare  in  the  Media  vertical  –  so  far 23
  • 24. Disambiguating  contexts Overlapping  contexts  pose  hard  problems  in  natural  language  understanding.   That  runs  counter  to  the  correlation  emphasis  of  big  data.
 NLP  libraries  lack  features  for  disambiguation.
  • 25. Disambiguating  contexts 25 Suppose  someone  publishes  a  book  which  uses  the  term   `react`:  are  they  talking  about  a  JavaScript  library,  or  about   human  behavior  during  interviews?    Our  customers  ask  for   both.   We  handle  lots  of  content  about  both.  Disambiguating  those   contexts  is  important  for  good  UX  in  personalized  learning.   In  other  words,  how  do  machines  help  people  
 distinguish  that  content  within  search?   Potentially  a  good  case  for  deep  learning,  
 except  for  the  lack  of  labeled  data  at  scale.
  • 26. Active  learning  through  Jupyter 26 Jupyter  notebooks  are  used  to  manage  ML  
 pipelines  for  disambiguation,  where  machines  
 and  people  collaborate:   ▪ ML  based  on  examples  –  most  all  of  the  feature   engineering,  model  parameters,  etc.,  has  been   automated   ▪ https://github.com/ceteri/nbtransom   ▪ based  on  use  of  nbformat,  pandas,  scikit-­‐learn
  • 27. Active  learning  through  Jupyter 27 Jupyter  notebooks  are  used  to  manage  ML   pipelines and  people  collaborate:   ▪ ML  based  on  examples  –  most  all  of  the  feature   engineering,  model  parameters,  etc.,  has  been   automated   ▪ https://github.com/ceteri/nbtransom ▪ based  on  use  of   Jupyter  notebook  as…   ▪ one  part  configuration  file   ▪ one  part  data  sample   ▪ one  part  structured  log   ▪ one  part  data  visualization  tool   plus,  subsequent  data  mining  of  these  
 notebooks  helps  augment  our  ontology
  • 28. Active  learning  through  Jupyter 28 ML#Pipelines Jupyter#kernel Browser SSH#tunnel
  • 29. Active  learning  through  Jupyter ▪ Notebooks  allow  the  human  experts  to  access  the   internals  of  a  mostly  automated  ML  pipeline,  rapidly   ▪ Stated  another  way,  both  the  machines  and  the  people   become  collaborators  on  shared  documents   ▪ Anticipates  upcoming  collaborative  document  features   in  JupyterLab
  • 30. Active  learning  through  Jupyter 1. Experts  use  notebooks  to  provide  examples  of  book  chapters,  video   segments,  etc.,  for  each  key  phrase  that  has  overlapping  contexts   2. Machines  build  ensemble  ML  models  based  on  those  examples,   updating  notebooks  with  model  evaluation   3. Machines  attempt  to  annotate  labels  for  millions  of  pieces  of  content,  
 e.g.,  `AlphaGo`,  `Golang`,  versus  a  mundane  use  of  the  verb  `go`   4. Disambiguation  can  run  mostly  automated,  in  parallel  at  scale  –  
 through  integration  with  Apache  Spark   5. In  cases  where  ensembles  disagree,  ML  pipelines  defer  to  human   experts  who  make  judgement  calls,  providing  further  examples   6. New  examples  go  into  training  ML  pipelines  to  build  better  models   7. Rinse,  lather,  repeat
  • 32. Product  management The  History  and  Evolution  of  Product  Management
 Martin  Eriksson
 Mind  the  Product  (2015-­‐10-­‐28)   From  PM’s  origins  as  “Brand  Men”,
 on  through  the  success  arc  of  Hewlett-­‐Packard,
 on  to  Agile  Manifesto,  Lean  Enterprise,  etc.   Formerly  part  of  Engineering  or  Marketing,
 PM  now  “taking  a  seat  at  the  table”  under  CEOs 32
  • 33. Conway’s  Law How  Do  Committees  Invent?
 Melvin  Conway
 Datamation  (1968-­‐04)   Organizations  that  create  systems  produce  designs  
 which  copy  their  own  communication  structures.   For  each  level  of  delegation,  someone’s  scope  of  
 inquiry  narrows,  design  alternatives  also  narrow  –  
 until  a  system  is  simple  enough  to  be  understood  
 in  human  terms. 33
  • 34. Conway’s  Law  illustrated Organizational  Charts
 Manu  Cornet    Bonkers  World   Cognitive  biases:   ▪ anthropocentrism   ▪ system  justification   In  retrospect,  Agile  Manifesto  
 contains  examples   See  related  descriptions:
 Destruction  and  Creation
 John  R.  Boyd    USAF
 (1976-­‐09-­‐03) 34
  • 35. First-­‐order  cybernetics Cybernetics:  Or  Control  and  Communication  
 in  the  Animal  and  the  Machine
 Norbert  Wiener    MIT
 MIT  Press  (1948)   early  work  had  been  about  closed-­‐loop  control  systems:   homeostasis,  habituation,  adaptation,  and  other   regulatory  processes   given  a  system  which  has  input  and  output,  a  controller   leveraging  a  negative  feedback  loop,  and  one  or  more   observers  outside  of  the  system   related  to  the  early  Macy  Conferences 35
  • 36. “the  organism  was  no  longer  an  input/output  machine;
    rather  it  was  part  of  a  loop  from  perception  to  action
    and  back  again  to  perception”   Paul  Pangaro  describing  Jerry  Lettvin  @  MIT  cybernetics
  • 37. Second-­‐order  cybernetics 1. von  Foerster:  one  can  apply  the  understandings  developed  in   cybernetics  to  the  subject  matter  itself   2. presence  of  the  observer  is  inevitable  and  may  be  desirable:  
 “What  is  said  is  said  to  an  observer”   3. eigen  functions:  stable,  dynamically  self-­‐perpetuating  states  that   are  self-­‐referential:  “We  construct  our  realities”  per  constructivism   4. autopoiesis:  a  living  entity  exists  as  a  network  of  components,   recursively  producing  itself,  realizing  its  boundaries;  it  grows  and   maintains  itself  by  reference  to  itself   5. feedback  loops  represent  conversations,  from  which  the   participants  cannot  be  detached   6. an  essentially  ethical  understanding   7. a  productive  interaction  between  theory  and  practice,  in  which   each  supports  the  other 37
  • 38. Second-­‐order  cybernetics 1. von  Foerster:  one  can  apply  the  understandings  developed  in   cybernetics 2. presence  of  the  observer  is  inevitable  and  may  be  desirable:   “What  is  said  is  said  to  an  observer” 3. eigen  functions:  stable,  dynamically  self-­‐perpetuating  states  that   are  self-­‐referential:  “We  construct  our  realities”  per   4. autopoiesis recursively  producing  itself,  realizing  its  boundaries;  it  grows  and   maintains  itself  by  reference  to  itself   5. feedback  loops  represent   participants  cannot  be  detached 6. an  essentially  ethical  understanding   7. a  productive  interaction  between  theory  and  practice,  in  which   each  supports  the  other 38 second-­‐order  cybernetics  lays  a  foundation  for  AI  –   it’s  about  the  semantic  relations  of  conversations   within  a  system;  quite  apt  for  leveraging  NLP,  active   learning,  etc.,  when  you  have  semi-­‐structured  dialog
  • 39. Second-­‐order  cybernetics Autopoiesis  and  Cognition:  The  Realization  of  the  Living
 Humberto  Maturana,  Francisco  Varela
 Kluwer  (1980  /  original  1972)   Understanding  Computers  and  Cognition:  
 A  New  Foundation  for  Design
 Terry  Winograd,  Fernando  Flores
 Intellect  Books  (1986)   Conversations  for  Action  and  Collected  Essays
 Fernando  Flores
 Createspace  (2013) 39
  • 40. Second-­‐order  cybernetics ▪ biology  informing  computer  science   ▪ historical  context  of  Project  Cybersyn   ▪ autopoiesis  and  cognition   ▪ organizational  closure:  
 “self-­‐making  means  stability”   ▪ speech  acts  (e.g.,  social  analysis  of  open  source)   ▪ IMO,  blueprints  for  AI  systems   Also,  the  focus  on  “information  as  a  collection  of  facts”  
 is  yet  another  form  of  cognitive  bias  –  instilled  through  
 30+  years  of  data  warehouse  practices,  where  data  must  
 fit  into  dimensions,  facts,  schema 40
  • 41. Active  Learning:   theory,  practices,  community
  • 42. HITL  theory:  choosing  what  to  learn Active  Learning  Literature  Survey
 Burr  Settles    UW  Madison
 (2010-­‐01-­‐26)   Can  machines  learn  more  economically  if  they  ask  human   “oracles”  questions?    e.g.,  task  in-­‐house  experts  with  the  edge   cases?   ▪ uncertainty  sampling:  query  about  instances  which  ML  is   least  certain  how  to  label  -­‐  least  confidence  /  margin  /  entropy   ▪ query-­‐by-­‐committee:  ensemble  of  ML  models  votes;  query   the  instance  about  which  they  disagree  most   ▪ expected  error  reduction:    maximize  the  expected   information  gain  of  the  query   ▪ variance  reduction:  minimize  future  generalization  error  of   the  model  (e.g.,  loss  function)   ▪ density-­‐weighted  methods:  instances  which  are  both   uncertain  and  “representative”  of  the  underlying  distribution 42
  • 43. HITL  practices:  emerging  themes while  ML  was  mostly  about  generalization,  
 now  we  can  borrow  from  Frank  Knight  (1921):  
 using  ML  models  to  explore  uncertainty  in   relationship  to  profit  vs.  risk   ▪ distinguish  forms  of  uncertainty:  aleatoric   (noise)  vs.  epistemic  (incomplete  model)   ▪ see  also:  meta-­‐learning  [1]  and  [2]   ▪ people  who  aren’t  ML  experts  should  be  able  to   train  and  iterate  robust  models  using  examples   ▪ emphasize  use  of  fitness  functions  to  make   decisions,  in  lieu  of  objective  functions  which
 tend  to  rely  on  overly  simplified  KPIs 43
  • 44. HITL  practices:  model  interpretation explicability  of  ML  models  becomes  essential,  
 must  be  intuitive  for  the  human  experts  involved:  
 Skater,  and  also  Anchors,  SHAP,  STREAK,  LIME,  etc.
 The  Building  Blocks  of  Interpretability
 Chris  Olah,  et  al.    Google  Brain
 Distill  (2018-­‐03-­‐06)   Challenges  for  Transparency
 Adrian  Weller
 WHI  (2017-­‐07-­‐29)   The  Mythos  of  Model  Interpretability
 Zachary  Lipton
 WHI  (2016-­‐03-­‐06) 44
  • 45. Interpreting  Machine  Learning  Models
 Wed  Mar  28  |  10-­‐11  am  Pacific   datascience.com/resources/webinars/interpreting-­‐machine-­‐learning-­‐models   live  webinar:  we’ll  discuss  the  need  for  methods  which  make  the  process  of   explaining  machine  learning  models  more  intuitive,  and  also  evaluate  myths   about  model  interpretability,  from  both  research  and  business  perspectives. 45 Pramit  Choudhary   Lead  Data  Scientist   DataScience.com   Sameer  Singh   CS    Assistant  Professor   UC  Irvine Paco  Nathan   Dir,  Learning  Group   O'Reilly  Media
  • 46. HITL  resources:  conferences,  journals,  etc. HILDA  2018
 Workshop  on  Human-­‐In-­‐the-­‐Loop  Data  Analytics
 Co-­‐located  with  SIGMOD  2018
 June  in  Houston   Collective  Intelligence  2018
 University  of  Zurich,  Switzerland
 collocated  with  AAAI  HCOMP  2018
 July  in  Zurich   HCOMP  in  Slack
 https://hcomp.slack.com/   Human  Computation  journal
 http://hcjournal.org/ojs/index.php?journal=jhc 46
  • 47. HITL  tooling:  active  learning Agnostic  Active  Learning  Without  Constraints
 Alina  Beygelzimer,  Daniel  Hsu,  John  Langford,  
 Tong  Zhang
 NIPS  (2010-­‐06-­‐14)   The  End  of  the  Beginning  of  Active  Learning
 Daniel  Hsu,  John  Langford
 Hunch.net  (2011-­‐04-­‐20)   https://github.com/JohnLangford/vowpal_wabbit/wiki   focused  on  cases  where  labeling  is  expensive;  uses  importance   weighted  active  learning;  handles  “adversarial  label  noise”   as  good  or  better  than  supervised  ML,  wherever  supervised   ML  works 47
  • 48. HITL  tooling:  machine  teaching Prodigy:  a  new  tool  for  radically   efficient  machine  teaching
 Matthew  Honnibal,  Ines  Montani     Explosion.ai  (2017) 48
  • 49. Management  strategy:  before In  general  with  Big  Data,  we  were  considering:   ▪ DAG  workflow  execution  –  
 those  are  typically  linear   ▪ data-­‐driven  organizations   ▪ ML  based  on  optimizing  for  
 objective  functions   ▪ general  considerations  about  
 correlation  vs.  causation   ▪ avoid  “garbage  in,  garbage  out” 49 Jarvis  workflow
  • 50. Management  strategy:  after HITL  introduces  circularities:   ▪ deprecate  linear  input/output  systems  
 as  the  “conventional  wisdom”   ▪ analogous  to  an  OODA  loop  which   incorporates  automation/augmentation   ▪ recognize  multiple  feedback  loops  
 as  conversations  for  action   ▪ recognize  opportunity:  loops  from   perception  (e.g.,  DL)  to  action  (e.g.,  HITL)   and  back  again  to  perception   ▪ design  systems  to  learn  implicitly  
 from  the  intelligence  already  there   ▪ hint:  recognize  the  “verbs”  being  used,   rather  than  over-­‐emphasizing  “nouns” 50 Experts decide about edge cases, providing examples Experts learn through Customer interactions Customers request Sales, Marketing, Service, Training Experts gain insights via Model explanations ML Models Models focus Experts (e.g., weak supervision) Organizational Learning Human Experts Examples, Actions Customers Models act on decisions when possible Customer Use Cases Models explore uncertainty when needed
  • 51. Management  strategy:  no-­‐collar  workforce No-­‐collar  workforce:  Humans  and  machines  in  one  loop
 Anthony  Abbatiello,  Tim  Boehm,  Jeff  Schwartz,  Sharon  Chand
 Deloitte  Insights  (2017-­‐12-­‐05)   ▪ near-­‐future:  human  workers  and  machines  complement   the  other’s  efforts  in  a  single  loop  of  productivity   ▪ 2018-­‐20:  expect  firms  to  embrace  a  “no-­‐collar  workforce”   trend  by  redesigning  jobs   ▪ yet  only  ~17%  ready  to  manage  a  workforce  in  which   people,  robots,  and  AI  work  side  by  side  –  largely  due  to   cultural,  tech  fluency,  regulatory  issues   ▪ e.g.,  what  about  onboarding  or  retiring  non-­‐human   workers?  these  are  no  longer  theoretical  questions   ▪ HR  orgs  must  develop  strategies  and  tools  for  recruiting,   managing,  and  training  a  hybrid  workforce 51
  • 53. Conference  summaries,  Oct  2017  part  1
 PN    (2017-­‐10-­‐10)   Themes  emerging  in  AI  conferences  about  the  impact   of  ML  on  software  process,  i.e.,  something’s  afoot:   2009–ish,  data  science  ran  headlong  into  prod  mgmt
 2012-­‐ish,  data  sci  leaders  moved  into  prod  exec  roles   2018-­‐ish,  AI  apps  disrupting  prod  mgmt
 … 53 Extrapolating  trends
  • 54. Flywheel  Effect,  circa  2018   AI  drives  features  in  products  and  services  …  
 which  in  turn  drives  cloud  consumption  …  
 which  in  turn  acquires  even  more  data  …  
 particularly  for  mobile  or  embedded  products   Incumbents  now  lead  in  AI  +  cloud  +  mobile/embed:  
 Google,  Amazon,  Microsoft,  IBM,  Apple,  Baidu,  etc.
  • 55. segment assets liabilities Google,   Amazon,   Microsoft,
 IBM,
  Apple,  
 Baidu,  
 etc. ▪ AI  +  cloud  +  mobile/embed,  
 leveraging  a  flywheel  effect   ▪ had  focused  business  lines  well  
 in  advance  to  prepare  large-­‐scale  
 labeled  data  sets   ▪ uses  AI  to  explore  uncertainty,  
 focusing  their  core  expertise ▪ high  capital  expenses,  long-­‐term  R&D  
 as  hardware  evolves  rapidly   ▪ potential  vulnerabilities  by  automating  
 too  much   ▪ potential  vulnerabilities  by  mistaking  
 first-­‐order  cybernetics  for  second-­‐order <  50% ▪ HITL  provides  a  vector  to  compete  
 against  top  incumbents,  with  many   unexplored  areas  of  opportunity ▪ facing  barriers:  talent  gap,  competing  
 investment  priorities,  security  concerns   ▪ verticals  eroded  by  horizontal  business  
 lines  from  top  incumbents >  50% ?? ▪ struggling  to  recognize  business  use  cases   ▪ buried  in  tech  debt  from  digital  infrastructure   ▪ lacks  management  support Challenge:  adoption  by  industry  segment 55
  • 56. What  is  changing  and  why? Second-­‐order  cybernetics  began  partly  as  a  study  of  how   complex  systems  fail,  and  also  about  what  social  systems  
 and  physical  systems  had  in  common   It  provides  foundations  for  AI  systems  of  people  +  machines   Feedback  loops  represent  structured  conversations  for  action,   from  which  the  participants  cannot  be  detached   The  organization  is  no  longer  viewed  as  an  input/output   machine;  rather  it’s  a  pluralistic  network  of  loops  from   perception  to  action  and  back  again  to  perception  –  
 e.g.,  DL  augments  perception  and  RL  augments  actions 56
  • 57. Second-­‐order  cybernetics  began  partly  as  a  study  of  how   complex  systems   and  physical  systems  had  in  common   It  provides  foundations  for   Feedback  loops  represent  structured   action The  organization  is  no  longer  viewed  as  an  input/output   machine;  rather  it’s  a  pluralistic  network  of  loops  from   perception  to  action  and  back  again  to  perception   e.g.,  DL  augments   What  is  changing  and  why? 57 In  other  words,  as  the  flywheel  effect  itself  
 is  evolving,  to  stay  ahead  we  must  recognize   the  emerging  “verbs”,  which  are  entry  points   into  the  business  use  cases
  • 58. What  do  organizations  carry  into  AI? Assess  the  cognitive  biases  we  bring  into  AI  systems  of  people  +  machines:   ▪ anthropocentrism  and  system  justification,  as  shown  by  Conway’s  Law   ▪ DW  +  BI  cultural  lens  overemphasizes  “information  as  a  collection  of  facts”,  
 missing  the  conversations  for  action   ▪ digitalization  sequence  “Product”,  “Service”,  “Data”:  overreacting  to  the  nouns   (facts),  while  ignoring  the  verbs  (relations)   ▪ delegation  +  committee:  narrowing  the  scope  of  inquiry  and  design  alternatives  
 until  a  system  is  simple  enough  to  understand  in  human  terms   ▪ some  incumbents  hold  tenaciously  to  ML  apps  within  first-­‐order  cybernetics,  
 i.e.,  bias  toward  mostly  top-­‐down  command  and  control   Instead,  we  must  design  systems  that  learn  implicitly  from  the  intelligence   already  within  an  organization  and  its  relationships  with  the  customers,   channels,  etc.  Sales,  Customer  Support,  Professional  Services,  Marketing 58
  • 59. What  do  organizations  carry  into  AI? Assess  the   ▪ anthropocentrism ▪ DW  +  BI  cultural  lens  overemphasizes  “ missing  the  conversations  for  action ▪ digitalization  sequence  “Product”,  “Service”,  “Data”:   (facts),  while   ▪ delegation until  a  system  is  simple  enough  to  understand  in  human  terms ▪ some  incumbents  hold  tenaciously  to  ML  apps  within  first-­‐order  cybernetics,   i.e.,  bias  toward   Instead,  we  must  design  systems  that  learn  implicitly  from  the  intelligence   already  within  an  organization  and  its  relationships  with  the  customers,   channels,  etc.   59 Could  we  be  encountering  early  stages  of   not-­‐only-­‐human  cognition  attempting  to   optimize  beyond  human  predispositions   and  cognitive  biases? [ed  note:  say  at  least  one  strange  thing]
  • 60. “The  future  belongs  to  those  who
    understand  at  a  very  deep  level  how
    to  combine  their  unique  expertise
    with  what  algorithms  do  best.”              –  Pedro  Domingos,  The  Master  Algorithm
  • 61. The  AI  Conf   CN  Apr  10-­‐13
 NY,  Apr  29-­‐May  2
 SF,  Sep  4-­‐7
 UK,  Oct  8-­‐11   Strata  Data   UK,  May  21-­‐24
 NY,  Sep  11-­‐14
 SF,  Mar  26-­‐28   JupyterCon  +  events   BOS,  Mar  21
 ATL,  Mar  31
 DC,  May  15
 NY,  Aug  21-­‐25   OSCON   PDX,  Jul  16-­‐19
  • 62. Get  Started  with   NLP  in  Python Just  Enough  Math Building  Data   Science  Teams Hylbert-­‐Speys How  Do  You  Learn? arycles,  online  courses,  conference  summaries…   liber118.com/pxn/
 @pacoid