SlideShare une entreprise Scribd logo
1  sur  50
Télécharger pour lire hors ligne
ELLEN	
  FRIEDMAN	
  	
   	
   	
   	
  BIG	
  DATA	
  LONDON 	
  	
  	
  
Principal	
  Technologist	
   	
   	
  14	
  November	
  	
  2018	
  
7	
  Successful	
  Habits	
  for	
  Data-­‐Intensive	
  	
  	
  
	
  ApplicaBons	
  in	
  ProducBon	
  
2 © 2018 MapR Technologies, Inc.
Ellen	
  Friedman,	
  PhD	
  	
  	
  
	
  Principal	
  Technologist,	
  MapR	
  Technologies	
  
	
  CommiBer	
  Apache	
  Drill	
  &	
  Apache	
  Mahout	
  projects	
  
	
  O’Reilly	
  author	
  
	
   	
   	
  	
  
Email	
  	
  	
  efriedman@mapr.com 	
  	
   	
  ellenf@apache.org	
  	
  
	
  
TwiBer	
  @Ellen_Friedman 	
   	
   	
   	
   	
   	
   	
   	
  
	
   	
  	
  
	
   	
   	
   	
  	
  
Contact	
  InformaBon	
  
	
  
	
  
3 © 2018 MapR Technologies, Inc.
.
Images	
  ©	
  Friedman	
  &	
  Dunning	
  
Image	
  courtesy	
  Mtell	
  used	
  with	
  permission
Big	
  Value	
  from	
  Big	
  Data	
  in	
  ProducBon	
  across	
  Diverse	
  Businesses	
  
TelecommunicaQons	
  
Smart-­‐Metered	
  UQliQes	
  
Heavy	
  Industry	
  
Financial	
  
TransportaQon	
  
Agritech	
  
4 © 2018 MapR Technologies, Inc.
“The	
  future	
  is	
  already	
  here	
  –	
  it’s	
  just	
  
not	
  evenly	
  distributed.”	
  
	
  
	
   	
  	
   	
   	
  	
   	
  -­‐	
  William	
  Gibson
	
   	
  	
   	
  	
  
5 © 2018 MapR Technologies, Inc.
.
	
  
Some	
  organizaQons	
  more	
  successful	
  at	
  geWng	
  large	
  scale	
  systems	
  into	
  producQon	
  	
  
	
  
•  2018	
  Gartner	
  report	
  stated	
  only	
  17%	
  of	
  Hadoop-­‐based	
  systems	
  were	
  in	
  producQon	
  
	
   	
   	
   	
   	
  vs	
  
•  Over	
  90%	
  of	
  MapR	
  customers	
  have	
  large	
  scale	
  producQon	
  systems	
  
	
  
Why?	
  
Value	
  from	
  Data-­‐Intensive	
  ApplicaBons	
  in	
  ProducBon	
  
6 © 2018 MapR Technologies, Inc.
1	
  
Data	
  may	
  be	
  in	
  production	
  long	
  before	
  
code	
  is	
  complete	
  
	
   	
  	
   	
  	
  
7 © 2018 MapR Technologies, Inc.
.
Labs	
  in	
  Canada	
  froze	
  blood	
  samples	
  for	
  years	
  in	
  case	
  they	
  contain	
  valuable	
  informaQon.	
  
	
  
•  Modern	
  geneQc	
  techniques	
  revealed	
  key	
  disease	
  data	
  
•  Correlated	
  with	
  outcomes	
  for	
  the	
  donor	
  paQents	
  
•  The	
  data	
  was	
  preserved	
  before	
  the	
  analysis	
  was	
  even	
  
Real	
  World	
  Example:	
  Biological	
  Samples	
  Contain	
  Data	
  
Image	
  ©	
  2003	
  Ellen	
  Friedman	
  
8 © 2018 MapR Technologies, Inc.
Are	
  you	
  asking	
  the	
  right	
  quesBon?	
  
tnow
Hourly
clicks at
A B C
Which	
  markeQng	
  email	
  is	
  most	
  effecQve?	
  
	
  
•  Looks	
  like	
  C	
  –	
  Blue	
  is	
  best	
  when	
  you	
  base	
  the	
  
quesQon	
  on	
  data	
  for	
  highest	
  hourly	
  response	
  
rate	
  at	
  a	
  current	
  point	
  in	
  Qme	
  tnow	
  
•  But	
  this	
  is	
  misleading.	
  	
  Why?	
  
9 © 2018 MapR Technologies, Inc.
BeRer	
  way	
  to	
  frame	
  the	
  quesBon:	
  
Which	
  markeQng	
  email	
  is	
  most	
  effecQve	
  based	
  
on	
  response	
  rate	
  at	
  Qme	
  t	
  ader	
  launch?	
  
	
  
•  Collect	
  and	
  retain	
  data	
  across	
  same	
  Qme	
  
interval	
  relaQve	
  to	
  launch	
  
	
  	
  
•  Looks	
  like	
  B	
  –	
  Green	
  is	
  best	
  performer	
  and	
  
C-­‐	
  Blue	
  is	
  least	
  effecQve	
  for	
  response	
  rate	
  
Click
rate tnow
t
Click
rate
Time after launch
Launch Measure
A
B
C
Measuring performance at a
constant time after launch gives
consistent comparison
10 © 2018 MapR Technologies, Inc.
BeRer	
  way	
  to	
  frame	
  the	
  quesBon:	
  
Which	
  markeQng	
  email	
  is	
  most	
  effecQve	
  based	
  
on	
  response	
  rate	
  at	
  Qme	
  t	
  ader	
  launch?	
  
	
  
•  Collect	
  and	
  retain	
  data	
  across	
  same	
  Qme	
  
interval	
  relaQve	
  to	
  launch	
  
	
  	
  
•  Looks	
  like	
  B	
  –	
  Green	
  is	
  best	
  performer	
  and	
  
C-­‐	
  Blue	
  is	
  least	
  effecQve	
  for	
  response	
  rate	
  
Click
rate tnow
t
Click
rate
Time after launch
Launch Measure
A
B
C
But that’s hard after we have over-
written the old data with current data
11 © 2018 MapR Technologies, Inc.
Spot	
  the	
  Difference?	
  
GET	photo.jpg	HTTP/1.1
Host:	lh4.googleusercontent
User-agent:	Mozilla/5.0	(Ma
Accept:	image/png,image/*
Accept-language:	en-US,en
Accept-encoding:	gzip,	defl
Referer:	https://www.google
Connection:	keep-alive
If-none-match:	"v9”
Cache-control:	max-age=0
GET	cc/borken.json	HTTP/1.1
host:	c.qrs.my
user-agent:	Mozilla/4.0	(co
accept:	application/json,	t
accept-language:	en-US,en
accept-encoding:	gzip,	defl
referer:	none
connection:	keep-alive
if-none-match:	"v9”
cache-control:	max-age=0
Attacker requestReal request
12 © 2018 MapR Technologies, Inc.
Spot	
  the	
  Difference?	
  
GET	photo.jpg	HTTP/1.1
Host:	lh4.googleusercontent
User-agent:	Mozilla/5.0	(Ma
Accept:	image/png,image/*
Accept-language:	en-US,en
Accept-encoding:	gzip,	defl
Referer:	https://www.google
Connection:	keep-alive
If-none-match:	"v9”
Cache-control:	max-age=0
GET	cc/borken.json	HTTP/1.1
host:	c.qrs.my
user-agent:	Mozilla/4.0	(co
accept:	application/json,	t
accept-language:	en-US,en
accept-encoding:	gzip,	defl
referer:	none
connection:	keep-alive
if-none-match:	"v9”
cache-control:	max-age=0
Attacker requestReal request
13 © 2018 MapR Technologies, Inc.
Security	
  expert	
  at	
  a	
  bank	
  preserved	
  headers	
  for	
  web	
  site	
  requests	
  
	
  
Detected	
  anomaly	
  in	
  headers	
  for	
  the	
  aBackers	
  vs	
  normal	
  (real)	
  requests	
  
	
  
But	
  how	
  would	
  you	
  know	
  what	
  data	
  to	
  preserve?	
  
	
  
•  PaBern	
  of	
  behavior	
  for	
  aBackers	
  was	
  allowable	
  for	
  headers	
  	
  
•  It	
  was	
  not	
  predictable:	
  but	
  it	
  was	
  different	
  
	
  
	
  
Domain	
  Knowledge	
  MaRers:	
  DetecBng	
  Security	
  ARacks	
  	
  
14 © 2018 MapR Technologies, Inc.
2	
  
End-­‐to-­‐end	
  design	
  for	
  production	
  
	
  
	
   	
  	
   	
  	
  
15 © 2018 MapR Technologies, Inc.
Do	
  you	
  have	
  a	
  good	
  fit	
  between	
  applicaQon,	
  its	
  SLAs	
  and	
  pracQcal	
  business	
  goal?	
  
Is	
  there	
  a	
  way	
  to	
  take	
  ac2on	
  on	
  the	
  output	
  of	
  the	
  applicaQon?	
  	
  
!  Note:	
  a	
  report	
  is	
  not	
  an	
  acQon	
  
Do	
  you	
  have	
  real	
  advance	
  planning	
  for	
  producQon?	
  	
  
!  Note:	
  slapping	
  an	
  SLA	
  onto	
  a	
  complex	
  applicaQon	
  &	
  tossing	
  it	
  to	
  IT	
  with	
  hazily	
  
defined	
  criQcal	
  tasks	
  is	
  not	
  producQon	
  planning	
  
	
  
	
  
	
  
The	
  missing	
  bits	
  usually	
  aren’t	
  in	
  the	
  applicaBon	
  itself	
  
16 © 2018 MapR Technologies, Inc.
.
Remember:	
  	
  IT	
  doesn’t	
  have	
  a	
  magic	
  wand…	
  
17 © 2018 MapR Technologies, Inc.
.
Remember:	
  	
  IT	
  doesn’t	
  have	
  a	
  magic	
  wand…	
  
18 © 2018 MapR Technologies, Inc.
Build	
  a	
  Data	
  Fabric	
  
Flexibility	
  &	
  agility	
  to	
  respond	
  as	
  life	
  changes	
  
19 © 2018 MapR Technologies, Inc.
A	
  Global	
  Data	
  Fabric:	
  Edge	
  to	
  On-­‐Premises	
  to	
  Cloud	
  
Data	
  where	
  you	
  want	
  it,	
  compute	
  power	
  where	
  you	
  need	
  it.	
  
	
  
20 © 2018 MapR Technologies, Inc.
.
Comprehensive	
  View	
  of	
  Data	
  vs	
  Isolated	
  Services	
  
Commonality	
  
IsolaQon	
  
Both	
  are	
  needed,	
  for	
  different	
  purposes.	
  	
  Best	
  pracQce	
  achieves	
  a	
  balance.	
  
21 © 2018 MapR Technologies, Inc.
3	
  
Orchestration	
  of	
  applications	
  and	
  
orchestration	
  of	
  data	
  
	
   	
  	
   	
  	
  
22 © 2018 MapR Technologies, Inc.
Containerized	
  applicaQons	
  run	
  in	
  different	
  environments	
  on	
  same	
  cluster	
  at	
  same	
  Qme	
  
	
  
Kubernetes	
  is	
  emerging	
  as	
  the	
  leader	
  in	
  orchestraQon	
  of	
  containerized	
  applicaQons	
  
•  You	
  specify	
  what	
  needs	
  to	
  be	
  done,	
  Kubernetes	
  arranges	
  it	
  by	
  running	
  containers	
  
	
  	
  
•  Kubernetes	
  allows	
  access	
  to	
  services	
  by	
  name	
  
	
  
ContainerizaBon:	
  flexibility,	
  convenience,	
  predictability	
  
23 © 2018 MapR Technologies, Inc.
CNCF	
  reported	
  	
  
	
  
•  nearly	
  70%	
  of	
  organizaQons	
  surveyed	
  use	
  Kubernetes	
  to	
  manage	
  containers	
  
	
  
•  but	
  the	
  #1	
  issue	
  for	
  Kubernetes	
  users	
  is	
  storage	
  
	
  
You	
  don’t	
  want	
  to	
  store	
  state	
  in	
  containers	
  –	
  defeats	
  flexibility	
  
	
  
	
  
	
  
	
  
	
  Biggest	
  Challenge	
  with	
  Kubernetes	
  is	
  Data	
  Persistence	
  
24 © 2018 MapR Technologies, Inc.
CNCF	
  reported	
  	
  
	
  
•  nearly	
  70%	
  of	
  organizaQons	
  surveyed	
  use	
  Kubernetes	
  to	
  manage	
  containers	
  
	
  
•  but	
  the	
  #1	
  issue	
  for	
  Kubernetes	
  users	
  is	
  storage	
  
You	
  don’t	
  want	
  to	
  store	
  state	
  in	
  containers	
  –	
  defeats	
  flexibility	
  
	
  
	
  How	
  do	
  you	
  get	
  the	
  benefits	
  of	
  containeriza2on	
  without	
  being	
  limited	
  to	
  
	
  stateless	
  applica2ons?	
  
	
  
	
  
	
  Biggest	
  Challenge	
  with	
  Kubernetes	
  is	
  Data	
  Persistence	
  
25 © 2018 MapR Technologies, Inc.
App 1 App 2 App 3
Kubernetes
26 © 2018 MapR Technologies, Inc.
App 1 App 2 App 3
Kubernetes
rpc
stream
LogFile
27 © 2018 MapR Technologies, Inc.
App 1 App 2 App 3
Kubernetes
rpc
stream
LogFile
Data Platform
28 © 2018 MapR Technologies, Inc.
	
  
Kubernetes	
  for	
  orchestraQon	
  of	
  applicaQons	
  
	
  + 	
  	
  
Dataware	
  for	
  orchestraQon	
  of	
  data	
  
	
  	
   	
  	
  
A	
  Powerful	
  CombinaBon	
  	
  
29 © 2018 MapR Technologies, Inc.
4	
  
Simplicity	
  is	
  golden.	
  
	
  
You	
  should	
  not	
  need	
  an	
  army	
  to	
  administer	
  
a	
  cluster.	
  
	
   	
  	
   	
  	
  
30 © 2018 MapR Technologies, Inc.
.
If	
  your	
  system	
  has	
  lots	
  of	
  work-­‐arounds,	
  think	
  again…	
  
31 © 2018 MapR Technologies, Inc.
.
If	
  your	
  system	
  has	
  lots	
  of	
  work-­‐arounds,	
  think	
  again…	
  
32 © 2018 MapR Technologies, Inc.
Much	
  of	
  logisQcs	
  and	
  many	
  processes	
  should	
  be	
  handled	
  by	
  the	
  plaoorm,	
  not	
  by	
  
developers	
  at	
  the	
  applicaQon	
  level	
  
	
  
•  SeparaQon	
  of	
  concerns	
  for	
  system	
  administrators	
  vs	
  developers/data	
  scienQsts	
  
•  More	
  efficient,	
  less	
  risk	
  of	
  error	
  
•  Ease	
  of	
  administraQon	
  
	
  
Data	
  Pla[orm	
  Should	
  Simplify	
  	
  
33 © 2018 MapR Technologies, Inc.
Orchestrate	
  Data	
  with	
  Dataware	
  
Legacy	
  ApplicaBons	
   Big	
  Data	
  1.0	
  ApplicaBons	
   Next-­‐Gen	
  ApplicaBons	
  
	
  
MapR	
  Converged	
  Data	
  Plaoorm	
  
	
  
High	
  Availability	
   Real	
  Time	
   Unified	
  Security	
   MulB-­‐tenancy	
   Disaster	
  Recovery	
   Global	
  Namespace	
  
Real-­‐Time	
  NoQL	
  Database	
  	
   	
  Stream	
  	
  Transport	
  Web-­‐Scale	
  Storage	
  	
  
MapR	
  is	
  more	
  than	
  just	
  data	
  storage	
  –	
  it’s	
  like	
  Kubernetes	
  for	
  data.	
  	
  
	
  
MapR	
  is	
  dataware.	
  
34 © 2018 MapR Technologies, Inc.
MapR	
  Volume:	
  Directory	
  with	
  Special	
  Management	
  CapabiliBes	
  
Cluster
Volume mount point
Directories
Files
Streams
Table
Volumes	
  used	
  for	
  easy	
  control	
  of	
  access,	
  mulQtenancy,	
  data	
  locality	
  &	
  DR	
  
35 © 2018 MapR Technologies, Inc.
MapR	
  can	
  run	
  huge	
  numbers	
  of	
  applicaQons	
  on	
  same	
  cluster	
  
	
  	
  
•  Great	
  performance,	
  opQmized	
  resources	
  
•  Comprehensive	
  view	
  of	
  data	
  &	
  beBer	
  collaboraQon	
  
•  Ease	
  of	
  administraQon	
  
	
  
Is	
  your	
  system	
  easy	
  to	
  administer?	
  
We’ve	
  seen	
  a	
  large	
  retail	
  customer	
  manage	
  a	
  cluster	
  of	
  	
  >1000	
  nodes	
  
with	
  just	
  3	
  administrators	
  	
  
	
  
36 © 2018 MapR Technologies, Inc.
Open	
  APIs	
  
Multicloud & Hybrid Cloud Strategy with MapR:
Application	
  
•  Unified	
  Security	
  Model	
  
•  Data	
  access	
  decoupled	
  from	
  physical	
  
storage	
  location.	
  Globally.	
  
•  No	
  lock-­‐in	
  to	
  proprietary	
  APIs	
  
•  Full	
  openness	
  
•  Data	
  made	
  portable	
  
API	
  Connector	
  
✓
GLOBAL	
  DATA	
  MANAGEMENT	
  
Edge	
   Private	
  Cloud	
  
On	
  Premise	
  
Public	
  Cloud	
   Public	
  Cloud	
   Public	
  Cloud	
  
Uniform computing environment
everywhere
37 © 2018 MapR Technologies, Inc.
5Build	
  real	
  multitenancy	
  
	
   	
  	
   	
  	
  
38 © 2018 MapR Technologies, Inc.
MapR	
  Volumes:	
  Easy	
  Data	
  Management	
  
•  MulQple	
  volumes	
  span	
  a	
  cluster	
  
•  Files,	
  tables,	
  streams	
  in	
  same	
  volume	
  
•  Fine-­‐grained	
  control	
  of	
  who	
  has	
  access	
  
•  Basis	
  for	
  mirroring,	
  snapshots	
  
•  Great	
  advantage	
  for	
  mulQtenancy	
  	
  
39 © 2018 MapR Technologies, Inc.
MapR	
  Volumes:	
  Control	
  Data	
  Locality	
  
•  Place	
  data	
  on	
  specialized	
  hardware	
  
(such	
  as	
  GPUs)	
  
•  Meet	
  compliance	
  requirements	
  
	
  
•  BeBer	
  opQmizaQon	
  of	
  resources	
  
•  Great	
  advantage	
  for	
  AI/	
  machine	
  
learning	
  	
  
40 © 2018 MapR Technologies, Inc.
A	
  simpler	
  system	
  is	
  more	
  cost	
  effecQve,	
  with	
  beBer	
  performance	
  
	
  
	
  
•  Remember:	
  	
  
Every	
  HBase	
  commit	
  requires	
  a	
  round	
  trip	
  to	
  the	
  namenode	
  if	
  run	
  on	
  HDFS	
  
MapR	
  has	
  no	
  namenode	
  –	
  avoids	
  the	
  problem	
  
	
  
•  It	
  isn’t	
  magic.	
  It’s	
  just	
  real	
  mulQtenancy.	
  	
  
Does	
  your	
  system	
  avoid	
  unnecessary	
  sprawl?	
  
We’ve	
  seen	
  a	
  customer	
  collapse	
  17	
  HBase	
  clusters	
  in	
  AWS	
  to	
  1	
  cluster	
  
running	
  five	
  nodes	
  HBase	
  on	
  MapR	
  with	
  great	
  performance	
  
	
  
41 © 2018 MapR Technologies, Inc.
6	
  
Streaming	
  architecture	
  provides	
  
flexibility	
  
	
   	
  	
   	
  	
  
42 © 2018 MapR Technologies, Inc.
Stream	
  Transport	
  to	
  Decouple	
  Producers	
  &	
  Consumers	
  
P
P
P
C
C
C
Transport Processing
Kafka /
MapR Streams
“Streaming	
  Microservices”	
  by	
  Ted	
  Dunning	
  &	
  Ellen	
  Friedman,	
  in	
  Encyclopedia	
  of	
  Big	
  Data	
  Technologies,	
  Sherif	
  Sakr	
  and	
  Albert	
  Zomaya,	
  editors,	
  ©	
  2018	
  
(Springer	
  InternaQonal	
  Publishing)	
  
	
  
ebook	
  Streaming	
  Architecture	
  by	
  Ted	
  Dunning	
  &	
  Ellen	
  Friedman	
  ©	
  2016	
  (	
  O’Reilly	
  Media),	
  chapter	
  3:	
  
hBps://mapr.com/ebooks/streaming-­‐architecture/chapter-­‐03-­‐streaming-­‐plaoorm-­‐for-­‐microservices.html	
  
	
  	
  	
  
43 © 2018 MapR Technologies, Inc.
Stream-­‐first	
  Architecture	
  Supports	
  Microservices	
  
Real-time
analytics
EMR
Patient Facilities
management
Insurance
audit
A
B
Medical tests
C
Medical test
results
With	
  the	
  right	
  messaging	
  tool	
  you	
  
support	
  mulQple	
  classes	
  of	
  use	
  cases	
  
	
  
	
  
	
  
44 © 2018 MapR Technologies, Inc.
ResultsRendezvous
Rendezvous	
  Architecture	
  is	
  based	
  on	
  streaming	
  microservices	
  
Scores
ArchiveDecoy
m1
m2
m3
Features /
profiles
InputRaw
Rendezvous Architecture described in:
-  Machine Learning Logistics book by Ted Dunning & Ellen Friedman, 2018 (O’Reilly)
-­‐  “Rendezvous	
  Architecture”	
  by	
  Ted	
  Dunning	
  &	
  Ellen	
  Friedman,	
  chapter	
  in	
  Encyclopedia	
  of	
  Big	
  Data	
  Technologies.	
  Sherif	
  Sakr	
  and	
  
Albert	
  Zomaya,	
  editors.	
  Springer	
  InternaQonal	
  Publishing,	
  2018.	
  	
  
	
  	
  
45 © 2018 MapR Technologies, Inc.
7Build	
  a	
  data-­‐aware	
  culture.	
  
	
  
That	
  also	
  gives	
  you	
  a	
  production-­‐ready	
  culture.	
  
	
  	
   	
   	
  	
  
46 © 2018 MapR Technologies, Inc.
DataOps:	
  Brings	
  Flexibility	
  &	
  Focus	
  
•  Cross-functional teams cut
across skill guilds
•  Better communication &
shared goal keep effort
focused and flexible
Refs:
“DevOps, DataOps & Machine Learning”
by Terry McCann
http://www.hyperbi.co.uk/devops-dataops-
and-machine-learning/
“How to Manage a DataOps Team” by E.
Friedman in RTInsights
https://www.rtinsights.com/how-manage-
dataops-team/
Developer
Data
Scientist /
Data
Engineer
Ops NetworkSite
Reliability
QA
Security
Project 1
Project 2
Project 3
Kubernetes
Data platform
Storage
Required team skill
Occasional help
47 © 2018 MapR Technologies, Inc.
AI	
  &	
  Analy)cs	
  in	
  Produc)on:	
  	
  
	
   	
  How	
  to	
  Make	
  It	
  Work	
  
	
  
By	
  Ted	
  Dunning	
  &	
  Ellen	
  Friedman	
  
	
  
	
   	
  
Download	
  FREE	
  pdf	
  of	
  this	
  book	
  courtesy	
  MapR:	
  
https://mapr.com/ebook/ai-and-analytics-in-production/
48 © 2018 MapR Technologies, Inc.
Please support women in tech – help build
girls’ dreams of what they can accomplish
© Ellen Friedman 2015#womenintech #datawomen
49 © 2018 MapR Technologies, Inc.
Thank	
  you	
  !	
  	
  
50 © 2018 MapR Technologies, Inc.
Ellen	
  Friedman,	
  PhD	
  	
  	
  
	
  Principal	
  Technologist,	
  MapR	
  Technologies	
  
	
  CommiBer	
  Apache	
  Drill	
  &	
  Apache	
  Mahout	
  projects	
  
	
  O’Reilly	
  author	
  
	
   	
   	
  	
  
Email	
  	
  	
  efriedman@mapr.com 	
  	
   	
  ellenf@apache.org	
  	
  
	
  
TwiBer	
  @Ellen_Friedman 	
   	
   	
   	
   	
   	
   	
   	
  
	
   	
  	
  
	
   	
   	
   	
  	
  
Contact	
  InformaBon	
  
	
  
	
  

Contenu connexe

Tendances

Steve Jenkins - Business Opportunities for Big Data in the Enterprise
Steve Jenkins - Business Opportunities for Big Data in the Enterprise Steve Jenkins - Business Opportunities for Big Data in the Enterprise
Steve Jenkins - Business Opportunities for Big Data in the Enterprise WeAreEsynergy
 
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?SnapLogic
 
Innovating With Data and Analytics
Innovating With Data and AnalyticsInnovating With Data and Analytics
Innovating With Data and AnalyticsVMware Tanzu
 
ODSC data science to DataOps
ODSC data science to DataOpsODSC data science to DataOps
ODSC data science to DataOpsChristopher Bergh
 
Tim Daines, QuantumBlack
Tim Daines, QuantumBlackTim Daines, QuantumBlack
Tim Daines, QuantumBlackMad*Pow
 
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive
 
Talend 6.1 - What's New in Talend?
Talend 6.1 - What's New in Talend?Talend 6.1 - What's New in Talend?
Talend 6.1 - What's New in Talend?Talend
 
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Impetus Technologies
 
DevOps + DataOps = Digital Transformation
DevOps + DataOps = Digital Transformation DevOps + DataOps = Digital Transformation
DevOps + DataOps = Digital Transformation Delphix
 
Moving to the Cloud: Modernizing Data Architecture in Healthcare
Moving to the Cloud: Modernizing Data Architecture in HealthcareMoving to the Cloud: Modernizing Data Architecture in Healthcare
Moving to the Cloud: Modernizing Data Architecture in HealthcarePerficient, Inc.
 
Proof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-seriesProof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-seriesDataWorks Summit
 
MapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn GloballyMapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn Globallyridhav
 
Private Cloud Delivers Big Data in Oil & Gas v4
Private Cloud Delivers Big Data in Oil & Gas v4Private Cloud Delivers Big Data in Oil & Gas v4
Private Cloud Delivers Big Data in Oil & Gas v4Andy Moore
 
Embracing Cloud Agility to Maximize Flexibility & Performance
Embracing Cloud Agility to Maximize Flexibility & Performance Embracing Cloud Agility to Maximize Flexibility & Performance
Embracing Cloud Agility to Maximize Flexibility & Performance Talend
 
Mike Tuche, CEO of Talend: Enabling the Data Driven Enterprise
Mike Tuche, CEO of Talend: Enabling the Data Driven EnterpriseMike Tuche, CEO of Talend: Enabling the Data Driven Enterprise
Mike Tuche, CEO of Talend: Enabling the Data Driven EnterpriseTalend
 
Jan van der Vegt. Challenges faced with machine learning in practice
Jan van der Vegt. Challenges faced with machine learning in practiceJan van der Vegt. Challenges faced with machine learning in practice
Jan van der Vegt. Challenges faced with machine learning in practiceLviv Startup Club
 
Elastic in oil and gas
Elastic in oil and gasElastic in oil and gas
Elastic in oil and gasDiego Escobar
 
Industry trends.v0.1pptx
Industry trends.v0.1pptxIndustry trends.v0.1pptx
Industry trends.v0.1pptxArindam Banerji
 

Tendances (20)

Steve Jenkins - Business Opportunities for Big Data in the Enterprise
Steve Jenkins - Business Opportunities for Big Data in the Enterprise Steve Jenkins - Business Opportunities for Big Data in the Enterprise
Steve Jenkins - Business Opportunities for Big Data in the Enterprise
 
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
 
Innovating With Data and Analytics
Innovating With Data and AnalyticsInnovating With Data and Analytics
Innovating With Data and Analytics
 
ODSC data science to DataOps
ODSC data science to DataOpsODSC data science to DataOps
ODSC data science to DataOps
 
Tim Daines, QuantumBlack
Tim Daines, QuantumBlackTim Daines, QuantumBlack
Tim Daines, QuantumBlack
 
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
 
Talend 6.1 - What's New in Talend?
Talend 6.1 - What's New in Talend?Talend 6.1 - What's New in Talend?
Talend 6.1 - What's New in Talend?
 
Meetup Spark UDF performance
Meetup Spark UDF performanceMeetup Spark UDF performance
Meetup Spark UDF performance
 
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
 
DevOps + DataOps = Digital Transformation
DevOps + DataOps = Digital Transformation DevOps + DataOps = Digital Transformation
DevOps + DataOps = Digital Transformation
 
Moving to the Cloud: Modernizing Data Architecture in Healthcare
Moving to the Cloud: Modernizing Data Architecture in HealthcareMoving to the Cloud: Modernizing Data Architecture in Healthcare
Moving to the Cloud: Modernizing Data Architecture in Healthcare
 
Proof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-seriesProof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-series
 
Smart App@Pivotal by Dat Tran
Smart App@Pivotal by Dat TranSmart App@Pivotal by Dat Tran
Smart App@Pivotal by Dat Tran
 
MapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn GloballyMapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn Globally
 
Private Cloud Delivers Big Data in Oil & Gas v4
Private Cloud Delivers Big Data in Oil & Gas v4Private Cloud Delivers Big Data in Oil & Gas v4
Private Cloud Delivers Big Data in Oil & Gas v4
 
Embracing Cloud Agility to Maximize Flexibility & Performance
Embracing Cloud Agility to Maximize Flexibility & Performance Embracing Cloud Agility to Maximize Flexibility & Performance
Embracing Cloud Agility to Maximize Flexibility & Performance
 
Mike Tuche, CEO of Talend: Enabling the Data Driven Enterprise
Mike Tuche, CEO of Talend: Enabling the Data Driven EnterpriseMike Tuche, CEO of Talend: Enabling the Data Driven Enterprise
Mike Tuche, CEO of Talend: Enabling the Data Driven Enterprise
 
Jan van der Vegt. Challenges faced with machine learning in practice
Jan van der Vegt. Challenges faced with machine learning in practiceJan van der Vegt. Challenges faced with machine learning in practice
Jan van der Vegt. Challenges faced with machine learning in practice
 
Elastic in oil and gas
Elastic in oil and gasElastic in oil and gas
Elastic in oil and gas
 
Industry trends.v0.1pptx
Industry trends.v0.1pptxIndustry trends.v0.1pptx
Industry trends.v0.1pptx
 

Similaire à 7 Habits for Big Data in Production - keynote Big Data London Nov 2018

Veritas + MongoDB
Veritas + MongoDBVeritas + MongoDB
Veritas + MongoDBMongoDB
 
Surprising Advantages of Streaming - ACM March 2018
Surprising Advantages of Streaming - ACM March 2018Surprising Advantages of Streaming - ACM March 2018
Surprising Advantages of Streaming - ACM March 2018Ellen Friedman
 
Who Broke My Cloud? SaaS Monitoring Best Practices
Who Broke My Cloud? SaaS Monitoring Best PracticesWho Broke My Cloud? SaaS Monitoring Best Practices
Who Broke My Cloud? SaaS Monitoring Best PracticesThousandEyes
 
Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...
Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...
Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...Matt Stubbs
 
Containers and Kubernetes without limits
Containers and Kubernetes without limitsContainers and Kubernetes without limits
Containers and Kubernetes without limitsAntje Barth
 
Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...DataWorks Summit
 
Container and Kubernetes without limits
Container and Kubernetes without limitsContainer and Kubernetes without limits
Container and Kubernetes without limitsAntje Barth
 
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughtonReal-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughtonSynerzip
 
Postgres Vision 2018: The Pragmatic Cloud
Postgres Vision 2018:  The Pragmatic CloudPostgres Vision 2018:  The Pragmatic Cloud
Postgres Vision 2018: The Pragmatic CloudEDB
 
Cloud Billing: Enabling consumers for pay for what they use
Cloud Billing: Enabling consumers for pay for what they useCloud Billing: Enabling consumers for pay for what they use
Cloud Billing: Enabling consumers for pay for what they useEduardo Mendez Polo
 
Cognizant Cloud for Utilities
Cognizant Cloud for UtilitiesCognizant Cloud for Utilities
Cognizant Cloud for UtilitiesSteve Lennon
 
Daniel Bochicchio, Skybernetics - “Valuable Insights from On High: Drone use ...
Daniel Bochicchio, Skybernetics - “Valuable Insights from On High: Drone use ...Daniel Bochicchio, Skybernetics - “Valuable Insights from On High: Drone use ...
Daniel Bochicchio, Skybernetics - “Valuable Insights from On High: Drone use ...Michael Hewitt, GISP
 
Digital Transformation; Digital Twins for Delivering Business Value in IIoT
Digital Transformation; Digital Twins for Delivering Business Value in IIoTDigital Transformation; Digital Twins for Delivering Business Value in IIoT
Digital Transformation; Digital Twins for Delivering Business Value in IIoTThe Hive
 
GEP-Supply-Chain-Planning-Guide-Fnl_0.pdf
GEP-Supply-Chain-Planning-Guide-Fnl_0.pdfGEP-Supply-Chain-Planning-Guide-Fnl_0.pdf
GEP-Supply-Chain-Planning-Guide-Fnl_0.pdfJamesKumar21
 
Fit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data LetdownFit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data LetdownInside Analysis
 
BIG Data & Hadoop Applications in Logistics
BIG Data & Hadoop Applications in LogisticsBIG Data & Hadoop Applications in Logistics
BIG Data & Hadoop Applications in LogisticsSkillspeed
 
NRB MAINFRAME DAY 02 - Gamal Khaldi - NRB Mainframe YtD recap and outlook 201...
NRB MAINFRAME DAY 02 - Gamal Khaldi - NRB Mainframe YtD recap and outlook 201...NRB MAINFRAME DAY 02 - Gamal Khaldi - NRB Mainframe YtD recap and outlook 201...
NRB MAINFRAME DAY 02 - Gamal Khaldi - NRB Mainframe YtD recap and outlook 201...NRB
 
IoT & Data Analytics Sharing Session - Telkomsigma
IoT & Data Analytics Sharing Session - TelkomsigmaIoT & Data Analytics Sharing Session - Telkomsigma
IoT & Data Analytics Sharing Session - TelkomsigmaTogi Nababan
 
Network Centric Cloud: Competing in a IT World with a Telecom Approach
Network Centric Cloud: Competing in a IT World with a Telecom ApproachNetwork Centric Cloud: Competing in a IT World with a Telecom Approach
Network Centric Cloud: Competing in a IT World with a Telecom ApproachEduardo Mendez Polo
 
DevOps as a competitive advantage
DevOps as a competitive advantageDevOps as a competitive advantage
DevOps as a competitive advantageIdo Green
 

Similaire à 7 Habits for Big Data in Production - keynote Big Data London Nov 2018 (20)

Veritas + MongoDB
Veritas + MongoDBVeritas + MongoDB
Veritas + MongoDB
 
Surprising Advantages of Streaming - ACM March 2018
Surprising Advantages of Streaming - ACM March 2018Surprising Advantages of Streaming - ACM March 2018
Surprising Advantages of Streaming - ACM March 2018
 
Who Broke My Cloud? SaaS Monitoring Best Practices
Who Broke My Cloud? SaaS Monitoring Best PracticesWho Broke My Cloud? SaaS Monitoring Best Practices
Who Broke My Cloud? SaaS Monitoring Best Practices
 
Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...
Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...
Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...
 
Containers and Kubernetes without limits
Containers and Kubernetes without limitsContainers and Kubernetes without limits
Containers and Kubernetes without limits
 
Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...
 
Container and Kubernetes without limits
Container and Kubernetes without limitsContainer and Kubernetes without limits
Container and Kubernetes without limits
 
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughtonReal-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
 
Postgres Vision 2018: The Pragmatic Cloud
Postgres Vision 2018:  The Pragmatic CloudPostgres Vision 2018:  The Pragmatic Cloud
Postgres Vision 2018: The Pragmatic Cloud
 
Cloud Billing: Enabling consumers for pay for what they use
Cloud Billing: Enabling consumers for pay for what they useCloud Billing: Enabling consumers for pay for what they use
Cloud Billing: Enabling consumers for pay for what they use
 
Cognizant Cloud for Utilities
Cognizant Cloud for UtilitiesCognizant Cloud for Utilities
Cognizant Cloud for Utilities
 
Daniel Bochicchio, Skybernetics - “Valuable Insights from On High: Drone use ...
Daniel Bochicchio, Skybernetics - “Valuable Insights from On High: Drone use ...Daniel Bochicchio, Skybernetics - “Valuable Insights from On High: Drone use ...
Daniel Bochicchio, Skybernetics - “Valuable Insights from On High: Drone use ...
 
Digital Transformation; Digital Twins for Delivering Business Value in IIoT
Digital Transformation; Digital Twins for Delivering Business Value in IIoTDigital Transformation; Digital Twins for Delivering Business Value in IIoT
Digital Transformation; Digital Twins for Delivering Business Value in IIoT
 
GEP-Supply-Chain-Planning-Guide-Fnl_0.pdf
GEP-Supply-Chain-Planning-Guide-Fnl_0.pdfGEP-Supply-Chain-Planning-Guide-Fnl_0.pdf
GEP-Supply-Chain-Planning-Guide-Fnl_0.pdf
 
Fit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data LetdownFit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data Letdown
 
BIG Data & Hadoop Applications in Logistics
BIG Data & Hadoop Applications in LogisticsBIG Data & Hadoop Applications in Logistics
BIG Data & Hadoop Applications in Logistics
 
NRB MAINFRAME DAY 02 - Gamal Khaldi - NRB Mainframe YtD recap and outlook 201...
NRB MAINFRAME DAY 02 - Gamal Khaldi - NRB Mainframe YtD recap and outlook 201...NRB MAINFRAME DAY 02 - Gamal Khaldi - NRB Mainframe YtD recap and outlook 201...
NRB MAINFRAME DAY 02 - Gamal Khaldi - NRB Mainframe YtD recap and outlook 201...
 
IoT & Data Analytics Sharing Session - Telkomsigma
IoT & Data Analytics Sharing Session - TelkomsigmaIoT & Data Analytics Sharing Session - Telkomsigma
IoT & Data Analytics Sharing Session - Telkomsigma
 
Network Centric Cloud: Competing in a IT World with a Telecom Approach
Network Centric Cloud: Competing in a IT World with a Telecom ApproachNetwork Centric Cloud: Competing in a IT World with a Telecom Approach
Network Centric Cloud: Competing in a IT World with a Telecom Approach
 
DevOps as a competitive advantage
DevOps as a competitive advantageDevOps as a competitive advantage
DevOps as a competitive advantage
 

Dernier

Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 

Dernier (20)

Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 

7 Habits for Big Data in Production - keynote Big Data London Nov 2018

  • 1. ELLEN  FRIEDMAN          BIG  DATA  LONDON       Principal  Technologist      14  November    2018   7  Successful  Habits  for  Data-­‐Intensive        ApplicaBons  in  ProducBon  
  • 2. 2 © 2018 MapR Technologies, Inc. Ellen  Friedman,  PhD        Principal  Technologist,  MapR  Technologies    CommiBer  Apache  Drill  &  Apache  Mahout  projects    O’Reilly  author           Email      efriedman@mapr.com      ellenf@apache.org       TwiBer  @Ellen_Friedman                                 Contact  InformaBon      
  • 3. 3 © 2018 MapR Technologies, Inc. . Images  ©  Friedman  &  Dunning   Image  courtesy  Mtell  used  with  permission Big  Value  from  Big  Data  in  ProducBon  across  Diverse  Businesses   TelecommunicaQons   Smart-­‐Metered  UQliQes   Heavy  Industry   Financial   TransportaQon   Agritech  
  • 4. 4 © 2018 MapR Technologies, Inc. “The  future  is  already  here  –  it’s  just   not  evenly  distributed.”                  -­‐  William  Gibson          
  • 5. 5 © 2018 MapR Technologies, Inc. .   Some  organizaQons  more  successful  at  geWng  large  scale  systems  into  producQon       •  2018  Gartner  report  stated  only  17%  of  Hadoop-­‐based  systems  were  in  producQon            vs   •  Over  90%  of  MapR  customers  have  large  scale  producQon  systems     Why?   Value  from  Data-­‐Intensive  ApplicaBons  in  ProducBon  
  • 6. 6 © 2018 MapR Technologies, Inc. 1   Data  may  be  in  production  long  before   code  is  complete            
  • 7. 7 © 2018 MapR Technologies, Inc. . Labs  in  Canada  froze  blood  samples  for  years  in  case  they  contain  valuable  informaQon.     •  Modern  geneQc  techniques  revealed  key  disease  data   •  Correlated  with  outcomes  for  the  donor  paQents   •  The  data  was  preserved  before  the  analysis  was  even   Real  World  Example:  Biological  Samples  Contain  Data   Image  ©  2003  Ellen  Friedman  
  • 8. 8 © 2018 MapR Technologies, Inc. Are  you  asking  the  right  quesBon?   tnow Hourly clicks at A B C Which  markeQng  email  is  most  effecQve?     •  Looks  like  C  –  Blue  is  best  when  you  base  the   quesQon  on  data  for  highest  hourly  response   rate  at  a  current  point  in  Qme  tnow   •  But  this  is  misleading.    Why?  
  • 9. 9 © 2018 MapR Technologies, Inc. BeRer  way  to  frame  the  quesBon:   Which  markeQng  email  is  most  effecQve  based   on  response  rate  at  Qme  t  ader  launch?     •  Collect  and  retain  data  across  same  Qme   interval  relaQve  to  launch       •  Looks  like  B  –  Green  is  best  performer  and   C-­‐  Blue  is  least  effecQve  for  response  rate   Click rate tnow t Click rate Time after launch Launch Measure A B C Measuring performance at a constant time after launch gives consistent comparison
  • 10. 10 © 2018 MapR Technologies, Inc. BeRer  way  to  frame  the  quesBon:   Which  markeQng  email  is  most  effecQve  based   on  response  rate  at  Qme  t  ader  launch?     •  Collect  and  retain  data  across  same  Qme   interval  relaQve  to  launch       •  Looks  like  B  –  Green  is  best  performer  and   C-­‐  Blue  is  least  effecQve  for  response  rate   Click rate tnow t Click rate Time after launch Launch Measure A B C But that’s hard after we have over- written the old data with current data
  • 11. 11 © 2018 MapR Technologies, Inc. Spot  the  Difference?   GET photo.jpg HTTP/1.1 Host: lh4.googleusercontent User-agent: Mozilla/5.0 (Ma Accept: image/png,image/* Accept-language: en-US,en Accept-encoding: gzip, defl Referer: https://www.google Connection: keep-alive If-none-match: "v9” Cache-control: max-age=0 GET cc/borken.json HTTP/1.1 host: c.qrs.my user-agent: Mozilla/4.0 (co accept: application/json, t accept-language: en-US,en accept-encoding: gzip, defl referer: none connection: keep-alive if-none-match: "v9” cache-control: max-age=0 Attacker requestReal request
  • 12. 12 © 2018 MapR Technologies, Inc. Spot  the  Difference?   GET photo.jpg HTTP/1.1 Host: lh4.googleusercontent User-agent: Mozilla/5.0 (Ma Accept: image/png,image/* Accept-language: en-US,en Accept-encoding: gzip, defl Referer: https://www.google Connection: keep-alive If-none-match: "v9” Cache-control: max-age=0 GET cc/borken.json HTTP/1.1 host: c.qrs.my user-agent: Mozilla/4.0 (co accept: application/json, t accept-language: en-US,en accept-encoding: gzip, defl referer: none connection: keep-alive if-none-match: "v9” cache-control: max-age=0 Attacker requestReal request
  • 13. 13 © 2018 MapR Technologies, Inc. Security  expert  at  a  bank  preserved  headers  for  web  site  requests     Detected  anomaly  in  headers  for  the  aBackers  vs  normal  (real)  requests     But  how  would  you  know  what  data  to  preserve?     •  PaBern  of  behavior  for  aBackers  was  allowable  for  headers     •  It  was  not  predictable:  but  it  was  different       Domain  Knowledge  MaRers:  DetecBng  Security  ARacks    
  • 14. 14 © 2018 MapR Technologies, Inc. 2   End-­‐to-­‐end  design  for  production              
  • 15. 15 © 2018 MapR Technologies, Inc. Do  you  have  a  good  fit  between  applicaQon,  its  SLAs  and  pracQcal  business  goal?   Is  there  a  way  to  take  ac2on  on  the  output  of  the  applicaQon?     !  Note:  a  report  is  not  an  acQon   Do  you  have  real  advance  planning  for  producQon?     !  Note:  slapping  an  SLA  onto  a  complex  applicaQon  &  tossing  it  to  IT  with  hazily   defined  criQcal  tasks  is  not  producQon  planning         The  missing  bits  usually  aren’t  in  the  applicaBon  itself  
  • 16. 16 © 2018 MapR Technologies, Inc. . Remember:    IT  doesn’t  have  a  magic  wand…  
  • 17. 17 © 2018 MapR Technologies, Inc. . Remember:    IT  doesn’t  have  a  magic  wand…  
  • 18. 18 © 2018 MapR Technologies, Inc. Build  a  Data  Fabric   Flexibility  &  agility  to  respond  as  life  changes  
  • 19. 19 © 2018 MapR Technologies, Inc. A  Global  Data  Fabric:  Edge  to  On-­‐Premises  to  Cloud   Data  where  you  want  it,  compute  power  where  you  need  it.    
  • 20. 20 © 2018 MapR Technologies, Inc. . Comprehensive  View  of  Data  vs  Isolated  Services   Commonality   IsolaQon   Both  are  needed,  for  different  purposes.    Best  pracQce  achieves  a  balance.  
  • 21. 21 © 2018 MapR Technologies, Inc. 3   Orchestration  of  applications  and   orchestration  of  data            
  • 22. 22 © 2018 MapR Technologies, Inc. Containerized  applicaQons  run  in  different  environments  on  same  cluster  at  same  Qme     Kubernetes  is  emerging  as  the  leader  in  orchestraQon  of  containerized  applicaQons   •  You  specify  what  needs  to  be  done,  Kubernetes  arranges  it  by  running  containers       •  Kubernetes  allows  access  to  services  by  name     ContainerizaBon:  flexibility,  convenience,  predictability  
  • 23. 23 © 2018 MapR Technologies, Inc. CNCF  reported       •  nearly  70%  of  organizaQons  surveyed  use  Kubernetes  to  manage  containers     •  but  the  #1  issue  for  Kubernetes  users  is  storage     You  don’t  want  to  store  state  in  containers  –  defeats  flexibility            Biggest  Challenge  with  Kubernetes  is  Data  Persistence  
  • 24. 24 © 2018 MapR Technologies, Inc. CNCF  reported       •  nearly  70%  of  organizaQons  surveyed  use  Kubernetes  to  manage  containers     •  but  the  #1  issue  for  Kubernetes  users  is  storage   You  don’t  want  to  store  state  in  containers  –  defeats  flexibility      How  do  you  get  the  benefits  of  containeriza2on  without  being  limited  to    stateless  applica2ons?        Biggest  Challenge  with  Kubernetes  is  Data  Persistence  
  • 25. 25 © 2018 MapR Technologies, Inc. App 1 App 2 App 3 Kubernetes
  • 26. 26 © 2018 MapR Technologies, Inc. App 1 App 2 App 3 Kubernetes rpc stream LogFile
  • 27. 27 © 2018 MapR Technologies, Inc. App 1 App 2 App 3 Kubernetes rpc stream LogFile Data Platform
  • 28. 28 © 2018 MapR Technologies, Inc.   Kubernetes  for  orchestraQon  of  applicaQons    +     Dataware  for  orchestraQon  of  data           A  Powerful  CombinaBon    
  • 29. 29 © 2018 MapR Technologies, Inc. 4   Simplicity  is  golden.     You  should  not  need  an  army  to  administer   a  cluster.            
  • 30. 30 © 2018 MapR Technologies, Inc. . If  your  system  has  lots  of  work-­‐arounds,  think  again…  
  • 31. 31 © 2018 MapR Technologies, Inc. . If  your  system  has  lots  of  work-­‐arounds,  think  again…  
  • 32. 32 © 2018 MapR Technologies, Inc. Much  of  logisQcs  and  many  processes  should  be  handled  by  the  plaoorm,  not  by   developers  at  the  applicaQon  level     •  SeparaQon  of  concerns  for  system  administrators  vs  developers/data  scienQsts   •  More  efficient,  less  risk  of  error   •  Ease  of  administraQon     Data  Pla[orm  Should  Simplify    
  • 33. 33 © 2018 MapR Technologies, Inc. Orchestrate  Data  with  Dataware   Legacy  ApplicaBons   Big  Data  1.0  ApplicaBons   Next-­‐Gen  ApplicaBons     MapR  Converged  Data  Plaoorm     High  Availability   Real  Time   Unified  Security   MulB-­‐tenancy   Disaster  Recovery   Global  Namespace   Real-­‐Time  NoQL  Database      Stream    Transport  Web-­‐Scale  Storage     MapR  is  more  than  just  data  storage  –  it’s  like  Kubernetes  for  data.       MapR  is  dataware.  
  • 34. 34 © 2018 MapR Technologies, Inc. MapR  Volume:  Directory  with  Special  Management  CapabiliBes   Cluster Volume mount point Directories Files Streams Table Volumes  used  for  easy  control  of  access,  mulQtenancy,  data  locality  &  DR  
  • 35. 35 © 2018 MapR Technologies, Inc. MapR  can  run  huge  numbers  of  applicaQons  on  same  cluster       •  Great  performance,  opQmized  resources   •  Comprehensive  view  of  data  &  beBer  collaboraQon   •  Ease  of  administraQon     Is  your  system  easy  to  administer?   We’ve  seen  a  large  retail  customer  manage  a  cluster  of    >1000  nodes   with  just  3  administrators      
  • 36. 36 © 2018 MapR Technologies, Inc. Open  APIs   Multicloud & Hybrid Cloud Strategy with MapR: Application   •  Unified  Security  Model   •  Data  access  decoupled  from  physical   storage  location.  Globally.   •  No  lock-­‐in  to  proprietary  APIs   •  Full  openness   •  Data  made  portable   API  Connector   ✓ GLOBAL  DATA  MANAGEMENT   Edge   Private  Cloud   On  Premise   Public  Cloud   Public  Cloud   Public  Cloud   Uniform computing environment everywhere
  • 37. 37 © 2018 MapR Technologies, Inc. 5Build  real  multitenancy            
  • 38. 38 © 2018 MapR Technologies, Inc. MapR  Volumes:  Easy  Data  Management   •  MulQple  volumes  span  a  cluster   •  Files,  tables,  streams  in  same  volume   •  Fine-­‐grained  control  of  who  has  access   •  Basis  for  mirroring,  snapshots   •  Great  advantage  for  mulQtenancy    
  • 39. 39 © 2018 MapR Technologies, Inc. MapR  Volumes:  Control  Data  Locality   •  Place  data  on  specialized  hardware   (such  as  GPUs)   •  Meet  compliance  requirements     •  BeBer  opQmizaQon  of  resources   •  Great  advantage  for  AI/  machine   learning    
  • 40. 40 © 2018 MapR Technologies, Inc. A  simpler  system  is  more  cost  effecQve,  with  beBer  performance       •  Remember:     Every  HBase  commit  requires  a  round  trip  to  the  namenode  if  run  on  HDFS   MapR  has  no  namenode  –  avoids  the  problem     •  It  isn’t  magic.  It’s  just  real  mulQtenancy.     Does  your  system  avoid  unnecessary  sprawl?   We’ve  seen  a  customer  collapse  17  HBase  clusters  in  AWS  to  1  cluster   running  five  nodes  HBase  on  MapR  with  great  performance    
  • 41. 41 © 2018 MapR Technologies, Inc. 6   Streaming  architecture  provides   flexibility            
  • 42. 42 © 2018 MapR Technologies, Inc. Stream  Transport  to  Decouple  Producers  &  Consumers   P P P C C C Transport Processing Kafka / MapR Streams “Streaming  Microservices”  by  Ted  Dunning  &  Ellen  Friedman,  in  Encyclopedia  of  Big  Data  Technologies,  Sherif  Sakr  and  Albert  Zomaya,  editors,  ©  2018   (Springer  InternaQonal  Publishing)     ebook  Streaming  Architecture  by  Ted  Dunning  &  Ellen  Friedman  ©  2016  (  O’Reilly  Media),  chapter  3:   hBps://mapr.com/ebooks/streaming-­‐architecture/chapter-­‐03-­‐streaming-­‐plaoorm-­‐for-­‐microservices.html        
  • 43. 43 © 2018 MapR Technologies, Inc. Stream-­‐first  Architecture  Supports  Microservices   Real-time analytics EMR Patient Facilities management Insurance audit A B Medical tests C Medical test results With  the  right  messaging  tool  you   support  mulQple  classes  of  use  cases        
  • 44. 44 © 2018 MapR Technologies, Inc. ResultsRendezvous Rendezvous  Architecture  is  based  on  streaming  microservices   Scores ArchiveDecoy m1 m2 m3 Features / profiles InputRaw Rendezvous Architecture described in: -  Machine Learning Logistics book by Ted Dunning & Ellen Friedman, 2018 (O’Reilly) -­‐  “Rendezvous  Architecture”  by  Ted  Dunning  &  Ellen  Friedman,  chapter  in  Encyclopedia  of  Big  Data  Technologies.  Sherif  Sakr  and   Albert  Zomaya,  editors.  Springer  InternaQonal  Publishing,  2018.        
  • 45. 45 © 2018 MapR Technologies, Inc. 7Build  a  data-­‐aware  culture.     That  also  gives  you  a  production-­‐ready  culture.            
  • 46. 46 © 2018 MapR Technologies, Inc. DataOps:  Brings  Flexibility  &  Focus   •  Cross-functional teams cut across skill guilds •  Better communication & shared goal keep effort focused and flexible Refs: “DevOps, DataOps & Machine Learning” by Terry McCann http://www.hyperbi.co.uk/devops-dataops- and-machine-learning/ “How to Manage a DataOps Team” by E. Friedman in RTInsights https://www.rtinsights.com/how-manage- dataops-team/ Developer Data Scientist / Data Engineer Ops NetworkSite Reliability QA Security Project 1 Project 2 Project 3 Kubernetes Data platform Storage Required team skill Occasional help
  • 47. 47 © 2018 MapR Technologies, Inc. AI  &  Analy)cs  in  Produc)on:        How  to  Make  It  Work     By  Ted  Dunning  &  Ellen  Friedman         Download  FREE  pdf  of  this  book  courtesy  MapR:   https://mapr.com/ebook/ai-and-analytics-in-production/
  • 48. 48 © 2018 MapR Technologies, Inc. Please support women in tech – help build girls’ dreams of what they can accomplish © Ellen Friedman 2015#womenintech #datawomen
  • 49. 49 © 2018 MapR Technologies, Inc. Thank  you  !    
  • 50. 50 © 2018 MapR Technologies, Inc. Ellen  Friedman,  PhD        Principal  Technologist,  MapR  Technologies    CommiBer  Apache  Drill  &  Apache  Mahout  projects    O’Reilly  author           Email      efriedman@mapr.com      ellenf@apache.org       TwiBer  @Ellen_Friedman                                 Contact  InformaBon