SlideShare une entreprise Scribd logo
1  sur  30
Télécharger pour lire hors ligne
Apache	
  Hadoop	
  and	
  the	
  	
  
Big	
  Data	
  Opportunity	
  in	
  Banking	
  
A	
   w ebcast	
   f rom	
   H ortonworks	
   & 	
   T resata	
  



©	
  Hortonworks	
  Inc.	
  2011	
                         ©	
  tresata	
  	
  	
  all	
  rights	
  reserved	
  
Presenters	
  
•  Arun	
  C.	
  Murthy	
  
   − Co-­‐founder	
  of	
  Hortonworks	
  
   − Lead	
  of	
  NextGen	
  MapReduce	
  in	
  Apache	
  Hadoop	
  
   − Long-­‐Jme	
  contributor	
  and	
  commiKer	
  to	
  Apache	
  Hadoop	
  


•  Abhishek	
  (Abhi)	
  Mehta	
  
   − Co-­‐founder	
  of	
  Tresata	
  
   − Creator	
  of	
  first	
  Hadoop-­‐powered	
  big	
  data	
  &	
  analyJcs	
  plaOorm	
  for	
  
    financial	
  industry	
  data	
  




                                             3	
  
Agenda	
  
•  What is Apache Hadoop?
•  Creating Value from Big Data
•  Tresata and Hadoop for Banking
•  Future of Hadoop




                          4	
  
What	
  is	
  Apache	
  Hadoop?	
  

A	
  set	
  of	
  open	
  source	
  projects	
  owned	
  by	
  
the	
  Apache	
  FoundaJon	
  that	
  transforms	
  
commodity	
  computers	
  and	
  network	
  into	
  a	
  
distributed	
  service	
  
	
  



•  HDFS	
  –	
  Stores	
  petabytes	
  of	
  data	
  reliably	
  
•  MapReduce	
  –	
  Allows	
  huge	
  distributed	
  
      computaJons	
  
	
  

Key	
  A9ributes	
  
•      Reliable	
  and	
  redundant	
  –	
  Doesn’t	
  slow	
  down	
  or	
  loose	
  data	
  even	
  as	
  hardware	
  fails	
  
•      Very	
  powerful	
  –	
  Harnesses	
  huge	
  clusters,	
  supports	
  best	
  of	
  breed	
  analyJcs	
  
•      Scalable	
  –	
  scales	
  linearly	
  to	
  handle	
  “big	
  data”	
  volumes	
  
•      Cost-­‐effecDve	
  –	
  runs	
  on	
  commodity	
  machines	
  &	
  network	
  
•      Simple	
  and	
  flexible	
  APIs	
  –	
  enabling	
  a	
  large	
  ecosystem	
  of	
  soluJon	
  providers	
  



                                                          5	
  
Core Apache Hadoop Projects

                                                                                                                                         Programming
                                                                     Pig	
                                           Hive	
  
                                                                (Data	
  Flow)	
                                      (SQL)	
             Languages


                                                                                     MapReduce	
                                          Computation
                         Zookeeper	
  	
  
      (Management)	
  




                                        (CoordinaJon)	
  




                                                                        (Distributed	
  Programing	
  Framework)	
  
HMS	
  




                                                               HCatalog	
                                          HBase	
               Table Storage
                                                                (Meta	
  Data)	
                             (Columnar	
  Storage)	
  
                                                                                                                           	
  




                                                                                           HDFS	
  	
                                    Object Storage
                                                                           (Hadoop	
  Distributed	
  File	
  System)	
  




                                                       Core Apache Hadoop                           Related Apache Projects


                                                                                         6	
  
Apache	
  Hadoop:	
  Why	
  is	
  it	
  TransformaDonal?	
  

Data	
  Deluge	
  (growth	
  faster	
  than	
  Moore’s	
  law)	
  
•           Economist:	
  Only	
  5%	
  of	
  generated	
  data	
  is	
  structured	
  
•           Gartner:	
  	
  Data	
  growth	
  is	
  the	
  biggest	
  data	
  center	
  hardware	
  infrastructure	
  
            challenge	
  for	
  large	
  enterprises	
  
•           Forrester:	
  Four	
  Vs	
  -­‐	
  Volume,	
  Velocity,	
  Variety,	
  Variability	
  
•           Hundreds	
  of	
  exabytes	
  of	
  data	
  per	
  year!	
  
     	
  




                                                                                   Source:	
  IDC	
  Digital	
  Universe	
  Study,	
  May	
  2011	
  



                                                            7	
  
Apache	
  Hadoop:	
  Why	
  is	
  it	
  TransformaDonal?	
  

New	
  way	
  of	
  thinking	
  about	
  your	
  data:	
  
	
  




•  Current	
  	
  
              − What	
  data	
  do	
  I	
  keep?	
  
              − What	
  reports	
  do	
  I	
  run?	
  
              − Sample	
  –	
  Store	
  –	
  Extrapolate	
  à	
  What-­‐If	
  scenarios	
  à	
  Variable	
  insights	
  
       	
  

•  New	
  dawn	
  
              − Store	
  everything	
  (viable	
  and	
  economical)	
  –	
  Process	
  whenever!	
  
              − Test	
  every	
  what-­‐if	
  situaJon,	
  prove	
  every	
  hypothesis…	
  do	
  it	
  in	
  a	
  Jmely	
  manner!	
  
              − No	
  more	
  down-­‐sampled	
  data	
  




                                                               8	
  
5	
  Ways	
  to	
  Create	
  Value	
  from	
  Big	
  Data	
  




Source:	
  	
  McKinsey	
  &	
  Company	
  report.	
  Big	
  data:	
  The	
  next	
  fronJer	
  for	
  innovaJon,	
  compeJJon,	
  and	
  producJvity.	
  May	
  2011.	
  



                                                                        9	
  
Crossing	
  the	
  Chasm	
  
                      DisrupJon:	
  Data	
  




                                               Geoffrey	
  A.	
  Moore*	
  




                         10	
  
Typical	
  ApplicaDons	
  &	
  Early	
  Adopters	
  


                                                        data
                          analyzing web logs          analytics
            advertising optimization       machine learning
                                                   mail anti-spam
            text mining    web search
                                                content optimization
            customer trend analysis
                                         ad selection
          video & audio processing
                                                 data mining
                 user interest prediction
                                 social media




                            11	
  
Big	
  Data	
  Has	
  Reached	
  Every	
  Market	
  Sector	
  
Digital	
  data	
  is	
  personal,	
  everywhere,	
  increasingly	
  accessible,	
  
              and	
  will	
  con:nue	
  to	
  grow	
  exponen:ally	
  




                               Source:	
  	
  McKinsey	
  &	
  Company	
  report.	
  Big	
  
                               data:	
  The	
  next	
  fronJer	
  for	
  innovaJon,	
  
                               compeJJon,	
  and	
  producJvity.	
  May	
  2011.	
  
Big	
  Data	
  Value	
  CreaDon	
  OpportuniDes	
  
Financial	
  Services	
                                          Healthcare	
  
• 	
  Detect	
  fraud	
                                          • 	
  OpJmal	
  treatment	
  pathways	
  
• 	
  Model	
  and	
  manage	
  risk	
                           • 	
  Remote	
  paJent	
  monitoring	
  
• 	
  Improve	
  debt	
  recovery	
  rates	
                     • 	
  PredicJve	
  modeling	
  for	
  new	
  drugs	
  
• 	
  Personalize	
  banking/insurance	
  products	
             • 	
  Personalized	
  medicine	
  
Retail	
                                                         Web	
  /	
  Social	
  /	
  Mobile	
  
• 	
  In-­‐store	
  behavior	
  analysis	
                       • 	
  LocaJon-­‐based	
  markeJng	
  
• 	
  Cross	
  selling	
                                         • 	
  Social	
  segmentaJon	
  
• 	
  OpJmize	
  pricing,	
  placement,	
  design	
              • 	
  SenJment	
  analysis	
  
• 	
  OpJmize	
  inventory	
  and	
  distribuJon	
               • 	
  Price	
  comparison	
  services	
  
Manufacturing	
                                                  Government	
  
• 	
  Design	
  to	
  value	
                                    • 	
  Reduce	
  fraud	
  
• 	
  Crowd-­‐sourcing	
                                         • 	
  Segment	
  populaJons,	
  customize	
  acJon	
  
• 	
  “Digital	
  factory”	
  for	
  lean	
  manufacturing	
     • 	
  Support	
  open	
  data	
  iniJaJves	
  
• 	
  Improve	
  service	
  via	
  product	
  sensor	
           • 	
  Automate	
  decision	
  making	
  
data	
  

                                                    13	
  
Why	
  can	
  you	
  bank	
  on	
  Hortonworks?	
  
•  Architects	
  of	
  Hadoop	
  since	
  big	
  bang,	
  circa	
  2006	
  
•  Real	
  world	
  experience	
  supporJng	
  the	
  largest	
  Hadoop	
  install	
  
   in	
  the	
  world:	
  
   − 50,000	
  node	
  footprint	
  	
  
   − Over	
  200PB	
  of	
  data	
  
   − 24x7	
  service	
  	
  
   − Billions	
  of	
  ad	
  dollars	
  
•  We’ve	
  taken	
  the	
  3am	
  calls	
  to	
  fix	
  stuff	
  when	
  it	
  breaks!	
  




                                           14	
  
Tresata	
  =	
  Hadoop	
  for	
  Banking	
  



                   15	
  
What	
  We	
  Believe	
  
1.  Data	
  will	
  reboot	
  the	
  financial	
  service	
  industry	
  


2.  Data	
  growth	
  is	
  viral…exisJng	
  tools	
  can’t	
  keep	
  up	
  


3.  The	
  economics	
  to	
  store,	
  process,	
  analyze	
  and	
  visualize	
  all	
  
    of	
  your	
  data	
  makes	
  it	
  a	
  ‘no-­‐brainer’	
  to	
  do	
  so	
  


4.  Big	
  Data	
  capabiliDes	
  are	
  needed	
  to	
  address	
  business	
  
    problems	
  


                                      16	
  
sharpen their data skills, even                                                        the accessibility of data sources, and the
                                    low-ranking sectors (by our gauges                                                     degree to which managers make data-driven
                                    of value potential and data capture),                                                  decisions.
                      Q4 2011


and	
  What	
  McKinsey	
  Said	
  
                                    such as construction and education,
                      Big data sidebar on sector productivity
                                    could see their fortunes change.
                      Exhibit 1 of 1




                      The ease of capturing big data’s value, and the magnitude
                      of its potential, vary across sectors.

                      Example: US economy                                                                                         Size of bubble indicates relative
                                                                                                                                  contribution to GDP



                                                   High
                                                                                Utilities          Health care Computers and other electronic products
                                                                 Natural resources                 providers
                                                                                                                       Information
                                                           Manufacturing
                Big data: ease-of-capture index1




                                                                                                                                          Finance and
                                                                                                                                          insurance


                                                                 Professional services                                 Transportation and warehousing
                                                                                                                          Real estate
                                                               Accommodation and food
                                                                                                                            Management of companies
                                                          Construction
                                                                                                                             Wholesale trade

                                                             Administrative
                                                             services                                   Retail trade

                                                                   Other services
                                                                                                   Educational
                                                                                                   services
                                                                                                                                      Government
                                                                                                   Arts and
                                                   Low                                             entertainment                                            High

                                                                                             Big data: value potential index1


         Source:	
  	
  mcksinsey	
  global	
  insJtute	
  october	
  2011	
  	
  
             !       "#$%&'()*+'&%',-+*.)(*#/%#0%1'($*.23%2''%)--'/&*,%*/%4.5*/2'6%7+#8)+%9/2(*(:('%0:++%$'-#$(%!"#$%&'&($)*+$,+-'$./0,'"+/$.0/$
                     ",,01&'"0,2$3045+'"'"0,2$&,%$5/0%63'"1"'73%);)*+)8+'%0$''%#0%.<)$='%#/+*/'%)(%1.>*/2'6?.#1@1=*?
                     A#:$.'B%CA%D:$'):%#0%E)8#$%A()(*2(*.2F%4.5*/2'6%7+#8)+%9/2(*(:('%)/)+62*2%%



                                                                                    17	
  
The	
  Opportunity	
  
1.  1-­‐5%	
  of	
  data	
  in	
  a	
  financial	
  insJtuJon	
  is	
  analyzed	
  


2.  Not	
  all	
  data	
  is	
  stored	
  


3.  Top-­‐down	
  macros	
  cannot	
  be	
  implemented	
  or	
  acted	
  on	
  


4.  New	
  approaches	
  to	
  data	
  &	
  analyDcs	
  are	
  essenJal	
  



Source:	
  	
  tresata	
  research	
  


                                             18	
  
The	
  Business	
  Problems	
  
1.  Storage	
  &	
  retrieval	
  –	
  archive	
  data	
  on	
  disk	
  


2.  Consumer	
  behavior	
  analysis	
  –	
  as	
  it	
  happens	
  


3.  Modeling	
  &	
  AnalyDcs	
  –	
  model	
  enJre	
  data	
  sets…	
  


4.  Single	
  View	
  Of	
  Customer	
  –	
  structured	
  &	
  unstructured	
  data	
  	
  




                                         19	
  
Hadoop	
  in	
  Banking	
  Today	
  
1.  Early	
  adopter,	
  like	
  any	
  other	
  technology	
  trend	
  
 a.  Interest	
  is	
  global,	
  not	
  just	
  limited	
  to	
  the	
  US	
  
 b.  Approved	
  for	
  use	
  by	
  most	
  technology	
  teams	
  &	
  architects	
  	
  
 c.  Proof	
  of	
  concepts	
  ongoing	
  at	
  most	
  financial	
  services	
  insJtuJons	
  


2.  Broad	
  agreement	
  that:	
  
 a.  Hadoop	
  will	
  be	
  the	
  Big	
  Data	
  operaDng	
  system	
  
 b.  A	
  hadoop	
  powered	
  Data	
  processing	
  &	
  analyDcs	
  plaaorm	
  will	
  spur	
  
     rapid	
  adopJon	
  
 c.  Need	
  to	
  apply	
  to	
  business	
  problems	
  with	
  revenue/profit	
  impact	
  




                                           20	
  
Elephants	
  in	
  the	
  Room	
  
1.  Resources	
  
 a.  Talent	
  –	
  how	
  many	
  MapReduce	
  developers	
  can	
  we	
  get	
  
 b.  ApplicaDons	
  –	
  data,	
  analyJcs	
  and	
  business	
  problems	
  are	
  the	
  same,	
  
     why	
  should	
  each	
  insJtuJon	
  build	
  their	
  own	
  hadoop	
  applicaJons	
  to	
  
     run	
  the	
  same	
  processes	
  
 c.  EssenDals	
  –	
  how	
  to	
  manage	
  security,	
  provisioning,	
  &	
  performance	
  	
  


2.  ImplementaDon	
  
 a.  IntegraDon	
  –	
  fit	
  with	
  exisJng	
  technology	
  infrastructures	
  &	
  business	
  
      processes	
  
 b.  Support	
  –	
  experience	
  with	
  managing	
  thousands	
  of	
  nodes/	
  servers	
  
 c.  Open	
  Source	
  –	
  ‘out	
  of	
  the	
  box’	
  applicability	
  


                                           21	
  
How	
  Tresata	
  Changes	
  the	
  Game…	
  
1.  Our	
  ApplicaDon	
  
       a.  FS	
  for	
  Hadoop	
  –	
  fully	
  built	
  on	
  hadoop.	
  	
  Store,	
  process,	
  analyze	
  and	
  
           visualize	
  	
  leveraging	
  the	
  full	
  power	
  of	
  hadoop	
  
       b.  Processing	
  Pipeline	
  –	
  automated	
  ingesJon,	
  cleaning,	
  de-­‐duping,	
  
           matching	
  engine	
  built	
  for	
  financial	
  data	
  
       c.  AnalyDcs	
  –	
  massively	
  parallel	
  scoring	
  and	
  algorithm	
  containers	
  
           codified	
  by	
  business	
  problem	
  



2.  Tame	
  the	
  elephants	
  (making	
  it	
  work	
  at	
  scale)	
  

	
  

                                                       22	
  
Tresata	
  Case	
  Study	
  
 A.  Client	
  Business	
  Problem	
  
        i.  Problem	
  –	
  Process	
  data	
  and	
  score	
  for	
  >30	
  MM	
  client	
  applicaJons	
  
        ii.  Data	
  Sources	
  –	
  23	
  separate	
  data	
  sources,	
  mulDple	
  Dme	
  series	
  
        iii.  Raw	
  Variables	
  –	
  100	
  variables	
  per	
  client	
  per	
  data	
  source	
  
        iv.  Current	
  state	
  -­‐	
  expensive	
  legacy	
  plaOorm,	
  algorithms	
  developed	
  on	
  
              sub-­‐samples,	
  unable	
  to	
  scale	
  algorithms	
  to	
  full	
  data	
  set	
  

 B.  Tresata	
  SoluDon	
  
        i.  Data	
  Engine	
  –	
  automated	
  data	
  import,	
  cleaning,	
  matching,	
  scoring	
  
        ii.  Compute	
  Engine	
  –	
  algorithms	
  process	
  &	
  score	
  >30MM	
  in	
  minutes	
  
        iii.  IntegraJon	
  –	
  work	
  with	
  exisJng	
  tools	
  and	
  processes	
  
        iv.  Scalable	
  deployment	
  –	
  Big	
  Data	
  as	
  a	
  Service	
  delivery	
  


 	
                                           23	
  
Tresata	
  Big	
  Data	
  ApplicaDon	
  




                         24	
  
Tresata	
  AnalyDcs	
  




                          25	
  
Tresata	
  +	
  Hortonworks	
  
1.  Commitment	
  to	
  train	
  
 a.  Series	
  of	
  webinars	
  
 b.  Hadoop	
  training	
  tailored	
  for	
  financial	
  services	
  
 c.  Dedicated	
  training	
  programs	
  (tailored	
  for	
  client	
  needs)	
  


2.  Commitment	
  to	
  support	
  
 a.  ProducJon	
  scale	
  support	
  model	
  (Tresata	
  cerJfied	
  for	
  Financial	
  Svcs)	
  
 b.  Meet	
  and/or	
  exceed	
  industry	
  standards	
  on	
  distro	
  
 	
  




                                          26	
  
Where	
  Hadoop	
  is	
  
Going	
  
©	
  Hortonworks	
  Inc.	
  2011	
     ©	
  tresata	
  	
  	
  all	
  rights	
  reserved	
  
The	
  Future	
  
•  Hadoop	
  is	
  going	
  mainstream	
  
  − Amazon,	
  Microsou,	
  Oracle,	
  EMC,	
  NetApp,	
  etc.	
  

•  Apache	
  Hadoop	
  is	
  covering	
  new	
  ground	
  (Hadoop	
  .Next)	
  
  − Much	
  of	
  the	
  development	
  led	
  by	
  Hortonworks	
  
  − More	
  scale,	
  more	
  performance	
  
  − Other	
  paradigms	
  than	
  MapReduce	
  for	
  data	
  processing	
  
  − Enhanced	
  operability	
  and	
  management	
  (Ambari)	
  
  − Metadata	
  management	
  (Hcatalog)	
  




                                        28	
  
Technology	
  Roadmap	
  
	
                                                                                                    	
  



Hadoop.Now	
  –	
  Making	
  Apache	
  Hadoop	
  Accessible	
                                        Q4	
  2011	
  
•  Release	
  the	
  most	
  widely	
  deployed	
  version	
  of	
  Hadoop	
  ever	
  (0.20.205)	
   	
  
•  Release	
  directly	
  usable	
  code	
  via	
  Apache	
  (RPMs,	
  .debs…)	
  
•  Frequent	
  sustaining	
  releases	
  off	
  of	
  the	
  stable	
  branches	
  


	
                                                                                                    	
  



Hadoop.Next	
  –	
  Next	
  GeneraDon	
  Apache	
  Hadoop	
                                           2012	
  
                                                                                                      (Alphas	
  starJng	
  
•  Address	
  key	
  product	
  gaps	
  (HBase	
  support,	
  HA,	
  Management…)	
                   	
  late	
  2011)	
  
•  Enable	
  community	
  &	
  partner	
  innovaJon	
  via	
  modular	
  architecture	
  &	
  
   open	
  APIs	
  
•  Work	
  with	
  community	
  to	
  define	
  integrated	
  stack	
  




                                                29	
  
Next	
  Steps	
  
•  Engage	
  with	
  Hortonworks	
  &	
  Tresata	
  
  − www.hortonworks.com	
  
  − www.tresata.com	
  
•  AddiJonal	
  Webcast	
  Series	
  
  − Reference	
  Architecture	
  for	
  Hadoop	
  in	
  Financial	
  Services	
  -­‐	
  Nov	
  15	
  
  − Hadoop	
  in	
  Financial	
  Services	
  DEEP	
  DIVE	
  -­‐	
  Dec	
  6	
  
  − www.hortonworks.com/webcasts	
  
•  Hortonworks	
  party	
  @	
  Hadoop	
  World	
  
  − Tuesday	
  Nov	
  8	
  at	
  7pm	
  
  − Inc	
  Lounge	
  at	
  the	
  Time	
  Hotel	
  
  − RSVP:	
  hortonworks.eventbrite.com	
  


                                             30	
  
QuesDons?	
  
	
  




©	
  Hortonworks	
  Inc.	
  2011	
     ©	
  tresata	
  	
  	
  all	
  rights	
  reserved	
  

Contenu connexe

Tendances

Essential Metadata Strategies
Essential Metadata StrategiesEssential Metadata Strategies
Essential Metadata StrategiesDATAVERSITY
 
RWE & Patient Analytics Leveraging Databricks – A Use Case
RWE & Patient Analytics Leveraging Databricks – A Use CaseRWE & Patient Analytics Leveraging Databricks – A Use Case
RWE & Patient Analytics Leveraging Databricks – A Use CaseDatabricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
DAMA Australia: How to Choose a Data Management Tool
DAMA Australia: How to Choose a Data Management ToolDAMA Australia: How to Choose a Data Management Tool
DAMA Australia: How to Choose a Data Management ToolPrecisely
 
Security Enhanced PostgreSQL - System-wide consistency in access control
Security Enhanced PostgreSQL - System-wide consistency in access controlSecurity Enhanced PostgreSQL - System-wide consistency in access control
Security Enhanced PostgreSQL - System-wide consistency in access controlKohei KaiGai
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemJames Serra
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Dell Technologies - The Complete ISG Hardware Portfolio
Dell Technologies - The Complete ISG Hardware PortfolioDell Technologies - The Complete ISG Hardware Portfolio
Dell Technologies - The Complete ISG Hardware PortfolioDell Technologies
 
Data center design standards for cabinet and floor loading
Data center design standards for cabinet and floor loadingData center design standards for cabinet and floor loading
Data center design standards for cabinet and floor loadingkotatsu
 
VDI Performance Assessment
VDI Performance AssessmentVDI Performance Assessment
VDI Performance AssessmenteG Innovations
 
How to Take Advantage of an Enterprise Data Warehouse in the Cloud
How to Take Advantage of an Enterprise Data Warehouse in the CloudHow to Take Advantage of an Enterprise Data Warehouse in the Cloud
How to Take Advantage of an Enterprise Data Warehouse in the CloudDenodo
 
Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeBuilding Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeDatabricks
 
Data Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the CloudData Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the CloudMichael Rainey
 
High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance ComputingDell World
 
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Amazon Web Services
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshSion Smith
 
High Performance Computing - The Future is Here
High Performance Computing - The Future is HereHigh Performance Computing - The Future is Here
High Performance Computing - The Future is HereMartin Hamilton
 
Technical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdfTechnical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdfIlham31574
 

Tendances (20)

Essential Metadata Strategies
Essential Metadata StrategiesEssential Metadata Strategies
Essential Metadata Strategies
 
RWE & Patient Analytics Leveraging Databricks – A Use Case
RWE & Patient Analytics Leveraging Databricks – A Use CaseRWE & Patient Analytics Leveraging Databricks – A Use Case
RWE & Patient Analytics Leveraging Databricks – A Use Case
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
DAMA Australia: How to Choose a Data Management Tool
DAMA Australia: How to Choose a Data Management ToolDAMA Australia: How to Choose a Data Management Tool
DAMA Australia: How to Choose a Data Management Tool
 
Security Enhanced PostgreSQL - System-wide consistency in access control
Security Enhanced PostgreSQL - System-wide consistency in access controlSecurity Enhanced PostgreSQL - System-wide consistency in access control
Security Enhanced PostgreSQL - System-wide consistency in access control
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform System
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Dell Technologies - The Complete ISG Hardware Portfolio
Dell Technologies - The Complete ISG Hardware PortfolioDell Technologies - The Complete ISG Hardware Portfolio
Dell Technologies - The Complete ISG Hardware Portfolio
 
Data center design standards for cabinet and floor loading
Data center design standards for cabinet and floor loadingData center design standards for cabinet and floor loading
Data center design standards for cabinet and floor loading
 
VDI Performance Assessment
VDI Performance AssessmentVDI Performance Assessment
VDI Performance Assessment
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
How to Take Advantage of an Enterprise Data Warehouse in the Cloud
How to Take Advantage of an Enterprise Data Warehouse in the CloudHow to Take Advantage of an Enterprise Data Warehouse in the Cloud
How to Take Advantage of an Enterprise Data Warehouse in the Cloud
 
Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeBuilding Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta Lake
 
Data Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the CloudData Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the Cloud
 
High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance Computing
 
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
 
High Performance Computing - The Future is Here
High Performance Computing - The Future is HereHigh Performance Computing - The Future is Here
High Performance Computing - The Future is Here
 
Technical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdfTechnical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdf
 

Similaire à Apache Hadoop and Big Data Opportunity in Banking

Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Hortonworks
 
Hadoop for shanghai dev meetup
Hadoop for shanghai dev meetupHadoop for shanghai dev meetup
Hadoop for shanghai dev meetupRoby Chen
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop siliconsudipt
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinerySteve Loughran
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranJAX London
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big DecisionsInnoTech
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsRichard McDougall
 
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureBig Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureOdinot Stanislas
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendCaserta
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data SolutionsMark Kromer
 
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondHadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondTeradata Aster
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopPOSSCON
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & HadoopBlackvard
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing DataWorks Summit
 

Similaire à Apache Hadoop and Big Data Opportunity in Banking (20)

Hadoop Trends
Hadoop TrendsHadoop Trends
Hadoop Trends
 
Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011
 
Hadoop for shanghai dev meetup
Hadoop for shanghai dev meetupHadoop for shanghai dev meetup
Hadoop for shanghai dev meetup
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinery
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve Loughran
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big Decisions
 
Cloud computing era
Cloud computing eraCloud computing era
Cloud computing era
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
 
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureBig Data and Implications on Platform Architecture
Big Data and Implications on Platform Architecture
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data Solutions
 
Zh tw cloud computing era
Zh tw cloud computing eraZh tw cloud computing era
Zh tw cloud computing era
 
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondHadoop - Now, Next and Beyond
Hadoop - Now, Next and Beyond
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
 
Future of Data - Big Data
Future of Data - Big DataFuture of Data - Big Data
Future of Data - Big Data
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing
 

Dernier

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Dernier (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

Apache Hadoop and Big Data Opportunity in Banking

  • 1. Apache  Hadoop  and  the     Big  Data  Opportunity  in  Banking   A   w ebcast   f rom   H ortonworks   &   T resata   ©  Hortonworks  Inc.  2011   ©  tresata      all  rights  reserved  
  • 2. Presenters   •  Arun  C.  Murthy   − Co-­‐founder  of  Hortonworks   − Lead  of  NextGen  MapReduce  in  Apache  Hadoop   − Long-­‐Jme  contributor  and  commiKer  to  Apache  Hadoop   •  Abhishek  (Abhi)  Mehta   − Co-­‐founder  of  Tresata   − Creator  of  first  Hadoop-­‐powered  big  data  &  analyJcs  plaOorm  for   financial  industry  data   3  
  • 3. Agenda   •  What is Apache Hadoop? •  Creating Value from Big Data •  Tresata and Hadoop for Banking •  Future of Hadoop 4  
  • 4. What  is  Apache  Hadoop?   A  set  of  open  source  projects  owned  by   the  Apache  FoundaJon  that  transforms   commodity  computers  and  network  into  a   distributed  service     •  HDFS  –  Stores  petabytes  of  data  reliably   •  MapReduce  –  Allows  huge  distributed   computaJons     Key  A9ributes   •  Reliable  and  redundant  –  Doesn’t  slow  down  or  loose  data  even  as  hardware  fails   •  Very  powerful  –  Harnesses  huge  clusters,  supports  best  of  breed  analyJcs   •  Scalable  –  scales  linearly  to  handle  “big  data”  volumes   •  Cost-­‐effecDve  –  runs  on  commodity  machines  &  network   •  Simple  and  flexible  APIs  –  enabling  a  large  ecosystem  of  soluJon  providers   5  
  • 5. Core Apache Hadoop Projects Programming Pig   Hive   (Data  Flow)   (SQL)   Languages MapReduce   Computation Zookeeper     (Management)   (CoordinaJon)   (Distributed  Programing  Framework)   HMS   HCatalog   HBase   Table Storage (Meta  Data)   (Columnar  Storage)     HDFS     Object Storage (Hadoop  Distributed  File  System)   Core Apache Hadoop Related Apache Projects 6  
  • 6. Apache  Hadoop:  Why  is  it  TransformaDonal?   Data  Deluge  (growth  faster  than  Moore’s  law)   •  Economist:  Only  5%  of  generated  data  is  structured   •  Gartner:    Data  growth  is  the  biggest  data  center  hardware  infrastructure   challenge  for  large  enterprises   •  Forrester:  Four  Vs  -­‐  Volume,  Velocity,  Variety,  Variability   •  Hundreds  of  exabytes  of  data  per  year!     Source:  IDC  Digital  Universe  Study,  May  2011   7  
  • 7. Apache  Hadoop:  Why  is  it  TransformaDonal?   New  way  of  thinking  about  your  data:     •  Current     − What  data  do  I  keep?   − What  reports  do  I  run?   − Sample  –  Store  –  Extrapolate  à  What-­‐If  scenarios  à  Variable  insights     •  New  dawn   − Store  everything  (viable  and  economical)  –  Process  whenever!   − Test  every  what-­‐if  situaJon,  prove  every  hypothesis…  do  it  in  a  Jmely  manner!   − No  more  down-­‐sampled  data   8  
  • 8. 5  Ways  to  Create  Value  from  Big  Data   Source:    McKinsey  &  Company  report.  Big  data:  The  next  fronJer  for  innovaJon,  compeJJon,  and  producJvity.  May  2011.   9  
  • 9. Crossing  the  Chasm   DisrupJon:  Data   Geoffrey  A.  Moore*   10  
  • 10. Typical  ApplicaDons  &  Early  Adopters   data analyzing web logs analytics advertising optimization machine learning mail anti-spam text mining web search content optimization customer trend analysis ad selection video & audio processing data mining user interest prediction social media 11  
  • 11. Big  Data  Has  Reached  Every  Market  Sector   Digital  data  is  personal,  everywhere,  increasingly  accessible,   and  will  con:nue  to  grow  exponen:ally   Source:    McKinsey  &  Company  report.  Big   data:  The  next  fronJer  for  innovaJon,   compeJJon,  and  producJvity.  May  2011.  
  • 12. Big  Data  Value  CreaDon  OpportuniDes   Financial  Services   Healthcare   •   Detect  fraud   •   OpJmal  treatment  pathways   •   Model  and  manage  risk   •   Remote  paJent  monitoring   •   Improve  debt  recovery  rates   •   PredicJve  modeling  for  new  drugs   •   Personalize  banking/insurance  products   •   Personalized  medicine   Retail   Web  /  Social  /  Mobile   •   In-­‐store  behavior  analysis   •   LocaJon-­‐based  markeJng   •   Cross  selling   •   Social  segmentaJon   •   OpJmize  pricing,  placement,  design   •   SenJment  analysis   •   OpJmize  inventory  and  distribuJon   •   Price  comparison  services   Manufacturing   Government   •   Design  to  value   •   Reduce  fraud   •   Crowd-­‐sourcing   •   Segment  populaJons,  customize  acJon   •   “Digital  factory”  for  lean  manufacturing   •   Support  open  data  iniJaJves   •   Improve  service  via  product  sensor   •   Automate  decision  making   data   13  
  • 13. Why  can  you  bank  on  Hortonworks?   •  Architects  of  Hadoop  since  big  bang,  circa  2006   •  Real  world  experience  supporJng  the  largest  Hadoop  install   in  the  world:   − 50,000  node  footprint     − Over  200PB  of  data   − 24x7  service     − Billions  of  ad  dollars   •  We’ve  taken  the  3am  calls  to  fix  stuff  when  it  breaks!   14  
  • 14. Tresata  =  Hadoop  for  Banking   15  
  • 15. What  We  Believe   1.  Data  will  reboot  the  financial  service  industry   2.  Data  growth  is  viral…exisJng  tools  can’t  keep  up   3.  The  economics  to  store,  process,  analyze  and  visualize  all   of  your  data  makes  it  a  ‘no-­‐brainer’  to  do  so   4.  Big  Data  capabiliDes  are  needed  to  address  business   problems   16  
  • 16. sharpen their data skills, even the accessibility of data sources, and the low-ranking sectors (by our gauges degree to which managers make data-driven of value potential and data capture), decisions. Q4 2011 and  What  McKinsey  Said   such as construction and education, Big data sidebar on sector productivity could see their fortunes change. Exhibit 1 of 1 The ease of capturing big data’s value, and the magnitude of its potential, vary across sectors. Example: US economy Size of bubble indicates relative contribution to GDP High Utilities Health care Computers and other electronic products Natural resources providers Information Manufacturing Big data: ease-of-capture index1 Finance and insurance Professional services Transportation and warehousing Real estate Accommodation and food Management of companies Construction Wholesale trade Administrative services Retail trade Other services Educational services Government Arts and Low entertainment High Big data: value potential index1 Source:    mcksinsey  global  insJtute  october  2011     ! "#$%&'()*+'&%',-+*.)(*#/%#0%1'($*.23%2''%)--'/&*,%*/%4.5*/2'6%7+#8)+%9/2(*(:('%0:++%$'-#$(%!"#$%&'&($)*+$,+-'$./0,'"+/$.0/$ ",,01&'"0,2$3045+'"'"0,2$&,%$5/0%63'"1"'73%);)*+)8+'%0$''%#0%.<)$='%#/+*/'%)(%1.>*/2'6?.#1@1=*? A#:$.'B%CA%D:$'):%#0%E)8#$%A()(*2(*.2F%4.5*/2'6%7+#8)+%9/2(*(:('%)/)+62*2%% 17  
  • 17. The  Opportunity   1.  1-­‐5%  of  data  in  a  financial  insJtuJon  is  analyzed   2.  Not  all  data  is  stored   3.  Top-­‐down  macros  cannot  be  implemented  or  acted  on   4.  New  approaches  to  data  &  analyDcs  are  essenJal   Source:    tresata  research   18  
  • 18. The  Business  Problems   1.  Storage  &  retrieval  –  archive  data  on  disk   2.  Consumer  behavior  analysis  –  as  it  happens   3.  Modeling  &  AnalyDcs  –  model  enJre  data  sets…   4.  Single  View  Of  Customer  –  structured  &  unstructured  data     19  
  • 19. Hadoop  in  Banking  Today   1.  Early  adopter,  like  any  other  technology  trend   a.  Interest  is  global,  not  just  limited  to  the  US   b.  Approved  for  use  by  most  technology  teams  &  architects     c.  Proof  of  concepts  ongoing  at  most  financial  services  insJtuJons   2.  Broad  agreement  that:   a.  Hadoop  will  be  the  Big  Data  operaDng  system   b.  A  hadoop  powered  Data  processing  &  analyDcs  plaaorm  will  spur   rapid  adopJon   c.  Need  to  apply  to  business  problems  with  revenue/profit  impact   20  
  • 20. Elephants  in  the  Room   1.  Resources   a.  Talent  –  how  many  MapReduce  developers  can  we  get   b.  ApplicaDons  –  data,  analyJcs  and  business  problems  are  the  same,   why  should  each  insJtuJon  build  their  own  hadoop  applicaJons  to   run  the  same  processes   c.  EssenDals  –  how  to  manage  security,  provisioning,  &  performance     2.  ImplementaDon   a.  IntegraDon  –  fit  with  exisJng  technology  infrastructures  &  business   processes   b.  Support  –  experience  with  managing  thousands  of  nodes/  servers   c.  Open  Source  –  ‘out  of  the  box’  applicability   21  
  • 21. How  Tresata  Changes  the  Game…   1.  Our  ApplicaDon   a.  FS  for  Hadoop  –  fully  built  on  hadoop.    Store,  process,  analyze  and   visualize    leveraging  the  full  power  of  hadoop   b.  Processing  Pipeline  –  automated  ingesJon,  cleaning,  de-­‐duping,   matching  engine  built  for  financial  data   c.  AnalyDcs  –  massively  parallel  scoring  and  algorithm  containers   codified  by  business  problem   2.  Tame  the  elephants  (making  it  work  at  scale)     22  
  • 22. Tresata  Case  Study   A.  Client  Business  Problem   i.  Problem  –  Process  data  and  score  for  >30  MM  client  applicaJons   ii.  Data  Sources  –  23  separate  data  sources,  mulDple  Dme  series   iii.  Raw  Variables  –  100  variables  per  client  per  data  source   iv.  Current  state  -­‐  expensive  legacy  plaOorm,  algorithms  developed  on   sub-­‐samples,  unable  to  scale  algorithms  to  full  data  set   B.  Tresata  SoluDon   i.  Data  Engine  –  automated  data  import,  cleaning,  matching,  scoring   ii.  Compute  Engine  –  algorithms  process  &  score  >30MM  in  minutes   iii.  IntegraJon  –  work  with  exisJng  tools  and  processes   iv.  Scalable  deployment  –  Big  Data  as  a  Service  delivery     23  
  • 23. Tresata  Big  Data  ApplicaDon   24  
  • 25. Tresata  +  Hortonworks   1.  Commitment  to  train   a.  Series  of  webinars   b.  Hadoop  training  tailored  for  financial  services   c.  Dedicated  training  programs  (tailored  for  client  needs)   2.  Commitment  to  support   a.  ProducJon  scale  support  model  (Tresata  cerJfied  for  Financial  Svcs)   b.  Meet  and/or  exceed  industry  standards  on  distro     26  
  • 26. Where  Hadoop  is   Going   ©  Hortonworks  Inc.  2011   ©  tresata      all  rights  reserved  
  • 27. The  Future   •  Hadoop  is  going  mainstream   − Amazon,  Microsou,  Oracle,  EMC,  NetApp,  etc.   •  Apache  Hadoop  is  covering  new  ground  (Hadoop  .Next)   − Much  of  the  development  led  by  Hortonworks   − More  scale,  more  performance   − Other  paradigms  than  MapReduce  for  data  processing   − Enhanced  operability  and  management  (Ambari)   − Metadata  management  (Hcatalog)   28  
  • 28. Technology  Roadmap       Hadoop.Now  –  Making  Apache  Hadoop  Accessible   Q4  2011   •  Release  the  most  widely  deployed  version  of  Hadoop  ever  (0.20.205)     •  Release  directly  usable  code  via  Apache  (RPMs,  .debs…)   •  Frequent  sustaining  releases  off  of  the  stable  branches       Hadoop.Next  –  Next  GeneraDon  Apache  Hadoop   2012   (Alphas  starJng   •  Address  key  product  gaps  (HBase  support,  HA,  Management…)    late  2011)   •  Enable  community  &  partner  innovaJon  via  modular  architecture  &   open  APIs   •  Work  with  community  to  define  integrated  stack   29  
  • 29. Next  Steps   •  Engage  with  Hortonworks  &  Tresata   − www.hortonworks.com   − www.tresata.com   •  AddiJonal  Webcast  Series   − Reference  Architecture  for  Hadoop  in  Financial  Services  -­‐  Nov  15   − Hadoop  in  Financial  Services  DEEP  DIVE  -­‐  Dec  6   − www.hortonworks.com/webcasts   •  Hortonworks  party  @  Hadoop  World   − Tuesday  Nov  8  at  7pm   − Inc  Lounge  at  the  Time  Hotel   − RSVP:  hortonworks.eventbrite.com   30  
  • 30. QuesDons?     ©  Hortonworks  Inc.  2011   ©  tresata      all  rights  reserved