SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
©	
  Hortonworks	
  Inc.	
  2015 Page	
  1
Apache	
  Tez
-­‐ Next	
  Generation	
  of	
  execution	
  engine	
  upon	
  hadoop
Jeff	
  Zhang	
  (@zjffdu)
©	
  Hortonworks	
  Inc.	
  2015
Who’s	
  this	
  guy
• Start	
  use	
  pig	
  from	
  2009.	
  Become	
  Pig	
  committer	
  from	
  Nov	
  
2009
• Join	
  Hortonworks	
  in	
  2014.	
  
• Tez Committer	
  from	
  Oct	
  2014
©	
  Hortonworks	
  Inc.	
  2015
Agenda
•Tez Introduction
•Tez Feature	
  Deep	
  Dive
•Tez Status	
  &	
  Roadmap
©	
  Hortonworks	
  Inc.	
  2015
I/O	
  Synchronization	
  
Barrier
I/O	
  Synchronization	
  
Barrier
Job	
  1	
  (	
  Join a	
  &	
  b	
  )
Job	
  3 (	
  Group by	
  of	
  c	
  )
Job	
  2	
  	
  (Group	
  by	
  of	
  
a	
  Join b)
Job	
  4	
  (Join	
  of	
  S	
  & R	
  )
Hive	
  -­‐ MR
Example	
  of	
  MR	
  versus	
  Tez
Page	
  4
Single	
  Job
Hive	
  -­‐ Tez
Join a	
  &	
  b
Group	
  by	
  of	
  a	
  Join b
Group by	
  of	
  c
Job	
  4	
  (Join	
  of	
  S	
  & R	
  )
©	
  Hortonworks	
  Inc.	
  2015
Tez	
  – Introduction
Page	
  5
• Distributed	
  execution	
  framework	
  
targeted	
  towards	
  data-­‐processing	
  
applications.
• Based	
  on	
  expressing	
  a	
  computation	
  
as	
  a	
  dataflow	
  graph	
  (DAG).
• Highly	
  customizable	
  to	
  meet	
  a	
  broad	
  
spectrum	
  of	
  use	
  cases.
• Built	
  on	
  top	
  of	
  YARN	
  – the	
  resource	
  
management	
  framework	
  for	
  
Hadoop.
• Open	
  source	
  Apache	
  project	
  and	
  
Apache	
  licensed.
©	
  Hortonworks	
  Inc.	
  2015
What	
  is	
  DAG	
  &	
  Why	
  	
  DAG
Projection
Filter
GroupBy
…
Join
Union
Intersect
…
Split
…
• Directed	
  Acyclic	
  Graph
• Any	
  complicated	
  DAG	
  can	
  been	
  composed	
  of	
  the	
  following	
  3	
  basic	
  
paradigm
– Sequential
– Merge
– Divide
©	
  Hortonworks	
  Inc.	
  2015
Expressing	
  DAG	
  in	
  Tez API
• DAG	
  API	
  (Logic	
  View)
– Allowuser to	
  build	
  DAG
– Topological	
  structure	
  of	
  the	
  data	
  computation	
  flow
• Runtime	
  API	
  (Runtime	
  View)
– Application	
  logic	
  of	
  each	
  computation	
  unit	
  (vertex)
– How to move/read/write	
  data between vertices
©	
  Hortonworks	
  Inc.	
  2015
DAG	
  API	
  (Logic	
  View)
Page	
  8
• Vertex	
  (Processor,	
  Parallelism,	
  Resource,	
  etc…)
• Edge (EdgeProperty)
– DataMovement
– Scatter	
  Gather	
  (Join,	
  GroupBy …	
  )
– Broadcast	
  	
  	
  (	
  Pig	
  Replicated	
  Join	
  /	
  Hive	
  Broadcast	
  Join	
  )
– One-­‐to-­‐One	
  	
  (	
  Pig	
  Order	
  by	
  )
– Custom
©	
  Hortonworks	
  Inc.	
  2015
Runtime	
  API	
  (Runtime	
  View)
Page	
  9
ProcessorInput Output
• Input
– Through	
  which	
  processor	
  receives	
  data	
  on	
  an	
  edge
– Vertex	
  can	
  have	
  multiple	
  inputs
• Processor
– Application	
  Logic	
  (One	
  vertex	
  one	
  processor)
– Consume	
  the	
  inputs	
  and	
  produce	
  the	
  outputs
• Output
– Through	
  which	
  processor	
  writes	
  data	
  to	
  an	
  edge
– One	
  vertex	
  can	
  have	
  multiple	
  outputs	
  
• Example	
  of	
  Input/Output/Processor
– MRInput &	
  MROutput (InputFormat/OutputFormat)
– OrderedGroupedKVInput &	
  OrderedPartitionedKVOutput (Scatter	
  Gather)
– UnorderedKVInput &	
  UnorderedKVOutput (Broadcast	
  &	
  One-­‐to-­‐One)
– PigProcessor/HiveProcessor
©	
  Hortonworks	
  Inc.	
  2015
Benefit	
  of	
  DAG
• Easier	
  to	
  express	
  computation	
  in	
  DAG
• No	
  intermediate	
  data	
  written	
  to	
  HDFS
• Less	
  pressure	
  on	
  NameNode
• No	
  resource	
  queuing	
  effort	
  &	
  less	
  resource	
  contention
• More	
  optimization	
  opportunity	
  with	
  more	
  global	
  context
©	
  Hortonworks	
  Inc.	
  2015
Agenda
•Tez Introduction
•Tez Feature	
  Deep	
  Dive
•Tez Improvement	
  &	
  Debuggability
•Tez Status	
  &	
  Roadmap
©	
  Hortonworks	
  Inc.	
  2015
Container-­‐Reuse
• Reuse	
  the	
  same	
  container	
  across	
  DAG/Vertices/Tasks
• Benefit	
  of	
  Container-­‐Reuse
– Less	
  resources	
  consumed
– Reduce	
  overhead	
  of	
  launching	
  JVM
– Reduce	
  overhead	
  of	
  negotiate with Resource	
  Manager
– Reduce	
  overhead	
  of	
  resource	
  localization
– Reduce	
  network	
  IO
– Object	
  Caching	
  (Object	
  Sharing)
©	
  Hortonworks	
  Inc.	
  2015
Tez Session
• Multiple	
  Jobs/DAGs	
  in	
  one	
  AM
• Container-­‐reuse	
  across	
  Jobs/DAGs
• Data	
  sharing	
  between	
  Jobs/DAGs
©	
  Hortonworks	
  Inc.	
  2015
Dynamic	
  Parallelism	
  Estimation	
  
• VertexManager
– Listen	
  to	
  the	
  other	
  vertices	
  
status
– Coordinate	
  and	
  schedule	
  its	
  
tasks
– Communication	
  between	
  
vertices
©	
  Hortonworks	
  Inc.	
  2015
ATS	
  Integration
• Tez is	
  fully	
  integrated	
  with	
  YARN	
  ATS	
  (Application	
  Timeline	
  
Service)
– DAG	
  Status,	
  DAG	
  Metrics,	
  Task	
  Status,	
  Task	
  Metrics	
  are	
  captured
• Diagnostics	
  &	
  Performance	
  analysis
– Data	
  Source	
  for	
  monitoring	
  &	
  diagnostics	
  
– Data	
  Source	
  for	
  performance	
  analysis	
  
©	
  Hortonworks	
  Inc.	
  2015
Recovery
• AM	
  can	
  crash	
  in	
  corner	
  cases
– OOM
– Node	
  failure
– …
• Continue	
  from	
  the	
  last	
  checkpoint
• Transparent	
  to	
  end	
  users
AM	
  Crash
©	
  Hortonworks	
  Inc.	
  2015
Order	
  By	
  of	
  Pig
f =	
  Load	
  ‘foo’	
  as	
  (x,	
  y);
o =	
  Order	
  f	
  by	
  x;Load
Sample
(Calculate	
  Histogram)
HDFS
Partition
Sort
Broadcast
Load
Sample
(Calculate	
  Histogram)
Partition
Sort
One-­‐to-­‐One
Scatter	
  Gather
Scatter	
  Gather
©	
  Hortonworks	
  Inc.	
  2015
Tez UI
©	
  Hortonworks	
  Inc.	
  2015
Tez UI
Tez UI
20
Download	
  data from	
  ATS
©	
  Hortonworks	
  Inc.	
  2015
RoadMap
• Shared	
  output	
  edges
– Same	
  output	
  to	
  multiple	
  vertices
• Local	
  mode	
  stabilization
• Optimizing	
  (include/exclude)	
  vertex	
  at	
  runtime
• Partial	
  completion	
  VertexManager
• Co-­‐Scheduling
• Framework	
  stats	
  for	
  better	
  runtime	
  decisions
©	
  Hortonworks	
  Inc.	
  2015
Tez	
  – Adoption	
  
• Apache	
  Hive
• Start	
  from	
  Hive	
  0.13
• set	
  hive.exec.engine =	
  tez
• Apache	
  Pig
• Start	
  from	
  Pig	
  0.14
• pig	
  -­‐x	
  tez
• Cascading
• Flink
Page	
  22
©	
  Hortonworks	
  Inc.	
  2015
Tez Community
• Useful	
  Links
– http://tez.apache.org/
– JIRA	
  :	
  https://issues.apache.org/jira/browse/TEZ
– Code	
  Repository:	
  https://git-­‐wip-­‐us.apache.org/repos/asf/tez.git
– Mailing	
  Lists
– Dev List:	
  dev@tez.apache.org
– User	
  List:	
  user@tez.apache.org
– Issues	
  List:	
  issues@tez.apache.org
• Tez Meetup
– http://www.meetup.com/Apache-­‐Tez-­‐User-­‐Group
©	
  Hortonworks	
  Inc.	
  2015
Thank  You!
Questions  &  Answers
Page	
  24

Contenu connexe

Tendances

Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the CloudSpeed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloudgluent.
 
Llap: Locality is Dead
Llap: Locality is DeadLlap: Locality is Dead
Llap: Locality is Deadt3rmin4t0r
 
Quick Introduction to Apache Tez
Quick Introduction to Apache TezQuick Introduction to Apache Tez
Quick Introduction to Apache TezGetInData
 
Recent Changes and Challenges for Future Presto
Recent Changes and Challenges for Future PrestoRecent Changes and Challenges for Future Presto
Recent Changes and Challenges for Future PrestoKai Sasaki
 
Stinger Initiative - Deep Dive
Stinger Initiative - Deep DiveStinger Initiative - Deep Dive
Stinger Initiative - Deep DiveHortonworks
 
Custom Script Execution Environment on TD Workflow @ TD Tech Talk 2018-10-17
Custom Script Execution Environment on TD Workflow @ TD Tech Talk 2018-10-17Custom Script Execution Environment on TD Workflow @ TD Tech Talk 2018-10-17
Custom Script Execution Environment on TD Workflow @ TD Tech Talk 2018-10-17Muga Nishizawa
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveDataWorks Summit
 
YARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupYARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupRommel Garcia
 
Spark Summit EU talk by Brij Bhushan Ravat
Spark Summit EU talk by Brij Bhushan RavatSpark Summit EU talk by Brij Bhushan Ravat
Spark Summit EU talk by Brij Bhushan RavatSpark Summit
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017alanfgates
 
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the EnterpriseEnabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the EnterpriseDataWorks Summit/Hadoop Summit
 
An Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present FutureAn Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present FutureDataWorks Summit/Hadoop Summit
 
Geographica: A Benchmark for Geospatial RDF Stores
Geographica: A Benchmark for Geospatial RDF StoresGeographica: A Benchmark for Geospatial RDF Stores
Geographica: A Benchmark for Geospatial RDF StoresKostis Kyzirakos
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsHortonworks
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processinghitesh1892
 
Tune up Yarn and Hive
Tune up Yarn and HiveTune up Yarn and Hive
Tune up Yarn and Hiverxu
 
Apache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and FutureApache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and FutureDataWorks Summit
 

Tendances (20)

Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the CloudSpeed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
 
201810 td tech_talk
201810 td tech_talk201810 td tech_talk
201810 td tech_talk
 
Llap: Locality is Dead
Llap: Locality is DeadLlap: Locality is Dead
Llap: Locality is Dead
 
Quick Introduction to Apache Tez
Quick Introduction to Apache TezQuick Introduction to Apache Tez
Quick Introduction to Apache Tez
 
October 2014 HUG : Hive On Spark
October 2014 HUG : Hive On SparkOctober 2014 HUG : Hive On Spark
October 2014 HUG : Hive On Spark
 
Recent Changes and Challenges for Future Presto
Recent Changes and Challenges for Future PrestoRecent Changes and Challenges for Future Presto
Recent Changes and Challenges for Future Presto
 
Stinger Initiative - Deep Dive
Stinger Initiative - Deep DiveStinger Initiative - Deep Dive
Stinger Initiative - Deep Dive
 
Custom Script Execution Environment on TD Workflow @ TD Tech Talk 2018-10-17
Custom Script Execution Environment on TD Workflow @ TD Tech Talk 2018-10-17Custom Script Execution Environment on TD Workflow @ TD Tech Talk 2018-10-17
Custom Script Execution Environment on TD Workflow @ TD Tech Talk 2018-10-17
 
EMR and DynamoDB
EMR and DynamoDBEMR and DynamoDB
EMR and DynamoDB
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
 
YARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupYARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User Group
 
Spark Summit EU talk by Brij Bhushan Ravat
Spark Summit EU talk by Brij Bhushan RavatSpark Summit EU talk by Brij Bhushan Ravat
Spark Summit EU talk by Brij Bhushan Ravat
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
 
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the EnterpriseEnabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
 
An Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present FutureAn Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present Future
 
Geographica: A Benchmark for Geospatial RDF Stores
Geographica: A Benchmark for Geospatial RDF StoresGeographica: A Benchmark for Geospatial RDF Stores
Geographica: A Benchmark for Geospatial RDF Stores
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processing
 
Tune up Yarn and Hive
Tune up Yarn and HiveTune up Yarn and Hive
Tune up Yarn and Hive
 
Apache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and FutureApache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and Future
 

Similaire à 3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai

Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and FutureRajesh Balamohan
 
February 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and InsidesFebruary 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and InsidesYahoo Developer Network
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing DataWorks Summit
 
Tez Data Processing over Yarn
Tez Data Processing over YarnTez Data Processing over Yarn
Tez Data Processing over YarnInMobi Technology
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing enginebigdatagurus_meetup
 
Tez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaTez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaData Con LA
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingHortonworks
 
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks
 
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014Modern Data Stack France
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingBikas Saha
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez Hortonworks
 
Apache Apex Meetup at Cask
Apache Apex Meetup at CaskApache Apex Meetup at Cask
Apache Apex Meetup at CaskApache Apex
 
Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?MapR Technologies
 
Interactive query in hadoop
Interactive query in hadoopInteractive query in hadoop
Interactive query in hadoopRommel Garcia
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks
 
DataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application MeetupDataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application MeetupThomas Weise
 
Tajo_Meetup_20141120
Tajo_Meetup_20141120Tajo_Meetup_20141120
Tajo_Meetup_20141120Hyoungjun Kim
 
La big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixitLa big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixitData Con LA
 

Similaire à 3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai (20)

Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
February 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and InsidesFebruary 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and Insides
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Tez Data Processing over Yarn
Tez Data Processing over YarnTez Data Processing over Yarn
Tez Data Processing over Yarn
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
 
Tez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaTez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_saha
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
 
Mhug apache storm
Mhug apache stormMhug apache storm
Mhug apache storm
 
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
Apache Apex Meetup at Cask
Apache Apex Meetup at CaskApache Apex Meetup at Cask
Apache Apex Meetup at Cask
 
Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?
 
Interactive query in hadoop
Interactive query in hadoopInteractive query in hadoop
Interactive query in hadoop
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014
 
DataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application MeetupDataTorrent Presentation @ Big Data Application Meetup
DataTorrent Presentation @ Big Data Application Meetup
 
Tajo_Meetup_20141120
Tajo_Meetup_20141120Tajo_Meetup_20141120
Tajo_Meetup_20141120
 
La big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixitLa big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixit
 

Plus de Luke Han

Augmented OLAP for Big Data
Augmented OLAP for Big DataAugmented OLAP for Big Data
Augmented OLAP for Big DataLuke Han
 
Apache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data SpainApache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data SpainLuke Han
 
Refactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics ProductsRefactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics ProductsLuke Han
 
Building Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSIBuilding Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSILuke Han
 
Apache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and JapanApache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and JapanLuke Han
 
The Apache Way - Building Open Source Community in China - Luke Han
The Apache Way - Building Open Source Community in China - Luke HanThe Apache Way - Building Open Source Community in China - Luke Han
The Apache Way - Building Open Source Community in China - Luke HanLuke Han
 
The Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke HanThe Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke HanLuke Han
 
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @ShanghaiLuke Han
 
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @ShanghaiLuke Han
 
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @ShanghaiLuke Han
 
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...Luke Han
 
Apache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 BeijingApache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 BeijingLuke Han
 
ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015Luke Han
 
Apache Kylin Extreme OLAP Engine for Big Data
Apache Kylin Extreme OLAP Engine for Big DataApache Kylin Extreme OLAP Engine for Big Data
Apache Kylin Extreme OLAP Engine for Big DataLuke Han
 
Apache Kylin Introduction
Apache Kylin IntroductionApache Kylin Introduction
Apache Kylin IntroductionLuke Han
 
Adding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark MeetupAdding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark MeetupLuke Han
 
Apache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 BeijingApache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 BeijingLuke Han
 
Kylin OLAP Engine Tour
Kylin OLAP Engine TourKylin OLAP Engine Tour
Kylin OLAP Engine TourLuke Han
 
Actuate presentation 2011
Actuate presentation   2011Actuate presentation   2011
Actuate presentation 2011Luke Han
 

Plus de Luke Han (19)

Augmented OLAP for Big Data
Augmented OLAP for Big DataAugmented OLAP for Big Data
Augmented OLAP for Big Data
 
Apache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data SpainApache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data Spain
 
Refactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics ProductsRefactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics Products
 
Building Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSIBuilding Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSI
 
Apache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and JapanApache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and Japan
 
The Apache Way - Building Open Source Community in China - Luke Han
The Apache Way - Building Open Source Community in China - Luke HanThe Apache Way - Building Open Source Community in China - Luke Han
The Apache Way - Building Open Source Community in China - Luke Han
 
The Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke HanThe Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke Han
 
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
 
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
 
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
 
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
 
Apache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 BeijingApache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 Beijing
 
ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015
 
Apache Kylin Extreme OLAP Engine for Big Data
Apache Kylin Extreme OLAP Engine for Big DataApache Kylin Extreme OLAP Engine for Big Data
Apache Kylin Extreme OLAP Engine for Big Data
 
Apache Kylin Introduction
Apache Kylin IntroductionApache Kylin Introduction
Apache Kylin Introduction
 
Adding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark MeetupAdding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark Meetup
 
Apache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 BeijingApache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 Beijing
 
Kylin OLAP Engine Tour
Kylin OLAP Engine TourKylin OLAP Engine Tour
Kylin OLAP Engine Tour
 
Actuate presentation 2011
Actuate presentation   2011Actuate presentation   2011
Actuate presentation 2011
 

Dernier

Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingShane Coughlan
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxRTS corp
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Anthony Dahanne
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolsosttopstonverter
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 

Dernier (20)

Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration tools
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 

3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai

  • 1. ©  Hortonworks  Inc.  2015 Page  1 Apache  Tez -­‐ Next  Generation  of  execution  engine  upon  hadoop Jeff  Zhang  (@zjffdu)
  • 2. ©  Hortonworks  Inc.  2015 Who’s  this  guy • Start  use  pig  from  2009.  Become  Pig  committer  from  Nov   2009 • Join  Hortonworks  in  2014.   • Tez Committer  from  Oct  2014
  • 3. ©  Hortonworks  Inc.  2015 Agenda •Tez Introduction •Tez Feature  Deep  Dive •Tez Status  &  Roadmap
  • 4. ©  Hortonworks  Inc.  2015 I/O  Synchronization   Barrier I/O  Synchronization   Barrier Job  1  (  Join a  &  b  ) Job  3 (  Group by  of  c  ) Job  2    (Group  by  of   a  Join b) Job  4  (Join  of  S  & R  ) Hive  -­‐ MR Example  of  MR  versus  Tez Page  4 Single  Job Hive  -­‐ Tez Join a  &  b Group  by  of  a  Join b Group by  of  c Job  4  (Join  of  S  & R  )
  • 5. ©  Hortonworks  Inc.  2015 Tez  – Introduction Page  5 • Distributed  execution  framework   targeted  towards  data-­‐processing   applications. • Based  on  expressing  a  computation   as  a  dataflow  graph  (DAG). • Highly  customizable  to  meet  a  broad   spectrum  of  use  cases. • Built  on  top  of  YARN  – the  resource   management  framework  for   Hadoop. • Open  source  Apache  project  and   Apache  licensed.
  • 6. ©  Hortonworks  Inc.  2015 What  is  DAG  &  Why    DAG Projection Filter GroupBy … Join Union Intersect … Split … • Directed  Acyclic  Graph • Any  complicated  DAG  can  been  composed  of  the  following  3  basic   paradigm – Sequential – Merge – Divide
  • 7. ©  Hortonworks  Inc.  2015 Expressing  DAG  in  Tez API • DAG  API  (Logic  View) – Allowuser to  build  DAG – Topological  structure  of  the  data  computation  flow • Runtime  API  (Runtime  View) – Application  logic  of  each  computation  unit  (vertex) – How to move/read/write  data between vertices
  • 8. ©  Hortonworks  Inc.  2015 DAG  API  (Logic  View) Page  8 • Vertex  (Processor,  Parallelism,  Resource,  etc…) • Edge (EdgeProperty) – DataMovement – Scatter  Gather  (Join,  GroupBy …  ) – Broadcast      (  Pig  Replicated  Join  /  Hive  Broadcast  Join  ) – One-­‐to-­‐One    (  Pig  Order  by  ) – Custom
  • 9. ©  Hortonworks  Inc.  2015 Runtime  API  (Runtime  View) Page  9 ProcessorInput Output • Input – Through  which  processor  receives  data  on  an  edge – Vertex  can  have  multiple  inputs • Processor – Application  Logic  (One  vertex  one  processor) – Consume  the  inputs  and  produce  the  outputs • Output – Through  which  processor  writes  data  to  an  edge – One  vertex  can  have  multiple  outputs   • Example  of  Input/Output/Processor – MRInput &  MROutput (InputFormat/OutputFormat) – OrderedGroupedKVInput &  OrderedPartitionedKVOutput (Scatter  Gather) – UnorderedKVInput &  UnorderedKVOutput (Broadcast  &  One-­‐to-­‐One) – PigProcessor/HiveProcessor
  • 10. ©  Hortonworks  Inc.  2015 Benefit  of  DAG • Easier  to  express  computation  in  DAG • No  intermediate  data  written  to  HDFS • Less  pressure  on  NameNode • No  resource  queuing  effort  &  less  resource  contention • More  optimization  opportunity  with  more  global  context
  • 11. ©  Hortonworks  Inc.  2015 Agenda •Tez Introduction •Tez Feature  Deep  Dive •Tez Improvement  &  Debuggability •Tez Status  &  Roadmap
  • 12. ©  Hortonworks  Inc.  2015 Container-­‐Reuse • Reuse  the  same  container  across  DAG/Vertices/Tasks • Benefit  of  Container-­‐Reuse – Less  resources  consumed – Reduce  overhead  of  launching  JVM – Reduce  overhead  of  negotiate with Resource  Manager – Reduce  overhead  of  resource  localization – Reduce  network  IO – Object  Caching  (Object  Sharing)
  • 13. ©  Hortonworks  Inc.  2015 Tez Session • Multiple  Jobs/DAGs  in  one  AM • Container-­‐reuse  across  Jobs/DAGs • Data  sharing  between  Jobs/DAGs
  • 14. ©  Hortonworks  Inc.  2015 Dynamic  Parallelism  Estimation   • VertexManager – Listen  to  the  other  vertices   status – Coordinate  and  schedule  its   tasks – Communication  between   vertices
  • 15. ©  Hortonworks  Inc.  2015 ATS  Integration • Tez is  fully  integrated  with  YARN  ATS  (Application  Timeline   Service) – DAG  Status,  DAG  Metrics,  Task  Status,  Task  Metrics  are  captured • Diagnostics  &  Performance  analysis – Data  Source  for  monitoring  &  diagnostics   – Data  Source  for  performance  analysis  
  • 16. ©  Hortonworks  Inc.  2015 Recovery • AM  can  crash  in  corner  cases – OOM – Node  failure – … • Continue  from  the  last  checkpoint • Transparent  to  end  users AM  Crash
  • 17. ©  Hortonworks  Inc.  2015 Order  By  of  Pig f =  Load  ‘foo’  as  (x,  y); o =  Order  f  by  x;Load Sample (Calculate  Histogram) HDFS Partition Sort Broadcast Load Sample (Calculate  Histogram) Partition Sort One-­‐to-­‐One Scatter  Gather Scatter  Gather
  • 18. ©  Hortonworks  Inc.  2015 Tez UI
  • 19. ©  Hortonworks  Inc.  2015 Tez UI
  • 21. ©  Hortonworks  Inc.  2015 RoadMap • Shared  output  edges – Same  output  to  multiple  vertices • Local  mode  stabilization • Optimizing  (include/exclude)  vertex  at  runtime • Partial  completion  VertexManager • Co-­‐Scheduling • Framework  stats  for  better  runtime  decisions
  • 22. ©  Hortonworks  Inc.  2015 Tez  – Adoption   • Apache  Hive • Start  from  Hive  0.13 • set  hive.exec.engine =  tez • Apache  Pig • Start  from  Pig  0.14 • pig  -­‐x  tez • Cascading • Flink Page  22
  • 23. ©  Hortonworks  Inc.  2015 Tez Community • Useful  Links – http://tez.apache.org/ – JIRA  :  https://issues.apache.org/jira/browse/TEZ – Code  Repository:  https://git-­‐wip-­‐us.apache.org/repos/asf/tez.git – Mailing  Lists – Dev List:  dev@tez.apache.org – User  List:  user@tez.apache.org – Issues  List:  issues@tez.apache.org • Tez Meetup – http://www.meetup.com/Apache-­‐Tez-­‐User-­‐Group
  • 24. ©  Hortonworks  Inc.  2015 Thank  You! Questions  &  Answers Page  24