SlideShare une entreprise Scribd logo
1  sur  26
An Introduction to
                        MapReduce 2 and
                              YARN

                            Tom White, Cloudera
                              @tom_e_white
                               June 4, 2012
                               Chicago HUG
Tuesday, June 5, 2012
Road Trip




Tuesday, June 5, 2012
About me
                    •   Apache Hadoop Committer,
                        PMC Member, Apache
                        Member

                    •   Engineer at Cloudera
                        working on core Hadoop

                    •   Founder of Apache Whirr

                    •   Author of “Hadoop: The
                        Definitive Guide”

                        •   http://hadoopbook.com

Tuesday, June 5, 2012
First, whatʼs
                        MapReduce 1?


Tuesday, June 5, 2012
Tuesday, June 5, 2012
Whatʼs wrong with
                             MR1?


Tuesday, June 5, 2012
Motivation 1


                    •   Scaling >4000 nodes
                    • Fewer, larger clusters



Tuesday, June 5, 2012
Motivation 2


                    •   HA of Job Tracker
                    • Large, complex state



Tuesday, June 5, 2012
Motivation 3

                    •   Poor resource utilization
                    • Slots in MR1 are for either
                        map or reduce


Tuesday, June 5, 2012
Yet Another Resource Negotiator




Tuesday, June 5, 2012
Tuesday, June 5, 2012
Tuesday, June 5, 2012
Node Manager
               is a generalized Task Tracker
                    • Task Tracker
                     • fixed number of map or reduce
                        slots
                    • Node Manager
                     • containers with variable resource
                        limits

Tuesday, June 5, 2012
Tuesday, June 5, 2012
Tuesday, June 5, 2012
MR is user space
                         YARN is kernel


Tuesday, June 5, 2012
Bonus Apps

                    •   Distributed shell
                    • MPI   (MAPREDUCE-2911)



                    • Master-worker            (MAPREDUCE-3315)



                    • Apache Giraph, Hama

Tuesday, June 5, 2012
Tuesday, June 5, 2012
Tuesday, June 5, 2012
Old API ≠ MR1
                        New API ≠ MR2


Tuesday, June 5, 2012
Old API         New API
                              o.a.h.mapred   o.a.h.mapreduce



                        MR1       ✓                ✓

                        MR2       ✓                ✓

Tuesday, June 5, 2012
Tuesday, June 5, 2012
Try out MR2

                    • Apache Hadoop 2.0.0-alpha
                     • hadoop.apache.org
                    • CDH4 and Cloudera Manager
                     • cloudera.com
                    • Cloud - Apache Whirr
Tuesday, June 5, 2012
MR1
   <dependency>
       <groupId>org.apache.hadoop</groupId>
       <artifactId>hadoop-client</artifactId>
       <version>1.0.3</version>
   </dependency>


     MR2
   <dependency>
       <groupId>org.apache.hadoop</groupId>
       <artifactId>hadoop-client</artifactId>
       <version>2.0.0-alpha</version>
   </dependency>

Tuesday, June 5, 2012
TODO

                    • Still alpha status
                    • Performance tuning
                    • Usability bug fixes
                    • RM recovery
                    • Security in MR2 not complete

Tuesday, June 5, 2012
Questions?



Tuesday, June 5, 2012

Contenu connexe

En vedette

An Introduction to MapReduce 2 and YARN
An Introduction to MapReduce 2 and YARNAn Introduction to MapReduce 2 and YARN
An Introduction to MapReduce 2 and YARNtomwhite
 
Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5Cloudera, Inc.
 
Apache Spark Operations
Apache Spark OperationsApache Spark Operations
Apache Spark OperationsCloudera, Inc.
 
Realtime BigData Step by Step mit Lambda, Kafka, Storm und Hadoop
Realtime BigData Step by Step mit Lambda, Kafka, Storm und HadoopRealtime BigData Step by Step mit Lambda, Kafka, Storm und Hadoop
Realtime BigData Step by Step mit Lambda, Kafka, Storm und HadoopValentin Zacharias
 
Spark on YARN: The Road Ahead
Spark on YARN: The Road AheadSpark on YARN: The Road Ahead
Spark on YARN: The Road AheadCloudera, Inc.
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSHortonworks
 
The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)Cloudera, Inc.
 
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Cloudera, Inc.
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...StampedeCon
 
[225]yarn 기반의 deep learning application cluster 구축 김제민
[225]yarn 기반의 deep learning application cluster 구축 김제민[225]yarn 기반의 deep learning application cluster 구축 김제민
[225]yarn 기반의 deep learning application cluster 구축 김제민NAVER D2
 
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudPart 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudCloudera, Inc.
 
Install Apache Hadoop for Development/Production
Install Apache Hadoop for  Development/ProductionInstall Apache Hadoop for  Development/Production
Install Apache Hadoop for Development/ProductionIMC Institute
 
Apache Spark & Hadoop : Train-the-trainer
Apache Spark & Hadoop : Train-the-trainerApache Spark & Hadoop : Train-the-trainer
Apache Spark & Hadoop : Train-the-trainerIMC Institute
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Cloudera, Inc.
 
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Emilio Coppa
 
Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360Cloudera, Inc.
 

En vedette (19)

An Introduction to MapReduce 2 and YARN
An Introduction to MapReduce 2 and YARNAn Introduction to MapReduce 2 and YARN
An Introduction to MapReduce 2 and YARN
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 
Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5
 
Apache Spark Operations
Apache Spark OperationsApache Spark Operations
Apache Spark Operations
 
Realtime BigData Step by Step mit Lambda, Kafka, Storm und Hadoop
Realtime BigData Step by Step mit Lambda, Kafka, Storm und HadoopRealtime BigData Step by Step mit Lambda, Kafka, Storm und Hadoop
Realtime BigData Step by Step mit Lambda, Kafka, Storm und Hadoop
 
Spark on YARN: The Road Ahead
Spark on YARN: The Road AheadSpark on YARN: The Road Ahead
Spark on YARN: The Road Ahead
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)
 
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1

 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
 
[225]yarn 기반의 deep learning application cluster 구축 김제민
[225]yarn 기반의 deep learning application cluster 구축 김제민[225]yarn 기반의 deep learning application cluster 구축 김제민
[225]yarn 기반의 deep learning application cluster 구축 김제민
 
Top 5 IoT Use Cases
Top 5 IoT Use CasesTop 5 IoT Use Cases
Top 5 IoT Use Cases
 
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudPart 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
 
Install Apache Hadoop for Development/Production
Install Apache Hadoop for  Development/ProductionInstall Apache Hadoop for  Development/Production
Install Apache Hadoop for Development/Production
 
Apache Spark & Hadoop : Train-the-trainer
Apache Spark & Hadoop : Train-the-trainerApache Spark & Hadoop : Train-the-trainer
Apache Spark & Hadoop : Train-the-trainer
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals
 
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)
 
Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360
 

Similaire à Map Reduce v2 and YARN - CHUG - 20120604

Ruby CI with Jenkins
Ruby CI with JenkinsRuby CI with Jenkins
Ruby CI with Jenkinscowboyd
 
Michael mahlberg exploratory-testing-the_missing_half_of_bdd
Michael mahlberg exploratory-testing-the_missing_half_of_bddMichael mahlberg exploratory-testing-the_missing_half_of_bdd
Michael mahlberg exploratory-testing-the_missing_half_of_bddMichael Mahlberg
 
Introduction to Apache Pig
Introduction to Apache PigIntroduction to Apache Pig
Introduction to Apache PigTapan Avasthi
 
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)Chiradeep Vittal
 
High Availability in YARN
High Availability in YARNHigh Availability in YARN
High Availability in YARNArinto Murdopo
 
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaReal-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaData Science London
 

Similaire à Map Reduce v2 and YARN - CHUG - 20120604 (7)

Ruby CI with Jenkins
Ruby CI with JenkinsRuby CI with Jenkins
Ruby CI with Jenkins
 
Michael mahlberg exploratory-testing-the_missing_half_of_bdd
Michael mahlberg exploratory-testing-the_missing_half_of_bddMichael mahlberg exploratory-testing-the_missing_half_of_bdd
Michael mahlberg exploratory-testing-the_missing_half_of_bdd
 
Introduction to Apache Pig
Introduction to Apache PigIntroduction to Apache Pig
Introduction to Apache Pig
 
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
 
High Availability in YARN
High Availability in YARNHigh Availability in YARN
High Availability in YARN
 
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaReal-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera Impala
 
Hadoop-2 @ eBay
Hadoop-2 @ eBayHadoop-2 @ eBay
Hadoop-2 @ eBay
 

Plus de Chicago Hadoop Users Group

Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Chicago Hadoop Users Group
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChicago Hadoop Users Group
 
Everything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieEverything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieChicago Hadoop Users Group
 
An Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopAn Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopChicago Hadoop Users Group
 
HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917Chicago Hadoop Users Group
 
Avro - More Than Just a Serialization Framework - CHUG - 20120416
Avro - More Than Just a Serialization Framework - CHUG - 20120416Avro - More Than Just a Serialization Framework - CHUG - 20120416
Avro - More Than Just a Serialization Framework - CHUG - 20120416Chicago Hadoop Users Group
 

Plus de Chicago Hadoop Users Group (18)

Kinetica master chug_9.12
Kinetica master chug_9.12Kinetica master chug_9.12
Kinetica master chug_9.12
 
Chug dl presentation
Chug dl presentationChug dl presentation
Chug dl presentation
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Using Apache Drill
Using Apache DrillUsing Apache Drill
Using Apache Drill
 
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
 
Meet Spark
Meet SparkMeet Spark
Meet Spark
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your Business
 
An Overview of Ambari
An Overview of AmbariAn Overview of Ambari
An Overview of Ambari
 
Hadoop and Big Data Security
Hadoop and Big Data SecurityHadoop and Big Data Security
Hadoop and Big Data Security
 
Advanced Oozie
Advanced OozieAdvanced Oozie
Advanced Oozie
 
Scalding for Hadoop
Scalding for HadoopScalding for Hadoop
Scalding for Hadoop
 
Financial Data Analytics with Hadoop
Financial Data Analytics with HadoopFinancial Data Analytics with Hadoop
Financial Data Analytics with Hadoop
 
Everything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieEverything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about Oozie
 
An Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopAn Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache Hadoop
 
HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917
 
Hadoop in a Windows Shop - CHUG - 20120416
Hadoop in a Windows Shop - CHUG - 20120416Hadoop in a Windows Shop - CHUG - 20120416
Hadoop in a Windows Shop - CHUG - 20120416
 
Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815
 
Avro - More Than Just a Serialization Framework - CHUG - 20120416
Avro - More Than Just a Serialization Framework - CHUG - 20120416Avro - More Than Just a Serialization Framework - CHUG - 20120416
Avro - More Than Just a Serialization Framework - CHUG - 20120416
 

Dernier

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Dernier (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 

Map Reduce v2 and YARN - CHUG - 20120604

  • 1. An Introduction to MapReduce 2 and YARN Tom White, Cloudera @tom_e_white June 4, 2012 Chicago HUG Tuesday, June 5, 2012
  • 3. About me • Apache Hadoop Committer, PMC Member, Apache Member • Engineer at Cloudera working on core Hadoop • Founder of Apache Whirr • Author of “Hadoop: The Definitive Guide” • http://hadoopbook.com Tuesday, June 5, 2012
  • 4. First, whatʼs MapReduce 1? Tuesday, June 5, 2012
  • 6. Whatʼs wrong with MR1? Tuesday, June 5, 2012
  • 7. Motivation 1 • Scaling >4000 nodes • Fewer, larger clusters Tuesday, June 5, 2012
  • 8. Motivation 2 • HA of Job Tracker • Large, complex state Tuesday, June 5, 2012
  • 9. Motivation 3 • Poor resource utilization • Slots in MR1 are for either map or reduce Tuesday, June 5, 2012
  • 10. Yet Another Resource Negotiator Tuesday, June 5, 2012
  • 13. Node Manager is a generalized Task Tracker • Task Tracker • fixed number of map or reduce slots • Node Manager • containers with variable resource limits Tuesday, June 5, 2012
  • 16. MR is user space YARN is kernel Tuesday, June 5, 2012
  • 17. Bonus Apps • Distributed shell • MPI (MAPREDUCE-2911) • Master-worker (MAPREDUCE-3315) • Apache Giraph, Hama Tuesday, June 5, 2012
  • 20. Old API ≠ MR1 New API ≠ MR2 Tuesday, June 5, 2012
  • 21. Old API New API o.a.h.mapred o.a.h.mapreduce MR1 ✓ ✓ MR2 ✓ ✓ Tuesday, June 5, 2012
  • 23. Try out MR2 • Apache Hadoop 2.0.0-alpha • hadoop.apache.org • CDH4 and Cloudera Manager • cloudera.com • Cloud - Apache Whirr Tuesday, June 5, 2012
  • 24. MR1 <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>1.0.3</version> </dependency> MR2 <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.0.0-alpha</version> </dependency> Tuesday, June 5, 2012
  • 25. TODO • Still alpha status • Performance tuning • Usability bug fixes • RM recovery • Security in MR2 not complete Tuesday, June 5, 2012