The document discusses YARN (Yet Another Resource Negotiator), which is the cluster resource management layer of Hadoop. It describes the limitations of the previous Hadoop 1.0 architecture where MapReduce was responsible for both data processing and resource management. YARN was created to address these limitations by separating resource management from data processing. It discusses the components of YARN including the Resource Manager, Node Manager, Containers, and Application Master. It also provides examples of workloads that can run on YARN beyond MapReduce and describes the YARN architecture and how applications run on the YARN framework.
3. What’s in it for you?
Hadoop 1.0 (MR 1)
Limitations of Hadoop
1.0 (MR 1)
4. What’s in it for you?
Hadoop 1.0 (MR 1)
Limitations of Hadoop
1.0 (MR 1)
Need for YARN
Workloads running on
YARN
5. What’s in it for you?
Hadoop 1.0 (MR 1)
Limitations of Hadoop
1.0 (MR 1)
Need for YARN
What is YARN?
6. What’s in it for you?
Hadoop 1.0 (MR 1)
Limitations of Hadoop
1.0 (MR 1)
Need for YARN
What is YARN?
Workloads running on
YARN
7. What’s in it for you?
Hadoop 1.0 (MR 1)
Limitations of Hadoop
1.0 (MR 1)
Need for YARN
What is YARN?
Workloads running on
YARN
YARN Components
8. What’s in it for you?
Hadoop 1.0 (MR 1)
Limitations of Hadoop
1.0 (MR 1)
Need for YARN
What is YARN?
Workloads running on
YARN
YARN Components
YARN Architecture
9. What’s in it for you?
Hadoop 1.0 (MR 1)
Limitations of Hadoop
1.0 (MR 1)
Need for YARN
What is YARN?
Workloads running on
YARN
YARN Components
YARN Architecture
Demo on YARN
11. Hadoop 1.0 (MR 1)
HDFS
(data storage)
MapReduce
(data processing)
Hadoop 1.0
In Hadoop 1.0, MapReduce performed both
data processing and resource management
Data processing Resource management
12. Hadoop 1.0 (MR 1)
Job
Tracker
Task
Tracker
Allocated resources, performed
scheduling and monitored jobs
MapReduce consisted of
Job Tracker and Task Tracker
Task Trackers reported their progress
to the Job Tracker
Assigned map and reduce tasks to jobs
running on Task Trackers
Task Trackers processed the jobs
19. Limitations of Hadoop 1.0 (MR 1)
Due to a single JobTracker, scalability
became a bottleneck.
Cannot have a cluster size of more than
4000 nodes and cannot run
more than 40000 concurrent tasks
Scalability1
20. Limitations of Hadoop 1.0 (MR 1)
JobTracker is single point of
failure. Any failure kills all queued
and running jobs. Jobs need to be
resubmitted by
users
Availability issue2
Due to a single JobTracker, scalability
became a bottleneck.
Maximum cluster size – 4000 nodes
Maximum concurrent tasks - 40000
Scalability1
21. Limitations of Hadoop 1.0 (MR 1)
Due to predefined number of map
and reduce slots for each
TaskTracker, resource utilization
issues occur
Resource Utilization3
22. Limitations of Hadoop 1.0 (MR 1)
Problem in performing real-time
analysis and running Ad-hoc query as
MapReduce is batch driven
Limitations in running non-
MapReduce applications4
Due to predefined number of map
and reduce slots for each
TaskTracker, resource utilization
issues occur
Resource Utilization3
25. MapReduce
(data processing)
Other frameworks
(processing)
YARN
(cluster resource management)
HDFS
(data storage)
Hadoop 2.0
YARN solved those issues and users could
work on multiple processing models along
with MapReduce
HDFS
(data storage)
MapReduce
(data processing)
Hadoop 1.0
Designed to run MapReduce jobs only and
had issues in scalability, resource
utilization, etc.
Before YARN After YARN
Need for YARN
27. Solution - Hadoop 2.0 (YARN)
Scalability
Can have a cluster size of
more than 10,000 nodes
and can run
more than 1,00,000
concurrent tasks
28. Solution - Hadoop 2.0 (YARN)
Scalability
Can have a cluster size of
more than 10,000 nodes
and can run
more than 1,00,000
concurrent tasks
Compatibility
Applications developed for
Hadoop 1 runs on YARN
without any disruption or
availability issues
29. Solution - Hadoop 2.0 (YARN)
Scalability
Can have a cluster size of
more than 10,000 nodes
and can run
more than 1,00,000
concurrent tasks
Resource utilizationCompatibility
Allows dynamic
allocation of cluster
resources to improve
resource utilization
Applications developed for
Hadoop 1 runs on YARN
without any disruption or
availability issues
30. Solution - Hadoop 2.0 (YARN)
Scalability
Can have a cluster size of
more than 10,000 nodes
and can run
more than 1,00,000
concurrent tasks
Resource utilization Multitenancy
Can use open-source and
propriety data access
engines and perform real-
time analysis and running
ad-hoc query
Compatibility
Allows dynamic
allocation of cluster
resources to improve
resource utilization
Applications developed for
Hadoop 1 runs on YARN
without any disruption or
availability issues
32. What is YARN?
YARN – Yet Another Resource Negotiator
YARN is the cluster resource management layer of the Apache Hadoop Ecosystem,
which schedules jobs and assigns resources
33. What is YARN?
YARN – Yet Another Resource Negotiator
I want resources to
run my applications
MapReduce
Application
YARN is the cluster resource management layer of the Apache Hadoop Ecosystem,
which schedules jobs and assigns resources
34. What is YARN?
YARN – Yet Another Resource Negotiator
Memory
Network CPU
YARN provides the desired
resources
I want resources to
run my applications
MapReduce
Application
YARN is the cluster resource management layer of the Apache Hadoop Ecosystem,
which schedules jobs and assigns resources
35. Workloads running on YARN
Hadoop Distributed
File System
Cluster Resource
Management
BATCH
(MapReduce)
INTERACTIVE
(Tez)
Column
Oriented
Database
(HBase)
STREAMING
(Storm)
GRAPH
(Giraph)
IN-MEMORY
(Spark)
OTHERS
(Weave)
List of frameworks that runs on top of YARN:
40. YARN Components – Resource Manager
Scheduler
Applications
Manager
Resource
Manager
Ultimate authority that decides the
allocation of resources among all
the applications in the system
41. YARN Components – Resource Manager
Scheduler
Applications
Manager
Resource
Manager
Responsible for allocating resources to
various running applications
Does not perform monitoring or tracking
of status for the applications
Offers no guarantee about restarting
failed tasks due to hardware or
application failures
42. YARN Components – Resource Manager
Scheduler
Applications
Manager
Resource
Manager
Responsible for allocating resources to
various running applications
Does not perform monitoring or tracking
of status for the applications
Offers no guarantee about restarting
failed tasks due to hardware or
application failures
Responsible for accepting job-
submissions
Negotiates the first container for
executing the application specific
ApplicationMaster
Provides the service for restarting the
ApplicationMaster container on failure
44. YARN Components – Node Manager
Container App Master
Node
Manager
Slaves track processes and running
jobs and monitor each container’s
resource utilization
45. YARN Components – Node Manager
Container App Master
Node
Manager
Has a collection of resources like CPU,
memory, disk, network, etc.
Authenticates and provides rights to an
application to use specific amount of
resources
Node Manager
Monitors
Resource Usage,
CPU, Memory, etc.
46. YARN Components – Node Manager
Container App Master
Node
Manager
Has a collection of resources like CPU,
memory, disk, network, etc.
Authenticates and provides rights to an
application to use specific amount of
resources
Application Master manages resource needs of
individual applications
Interacts with Scheduler to acquire required
resources and Node Manager to execute and
monitor tasks
Node Manager
Monitors
Resource Usage,
CPU, Memory, etc. Resource
Manager
Application
Master
Node
Manager
Interacts Interacts
55. Running an application in YARN
Client
Client submits an application to the ResourceManager1
56. Running an application in YARN
Client
Client submits an application to the ResourceManager
Resource
Manager
ResourceManager allocates a container
1
2
57. Running an application in YARN
Client
Client submits an application to the ResourceManager
Resource
Manager
ResourceManager allocates a container
App Master
ApplicationMaster contacts the related NodeManager
1
2
3
58. Running an application in YARN
Client
Client submits an application to the ResourceManager
Resource
Manager
ResourceManager allocates a container
App Master
ApplicationMaster contacts the related NodeManager
Node
Manager
NodeManager launches the container
1
2
3
4
59. Running an application in YARN
Client
Client submits an application to the ResourceManager
Resource
Manager
ResourceManager allocates a container
App Master
ApplicationMaster contacts the related NodeManager
Node
Manager
NodeManager launches the container
container Container executes the ApplicationMaster
1
2
3
4
5