SlideShare a Scribd company logo
1 of 54
Download to read offline
Storage Systems for
Big Data
Sameer Tiwari
Hadoop Storage Architect, Pivotal Inc., @sameertech
Storage Systems for
Big Data
Sameer Tiwari
Hadoop Storage Architect, Pivotal Inc., @sameertech
Storage Hierarchy
- In-memory KV Store
- Extremely fast access

Other KV

- Large indexed Tables
- Fast Random access
- Consistent



- Large Distributed Storage
- High aggregate throughput


General purpose FS

Posix filesystem. *nix
Storage Hierarchy
- In-memory KV Store
- Extremely fast access

Other KV

- Large indexed Tables
- Fast Random access
- Consistent



- Large Distributed Storage
- High aggregate throughput


General purpose FS

Posix filesystem. *nix
Hadoop Distributed File System(HDFS)
● History
○ Based on Google File System Paper (2003)
○ Built at Yahoo by a small team
● Goals
○ Tolerance to Hardware failure
○ Sequential access as opposed to Random
○ High aggregated throughput for Large Data Sets
○ “Write Once Read Many” paradigm
HDFS - Key Components


DataNode 1

Rack 1

DataNode 2

DataNode 3

DataNode 4

Rack 2
HDFS - Key Components

FileA: Metadata e.g. Size, Owner...
AB1:D1, AB1:D3, AB1:D4
AB2:D1, AB2:D3, AB2:D4



FileB: Metadata e.g. Size, Owner...
BB1:D1, BB1:D2, BB1:D4

DataNode 1

Rack 1

DataNode 2

DataNode 3

DataNode 4

Rack 2
HDFS - Key Components


DataNode 1

FileA: Metadata e.g. Size, Owner...
AB1:D1, AB1:D3, AB1:D4
AB2:D1, AB2:D3, AB2:D4



FileB: Metadata e.g. Size, Owner...
BB1:D1, BB1:D2, BB1:D4
Data Blocks

DataNode 2

DataNode 3

DataNode 4


Rack 1

Rack 2
HDFS - Key Components


DataNode 1



FileA: Metadata e.g. Size, Owner...
AB1:D1, AB1:D3, AB1:D4
AB2:D1, AB2:D3, AB2:D4



FileB: Metadata e.g. Size, Owner...
BB1:D1, BB1:D2, BB1:D4
Data Blocks

DataNode 2


DataNode 3




DataNode 4



Rack 1

Rack 2
Replication PipeLining
HDFS - Communication
HDFS Client API.

HDFS - Communication
HDFS Client API.



- DataNodeProtocol
- Non-RPC, Streaming
- Heavy Buffering



DataNode 1
HDFS - Communication
HDFS Client API.




- DataNodeProtocol
- Non-RPC, Streaming
- Heavy Buffering

DN registration: At init time
Heart Beat: Stats about Activity and Capacity
Block Report: List of blocks (hour)
Block Received: (Triggered by Client upload)




DataNode 1

DataNode 2
HDFS - Communication
HDFS Client API.




- DataNodeProtocol
- Non-RPC, Streaming
- Heavy Buffering

DN registration: At init time
Heart Beat: Stats about Activity and Capacity
Block Report: List of blocks (hour)
Block Received: (Triggered by Client upload)



DataNode 1


DataNode 2
HDFS - NameNode 1 of 4
● Heart of HDFS. Typically Lots of Memory ~128Gigs
● Hosts two important tables
● The HDFS Namespace: File->Block mapping
○ Persisted for backup
● The iNode table: Block->Datanode mapping
○ Not persisted.
○ Re-built from block reports
● HDFS is Journaled File system
○ Maintains a WAL called edit log
○ Edit log is merged into fsimage at a preset log size
HDFS - NameNode 2 of 4
● Can take on 3 roles
● Regular mode: Hosts the HDFS Namespace
● Backup mode: Secondary NN
○ Downloads fsimage regularly
○ Merges changes to namespace
○ Its a misnomer, it more of a checkpointing server
● Safemode: Startup time
○ Its a R/O mode
○ Collects data from active DNs
HDFS - NameNode 3 of 4
HA using Quorum Journal Manager (Hadoop 2.0+)


Active NN



Standby NN
HDFS - NameNode 4 of 4
● Replication Monitor: Fix over/under replicated blocks
○ Replica Modes: Corrupt, Current, Out-of-date,
● Lease Management: During file creation
○ Ensures single writer (multiple readers are ok)
○ Synchronously checks active lease
○ Asynchronously checks the entire Tree of leases
● Heartbeat monitor: Collects DN stats and marks them
down if no heartbeat recvd for ~10mins.
HDFS - DataNode
● Typical Machine: ~ 4TB X 12 disks JBOD
● Has no idea about HDFS, only knows about blocks
● Serves 2 types of requests
○ NN requests for Block create/delete/replicate
○ Serves Block R/W requests from Clients
● Maintains only one table
○ Block->Real Bytes on the local FS
○ Stored locally and not backed up
○ DN can re-build this table by scanning its local dir
HDFS - DataNode
● Creates a chksum file for each block
● Runs blockScanner() to find corrupt blocks
● DataNode to NameNode communication
○ Init - registration
○ Sends HeartBeat to NN every few secs
○ Block completion: blockReceived()
○ Lets NN respond with block commands
○ Sends full Block Report every hour
HDFS - Typical Deployment
Master Switch

Aggregator Switch 1






Aggregator Switch 2




Aggregator Switch 3




HDFS - Limitations
● NN holds the Namespace in a single Java process
● 64Gig Heap == ~250 million files + blocks
○ Federation sort of solves the problem
○ Moving Namespace to a KV Store is one solution
● Enterprise features slowly being added
○ Snapshots
○ NFS access
○ Geo replication
○ Run Length Encoding to reduce 3X copies to 1.3X
HDFS - Advanced Concepts
● Support for fadvise readahead and drop-behind
● HDFS takes advantage of multiple disks
○ Individual failures do not cause DN failures
○ Spills are parallelized
● Replica and Task placement
○ Done by DNSToSwitchMapping():resolve()
○ User supplied rack topology
○ IP address -> Rack id mapping
○ net.topology.* setttings in core-site.xml
HDFS - Advanced Concepts
● Couple of tools for Perf monitoring
○ Ganglia for HDFS
○ Nagios for general health of the machine.
Storage Hierarchy
- In-memory KV Store
- Extremely fast access

Other KV

- Large indexed Tables
- Fast Random access
- Consistent



- Large Distributed Storage
- High aggregate throughput


General purpose FS

Posix filesystem. *nix
Storage Hierarchy
- In-memory KV Store
- Extremely fast access

Other KV

- Large indexed Tables
- Fast Random access
- Consistent



- Large Distributed Storage
- High aggregate throughput


General purpose FS

Posix filesystem. *nix
● History

Based on Google’s Big Table (2006)
Built at Powerset (later acquired by Microsoft)
Facebook and Yahoo use it extensively (~1000 machines)

● Goals

Random R/W access
Tables with Billions of Rows X Millions of Columns
Often referred to as a “NoSQL” Data store
High speed ingest rate. FB == ~Billion msgs+chat per day.
Good consistency model
HBase - Key Components





Active and Backup




HBase - Data Model
● Google BigTable Paper on #2 says
A Bigtable is a sparse, distributed, persistent multidimensional sorted map. The
map is indexed by a row key, column key, and a timestamp; each value in the
map is an uninterpreted array of bytes

Let’s break that down over the next few slides...
HBase - Data Model
● Data is stored in Tables
● Tables have Rows and Columns
● Thats where the similarity ends

Columns are grouped into Column Families

● Rows are stored in a sorted(increasing) order

Implies, there is only one primary key

● Rows can be sparsely populated

Variable length rows are common

● Same row can be updated multiple times

Each will be stored as a versioned update
HBase - Data Model
Conceptual View
byte-array, Sorted by
byte order


Single column
in “contents”

ColumnFamily contents

Column => Column Family: Qualifier
e.g. Two Columns in the “anchor”

Row Key

Time Stamp

ColumnFamily anchor


t9 = "CNN"


t8 = ""



contents:html = "<html>..."



contents:html = "<html>..."
HBase - Data Model
Physical View
Row Key

Time Stamp

ColumnFamily contents



contents:html = "<html>..."



contents:html = "<html>..."

Row Key

Time Stamp

ColumnFamily anchor


t9 = "CNN"


t8 = ""
HBase - Table Objects
Region Server : ~200 Regions per Server

Data :
R1- R40








Region Servers




Blocks DFS
HBase - Data Model Operations

HTable class offers 4 techniques: get, put, delete and scan.
The first 3 have a single or batch mode available
//Scan example
public static final byte[] CF1 = "empData1".getBytes();
public static final byte[] ATTR1 = "empId".getBytes();
HTable htable = new HTable(blah... // create an instance of HTable
Scan scan = new Scan();
scan.addColumn(CF1, ATTR1);
ResultScanner rs = htable.getScanner(scan);
try {
for (Result r =; r != null; r = {
// do something with it...
} finally {
HBase - Data Versioning


By default a put() uses timestamp, but you can override it
Get.setMaxVersions() or Get.setTimeRange
By default a get() returns the latest version, but you can ask for any
All Data model operations are in !sorted order. Row:CF:Col:Version
Delete flavors: delete col+ver, delete col, delete col family, delete row
Deletes work by creating tombstone markers
■ delete() masks a put() till a major compaction takes place
■ Major compactions can change get() results
All operations are ATOMIC within a row
HBase - Read Path
-ROOT- Table for keeping
track of .META. table


Region Server1

regionInfo, Server

Q:Where is .META.?
A: RegionServer2

Q:Where is -ROOT-?
A: RegionServer1

.META. Table for all
regions in the system,
never splits


table, startKey, id::
regionInfo, Server

Q: HTable.get()

A: Row


HFile - 1
HFile - 2

Region Server2
HBase - Write Path

Region Server1

regionInfo, Server

Q:Where is .META.?
A: RegionServer2

Q:Where is -ROOT-?
A: RegionServer1



-ROOT- Table for keeping
track of .META. table

return Code




Offline flush


.META. Table for all
regions in the system,
never splits
table, startKey, id::
regionInfo, Server
HBase - Shell

Table MetaData: e.g. create/alter/drop/describe table
Table Data: e.g. put/scan/delete/count row(s)
Admin: e.g. flush/rebalance/compact regions, split tables
Replication Tools: e.g. add/enable/list/start/stop replication
Security: e.g. grant/revoke/list user permissions

Shell interaction example:
hbase(main):001:0> create 'myTable', 'myColFam1'
0 row(s) in 3.8890 seconds
hbase(main):002:0> put 'myTable’, 'row-1', 'myColFam1:col1', 'value-1'
0 row(s) in 0.1840 seconds
hbase(main):003:0> scan 'test'
ROW COLUMN+CELL row-11 column=myColFam1:col1, timestamp=1457381922312, value=value-1
1 row(s) in 0.1160 seconds
HBase - Advanced Topics

Bulk Loading
Cluster Replication
Merging and Splitting of regions
Predicate pushdown using Server side Filters
Bloom filters
Performance Tuning
HBase - What its not


HBase is not for everyone
Has no support for
■ Joins
■ Secondary indexes
■ Transactions
■ JDBC driver
Works well with large deployments
Requires good working knowledge of the Hadoop eco-system.
HBase - What its good at

Strongly consistent reads/writes


Automatic sharding


Automatic RegionServer failover


HBase supports MapReduce for using HBase as both source and sink


Works on top of HDFS


HBase provides Java Client AP and a REST/Thrift API


Block Cache and Bloom Filters support


Web UI and JMX support, for operational management
Storage Hierarchy
- In-memory KV Store
- Extremely fast access

Other KV

- Large indexed Tables
- Fast Random access
- Consistent



- Large Distributed Storage
- High aggregate throughput


General purpose FS

Posix filesystem. *nix
Storage Hierarchy
- In-memory KV Store
- Extremely fast access

Other KV

- Large indexed Tables
- Fast Random access
- Consistent



- Large Distributed Storage
- High aggregate throughput


General purpose FS

Posix filesystem. *nix

Redis is an open source, in-memory key-value store, with Disk persistence


Originally written at LLOGG by Salvator Sanfilippo ~2009


Written in ANSI C and works in most Linux Systems


No external dependencies


Very small ~1MB memory per instance


Datatypes can be data-structures: String, Hash, Set, Sorted Set.


Compressed in-memory representation of data


Clients are available in lots of languages. C, C#, Clojure, Scala, Lua...
Redis Key Components
CPU - 1
Highly Optimized
Memory Storage

CPU - 2
Highly Optimized
Memory Storage
Single Threaded Server
Highly Optimized
Network Layer

Single Threaded Server
Highly Optimized
Network Layer

Highly Optimized
Memory Storage
Single Threaded Server
Highly Optimized
Network Layer

Redis Key Components
CPU - 1
Highly Optimized
Memory Storage

CPU - 2
Highly Optimized
Memory Storage
Single Threaded Server
Highly Optimized
Network Layer

Single Threaded Server
Highly Optimized
Network Layer

Highly Optimized
Memory Storage
Single Threaded Server
Highly Optimized
Network Layer

Redis Network Layer


TCP Server

- Typical request/response system
- For 10K requests, 20K network calls
- If each call 1ms, 20secs is lost
- Use Batching:: called Pipelining
- Send one response for 10K requests
- Saving 10 seconds for 10K calls
Redis Network Layer


TCP Server
Response Queue

- Typical request/response system
- For 10K requests, 20K network calls
- If each call 1ms, 20secs is lost
- Use Batching:: called Pipelining
- Send one response for 10K requests
- Saving 10 seconds for 10K calls
Redis Network Layer


TCP Server
Response Queue


Bypass OS socket layer abstraction
○ Uses low level epoll(), kqueue(), select() calls
Low overhead of waiting threads.
Allows, handling of close to 10K concurrent clients

- Typical request/response system
- For 10K requests, 20K network calls
- If each call 1ms, 20secs is lost
- Use Batching:: called Pipelining
- Send one response for 10K requests
- Saving 10 seconds for 10K calls
Redis Memory Optimizations

Integer encoding for small values


Small hashes are converted to arrays

Leverage CPU caching


Uses 32 bit version when possible


Leads to 5X to 10X memory saving
Redis Enterprise Features
Cluster 1
Async. replication


Redis Master
Shard 1



Shard 2

Cluster 2
Async. replication


Redis Master
Redis WrapUp

Super fast in memory KV store


Provides a CLI


Typical apps will require client side coding


Spills to disk for large data-sets, with reduced performance


Upcoming “cluster” feature will keep 3 copies for HA
Storage Hierarchy
- In-memory KV Store
- Extremely fast access

Other KV

- Large indexed Tables
- Fast Random access
- Consistent



- Large Distributed Storage
- High aggregate throughput


General purpose FS

Posix filesystem. *nix
Storage Systems for
Big Data
Sameer Tiwari
Hadoop Storage Architect, Pivotal Inc., @sameertech

More Related Content

What's hot

Ravi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi namboori
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfsshrey mehrotra
Hadoop Distributed File System(HDFS) : Behind the scenes
Hadoop Distributed File System(HDFS) : Behind the scenesHadoop Distributed File System(HDFS) : Behind the scenes
Hadoop Distributed File System(HDFS) : Behind the scenesNitin Khattar
Hadoop HDFS NameNode HA
Hadoop HDFS NameNode HAHadoop HDFS NameNode HA
Hadoop HDFS NameNode HAHanborq Inc.
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemRutvik Bapat
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemVaibhav Jain
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemAnand Kulkarni
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduceUday Vakalapudi
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationAdam Kawa
Hadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, HortonworksHadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, HortonworksCloudera, Inc.
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Simplilearn
Coordinating Metadata Replication: Survival Strategy for Distributed Systems
Coordinating Metadata Replication: Survival Strategy for Distributed SystemsCoordinating Metadata Replication: Survival Strategy for Distributed Systems
Coordinating Metadata Replication: Survival Strategy for Distributed SystemsKonstantin V. Shvachko

What's hot (20)

Ravi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS Architecture
Hadoop HDFS Concepts
Hadoop HDFS ConceptsHadoop HDFS Concepts
Hadoop HDFS Concepts
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
Hadoop Distributed File System(HDFS) : Behind the scenes
Hadoop Distributed File System(HDFS) : Behind the scenesHadoop Distributed File System(HDFS) : Behind the scenes
Hadoop Distributed File System(HDFS) : Behind the scenes
Hadoop HDFS NameNode HA
Hadoop HDFS NameNode HAHadoop HDFS NameNode HA
Hadoop HDFS NameNode HA
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
Anatomy of file write in hadoop
Anatomy of file write in hadoopAnatomy of file write in hadoop
Anatomy of file write in hadoop
Hadoop hdfs
Hadoop hdfsHadoop hdfs
Hadoop hdfs
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
Hadoop HDFS Concepts
Hadoop HDFS ConceptsHadoop HDFS Concepts
Hadoop HDFS Concepts
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS Federation
Hadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, HortonworksHadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Anatomy of file read in hadoop
Anatomy of file read in hadoopAnatomy of file read in hadoop
Anatomy of file read in hadoop
Hadoop HDFS
Hadoop HDFSHadoop HDFS
Hadoop HDFS
Coordinating Metadata Replication: Survival Strategy for Distributed Systems
Coordinating Metadata Replication: Survival Strategy for Distributed SystemsCoordinating Metadata Replication: Survival Strategy for Distributed Systems
Coordinating Metadata Replication: Survival Strategy for Distributed Systems

Viewers also liked

Hadoop MapReduce Streaming and Pipes
Hadoop MapReduce  Streaming and PipesHadoop MapReduce  Streaming and Pipes
Hadoop MapReduce Streaming and PipesHanborq Inc.
Designing Data Pipelines Using Hadoop
Designing Data Pipelines Using HadoopDesigning Data Pipelines Using Hadoop
Designing Data Pipelines Using HadoopDataWorks Summit
Hadoop Streaming: Programming Hadoop without Java
Hadoop Streaming: Programming Hadoop without JavaHadoop Streaming: Programming Hadoop without Java
Hadoop Streaming: Programming Hadoop without JavaGlenn K. Lockwood
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaDesing Pathshala
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitionerSubhas Kumar Ghosh
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesAshraf Uddin
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...NoSQLmatters
Resume john sing_2015_01_29_executive_it_architect_pre-sales_engineer
Resume john sing_2015_01_29_executive_it_architect_pre-sales_engineerResume john sing_2015_01_29_executive_it_architect_pre-sales_engineer
Resume john sing_2015_01_29_executive_it_architect_pre-sales_engineerJohn Sing
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureVenu Anuganti
Basics of big data analytics hadoop
Basics of big data analytics hadoopBasics of big data analytics hadoop
Basics of big data analytics hadoopAmbuj Kumar
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXHow Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXBMC Software
BigData_Chp2: Hadoop & Map-Reduce
BigData_Chp2: Hadoop & Map-ReduceBigData_Chp2: Hadoop & Map-Reduce
BigData_Chp2: Hadoop & Map-ReduceLilia Sfaxi
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopHortonworks

Viewers also liked (20)

Hadoop MapReduce Streaming and Pipes
Hadoop MapReduce  Streaming and PipesHadoop MapReduce  Streaming and Pipes
Hadoop MapReduce Streaming and Pipes
Designing Data Pipelines Using Hadoop
Designing Data Pipelines Using HadoopDesigning Data Pipelines Using Hadoop
Designing Data Pipelines Using Hadoop
информатика 5. информация сообщение
информатика 5. информация сообщениеинформатика 5. информация сообщение
информатика 5. информация сообщение
Hadoop Streaming: Programming Hadoop without Java
Hadoop Streaming: Programming Hadoop without JavaHadoop Streaming: Programming Hadoop without Java
Hadoop Streaming: Programming Hadoop without Java
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitioner
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
HDFS Federation
HDFS FederationHDFS Federation
HDFS Federation
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
Resume john sing_2015_01_29_executive_it_architect_pre-sales_engineer
Resume john sing_2015_01_29_executive_it_architect_pre-sales_engineerResume john sing_2015_01_29_executive_it_architect_pre-sales_engineer
Resume john sing_2015_01_29_executive_it_architect_pre-sales_engineer
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
Basics of big data analytics hadoop
Basics of big data analytics hadoopBasics of big data analytics hadoop
Basics of big data analytics hadoop
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXHow Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
Types of pipes
Types of pipesTypes of pipes
Types of pipes
BigData_Chp2: Hadoop & Map-Reduce
BigData_Chp2: Hadoop & Map-ReduceBigData_Chp2: Hadoop & Map-Reduce
BigData_Chp2: Hadoop & Map-Reduce
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Hrm(Grp-2 Sec-2).
Hrm(Grp-2 Sec-2).Hrm(Grp-2 Sec-2).
Hrm(Grp-2 Sec-2).

Similar to Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis

Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaApache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaYahoo Developer Network
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage TieringHadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage TieringErik Krogen
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs[B4]deview 2012-hdfs
[B4]deview 2012-hdfsNAVER D2
Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolutionDataWorks Summit
Ceph at salesforce ceph day external presentation
Ceph at salesforce   ceph day external presentationCeph at salesforce   ceph day external presentation
Ceph at salesforce ceph day external presentationSameer Tiwari
Ceph Day San Jose - Ceph at Salesforce
Ceph Day San Jose - Ceph at Salesforce Ceph Day San Jose - Ceph at Salesforce
Ceph Day San Jose - Ceph at Salesforce Ceph Community
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...Yahoo Developer Network
Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development Ceph Community
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseCloudera, Inc.
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1sprdd
Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1sprdd
hadoop architecture -Big data hadoop
   hadoop architecture -Big data hadoop   hadoop architecture -Big data hadoop
hadoop architecture -Big data hadoopjasikadogra
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Junping Du
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateDataWorks Summit

Similar to Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis (20)

Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaApache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Hadoop and HDFS
Hadoop and HDFSHadoop and HDFS
Hadoop and HDFS
Hbase: an introduction
Hbase: an introductionHbase: an introduction
Hbase: an introduction
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage TieringHadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs[B4]deview 2012-hdfs
[B4]deview 2012-hdfs
Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolution
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
Ceph at salesforce ceph day external presentation
Ceph at salesforce   ceph day external presentationCeph at salesforce   ceph day external presentation
Ceph at salesforce ceph day external presentation
Ceph Day San Jose - Ceph at Salesforce
Ceph Day San Jose - Ceph at Salesforce Ceph Day San Jose - Ceph at Salesforce
Ceph Day San Jose - Ceph at Salesforce
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1
hadoop architecture -Big data hadoop
   hadoop architecture -Big data hadoop   hadoop architecture -Big data hadoop
hadoop architecture -Big data hadoop
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update

Recently uploaded

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10 CEO/Founder: Sri Ambati Keynote at Wells Fargo Day CEO/Founder: Sri Ambati Keynote at Wells Fargo CEO/Founder: Sri Ambati Keynote at Wells Fargo Day CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

Recently uploaded (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite CEO/Founder: Sri Ambati Keynote at Wells Fargo Day CEO/Founder: Sri Ambati Keynote at Wells Fargo CEO/Founder: Sri Ambati Keynote at Wells Fargo Day CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024

Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis

  • 1. Storage Systems for Big Data Sameer Tiwari Hadoop Storage Architect, Pivotal Inc., @sameertech
  • 2. Storage Systems for Big Data Sameer Tiwari Hadoop Storage Architect, Pivotal Inc., @sameertech
  • 3. Storage Hierarchy - In-memory KV Store - Extremely fast access Other KV Store(s) - Large indexed Tables - Fast Random access - Consistent HBase Redis - Large Distributed Storage - High aggregate throughput HDFS General purpose FS Posix filesystem. *nix
  • 4. Storage Hierarchy - In-memory KV Store - Extremely fast access Other KV Store(s) - Large indexed Tables - Fast Random access - Consistent HBase Redis - Large Distributed Storage - High aggregate throughput HDFS General purpose FS Posix filesystem. *nix
  • 5. Hadoop Distributed File System(HDFS) ● History ○ Based on Google File System Paper (2003) ○ Built at Yahoo by a small team ● Goals ○ Tolerance to Hardware failure ○ Sequential access as opposed to Random ○ High aggregated throughput for Large Data Sets ○ “Write Once Read Many” paradigm
  • 6. HDFS - Key Components NameNode Client1 -FileA Client2 -FileB DataNode 1 Rack 1 DataNode 2 DataNode 3 DataNode 4 Rack 2
  • 7. HDFS - Key Components NameNode File.create() Client1 -FileA FileA: Metadata e.g. Size, Owner... AB1:D1, AB1:D3, AB1:D4 AB2:D1, AB2:D3, AB2:D4 MetaData NN OPs Client2 -FileB FileB: Metadata e.g. Size, Owner... BB1:D1, BB1:D2, BB1:D4 DataNode 1 Rack 1 DataNode 2 DataNode 3 DataNode 4 Rack 2
  • 8. HDFS - Key Components NameNode File.create() Client1 -FileA File.write() DataNode 1 FileA: Metadata e.g. Size, Owner... AB1:D1, AB1:D3, AB1:D4 AB2:D1, AB2:D3, AB2:D4 MetaData NN OPs Client2 -FileB FileB: Metadata e.g. Size, Owner... BB1:D1, BB1:D2, BB1:D4 Data Blocks DN OPs DataNode 2 DataNode 3 DataNode 4 AB1 BB1 Rack 1 Rack 2
  • 9. HDFS - Key Components NameNode File.create() Client1 -FileA File.write() DataNode 1 AB1 AB2 FileA: Metadata e.g. Size, Owner... AB1:D1, AB1:D3, AB1:D4 AB2:D1, AB2:D3, AB2:D4 MetaData NN OPs Client2 -FileB FileB: Metadata e.g. Size, Owner... BB1:D1, BB1:D2, BB1:D4 Data Blocks DN OPs DataNode 2 BB1 DataNode 3 AB1 BB1 AB2 DataNode 4 AB1 AB2 BB1 Rack 1 Rack 2 Replication PipeLining
  • 10. HDFS - Communication HDFS Client API. RPC:ClientProtocol Client1 -FileA NameNode
  • 11. HDFS - Communication HDFS Client API. RPC:ClientProtocol NameNode Client1 -FileA HDFS Client API - DataNodeProtocol - Non-RPC, Streaming - Heavy Buffering AB1 AB2 BB1 DataNode 1
  • 12. HDFS - Communication HDFS Client API. RPC:ClientProtocol NameNode Client1 -FileA RPC:DataNodeProtocol HDFS Client API - DataNodeProtocol - Non-RPC, Streaming - Heavy Buffering DN registration: At init time Heart Beat: Stats about Activity and Capacity (secs) Block Report: List of blocks (hour) Block Received: (Triggered by Client upload) AB1 AB2 BB1 AB2 BB1 DataNode 1 DataNode 2
  • 13. HDFS - Communication HDFS Client API. RPC:ClientProtocol NameNode Client1 -FileA RPC:DataNodeProtocol HDFS Client API - DataNodeProtocol - Non-RPC, Streaming - Heavy Buffering DN registration: At init time Heart Beat: Stats about Activity and Capacity (secs) Block Report: List of blocks (hour) Block Received: (Triggered by Client upload) AB1 AB2 BB1 DataNode 1 BB1 Replication PipeLining. Streaming AB2 DataNode 2
  • 14. HDFS - NameNode 1 of 4 ● Heart of HDFS. Typically Lots of Memory ~128Gigs ● Hosts two important tables ● The HDFS Namespace: File->Block mapping ○ Persisted for backup ● The iNode table: Block->Datanode mapping ○ Not persisted. ○ Re-built from block reports ● HDFS is Journaled File system ○ Maintains a WAL called edit log ○ Edit log is merged into fsimage at a preset log size
  • 15. HDFS - NameNode 2 of 4 ● Can take on 3 roles ● Regular mode: Hosts the HDFS Namespace ● Backup mode: Secondary NN ○ Downloads fsimage regularly ○ Merges changes to namespace ○ Its a misnomer, it more of a checkpointing server ● Safemode: Startup time ○ Its a R/O mode ○ Collects data from active DNs
  • 16. HDFS - NameNode 3 of 4 HA using Quorum Journal Manager (Hadoop 2.0+) ZK ZK Cluster ZK Cluster Cluster Clients Clients Clients Active NN Journal Journal Nodes Journal Nodes Nodes DataNodes DataNodes DataNodes DataNodes Standby NN
  • 17. HDFS - NameNode 4 of 4 ● Replication Monitor: Fix over/under replicated blocks ○ Replica Modes: Corrupt, Current, Out-of-date, under-construction ● Lease Management: During file creation ○ Ensures single writer (multiple readers are ok) ○ Synchronously checks active lease ○ Asynchronously checks the entire Tree of leases ● Heartbeat monitor: Collects DN stats and marks them down if no heartbeat recvd for ~10mins.
  • 18. HDFS - DataNode ● Typical Machine: ~ 4TB X 12 disks JBOD ● Has no idea about HDFS, only knows about blocks ● Serves 2 types of requests ○ NN requests for Block create/delete/replicate ○ Serves Block R/W requests from Clients ● Maintains only one table ○ Block->Real Bytes on the local FS ○ Stored locally and not backed up ○ DN can re-build this table by scanning its local dir
  • 19. HDFS - DataNode ● Creates a chksum file for each block ● Runs blockScanner() to find corrupt blocks ● DataNode to NameNode communication ○ Init - registration ○ Sends HeartBeat to NN every few secs ○ Block completion: blockReceived() ○ Lets NN respond with block commands ○ Sends full Block Report every hour
  • 20. HDFS - Typical Deployment Master Switch Aggregator Switch 1 TOR RACK1 TOR ... RACK N (10-20) Aggregator Switch 2 TOR RACK1 ... Aggregator Switch 3 TOR ... RACK N (10-20) ...
  • 21. HDFS - Limitations ● NN holds the Namespace in a single Java process ● 64Gig Heap == ~250 million files + blocks ○ Federation sort of solves the problem ○ Moving Namespace to a KV Store is one solution ● Enterprise features slowly being added ○ Snapshots ○ NFS access ○ Geo replication ○ Run Length Encoding to reduce 3X copies to 1.3X
  • 22. HDFS - Advanced Concepts ● Support for fadvise readahead and drop-behind ● HDFS takes advantage of multiple disks ○ Individual failures do not cause DN failures ○ Spills are parallelized ● Replica and Task placement ○ Done by DNSToSwitchMapping():resolve() ○ User supplied rack topology ○ IP address -> Rack id mapping ○ net.topology.* setttings in core-site.xml
  • 23. HDFS - Advanced Concepts ● Couple of tools for Perf monitoring ○ Ganglia for HDFS ○ Nagios for general health of the machine.
  • 24. Storage Hierarchy - In-memory KV Store - Extremely fast access Other KV Store(s) - Large indexed Tables - Fast Random access - Consistent HBase Redis - Large Distributed Storage - High aggregate throughput HDFS General purpose FS Posix filesystem. *nix
  • 25. Storage Hierarchy - In-memory KV Store - Extremely fast access Other KV Store(s) - Large indexed Tables - Fast Random access - Consistent HBase Redis - Large Distributed Storage - High aggregate throughput HDFS General purpose FS Posix filesystem. *nix
  • 26. HBase ● History ○ ○ ○ Based on Google’s Big Table (2006) Built at Powerset (later acquired by Microsoft) Facebook and Yahoo use it extensively (~1000 machines) ● Goals ○ ○ ○ ○ ○ Random R/W access Tables with Billions of Rows X Millions of Columns Often referred to as a “NoSQL” Data store High speed ingest rate. FB == ~Billion msgs+chat per day. Good consistency model
  • 27. HBase - Key Components ZK ZK Cluster ZK Cluster Cluster Client HMaster JobTracker NameNode Master(s): Active and Backup HRegion Server TaskTracker DataNode Slaves: Many
  • 28. HBase - Data Model ● Google BigTable Paper on #2 says A Bigtable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes Let’s break that down over the next few slides...
  • 29. HBase - Data Model ● Data is stored in Tables ● Tables have Rows and Columns ● Thats where the similarity ends ○ Columns are grouped into Column Families ● Rows are stored in a sorted(increasing) order ○ Implies, there is only one primary key ● Rows can be sparsely populated ○ Variable length rows are common ● Same row can be updated multiple times ○ Each will be stored as a versioned update
  • 30. HBase - Data Model Conceptual View Row-Key byte-array, Sorted by byte order Versions timemillis() Single column in “contents” byte-array ColumnFamily contents Column => Column Family: Qualifier e.g. Two Columns in the “anchor” byte-array Row Key Time Stamp ColumnFamily anchor "com.cnn.www" t9 = "CNN" "com.cnn.www" t8 = "" "com.cnn.www" t5 contents:html = "<html>..." "com.cnn.www" t3 contents:html = "<html>..."
  • 31. HBase - Data Model Physical View Row Key Time Stamp ColumnFamily contents "com.cnn.www" t5 contents:html = "<html>..." "com.cnn.www" t3 contents:html = "<html>..." Row Key Time Stamp ColumnFamily anchor "com.cnn.www" t9 = "CNN" "com.cnn.www" t8 = ""
  • 32. HBase - Table Objects Region Server : ~200 Regions per Server HLog/WAL Logical Table Data : R1- R40 Region1 R1-R10 MemStore HFile Blocks Blocks Shards HLog/WAL Region2 R11-R20 MemStore Region Servers HFile Blocks Blocks HDFS H Blocks DFS Blocks HDFS HDFS Blocks Blocks HDFS HDFS Blocks Blocks
  • 33. HBase - Data Model Operations ○ ○ HTable class offers 4 techniques: get, put, delete and scan. The first 3 have a single or batch mode available //Scan example public static final byte[] CF1 = "empData1".getBytes(); public static final byte[] ATTR1 = "empId".getBytes(); HTable htable = new HTable(blah... // create an instance of HTable Scan scan = new Scan(); scan.addColumn(CF1, ATTR1); scan.setStartRow(Bytes.toBytes("200")); scan.setStopRow(Bytes.toBytes("500")); ResultScanner rs = htable.getScanner(scan); try { for (Result r =; r != null; r = { // do something with it... } finally { rs.close(); }
  • 34. HBase - Data Versioning ○ ○ ○ ○ ○ ○ ○ ○ By default a put() uses timestamp, but you can override it Get.setMaxVersions() or Get.setTimeRange By default a get() returns the latest version, but you can ask for any All Data model operations are in !sorted order. Row:CF:Col:Version Delete flavors: delete col+ver, delete col, delete col family, delete row Deletes work by creating tombstone markers LIMITATIONS: ■ delete() masks a put() till a major compaction takes place ■ Major compactions can change get() results All operations are ATOMIC within a row
  • 35. HBase - Read Path -ROOT- Table for keeping track of .META. table ZK ZK Cluster ZK Cluster Cluster Region Server1 .META.,region,key: regionInfo, Server Q:Where is .META.? A: RegionServer2 1 Q:Where is -ROOT-? A: RegionServer1 .META. Table for all regions in the system, never splits 2 table, startKey, id:: regionInfo, Server Client Q: HTable.get() 3 6 A: Row 4 HFile - 1 HFile - 2 Region Server2 5 MemStore
  • 36. HBase - Write Path ZK ZK Cluster ZK Cluster Cluster 1 Region Server1 .META.,region,key: regionInfo, Server Q:Where is .META.? A: RegionServer2 Q:Where is -ROOT-? A: RegionServer1 2 HTable.put() Client -ROOT- Table for keeping track of .META. table 3 6 return Code Region Server2 4 5 HLog/WAL MemStore Offline flush HDFS Blocks .META. Table for all regions in the system, never splits table, startKey, id:: regionInfo, Server
  • 37. HBase - Shell ○ ○ ○ ○ ○ Table MetaData: e.g. create/alter/drop/describe table Table Data: e.g. put/scan/delete/count row(s) Admin: e.g. flush/rebalance/compact regions, split tables Replication Tools: e.g. add/enable/list/start/stop replication Security: e.g. grant/revoke/list user permissions ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ Shell interaction example: hbase(main):001:0> create 'myTable', 'myColFam1' 0 row(s) in 3.8890 seconds hbase(main):002:0> put 'myTable’, 'row-1', 'myColFam1:col1', 'value-1' 0 row(s) in 0.1840 seconds hbase(main):003:0> scan 'test' ROW COLUMN+CELL row-11 column=myColFam1:col1, timestamp=1457381922312, value=value-1 1 row(s) in 0.1160 seconds hbase(main):004:0>
  • 38. HBase - Advanced Topics ○ ○ ○ ○ ○ ○ ○ ○ Bulk Loading Cluster Replication Merging and Splitting of regions Predicate pushdown using Server side Filters Bloom filters Co-Processors Snapshots Performance Tuning
  • 39. HBase - What its not ○ ○ ○ ○ HBase is not for everyone Has no support for ■ SQL ■ Joins ■ Secondary indexes ■ Transactions ■ JDBC driver Works well with large deployments Requires good working knowledge of the Hadoop eco-system.
  • 40. HBase - What its good at ● Strongly consistent reads/writes ● Automatic sharding ● Automatic RegionServer failover ● HBase supports MapReduce for using HBase as both source and sink ● Works on top of HDFS ● HBase provides Java Client AP and a REST/Thrift API ● Block Cache and Bloom Filters support ● Web UI and JMX support, for operational management
  • 41. Storage Hierarchy - In-memory KV Store - Extremely fast access Other KV Store(s) - Large indexed Tables - Fast Random access - Consistent HBase Redis - Large Distributed Storage - High aggregate throughput HDFS General purpose FS Posix filesystem. *nix
  • 42. Storage Hierarchy - In-memory KV Store - Extremely fast access Other KV Store(s) - Large indexed Tables - Fast Random access - Consistent HBase Redis - Large Distributed Storage - High aggregate throughput HDFS General purpose FS Posix filesystem. *nix
  • 43. Redis ● Redis is an open source, in-memory key-value store, with Disk persistence ● Originally written at LLOGG by Salvator Sanfilippo ~2009 ● Written in ANSI C and works in most Linux Systems ● No external dependencies ● Very small ~1MB memory per instance ● Datatypes can be data-structures: String, Hash, Set, Sorted Set. ● Compressed in-memory representation of data ● Clients are available in lots of languages. C, C#, Clojure, Scala, Lua...
  • 44. Redis Key Components Memory CPU - 1 Highly Optimized Memory Storage CPU - 2 Highly Optimized Memory Storage Single Threaded Server Highly Optimized Network Layer Single Threaded Server CPU - N Highly Optimized Network Layer Highly Optimized Memory Storage Single Threaded Server Highly Optimized Network Layer Network
  • 45. Redis Key Components Memory CPU - 1 Highly Optimized Memory Storage CPU - 2 Highly Optimized Memory Storage Single Threaded Server Highly Optimized Network Layer Single Threaded Server CPU - N Highly Optimized Network Layer Highly Optimized Memory Storage Single Threaded Server Highly Optimized Network Layer Network
  • 46. Redis Network Layer Client TCP Server - Typical request/response system - For 10K requests, 20K network calls - If each call 1ms, 20secs is lost - Use Batching:: called Pipelining - Send one response for 10K requests - Saving 10 seconds for 10K calls
  • 47. Redis Network Layer Client TCP Server 1,2,3,4…10000 Response Queue - Typical request/response system - For 10K requests, 20K network calls - If each call 1ms, 20secs is lost - Use Batching:: called Pipelining - Send one response for 10K requests - Saving 10 seconds for 10K calls
  • 48. Redis Network Layer Client TCP Server 1,2,3,4…10000 Response Queue ● ● ● Bypass OS socket layer abstraction ○ Uses low level epoll(), kqueue(), select() calls Low overhead of waiting threads. Allows, handling of close to 10K concurrent clients - Typical request/response system - For 10K requests, 20K network calls - If each call 1ms, 20secs is lost - Use Batching:: called Pipelining - Send one response for 10K requests - Saving 10 seconds for 10K calls
  • 49. Redis Memory Optimizations ● Integer encoding for small values ● Small hashes are converted to arrays ○ Leverage CPU caching ● Uses 32 bit version when possible ● Leads to 5X to 10X memory saving
  • 50. Redis Enterprise Features Cluster 1 Async. replication Slave1 Redis Master Shard 1 Slave2 Client Shard 2 Cluster 2 Async. replication Slave1 Redis Master Slave2
  • 51. Redis WrapUp ● Super fast in memory KV store ● Provides a CLI ● Typical apps will require client side coding ● Spills to disk for large data-sets, with reduced performance ● Upcoming “cluster” feature will keep 3 copies for HA
  • 52. Storage Hierarchy - In-memory KV Store - Extremely fast access Other KV Store(s) - Large indexed Tables - Fast Random access - Consistent HBase Redis - Large Distributed Storage - High aggregate throughput HDFS General purpose FS Posix filesystem. *nix
  • 54. Storage Systems for Big Data Sameer Tiwari Hadoop Storage Architect, Pivotal Inc., @sameertech