Apache HBase at Airbnb

Apache HBase at Airbnb
JINGWEI LU, LIYIN TANG, AND JASON ZHANG
1

Event
Logs
MySQL
Dumps
Gold Cluster
HDFS
Hive
Kafk
a
Sqoo
p
Silver Cluster Spark Cluster
Spark
ReAi
r
Airflow Scheduling
S3
Presto Cluster
AirPal
Caravel
Tableau
Batch Infrastructure
Yarn HDFS
Hive
Yarn
Jingwei Lu, Liyin Tang, Jason Zhang
3

Streaming at Airbnb
Event
Logging
MySQL
BINLOG
Cluster
HDFS
Hive
Spinal tap
Presto Cluster
Yarn
Kafk
a
HBase
Spark Streaming
Datadog
Druid
Kafka
4

Stateless
Computation SinkSource
DStream DF DF

Stateful
Computatio
n
Source
DStream DF DF
Sink1
Sink2
Sink N
State Storage
RDD

Multiple Streams
DataFrame
Sink1
Process
A
Sink2
Sink3
SinkN
…
DataFrame
Sink1
Process
N
Sink2
Sink3
SinkN
…
Source
DStream
Align by Time
DataFram
e
DataFram
e
State
Storage
Source
DStream
…

Streaming + Batch
DataFrame
Sink1
Process
A
Sink2
Sink3
SinkN
…
DataFram
e
State
Storage
Sourc
e
DStream
Sourc
e
…
Align by Time
…
DataFrame
Sink1
Process
A
Sink2
Sink3
SinkN
…

AirStream Architecture
Sources
Stream #1 Stream #N
Hive Tables
HBase
Tables
Virtual Table Views for Computation
Sinks
…
Customized ComputationSpark SQL
Simple Config
HBase Services
Streaming
Sources
Druid

AirStream Architecture
Sources
Stream #1 Stream #N
Hive Tables
HBase
Tables
Virtual Table Views for Computation
Sinks
…
Customized ComputationSpark SQL
HBase Services
Streaming
Sources
Druid
Same Computation
for Batch
processing

State Store
• Merge changes
• Provide fast lookup
• Fast persistent storage across
streaming and batch jobs
14

Why HBase
Rich Functionalities
Rich Integration with Hadoop EcoSystem
Easy Management
Strong Community
Reliable and Scalable

HBase State Store
Operators in Airstream
16
Full Table Scan
Simple Aggregation
Bulk Upload
Key/Prefix Lookup
Update

Computation DAG
17
Input Data
Left Outer Join Result
Key Lookup

Key Space Design
• Hash partition key space
for load balance
• Composite key for K -> V
• Support full key lookup
• Prefix lookup supported for
all keys used in hash
function
Hash key1 key2 key3
Hash based on key prefix
Hash key1 key2
Lookup based on key prefix
key1 = ‘value1’ and key2 = ‘value2’
18

• Partition based on key before write
• Use bulk upload for large volume update
Write Performance
19

Case Study
Experiment realtime feedback
Update
Experiment Assignment
Event
Lookup
HBase
with TTL
Booking Event
Druid
Datado
g
20
one airstream job

Realtime Ingestion on HBase
Data Infrastructure
MySQL
Analytica
l Events
Kafka
Spark
Streamin
g
HBase
HDFS
Presto/Hive/Spar
k
Source
Inges
t
Realtime
Query
Snapsh
ot
Batch
Query
22

Access Data in HBase
HBase
Hive Presto
Spark
SQL
Spark
Streaming
Batch Jobs Interactive Query Streaming
HDFS
Snapshot
Table Mapping/Unifed View on realtime data
23

Snapshot & Reseed
HBase HDFS
Snapshot（HFile Links)
Bulk Upload
24

Case Study 1: Events Ingestion
Kafka
topic
…
topic
topic
Spark
Executor
1
…
Executor
2
Executor
K
HBase
DeDup
HDFS
Region1…Region2R
egion M
Daily
Snapshot
Realtime
Query
Hive
Presto
Events
Partition
25

Case Study 2: Streaming DB Export
KafkaRDS
Table1
…
Spinalta
p.
Table1
…
Table2
TableN
Spinaltap.
Table2
Spinaltap.
TableN
Spark
Executor
1
…
Executor2
Executor K
HBase
Region1
…
Region2
Region M
HDFS
Region1…Region2R
egion M
Daily Snapshot
Realtime Query
26

Case Study: Streaming DB Export
Rows CF: Colums Version Value
<ShardKey><DB_TABLE_#1><PK_a=A> id Fri May 19 00:33:19 2016 101
<ShardKey><DB_TABLE_#1><PK_a=A> city Fri May 19 00:33:19 2016 San Francisco
<ShardKey><DB_TABLE_#1><PK_a=A> city Fri May 10 00:34:19 2016 New York
<ShardKey><DB_TABLE_#2><PK_a=A’> id Fri May 19 00:33:19 2016 1
27

TXN 1
Commit_TS:
101
…
TXN 2
Commit_TS:
102
TXN 3
Commit_TS:
103
TXN N
Commit_TS:
N’
Binlog Order
28

TXN 1
Commit_TS:
101
…
TXN 2
Commit_TS:
103
TXN 3
Commit_TS:
102
TXN N
Commit_TS:
N’
NTP
Binlog Order
29

TXN 1
Commit_TS:
101
…
Binlog Order
TXN 2
Commit_TS:
103
TXN 3
Commit_TS:
102
TXN N
Commit_TS:
N’
Point-in-Time Restore on TS 102
30

Rows CF: Colums Version Value
<ShardKey><DB_TABLE_#1><PK_a=A> id bin100 101
<ShardKey><DB_TABLE_#1><PK_a=A> city bin101 San Francisco
<ShardKey><DB_TABLE_#1><PK_a=A> city bin102 New York
<ShardKey><DB_TABLE_#2><PK_a=A’> id bin100 1
31

Rows Version (Logical Offset) Value
<ShardKey><DB_TABLE_#1><2016-05-23 23><100> 100 mysql-bin.00000:100
32

Rows Version (Logical Offset) Value
33

Summary
Scalable and Reliable
Rich Stateful Computation
Rich Integration with Hadoop EcoSystem
Easy Operation

Apache HBase at Airbnb

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Apache HBase at Airbnb

Similaire à Apache HBase at Airbnb (20)

Plus de HBaseCon

Plus de HBaseCon (20)

Dernier

Dernier (20)

Apache HBase at Airbnb

Notes de l'éditeur