SlideShare a Scribd company logo
1 of 69
TRANSACTIONS OVER HBASE
Alex Baranau @abaranau	

Gary Helmling @gario	

!
Continuuity
WHO WE ARE
• We’ve built Continuuity Reactor: the world’s first scale-out
application server for Hadoop
• Fast, easy development, deployment and management of
Hadoop and HBase apps
• Continuuity team has years of experience in using and contributing
to Open Source, and we intend to continue doing so.
AGENDA
• Transactions over HBase: Why? What?
• Implementation: How?
• The approach
• Transaction Manager
• Transaction in batch jobs
THE REACTOR
• Continuuity Reactor is an app platform built on Hadoop and HBase
• Collect, Process, Store, and Query data.
• A Flow is a real-time processor with exactly-once guarantee
• A flow is composed of flowlets, connected via queues
• All processing happens with ACID guarantees in transactions
HBase
Table
PROCESSING IN A FLOW
...Queue ...
...
Flowlet
... ...
HBase
Table
PROCESSING IN A FLOW
...Queue ...
...
Flowlet
... ...
HBase
Table
WRAP IN TRANSACTION!
...Queue ...
...
Flowlet
TRANSACTIONS: WHAT?
• Atomic - Entire transaction is committed as one
• Consistent - No partial state change due to failure
• Isolated - No dirty reads, transaction is only visible
after commit
• Durable - Once committed, data is persisted reliably
HBASE: QUICK “WHAT?”
• Open-source non-relational distributed column-
oriented database modeled after Google’s BigTable
• Data is stored in named Tables of Rows
• Row consists of Cells:

(row key, column family, column, timestamp) -> value
• Table is split into Regions, each served by a single node
in HBase cluster
WHAT ABOUT HBASE?
• Atomic operations on cell value: 

checkAndPut, checkAndDelete, increment, append
• Atomic batch of operations on rows within region
• No cross region atomic operations support
• No cross table atomic operations support
• No multi-RPC atomic operations support
TRANSACTIONS: WHY ELSE?
• Complex data access patterns
• Secondary Indexes
• Design for greater performance
• Efficient batching of data operations
• Client-side buffer reliability fix
IMPLEMENTATION
• Multi-Version Concurrency Control
• Cell version (timestamp) = transaction ID
• All writes in the same transaction use the transaction ID as timestamp
• Reads exclude other, uncommitted transactions (for isolation)
• Optimistic Concurrency Control
• Conflict detection at commit of transaction
• Write Conflict: two overlapping transactions write the same row
• Rollback of one transaction in case of conflict (whichever commits later)
OPTIMISTIC CONCURRENCY
CONTROL
• Avoids cost of locking rows and tables
• No deadlocks or lock escalations
• Cost of conflict detection and possible rollback
is higher
• Good if conflicts are rare: short transaction,
disjoint partitioning of work
ZooKeeper
TRANSACTIONS IN CONTEXT
Tx Manager	

(standby)
HBase
Master 1
Master 2	

RS 1
RS 2 RS 4
RS 3
Client 1
Client 2
Client N
Tx Manager	

(active)
TRANSACTION LIFE CYCLE
time
out
try abort
failed
roll back
in HBase
write
to
HBase
do work
Client Tx Manager
none
complete V
abortsucceeded
in progress
start tx
start
start tx
commit
try commit check conflicts
RPC API
invalid X
invalidate
failed
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
Tx Manager
Client 1
Client 2
write = 1002
read = 1001
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
Tx Manager
Client 1
start
write = 1002
read = 1001
Client 2
write = 1002
read = 1001
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
Tx Manager
Client 1
start
write = 1002
read = 1001
Client 2
write = 1003
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
Tx Manager
Client 1
increment
write = 1002
read = 1001
Client 2
write = 1003
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
Tx Manager
Client 1
increment
write = 1002
read = 1001
Client 2
write = 1003
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
Tx Manager
Client 1 start
write = 1002
read = 1001
Client 2
write = 1003
read = 1001
inprogress=[1002]
write = 1003
read = 1001
excluded=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
Tx Manager
Client 1 start
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002, 1003]
write = 1003
read = 1001
excluded=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
increment
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002, 1003]
write = 1003
read = 1001
excluded=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
commit
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002, 1003]
write = 1003
read = 1001
excluded=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
commit
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
write = 1003
read = 1001
excluded=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
commit
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
conflict!
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1 rollback
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1003 11
Tx Manager
Client 1 rollback
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1003 11
Tx Manager
Client 1
abort
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1003 11
Tx Manager
Client 1
abort
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1003 11
Tx Manager
Client 1
abort
write = 1002
read = 1001
Client 2
write = 1004
read = 1003
inprogress=[]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1003 11
Tx Manager
Client 1 start
Client 2
write = 1005
read = 1003
inprogress=[]
write = 1004
read = 1003
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1003 11
Tx Manager
Client 1
read
Client 2
write = 1005
read = 1003
inprogress=[]
write = 1004
read = 1003
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
conflict!
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1 rollback
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1 rollback failed
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
invalidate
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
invalidate
write = 1002
read = 1001
Client 2
write = 1004
read = 1003
inprogress=[]
invalid=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1 start
Client 2
write = 1005
read = 1003
inprogress=[]
invalid=[1002]
write = 1004
read = 1003
exclude = [1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
read
Client 2
write = 1005
read = 1003
inprogress=[]
invalid=[1002]
write = 1004
read = 1003
exclude = [1002]
invisible!
TRANSACTION MANAGER
• Create new transactions
• Provides monotonically increasing write pointers
• Maintains all in-progress, committed, and invalid transactions
• Detect conflicts
• Transaction =
	 	 	 Write Pointer: Timestamp for HBase writes
	 	 	 Read pointer: Upper bound timestamp for reads
	 	 	 Excludes: List of timestamps to exclude from reads
TRANSACTION MANAGER
• Simple & Fast
• All required state is in-memory
• Single point of failure?
• Persist all state to a write-ahead log
• Secondary Tx Manager watches for failure of Primary
• Failover can happen quickly
TRANSACTION MANAGER
Tx Manager
Current State
in progress
committed
invalid
read point
write point
start()
TRANSACTION MANAGER
Tx Manager
Current State
in progress (+)
committed
invalid
read point
write point ++
start()
Tx Log
started, <write pt>
HDFS
TRANSACTION MANAGER
Tx Manager
Current State
in progress (-)
committed (+)
invalid
read point
write point
commit()
Tx Log
start, <write pt>
commit, <write pt>
HDFS
TRANSACTION SNAPSHOTS
• Write-ahead log provides persistence
• Guarantees point-in-time recovery
• Longer the log grows, longer recovery takes
• Periodically write snapshot of full transaction state
• Snapshot + all new logs provides full state
Tx Manager
Current State
TRANSACTION SNAPSHOTS
Tx Log A
in progress
committed
invalid
read point
write point
HDFS
Tx Manager
Current State
TRANSACTION SNAPSHOTS
Tx Log A
in progress
committed
invalid
read point
write point Tx Log B1
HDFS
TRANSACTION SNAPSHOTS
Tx Log ATx Manager
in progress
committed
invalid
read point
write point
Current State
State Snapshot
in progress
committed
invalid
read point
write point
Tx Log B2
HDFS
TRANSACTION SNAPSHOTS
Tx Log ATx Manager
in progress
committed
invalid
read point
write point
Current State
State Snapshot
in progress
committed
invalid
read point
write point
Tx Log B
Tx Snapshot
in progress
committed
invalid
read point
write point
3
HDFS
TRANSACTION SNAPSHOTS
Tx Log ATx Manager
in progress
committed
invalid
read point
write point
Current State
State Snapshot
in progress
committed
invalid
read point
write point
Tx Log B
Tx Snapshot
in progress
committed
invalid
read point
write point
4
HDFS
HBase
TRANSACTION CLEANUP
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1 rollback failed
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
TRANSACTION CLEANUP:
DATA JANITOR
• RegionObserver coprocessor
• Maintains in-memory snapshot of recent invalid &
in-progress sets
• Periodically updates from transaction snapshot in
HDFS
• Purges data from invalid transactions and older
versions on flush & compaction
HBase
TRANSACTION CLEANUP:
DATA JANITOR
Tx Snapshot
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
refresh
Data Janitor	

(RegionObserver)
MemStore
preFlush()
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Cell TS Value
row1:col1 1004 12
1003 11
1002 11
Custom RegionScanner
HFile
Cell TS Value
HBase
TRANSACTION CLEANUP:
DATA JANITOR
Data Janitor	

(RegionObserver)
HFile
Cell TS Value
Custom RegionScanner
MemStore
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Tx Snapshot
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
preFlush()
Cell TS Value
row1:col1 1004 12
1003 11
1002 11
HBase
TRANSACTION CLEANUP:
DATA JANITOR
Data Janitor	

(RegionObserver)
HFile
Cell TS Value
row1:col1 1004 12Custom RegionScanner
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Tx Snapshot
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
MemStore
preFlush()
Cell TS Value
row1:col1 1004 12
1003 11
1002 11
HBase
TRANSACTION CLEANUP:
DATA JANITOR
Data Janitor	

(RegionObserver)
Custom RegionScanner
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Tx Snapshot
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
MemStore
preFlush()
Cell TS Value
row1:col1 1004 12
1003 11
1002 11
HFile
Cell TS Value
row1:col1 1004 12
HBase
TRANSACTION CLEANUP:
DATA JANITOR
Data Janitor	

(RegionObserver)
Custom RegionScanner
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Tx Snapshot
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
MemStore
preFlush()
Cell TS Value
row1:col1 1004 12
1003 11
1002 11
HFile
Cell TS Value
row1:col1 1004 12
1003 11
HBase
TRANSACTION CLEANUP:
DATA JANITOR
Data Janitor	

(RegionObserver)
Custom RegionScanner
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Tx Snapshot
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
MemStore
preFlush()
Cell TS Value
row1:col1 1004 12
1003 11
1002 11
HFile
Cell TS Value
row1:col1 1004 12
1003 11
HBase
TRANSACTION CLEANUP:
DATA JANITOR
Data Janitor	

(RegionObserver)
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Tx Snapshot
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Custom RegionScanner
preFlush()
MemStore
Cell TS Value
row1:col1 1004 12
1003 11
1002 11
HFile
Cell TS Value
row1:col1 1004 12
1003 11
REDUNDANT VERSIONS CLEANUP
• Every write is new version of a cell

• Heavily updated values (e.g. counters) result
in lots of versions
• Cleanup: keep at most one version of a
cell visible to all transactions
• Plug-in cleanup into existing Data Janitor CP
TTL
• Timestamp of a value is overwritten
with tx id
• Breaks support for “native TTL”
feature in HBase
• Tx ids and timestamps cannot be
aligned: more than 1K tx/sec wanted
TTL
• Encode timestamp in tx id
• tx id = (current ts) * 1,000,000 + counter-
within-ms
• 1B txs/sec max
• long value overflow in ~200 years
• Plug-in cleanup into existing Data Janitor CP
• Apply TTL logic on reads
TRANSACTIONS IN BATCH
• Batch jobs are long running
• No changes should be visible before job is done
• Batch jobs are multi-step
• Individual step can be restarted
• All steps succeed or all fail semantics needed
• Every step retry should be isolated
• Uncommitted changes should be visible within same
step but not to other steps
TRANSACTIONS IN BATCH
• Batch jobs are long-running
• No fine-grained conflict detection: last started
among successfully committed wins
• Delayed rollback execution
• Batch jobs are multi-step
• For now: batch job uses single transaction, every
step re-uses it
• For future: nested transactions for steps
TX OVERHEAD
• 2 RPC calls, each faster than Put in HBase
• Can be improved with batching
• To avoid overhead of transactions where
appropriate the ACID guarantees can be relaxed
• No need to change data format
• When reading: read latest version of a cell
• When writing: use ts = (current ts) * 1M + counter
WHAT’S NEXT?
• Open Source
• Continue Scaling Tx Manager
• Transaction Groups?
• Integration across other transactional stores
QS?
Looking for the chance to work with a team that is defining
a new category within Big Data?
!
We are hiring!
http://continuuity.com/careers
careers[at]continuuity.com
!
!
Alex Baranau @abaranau alexb[at]continuuity.com

More Related Content

Similar to Transactions Over Apache HBase

HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon
 
LeanXcale Presentation - Waterloo University
LeanXcale Presentation - Waterloo UniversityLeanXcale Presentation - Waterloo University
LeanXcale Presentation - Waterloo UniversityRicardo Jimenez-Peris
 
The End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional ManagementThe End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional ManagementRicardo Jimenez-Peris
 
Running Head CLIENT SERVER AUTHENTICATIONHANDLING CONCURRENT CL.docx
Running Head CLIENT SERVER AUTHENTICATIONHANDLING CONCURRENT CL.docxRunning Head CLIENT SERVER AUTHENTICATIONHANDLING CONCURRENT CL.docx
Running Head CLIENT SERVER AUTHENTICATIONHANDLING CONCURRENT CL.docxjoellemurphey
 
Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast DataPatrick McFadin
 
Dbms ii mca-ch9-transaction-processing-2013
Dbms ii mca-ch9-transaction-processing-2013Dbms ii mca-ch9-transaction-processing-2013
Dbms ii mca-ch9-transaction-processing-2013Prosanta Ghosh
 
Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...
Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...
Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...Ishan Thakkar
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0HBaseCon
 
Multi-service reactive streams using Spring, Reactor, RSocket
Multi-service reactive streams using Spring, Reactor, RSocketMulti-service reactive streams using Spring, Reactor, RSocket
Multi-service reactive streams using Spring, Reactor, RSocketStéphane Maldini
 
Service discovery like a pro (presented at reversimX)
Service discovery like a pro (presented at reversimX)Service discovery like a pro (presented at reversimX)
Service discovery like a pro (presented at reversimX)Eran Harel
 
Databases: Concurrency Control
Databases: Concurrency ControlDatabases: Concurrency Control
Databases: Concurrency ControlDamian T. Gordon
 
transactions-advanced for automatic payment.pptx
transactions-advanced for automatic payment.pptxtransactions-advanced for automatic payment.pptx
transactions-advanced for automatic payment.pptxssusereced02
 
Transaction Support in Pulsar 2.5.0
Transaction Support in Pulsar 2.5.0Transaction Support in Pulsar 2.5.0
Transaction Support in Pulsar 2.5.0StreamNative
 
5 application serversforproject
5 application serversforproject5 application serversforproject
5 application serversforprojectashish61_scs
 
VMworld 2013: Extreme Performance Series: vCenter of the Universe
VMworld 2013: Extreme Performance Series: vCenter of the UniverseVMworld 2013: Extreme Performance Series: vCenter of the Universe
VMworld 2013: Extreme Performance Series: vCenter of the UniverseVMworld
 

Similar to Transactions Over Apache HBase (20)

HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
 
LeanXcale Presentation - Waterloo University
LeanXcale Presentation - Waterloo UniversityLeanXcale Presentation - Waterloo University
LeanXcale Presentation - Waterloo University
 
The End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional ManagementThe End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional Management
 
Transaction
TransactionTransaction
Transaction
 
basil.pptx
basil.pptxbasil.pptx
basil.pptx
 
Running Head CLIENT SERVER AUTHENTICATIONHANDLING CONCURRENT CL.docx
Running Head CLIENT SERVER AUTHENTICATIONHANDLING CONCURRENT CL.docxRunning Head CLIENT SERVER AUTHENTICATIONHANDLING CONCURRENT CL.docx
Running Head CLIENT SERVER AUTHENTICATIONHANDLING CONCURRENT CL.docx
 
Consul scale
Consul scaleConsul scale
Consul scale
 
Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast Data
 
Dbms ii mca-ch9-transaction-processing-2013
Dbms ii mca-ch9-transaction-processing-2013Dbms ii mca-ch9-transaction-processing-2013
Dbms ii mca-ch9-transaction-processing-2013
 
Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...
Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...
Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0
 
Multi-service reactive streams using Spring, Reactor, RSocket
Multi-service reactive streams using Spring, Reactor, RSocketMulti-service reactive streams using Spring, Reactor, RSocket
Multi-service reactive streams using Spring, Reactor, RSocket
 
Service discovery like a pro (presented at reversimX)
Service discovery like a pro (presented at reversimX)Service discovery like a pro (presented at reversimX)
Service discovery like a pro (presented at reversimX)
 
Databases: Concurrency Control
Databases: Concurrency ControlDatabases: Concurrency Control
Databases: Concurrency Control
 
transactions-advanced for automatic payment.pptx
transactions-advanced for automatic payment.pptxtransactions-advanced for automatic payment.pptx
transactions-advanced for automatic payment.pptx
 
19.TRANSACTIONs.ppt
19.TRANSACTIONs.ppt19.TRANSACTIONs.ppt
19.TRANSACTIONs.ppt
 
Transaction Support in Pulsar 2.5.0
Transaction Support in Pulsar 2.5.0Transaction Support in Pulsar 2.5.0
Transaction Support in Pulsar 2.5.0
 
5 application serversforproject
5 application serversforproject5 application serversforproject
5 application serversforproject
 
Data (1)
Data (1)Data (1)
Data (1)
 
VMworld 2013: Extreme Performance Series: vCenter of the Universe
VMworld 2013: Extreme Performance Series: vCenter of the UniverseVMworld 2013: Extreme Performance Series: vCenter of the Universe
VMworld 2013: Extreme Performance Series: vCenter of the Universe
 

Recently uploaded

Internet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxInternet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxVelmuruganTECE
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Steel Structures - Building technology.pptx
Steel Structures - Building technology.pptxSteel Structures - Building technology.pptx
Steel Structures - Building technology.pptxNikhil Raut
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxRomil Mishra
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptNarmatha D
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptJasonTagapanGulla
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 

Recently uploaded (20)

Internet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxInternet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptx
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Steel Structures - Building technology.pptx
Steel Structures - Building technology.pptxSteel Structures - Building technology.pptx
Steel Structures - Building technology.pptx
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptx
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.ppt
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.ppt
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 

Transactions Over Apache HBase

  • 1. TRANSACTIONS OVER HBASE Alex Baranau @abaranau Gary Helmling @gario ! Continuuity
  • 2. WHO WE ARE • We’ve built Continuuity Reactor: the world’s first scale-out application server for Hadoop • Fast, easy development, deployment and management of Hadoop and HBase apps • Continuuity team has years of experience in using and contributing to Open Source, and we intend to continue doing so.
  • 3. AGENDA • Transactions over HBase: Why? What? • Implementation: How? • The approach • Transaction Manager • Transaction in batch jobs
  • 4. THE REACTOR • Continuuity Reactor is an app platform built on Hadoop and HBase • Collect, Process, Store, and Query data. • A Flow is a real-time processor with exactly-once guarantee • A flow is composed of flowlets, connected via queues • All processing happens with ACID guarantees in transactions
  • 5. HBase Table PROCESSING IN A FLOW ...Queue ... ... Flowlet ... ...
  • 6. HBase Table PROCESSING IN A FLOW ...Queue ... ... Flowlet ... ...
  • 8. TRANSACTIONS: WHAT? • Atomic - Entire transaction is committed as one • Consistent - No partial state change due to failure • Isolated - No dirty reads, transaction is only visible after commit • Durable - Once committed, data is persisted reliably
  • 9. HBASE: QUICK “WHAT?” • Open-source non-relational distributed column- oriented database modeled after Google’s BigTable • Data is stored in named Tables of Rows • Row consists of Cells:
 (row key, column family, column, timestamp) -> value • Table is split into Regions, each served by a single node in HBase cluster
  • 10. WHAT ABOUT HBASE? • Atomic operations on cell value: 
 checkAndPut, checkAndDelete, increment, append • Atomic batch of operations on rows within region • No cross region atomic operations support • No cross table atomic operations support • No multi-RPC atomic operations support
  • 11. TRANSACTIONS: WHY ELSE? • Complex data access patterns • Secondary Indexes • Design for greater performance • Efficient batching of data operations • Client-side buffer reliability fix
  • 12. IMPLEMENTATION • Multi-Version Concurrency Control • Cell version (timestamp) = transaction ID • All writes in the same transaction use the transaction ID as timestamp • Reads exclude other, uncommitted transactions (for isolation) • Optimistic Concurrency Control • Conflict detection at commit of transaction • Write Conflict: two overlapping transactions write the same row • Rollback of one transaction in case of conflict (whichever commits later)
  • 13. OPTIMISTIC CONCURRENCY CONTROL • Avoids cost of locking rows and tables • No deadlocks or lock escalations • Cost of conflict detection and possible rollback is higher • Good if conflicts are rare: short transaction, disjoint partitioning of work
  • 14. ZooKeeper TRANSACTIONS IN CONTEXT Tx Manager (standby) HBase Master 1 Master 2 RS 1 RS 2 RS 4 RS 3 Client 1 Client 2 Client N Tx Manager (active)
  • 15. TRANSACTION LIFE CYCLE time out try abort failed roll back in HBase write to HBase do work Client Tx Manager none complete V abortsucceeded in progress start tx start start tx commit try commit check conflicts RPC API invalid X invalidate failed
  • 16. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 Tx Manager Client 1 Client 2 write = 1002 read = 1001
  • 17. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 Tx Manager Client 1 start write = 1002 read = 1001 Client 2 write = 1002 read = 1001
  • 18. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 Tx Manager Client 1 start write = 1002 read = 1001 Client 2 write = 1003 read = 1001 inprogress=[1002]
  • 19. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 Tx Manager Client 1 increment write = 1002 read = 1001 Client 2 write = 1003 read = 1001 inprogress=[1002]
  • 20. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 Tx Manager Client 1 increment write = 1002 read = 1001 Client 2 write = 1003 read = 1001 inprogress=[1002]
  • 21. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 Tx Manager Client 1 start write = 1002 read = 1001 Client 2 write = 1003 read = 1001 inprogress=[1002] write = 1003 read = 1001 excluded=[1002]
  • 22. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 Tx Manager Client 1 start write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002, 1003] write = 1003 read = 1001 excluded=[1002]
  • 23. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 increment write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002, 1003] write = 1003 read = 1001 excluded=[1002]
  • 24. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 commit write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002, 1003] write = 1003 read = 1001 excluded=[1002]
  • 25. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 commit write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002] write = 1003 read = 1001 excluded=[1002]
  • 26. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 commit write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 27. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 conflict! write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 28. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 rollback write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 29. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1003 11 Tx Manager Client 1 rollback write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 30. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1003 11 Tx Manager Client 1 abort write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 31. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1003 11 Tx Manager Client 1 abort write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[]
  • 32. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1003 11 Tx Manager Client 1 abort write = 1002 read = 1001 Client 2 write = 1004 read = 1003 inprogress=[]
  • 33. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1003 11 Tx Manager Client 1 start Client 2 write = 1005 read = 1003 inprogress=[] write = 1004 read = 1003
  • 34. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1003 11 Tx Manager Client 1 read Client 2 write = 1005 read = 1003 inprogress=[] write = 1004 read = 1003
  • 35. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 conflict! write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 36. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 rollback write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 37. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 rollback failed write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 38. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 invalidate write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 39. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 invalidate write = 1002 read = 1001 Client 2 write = 1004 read = 1003 inprogress=[] invalid=[1002]
  • 40. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 start Client 2 write = 1005 read = 1003 inprogress=[] invalid=[1002] write = 1004 read = 1003 exclude = [1002]
  • 41. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 read Client 2 write = 1005 read = 1003 inprogress=[] invalid=[1002] write = 1004 read = 1003 exclude = [1002] invisible!
  • 42. TRANSACTION MANAGER • Create new transactions • Provides monotonically increasing write pointers • Maintains all in-progress, committed, and invalid transactions • Detect conflicts • Transaction = Write Pointer: Timestamp for HBase writes Read pointer: Upper bound timestamp for reads Excludes: List of timestamps to exclude from reads
  • 43. TRANSACTION MANAGER • Simple & Fast • All required state is in-memory • Single point of failure? • Persist all state to a write-ahead log • Secondary Tx Manager watches for failure of Primary • Failover can happen quickly
  • 44. TRANSACTION MANAGER Tx Manager Current State in progress committed invalid read point write point start()
  • 45. TRANSACTION MANAGER Tx Manager Current State in progress (+) committed invalid read point write point ++ start() Tx Log started, <write pt> HDFS
  • 46. TRANSACTION MANAGER Tx Manager Current State in progress (-) committed (+) invalid read point write point commit() Tx Log start, <write pt> commit, <write pt> HDFS
  • 47. TRANSACTION SNAPSHOTS • Write-ahead log provides persistence • Guarantees point-in-time recovery • Longer the log grows, longer recovery takes • Periodically write snapshot of full transaction state • Snapshot + all new logs provides full state
  • 48. Tx Manager Current State TRANSACTION SNAPSHOTS Tx Log A in progress committed invalid read point write point HDFS
  • 49. Tx Manager Current State TRANSACTION SNAPSHOTS Tx Log A in progress committed invalid read point write point Tx Log B1 HDFS
  • 50. TRANSACTION SNAPSHOTS Tx Log ATx Manager in progress committed invalid read point write point Current State State Snapshot in progress committed invalid read point write point Tx Log B2 HDFS
  • 51. TRANSACTION SNAPSHOTS Tx Log ATx Manager in progress committed invalid read point write point Current State State Snapshot in progress committed invalid read point write point Tx Log B Tx Snapshot in progress committed invalid read point write point 3 HDFS
  • 52. TRANSACTION SNAPSHOTS Tx Log ATx Manager in progress committed invalid read point write point Current State State Snapshot in progress committed invalid read point write point Tx Log B Tx Snapshot in progress committed invalid read point write point 4 HDFS
  • 53. HBase TRANSACTION CLEANUP Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 rollback failed write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 54. TRANSACTION CLEANUP: DATA JANITOR • RegionObserver coprocessor • Maintains in-memory snapshot of recent invalid & in-progress sets • Periodically updates from transaction snapshot in HDFS • Purges data from invalid transactions and older versions on flush & compaction
  • 55. HBase TRANSACTION CLEANUP: DATA JANITOR Tx Snapshot read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] refresh Data Janitor (RegionObserver) MemStore preFlush() read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] Cell TS Value row1:col1 1004 12 1003 11 1002 11 Custom RegionScanner HFile Cell TS Value
  • 56. HBase TRANSACTION CLEANUP: DATA JANITOR Data Janitor (RegionObserver) HFile Cell TS Value Custom RegionScanner MemStore read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] Tx Snapshot read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] preFlush() Cell TS Value row1:col1 1004 12 1003 11 1002 11
  • 57. HBase TRANSACTION CLEANUP: DATA JANITOR Data Janitor (RegionObserver) HFile Cell TS Value row1:col1 1004 12Custom RegionScanner read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] Tx Snapshot read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] MemStore preFlush() Cell TS Value row1:col1 1004 12 1003 11 1002 11
  • 58. HBase TRANSACTION CLEANUP: DATA JANITOR Data Janitor (RegionObserver) Custom RegionScanner read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] Tx Snapshot read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] MemStore preFlush() Cell TS Value row1:col1 1004 12 1003 11 1002 11 HFile Cell TS Value row1:col1 1004 12
  • 59. HBase TRANSACTION CLEANUP: DATA JANITOR Data Janitor (RegionObserver) Custom RegionScanner read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] Tx Snapshot read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] MemStore preFlush() Cell TS Value row1:col1 1004 12 1003 11 1002 11 HFile Cell TS Value row1:col1 1004 12 1003 11
  • 60. HBase TRANSACTION CLEANUP: DATA JANITOR Data Janitor (RegionObserver) Custom RegionScanner read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] Tx Snapshot read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] MemStore preFlush() Cell TS Value row1:col1 1004 12 1003 11 1002 11 HFile Cell TS Value row1:col1 1004 12 1003 11
  • 61. HBase TRANSACTION CLEANUP: DATA JANITOR Data Janitor (RegionObserver) read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] Tx Snapshot read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] Custom RegionScanner preFlush() MemStore Cell TS Value row1:col1 1004 12 1003 11 1002 11 HFile Cell TS Value row1:col1 1004 12 1003 11
  • 62. REDUNDANT VERSIONS CLEANUP • Every write is new version of a cell • Heavily updated values (e.g. counters) result in lots of versions • Cleanup: keep at most one version of a cell visible to all transactions • Plug-in cleanup into existing Data Janitor CP
  • 63. TTL • Timestamp of a value is overwritten with tx id • Breaks support for “native TTL” feature in HBase • Tx ids and timestamps cannot be aligned: more than 1K tx/sec wanted
  • 64. TTL • Encode timestamp in tx id • tx id = (current ts) * 1,000,000 + counter- within-ms • 1B txs/sec max • long value overflow in ~200 years • Plug-in cleanup into existing Data Janitor CP • Apply TTL logic on reads
  • 65. TRANSACTIONS IN BATCH • Batch jobs are long running • No changes should be visible before job is done • Batch jobs are multi-step • Individual step can be restarted • All steps succeed or all fail semantics needed • Every step retry should be isolated • Uncommitted changes should be visible within same step but not to other steps
  • 66. TRANSACTIONS IN BATCH • Batch jobs are long-running • No fine-grained conflict detection: last started among successfully committed wins • Delayed rollback execution • Batch jobs are multi-step • For now: batch job uses single transaction, every step re-uses it • For future: nested transactions for steps
  • 67. TX OVERHEAD • 2 RPC calls, each faster than Put in HBase • Can be improved with batching • To avoid overhead of transactions where appropriate the ACID guarantees can be relaxed • No need to change data format • When reading: read latest version of a cell • When writing: use ts = (current ts) * 1M + counter
  • 68. WHAT’S NEXT? • Open Source • Continue Scaling Tx Manager • Transaction Groups? • Integration across other transactional stores
  • 69. QS? Looking for the chance to work with a team that is defining a new category within Big Data? ! We are hiring! http://continuuity.com/careers careers[at]continuuity.com ! ! Alex Baranau @abaranau alexb[at]continuuity.com