Submit Search
Upload
Hive acid-updates-strata-sjc-feb-2015
•
Download as PPTX, PDF
•
27 likes
•
10,278 views
A
alanfgates
Follow
Inserts update and deletes with transactions in Hive 0.14
Read less
Read more
Software
Report
Share
Report
Share
1 of 23
Download now
Recommended
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
DataWorks Summit
Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016
alanfgates
Hive Does ACID
Hive Does ACID
DataWorks Summit
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
DataWorks Summit
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
Big Data Spain
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
alanfgates
HiveACIDPublic
HiveACIDPublic
Inderaj (Raj) Bains
Hive acid-updates-summit-sjc-2014
Hive acid-updates-summit-sjc-2014
alanfgates
Recommended
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
DataWorks Summit
Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016
alanfgates
Hive Does ACID
Hive Does ACID
DataWorks Summit
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
DataWorks Summit
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
Big Data Spain
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
alanfgates
HiveACIDPublic
HiveACIDPublic
Inderaj (Raj) Bains
Hive acid-updates-summit-sjc-2014
Hive acid-updates-summit-sjc-2014
alanfgates
Hive acid and_2.x new_features
Hive acid and_2.x new_features
Alberto Romero
Apache Hive on ACID
Apache Hive on ACID
DataWorks Summit/Hadoop Summit
Hive: Loading Data
Hive: Loading Data
Benjamin Leonhardi
Strata feb2013
Strata feb2013
alanfgates
Stinger hadoop summit june 2013
Stinger hadoop summit june 2013
alanfgates
Apache Hive ACID Project
Apache Hive ACID Project
DataWorks Summit/Hadoop Summit
Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016
alanfgates
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
DataWorks Summit
Hive - 1455: Cloud Storage
Hive - 1455: Cloud Storage
Hortonworks
Data organization: hive meetup
Data organization: hive meetup
t3rmin4t0r
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
Llap: Locality is Dead
Llap: Locality is Dead
t3rmin4t0r
Optimizing Hive Queries
Optimizing Hive Queries
DataWorks Summit
Hive Data Modeling and Query Optimization
Hive Data Modeling and Query Optimization
Eyad Garelnabi
Ozone- Object store for Apache Hadoop
Ozone- Object store for Apache Hadoop
Hortonworks
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
Yifeng Jiang
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
DataWorks Summit
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
gluent.
A TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with Presto
Yu Liu
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
DataWorks Summit
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
alanfgates
Keynote apache bd-eu-nov-2016
Keynote apache bd-eu-nov-2016
alanfgates
More Related Content
What's hot
Hive acid and_2.x new_features
Hive acid and_2.x new_features
Alberto Romero
Apache Hive on ACID
Apache Hive on ACID
DataWorks Summit/Hadoop Summit
Hive: Loading Data
Hive: Loading Data
Benjamin Leonhardi
Strata feb2013
Strata feb2013
alanfgates
Stinger hadoop summit june 2013
Stinger hadoop summit june 2013
alanfgates
Apache Hive ACID Project
Apache Hive ACID Project
DataWorks Summit/Hadoop Summit
Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016
alanfgates
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
DataWorks Summit
Hive - 1455: Cloud Storage
Hive - 1455: Cloud Storage
Hortonworks
Data organization: hive meetup
Data organization: hive meetup
t3rmin4t0r
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
Llap: Locality is Dead
Llap: Locality is Dead
t3rmin4t0r
Optimizing Hive Queries
Optimizing Hive Queries
DataWorks Summit
Hive Data Modeling and Query Optimization
Hive Data Modeling and Query Optimization
Eyad Garelnabi
Ozone- Object store for Apache Hadoop
Ozone- Object store for Apache Hadoop
Hortonworks
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
Yifeng Jiang
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
DataWorks Summit
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
gluent.
A TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with Presto
Yu Liu
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
DataWorks Summit
What's hot
(20)
Hive acid and_2.x new_features
Hive acid and_2.x new_features
Apache Hive on ACID
Apache Hive on ACID
Hive: Loading Data
Hive: Loading Data
Strata feb2013
Strata feb2013
Stinger hadoop summit june 2013
Stinger hadoop summit june 2013
Apache Hive ACID Project
Apache Hive ACID Project
Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
Hive - 1455: Cloud Storage
Hive - 1455: Cloud Storage
Data organization: hive meetup
Data organization: hive meetup
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
Llap: Locality is Dead
Llap: Locality is Dead
Optimizing Hive Queries
Optimizing Hive Queries
Hive Data Modeling and Query Optimization
Hive Data Modeling and Query Optimization
Ozone- Object store for Apache Hadoop
Ozone- Object store for Apache Hadoop
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
A TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with Presto
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
Viewers also liked
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
alanfgates
Keynote apache bd-eu-nov-2016
Keynote apache bd-eu-nov-2016
alanfgates
Hortonworks apache training
Hortonworks apache training
alanfgates
Machine Learning in Big Data
Machine Learning in Big Data
DataWorks Summit/Hadoop Summit
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013
alanfgates
Big data spain keynote nov 2016
Big data spain keynote nov 2016
alanfgates
ORC 2015
ORC 2015
t3rmin4t0r
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014
alanfgates
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
Hakka Labs
Quark Virtualization Engine for Analytics
Quark Virtualization Engine for Analytics
DataWorks Summit/Hadoop Summit
The Evolution of Big Data Pipelines at Intuit
The Evolution of Big Data Pipelines at Intuit
DataWorks Summit/Hadoop Summit
Sparksee overview
Sparksee overview
Sparsity Technologies
GNW01: In-Memory Processing for Databases
GNW01: In-Memory Processing for Databases
Tanel Poder
GT.M: A Tried and Tested Open-Source NoSQL Database
GT.M: A Tried and Tested Open-Source NoSQL Database
Rob Tweed
Timeline service V2 at the Hadoop Summit SJ 2016
Timeline service V2 at the Hadoop Summit SJ 2016
Vrushali Channapattan
Simplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & Troubleshooting
DataWorks Summit/Hadoop Summit
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
DataWorks Summit/Hadoop Summit
The Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache Flink
DataWorks Summit/Hadoop Summit
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing Architectures
Humza Naseer
Viewers also liked
(19)
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Keynote apache bd-eu-nov-2016
Keynote apache bd-eu-nov-2016
Hortonworks apache training
Hortonworks apache training
Machine Learning in Big Data
Machine Learning in Big Data
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013
Big data spain keynote nov 2016
Big data spain keynote nov 2016
ORC 2015
ORC 2015
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
Quark Virtualization Engine for Analytics
Quark Virtualization Engine for Analytics
The Evolution of Big Data Pipelines at Intuit
The Evolution of Big Data Pipelines at Intuit
Sparksee overview
Sparksee overview
GNW01: In-Memory Processing for Databases
GNW01: In-Memory Processing for Databases
GT.M: A Tried and Tested Open-Source NoSQL Database
GT.M: A Tried and Tested Open-Source NoSQL Database
Timeline service V2 at the Hadoop Summit SJ 2016
Timeline service V2 at the Hadoop Summit SJ 2016
Simplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & Troubleshooting
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
The Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache Flink
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing Architectures
Similar to Hive acid-updates-strata-sjc-feb-2015
Apache Hive on ACID
Apache Hive on ACID
Hortonworks
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
DataWorks Summit
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?
DataWorks Summit
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
alanfgates
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
DataWorks Summit
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
DataWorks Summit
An In-Depth Look at Putting the Sting in Hive
An In-Depth Look at Putting the Sting in Hive
DataWorks Summit
ACID Transactions in Hive
ACID Transactions in Hive
Eugene Koifman
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019
alanfgates
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
DataWorks Summit
IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...
IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...
In-Memory Computing Summit
Thug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen Zhang
Chen Zhang
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloud
DataWorks Summit/Hadoop Summit
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?
DataWorks Summit
Apache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, Scale
Hortonworks
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
Hive & HBase For Transaction Processing
Hive & HBase For Transaction Processing
DataWorks Summit
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
StreamNative
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
DataWorks Summit
Similar to Hive acid-updates-strata-sjc-feb-2015
(20)
Apache Hive on ACID
Apache Hive on ACID
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
An In-Depth Look at Putting the Sting in Hive
An In-Depth Look at Putting the Sting in Hive
ACID Transactions in Hive
ACID Transactions in Hive
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...
IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...
Thug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen Zhang
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloud
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?
Apache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
Hive & HBase For Transaction Processing
Hive & HBase For Transaction Processing
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
Recently uploaded
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
MyIntelliSource, Inc.
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
ComplianceQuest1
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
Andolasoft Inc
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
SolGuruz
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
OnePlan Solutions
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
ABDERRAOUF MEHENNI
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
anilsa9823
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
Willy Marroquin (WillyDevNET)
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Steffen Staab
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
kalichargn70th171
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
aagamshah0812
Software Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
Arshad QA
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
panagenda
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
harshavardhanraghave
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
kellynguyen01
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Alberto González Trastoy
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
Wave PLM
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
ICS
Recently uploaded
(20)
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
Software Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
Hive acid-updates-strata-sjc-feb-2015
1.
© Hortonworks Inc.
2015 Hive 0.14 Does ACID February 2015 Page 1 Alan Gates gates@hortonworks.com @alanfgates
2.
© Hortonworks Inc.
2015 Page 2 • Hive only updated partitions –Insert overwrite rewrote an entire partition –Forced daily or even hourly partitions –Could add files to partition directory, but no file compaction • What about concurrent readers? –Ok for inserts, but overwrite caused races –There is a zookeeper lock manager, but… • No way to delete or update rows • No INSERT INTO T VALUES… –Breaks some tools History
3.
© Hortonworks Inc.
2015 Page 3 •Hadoop and Hive have always… –Worked without ACID –Perceived as tradeoff for performance •But, your data isn’t static –It changes daily, hourly, or faster –Ad hoc solutions require a lot of work –Managing change makes the user’s life better •Do or Do Not, There is NO Try Why is ACID Critical?
4.
© Hortonworks Inc.
2015 Page 4 • NOT OLTP!!! • Updating a Dimension Table –Changing a customer’s address • Delete Old Records –Remove records for compliance • Update/Restate Large Fact Tables –Fix problems after they are in the warehouse • Streaming Data Ingest –A continual stream of data coming in –Typically from Flume or Storm • NOT OLTP!!! Use Cases
5.
© Hortonworks Inc.
2015 Page 5 • New DML – INSERT INTO T VALUES(1, ‘fred’, ...); – UPDATE T SET (x = 5[, ...]) WHERE ... – DELETE FROM T WHERE ... – Supports partitioned and non-partitioned tables, WHERE clause can specify partition but not required • Restrictions – Table must have format that extends AcidInputFormat – currently ORC – Table must be bucketed and not sorted – can use 1 bucket but this will restrict write ||ism – Table must be marked transactional – create table T(...) clustered by (a) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true'); New SQL in Hive 0.14
6.
© Hortonworks Inc.
2015 Page 6 •Good –Handles compactions for us –Already has similar data model with LSM •Bad –No cross row transactions –Would require us to write a transaction manager over HBase, doable, but not less work –Hfile is column family based rather than columnar –HBase focused on point lookups and range scans –Warehousing requires full scans Why Not HBase?
7.
© Hortonworks Inc.
2015 Page 7 •HDFS Does Not Allow Arbitrary Writes –Store changes as delta files –Stitched together by client on read •Writes get a Transaction ID –Sequentially assigned by Metastore •Reads get Committed Transactions –Provides snapshot consistency –No locks required –Provide a snapshot of data from start of query Design
8.
© Hortonworks Inc.
2015 Stitching Buckets Together Page 8
9.
© Hortonworks Inc.
2015 Page 9 •Partition locations remain unchanged –Still warehouse/$db/$tbl/$part •Bucket Files Structured By Transactions –Base files $part/base_$tid/bucket_* –Delta files $part/delta_$tid_$tid/bucket_* HDFS Layout
10.
© Hortonworks Inc.
2015 Page 10 •Created new AcidInput/OutputFormat –Unique key is transaction, bucket, row •Reader returns correct version of row based on transaction state •Also Added Raw API for Compactor –Provides previous events as well •ORC implements new API –Extends records with change metadata –Add operation (d, u, i), transaction and key Input and Output Formats
11.
© Hortonworks Inc.
2015 Page 11 •Need to split buckets for MapReduce –Need to split base and deltas the same way –Use key ranges –Use indexes Distributing the Work
12.
© Hortonworks Inc.
2015 Page 12 • Existing lock managers –In memory - not durable –ZooKeeper - requires additional components to install, administer, etc. • Locks need to be integrated with transactions –commit/rollback must atomically release locks • We sort of have this database lying around which has ACID characteristics (metastore) • Transactions and locks stored in metastore • Uses metastore DB to provide unique, ascending ids for transactions and locks Transaction Manager
13.
© Hortonworks Inc.
2015 Page 13 •In Hive 0.14 DML statements are auto-commit –Working on adding BEGIN, COMMIT, ROLLBACK •Snapshot isolation –Reader will see consistent data for the duration of his/her query –May extend to other isolation levels in the future •Current transactions can be displayed using new SHOW TRANSACTIONS statement Transaction Model
14.
© Hortonworks Inc.
2015 Page 14 •Three types of locks –shared –semi-shared (can co-exist with shared, but not other semi-shared) –exclusive •Operations require different locks –SELECT, INSERT – shared –UPDATE, DELETE – semi-shared –DROP, INSERT OVERWRITE – exclusive Locking Model
15.
© Hortonworks Inc.
2015 Page 15 •Each transaction (or batch of transactions in streaming ingest) creates a new delta file •Too many files = NameNode •Need a way to –Collect many deltas into one delta – minor compaction –Rewrite base and delta to new base – major compaction Compactor
16.
© Hortonworks Inc.
2015 Page 16 •Run when there are 10 or more deltas (configurable) •Results in base + 1 delta Minor Compaction /hive/warehouse/purchaselog/ds=201403311000/base_0028000 /hive/warehouse/purchaselog/ds=201403311000/delta_0028001_0028100 /hive/warehouse/purchaselog/ds=201403311000/delta_0028101_0028200 /hive/warehouse/purchaselog/ds=201403311000/delta_0028201_0028300 /hive/warehouse/purchaselog/ds=201403311000/delta_0028301_0028400 /hive/warehouse/purchaselog/ds=201403311000/delta_0028401_0028500 /hive/warehouse/purchaselog/ds=201403311000/base_0028000 /hive/warehouse/purchaselog/ds=201403311000/delta_0028001_0028500
17.
© Hortonworks Inc.
2015 Page 17 •Run when deltas are 10% the size of base (configurable) •Results in new base Major Compaction /hive/warehouse/purchaselog/ds=201403311000/base_0028000 /hive/warehouse/purchaselog/ds=201403311000/delta_0028001_0028100 /hive/warehouse/purchaselog/ds=201403311000/delta_0028101_0028200 /hive/warehouse/purchaselog/ds=201403311000/delta_0028201_0028300 /hive/warehouse/purchaselog/ds=201403311000/delta_0028301_0028400 /hive/warehouse/purchaselog/ds=201403311000/delta_0028401_0028500 /hive/warehouse/purchaselog/ds=201403311000/base_0028500
18.
© Hortonworks Inc.
2015 Page 18 • Metastore thrift server will schedule and execute compactions –No need for user to schedule –User can initiate via new ALTER TABLE COMPACT statement • No locking required, compactions run at same time as select and DML –Compactor aware of readers, does not remove old files until readers have finished with them • Current compactions can be viewed via new SHOW COMPACTIONS statement Compactor Continued
19.
© Hortonworks Inc.
2015 Page 19 • Data is flowing in from generators in a stream • Without this, you have to add it to Hive in batches, often every hour –Thus your users have to wait an hour before they can see their data • New interface in hive.hcatalog.streaming lets applications write small batches of records and commit them –Users can now see data within a few seconds of it arriving from the data generators • Available for Apache Flume in HDP 2.1 and Storm in HDP 2.2 Application: Streaming Ingest
20.
© Hortonworks Inc.
2015 Page 20 •On the client hive.support.concurrency=true hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager hive.enforce.bucketing=true •On the metastore server hive.compactor.initiator.on=true hive.compactor.worker.threads=1 # or more Configuration
21.
© Hortonworks Inc.
2015 Page 21 • Phase 1, Hive 0.13 – Transaction and new lock manager – ORC file support – Automatic and manual compaction – Snapshot isolation – Streaming ingest via Flume • Phase 2, Hive 0.14 – INSERT … VALUES, UPDATE, DELETE • Phase 3, Hive 1.2(?) – Add support for only some columns in insert – INSERT into T (a, b) select c, d from U; – BEGIN, COMMIT, ROLLBACK • Future (all speculative based on user feedback) – Integration with HCatalog – Versioned or point in time queries – Streaming ingest of updates and deletes – Additional isolation levels such as dirty read or read committed – MERGE Phases of Development
22.
© Hortonworks Inc.
2015 Page 22 •JIRA: https://issues.apache.org/jira/browse/HI VE-5317 •Adds ACID semantics to Hive •Uses SQL standard commands –INSERT, UPDATE, DELETE •Provides scalable read and write access Conclusion
23.
© Hortonworks Inc.
2015 Thank You! Questions & Answers Page 23
Download now