Soumettre la recherche
Mettre en ligne
Hive Optimizations and New Features in 0.11-0.13
•
Télécharger en tant que PPTX, PDF
•
7 j'aime
•
4,545 vues
Titre amélioré par l'IA
A
alanfgates
Suivre
Technologie
Signaler
Partager
Signaler
Partager
1 sur 19
Télécharger maintenant
Recommandé
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013
alanfgates
Hive acid-updates-summit-sjc-2014
Hive acid-updates-summit-sjc-2014
alanfgates
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
DataWorks Summit/Hadoop Summit
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
DataWorks Summit/Hadoop Summit
Optimizing Hive Queries
Optimizing Hive Queries
DataWorks Summit
De-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-Cloud
DataWorks Summit
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
Yifeng Jiang
Optimizing Hive Queries
Optimizing Hive Queries
Owen O'Malley
Recommandé
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013
alanfgates
Hive acid-updates-summit-sjc-2014
Hive acid-updates-summit-sjc-2014
alanfgates
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
DataWorks Summit/Hadoop Summit
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
DataWorks Summit/Hadoop Summit
Optimizing Hive Queries
Optimizing Hive Queries
DataWorks Summit
De-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-Cloud
DataWorks Summit
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
Yifeng Jiang
Optimizing Hive Queries
Optimizing Hive Queries
Owen O'Malley
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
The Apache Software Foundation
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
HiveACIDPublic
HiveACIDPublic
Inderaj (Raj) Bains
Tune up Yarn and Hive
Tune up Yarn and Hive
rxu
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
gluent.
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
t3rmin4t0r
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
DataWorks Summit
Apache Hive ACID Project
Apache Hive ACID Project
DataWorks Summit/Hadoop Summit
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
alanfgates
Using Apache Hive with High Performance
Using Apache Hive with High Performance
Inderaj (Raj) Bains
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
DataWorks Summit
Hive on Spark, production experience @Uber
Hive on Spark, production experience @Uber
Future of Data Meetup
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Mich Talebzadeh (Ph.D.)
Apache Hive on ACID
Apache Hive on ACID
DataWorks Summit/Hadoop Summit
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
Bikas Saha
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
DataWorks Summit/Hadoop Summit
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
Chris Nauroth
The Heterogeneous Data lake
The Heterogeneous Data lake
DataWorks Summit/Hadoop Summit
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
Douglas Bernardini
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
markgrover
Gunther hagleitner:apache hive & stinger
Gunther hagleitner:apache hive & stinger
hdhappy001
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
DataWorks Summit
Contenu connexe
Tendances
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
The Apache Software Foundation
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
HiveACIDPublic
HiveACIDPublic
Inderaj (Raj) Bains
Tune up Yarn and Hive
Tune up Yarn and Hive
rxu
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
gluent.
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
t3rmin4t0r
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
DataWorks Summit
Apache Hive ACID Project
Apache Hive ACID Project
DataWorks Summit/Hadoop Summit
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
alanfgates
Using Apache Hive with High Performance
Using Apache Hive with High Performance
Inderaj (Raj) Bains
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
DataWorks Summit
Hive on Spark, production experience @Uber
Hive on Spark, production experience @Uber
Future of Data Meetup
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Mich Talebzadeh (Ph.D.)
Apache Hive on ACID
Apache Hive on ACID
DataWorks Summit/Hadoop Summit
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
Bikas Saha
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
DataWorks Summit/Hadoop Summit
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
Chris Nauroth
The Heterogeneous Data lake
The Heterogeneous Data lake
DataWorks Summit/Hadoop Summit
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
Douglas Bernardini
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
markgrover
Tendances
(20)
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
HiveACIDPublic
HiveACIDPublic
Tune up Yarn and Hive
Tune up Yarn and Hive
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
Apache Hive ACID Project
Apache Hive ACID Project
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Using Apache Hive with High Performance
Using Apache Hive with High Performance
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
Hive on Spark, production experience @Uber
Hive on Spark, production experience @Uber
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Apache Hive on ACID
Apache Hive on ACID
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
The Heterogeneous Data lake
The Heterogeneous Data lake
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
Similaire à Hive Optimizations and New Features in 0.11-0.13
Gunther hagleitner:apache hive & stinger
Gunther hagleitner:apache hive & stinger
hdhappy001
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
DataWorks Summit
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
DataWorks Summit
La big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixit
Data Con LA
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30
Ashish Narasimham
Hadoop Now, Next and Beyond
Hadoop Now, Next and Beyond
DataWorks Summit
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
Hortonworks
Big data solutions in Azure
Big data solutions in Azure
Mostafa
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
Chris Nauroth
Stinger Initiative - Deep Dive
Stinger Initiative - Deep Dive
Hortonworks
Building Big data solutions in Azure
Building Big data solutions in Azure
Mostafa
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
Big Data Spain
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration story
Sunil Govindan
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
Shivaji Dutta
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe Workshop
Hortonworks
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
Hortonworks
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
bigdatagurus_meetup
An Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
alanfgates
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
Data Con LA
Similaire à Hive Optimizations and New Features in 0.11-0.13
(20)
Gunther hagleitner:apache hive & stinger
Gunther hagleitner:apache hive & stinger
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
La big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixit
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30
Hadoop Now, Next and Beyond
Hadoop Now, Next and Beyond
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
Big data solutions in Azure
Big data solutions in Azure
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
Stinger Initiative - Deep Dive
Stinger Initiative - Deep Dive
Building Big data solutions in Azure
Building Big data solutions in Azure
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration story
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe Workshop
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
An Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
Plus de alanfgates
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019
alanfgates
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
alanfgates
Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018
alanfgates
Hortonworks apache training
Hortonworks apache training
alanfgates
Keynote apache bd-eu-nov-2016
Keynote apache bd-eu-nov-2016
alanfgates
Big data spain keynote nov 2016
Big data spain keynote nov 2016
alanfgates
Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016
alanfgates
Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016
alanfgates
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
alanfgates
Hive acid-updates-strata-sjc-feb-2015
Hive acid-updates-strata-sjc-feb-2015
alanfgates
Stinger hadoop summit june 2013
Stinger hadoop summit june 2013
alanfgates
Strata feb2013
Strata feb2013
alanfgates
Plus de alanfgates
(12)
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018
Hortonworks apache training
Hortonworks apache training
Keynote apache bd-eu-nov-2016
Keynote apache bd-eu-nov-2016
Big data spain keynote nov 2016
Big data spain keynote nov 2016
Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016
Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive acid-updates-strata-sjc-feb-2015
Hive acid-updates-strata-sjc-feb-2015
Stinger hadoop summit june 2013
Stinger hadoop summit june 2013
Strata feb2013
Strata feb2013
Dernier
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
AliaaTarek5
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
BookNet Canada
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
Kari Kakkonen
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
BookNet Canada
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
Raghuram Pandurangan
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
Neo4j
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
LoriGlavin3
How to write a Business Continuity Plan
How to write a Business Continuity Plan
Databarracks
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
LoriGlavin3
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
Curtis Poe
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
Nathaniel Shimoni
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
LoriGlavin3
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
HarshalMandlekar2
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Alkin Tezuysal
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
Ingrid Airi González
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
LoriGlavin3
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
Skynet Technologies
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
IES VE
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
Alan Dix
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
LoriGlavin3
Dernier
(20)
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
How to write a Business Continuity Plan
How to write a Business Continuity Plan
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
Hive Optimizations and New Features in 0.11-0.13
1.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Hive for Analytic Workloads Alan Gates (@alanfgates)
2.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Stinger Project (announced February 2013) Batch AND Interactive SQL-IN-Hadoop Stinger Initiative A broad, community-based effort to drive the next generation of HIVE Hive 0.13, April 2014: • Hive on Apache Tez • SQL standard authorization • Permanent UDFs • Vectorized Processing Hive 0.11, May 2013: • Base Optimizations • SQL Analytic Functions • ORCFile, Modern File Format Hive 0.12, October 2013: • VARCHAR, DATE Types • ORCFile predicate pushdown • Advanced Optimizations • Performance Boosts via YARN Speed Improve Hive query performance by 100X to allow for interactive query times (seconds) Scale The only SQL interface to Hadoop designed for queries that scale from TB to PB SQL Support broadest range of SQL semantics for analytic applications running against Hadoop …all IN Hadoop Goals:
3.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Stinger Highlights • 13 months • 145 separate contributors – from 44 separate entities • 3 Hive releases, 0.11, 0.12, and 0.13 • 392,000 lines of new Java code
4.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Now this is not the end. It is not even the beginning of the end. But it is, perhaps, the end of the beginning. -Winston Churchill
5.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Hive 0.13 Performance • The TPC Benchmark™DS is a decision support benchmark that models queries and data maintenance. It evaluates decision support systems that examine large volumes of data to answer real-world business questions. • Test: 50 SQL queries on Hive 0.13 • Test Environment – Driven by the Hive Testbench: https://github.com/cartershanklin/hive-testbench – Nodes: 20 nodes, 256 GB per node – only 48G per node used for Hive – Drives: 6x 4TB WDC WD4000FYYZ-0 drives per node – Interconnect: 10GB – Processors: 2x Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz for total of 16 CPU cores per machine – Scale: 30K (30T total data)
6.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Benchmark Results Queries modified to have partition key that duplicates join key, making it easier for the optimizer to choose which partitions to scan.
7.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Benchmark Results Queries modified to have partition key that duplicates join key, making it easier for the optimizer to choose which partitions to scan.
8.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. SQL Semantics Release SQL Semantics Hive 0.10 & before SELECT, JOIN, WHERE, GROUP BY, HAVING, ORDER BY, UNION, ROLLUP/CUBE, subqueries in FROM Hive 0.11 Windowing functions (RANK, ROW_NUMBER) and OVER clause Hive 0.13 • Subqueries with IN, EXISTS in WHERE and HAVING • Common table expressions (WITH clause) • Join condition in WHERE • CREATE FUNCTION (stored on cluster) Next Steps • Temporary tables • Subqueries with equality and inequality operators • Full UNION support • Set operators, EXCEPT and INTERSECT
9.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Security Release Security Hive 0.12 & before • StorageBasedAuthorizationProvider, maps file level security • secure, based on HDFS security • coarse grained, no column or row level security • default, all advisory • everyone has grant permissions Hive 0.13 SQL standard security for tables, views, and databases • GRANT/REVOKE • ROLEs • Column and row level permissions via views Next Steps • Integration with XA Secure • Extend to cover execution of functions
10.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Data Type Conformance Release Available Data Types Hive 0.10 & before Integer types, floating types, string, array, map, struct, timestamp, binary Hive 0.11 decimal (default precision and scale only) Hive 0.12 date, varchar Hive 0.13 char, user defined precision and scale for decimal
11.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Read and Write, ACID Release Write Capabilities, ACID Compliance Hive 0.12 & before • INSERT and INSERT OVERWRITE available • Locking available, requires ZooKeeper for durability • No ACID Hive 0.13 • ACID compliant ingestion of data from streaming sources such as Flume and Storm • Snapshot isolation for readers Next Steps • Addition of INSERT … VALUES, UPDATE, DELETE • Multi-statement transactions: BEGIN, COMMIT, ROLLBACK • Integration with HCatalog Owen and I have a talk on this at 5:30 today.
12.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Optimizer Release Optimizer Hive 0.11 & before Rules based optimizer • Mostly simple rules such as push filter below join Hive 0.12 Correlation optimizer • Where possible combine related execution into single job Next Steps • Use Optiq for cost based optimization • Join ordering and operator selection using statistics and cost estimates • Expand statistics calculated and used in planning Julian has a talk on this at 4:35 today.
13.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. MapReduce is dead, Long live Hadoop
14.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. MapReduce is dead, Long live Hadoop Tez Talks: • A New Chapter in Hadoop Data Processing, today 12:05 • Hive on Apache Tez: Benchmarked at Yahoo! Scale, today 12:05 • Hive + Tez: A Performance Deep Dive, today 2:35
15.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. ORC File Format •Columnar format for complex data types •Built into Hive from 0.11 •Support for Pig via OrcLoader/OrcStorer •Support for MapReduce via HCat •Two levels of compression –Lightweight type-specific and generic •Built in indexes –Every 10,000 rows with position information –Min, Max, Sum, Count of each column –Supports seek to row number Page 15
16.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. ORC File Format • Hive 0.12 –Predicate Push Down –Improved run length encoding –Adaptive string dictionaries –Padding stripes to HDFS block boundaries • Hive 0.13 –Stripe-based Input Splits –Input Split elimination –Vectorized Reader –Customized Pig Load and Store functions –ACID support • Next Steps –Faster writes –Integer dictionaries –Better block buffering Page 16
17.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Vectorized Query Execution •Designed for Modern Processor Architectures –Avoid branching in the inner loop. –Make the most use of L1 and L2 cache. •How It Works –Process records in batches of 1,000 rows –Generate code from templates to minimize branching. •What It Gives –30x improvement in rows processed per second. –Initial prototype: 100M rows/sec on laptop • In Hive 0.13, initial (map) tasks vectorized • Current work: vectorize shuffle and reduce tasks Page 17
18.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Try it Yourself • Apache Hive 0.13 –http://hive.apache.org/downloads.html • Download and play with HDP-2.1 –http://hortonworks.com/products/hortonworks-sandbox/ for use on your laptop –http://hortonworks.com/hdp/ for use on your cluster
19.
© Hortonworks Inc.
2013. Confidential and Proprietary.© Hortonworks Inc. 2013. Confidential and Proprietary. Thank You! @alanfgates @hortonworks
Notes de l'éditeur
21 – 29 sec, scan one day of items table
93 – fact to fact left outer join over a years data, finished in around an hour 13 – full year 6 way star join
Télécharger maintenant