Soumettre la recherche
Mettre en ligne
Big Data's Journey to ACID
•
Télécharger en tant que PPTX, PDF
•
0 j'aime
•
169 vues
Owen O'Malley
Suivre
A comparison of different tools for change management in big data systems.
Lire moins
Lire la suite
Technologie
Signaler
Partager
Signaler
Partager
1 sur 16
Télécharger maintenant
Recommandé
Ozone and HDFS’s evolution
Ozone and HDFS’s evolution
DataWorks Summit
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Dinesh Chitlangia
HPE Keynote Hadoop Summit San Jose 2016
HPE Keynote Hadoop Summit San Jose 2016
DataWorks Summit/Hadoop Summit
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
Ozone - Evolution of hdfs scalability
Ozone - Evolution of hdfs scalability
Dinesh Chitlangia
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
DataWorks Summit
Curb your insecurity with HDP
Curb your insecurity with HDP
DataWorks Summit/Hadoop Summit
Recommandé
Ozone and HDFS’s evolution
Ozone and HDFS’s evolution
DataWorks Summit
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Dinesh Chitlangia
HPE Keynote Hadoop Summit San Jose 2016
HPE Keynote Hadoop Summit San Jose 2016
DataWorks Summit/Hadoop Summit
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
Ozone - Evolution of hdfs scalability
Ozone - Evolution of hdfs scalability
Dinesh Chitlangia
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
DataWorks Summit
Curb your insecurity with HDP
Curb your insecurity with HDP
DataWorks Summit/Hadoop Summit
HDFS tiered storage
HDFS tiered storage
DataWorks Summit
HDFS Analysis for Small Files
HDFS Analysis for Small Files
DataWorks Summit/Hadoop Summit
Securing Spark Applications
Securing Spark Applications
DataWorks Summit/Hadoop Summit
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
DataWorks Summit
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioning
Evans Ye
Efficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and Arrow
DataWorks Summit/Hadoop Summit
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Spark Summit
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
DataWorks Summit/Hadoop Summit
Empower Data-Driven Organizations with HPE and Hadoop
Empower Data-Driven Organizations with HPE and Hadoop
DataWorks Summit/Hadoop Summit
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for Hadoop
Gwen (Chen) Shapira
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
DataWorks Summit
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
DataWorks Summit/Hadoop Summit
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...
DataWorks Summit
Apache Hive on ACID
Apache Hive on ACID
DataWorks Summit/Hadoop Summit
Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...
DataWorks Summit
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit
To The Cloud and Back: A Look At Hybrid Analytics
To The Cloud and Back: A Look At Hybrid Analytics
DataWorks Summit/Hadoop Summit
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
DataWorks Summit/Hadoop Summit
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
StreamNative
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
DataWorks Summit/Hadoop Summit
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing Meetup
Caserta
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BT
Cloudera, Inc.
Contenu connexe
Tendances
HDFS tiered storage
HDFS tiered storage
DataWorks Summit
HDFS Analysis for Small Files
HDFS Analysis for Small Files
DataWorks Summit/Hadoop Summit
Securing Spark Applications
Securing Spark Applications
DataWorks Summit/Hadoop Summit
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
DataWorks Summit
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioning
Evans Ye
Efficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and Arrow
DataWorks Summit/Hadoop Summit
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Spark Summit
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
DataWorks Summit/Hadoop Summit
Empower Data-Driven Organizations with HPE and Hadoop
Empower Data-Driven Organizations with HPE and Hadoop
DataWorks Summit/Hadoop Summit
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for Hadoop
Gwen (Chen) Shapira
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
DataWorks Summit
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
DataWorks Summit/Hadoop Summit
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...
DataWorks Summit
Apache Hive on ACID
Apache Hive on ACID
DataWorks Summit/Hadoop Summit
Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...
DataWorks Summit
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit
To The Cloud and Back: A Look At Hybrid Analytics
To The Cloud and Back: A Look At Hybrid Analytics
DataWorks Summit/Hadoop Summit
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
DataWorks Summit/Hadoop Summit
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
StreamNative
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
DataWorks Summit/Hadoop Summit
Tendances
(20)
HDFS tiered storage
HDFS tiered storage
HDFS Analysis for Small Files
HDFS Analysis for Small Files
Securing Spark Applications
Securing Spark Applications
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioning
Efficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and Arrow
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
Empower Data-Driven Organizations with HPE and Hadoop
Empower Data-Driven Organizations with HPE and Hadoop
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for Hadoop
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...
Apache Hive on ACID
Apache Hive on ACID
Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
To The Cloud and Back: A Look At Hybrid Analytics
To The Cloud and Back: A Look At Hybrid Analytics
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
Similaire à Big Data's Journey to ACID
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing Meetup
Caserta
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BT
Cloudera, Inc.
Introducing Kudu
Introducing Kudu
Jeremy Beard
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Cloudera, Inc.
Day1_Data Lake_v2.pdf
Day1_Data Lake_v2.pdf
JyotiMishra985288
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.
Managing storage on Prem and in Cloud
Managing storage on Prem and in Cloud
Howard Marks
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
huguk
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Timothy Spann
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera, Inc.
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Swiss Big Data User Group
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Mladen Kovacevic
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Cloudera, Inc.
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
markgrover
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Cloudera, Inc.
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloudera, Inc.
Denodo DataFest 2017: Business Needs for a Fast Data Strategy
Denodo DataFest 2017: Business Needs for a Fast Data Strategy
Denodo
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation in
RahulBhole12
Deep Dive into Azure SQL
Deep Dive into Azure SQL
Manpreet Singh
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
Cloudera, Inc.
Similaire à Big Data's Journey to ACID
(20)
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing Meetup
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BT
Introducing Kudu
Introducing Kudu
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Day1_Data Lake_v2.pdf
Day1_Data Lake_v2.pdf
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
Managing storage on Prem and in Cloud
Managing storage on Prem and in Cloud
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18
Denodo DataFest 2017: Business Needs for a Fast Data Strategy
Denodo DataFest 2017: Business Needs for a Fast Data Strategy
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation in
Deep Dive into Azure SQL
Deep Dive into Azure SQL
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
Plus de Owen O'Malley
Running An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid Them
Owen O'Malley
ORC Deep Dive 2020
ORC Deep Dive 2020
Owen O'Malley
Protect your private data with ORC column encryption
Protect your private data with ORC column encryption
Owen O'Malley
Fine Grain Access Control for Big Data: ORC Column Encryption
Fine Grain Access Control for Big Data: ORC Column Encryption
Owen O'Malley
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
Owen O'Malley
Strata NYC 2018 Iceberg
Strata NYC 2018 Iceberg
Owen O'Malley
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
Owen O'Malley
ORC Column Encryption
ORC Column Encryption
Owen O'Malley
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
Owen O'Malley
Protecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache Hadoop
Owen O'Malley
Data protection2015
Data protection2015
Owen O'Malley
Structor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop Clusters
Owen O'Malley
Hadoop Security Architecture
Hadoop Security Architecture
Owen O'Malley
Adding ACID Updates to Hive
Adding ACID Updates to Hive
Owen O'Malley
ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013
Owen O'Malley
ORC Files
ORC Files
Owen O'Malley
ORC File Introduction
ORC File Introduction
Owen O'Malley
Optimizing Hive Queries
Optimizing Hive Queries
Owen O'Malley
Next Generation Hadoop Operations
Next Generation Hadoop Operations
Owen O'Malley
Next Generation MapReduce
Next Generation MapReduce
Owen O'Malley
Plus de Owen O'Malley
(20)
Running An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid Them
ORC Deep Dive 2020
ORC Deep Dive 2020
Protect your private data with ORC column encryption
Protect your private data with ORC column encryption
Fine Grain Access Control for Big Data: ORC Column Encryption
Fine Grain Access Control for Big Data: ORC Column Encryption
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
Strata NYC 2018 Iceberg
Strata NYC 2018 Iceberg
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
ORC Column Encryption
ORC Column Encryption
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
Protecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache Hadoop
Data protection2015
Data protection2015
Structor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop Clusters
Hadoop Security Architecture
Hadoop Security Architecture
Adding ACID Updates to Hive
Adding ACID Updates to Hive
ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013
ORC Files
ORC Files
ORC File Introduction
ORC File Introduction
Optimizing Hive Queries
Optimizing Hive Queries
Next Generation Hadoop Operations
Next Generation Hadoop Operations
Next Generation MapReduce
Next Generation MapReduce
Dernier
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Igalia
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Drew Madelung
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
HampshireHUG
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Rafal Los
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
The Digital Insurer
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
UK Journal
Slack Application Development 101 Slides
Slack Application Development 101 Slides
praypatel2
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Puma Security, LLC
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
Results
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
Pixlogix Infotech
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
wesley chun
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
apidays
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
naman860154
How to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
naman860154
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Delhi Call girls
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Katpro Technologies
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
Khem
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
debabhi2
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Delhi Call girls
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Principled Technologies
Dernier
(20)
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Slack Application Development 101 Slides
Slack Application Development 101 Slides
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
How to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Big Data's Journey to ACID
1.
BIG DATA’S JOURNEY
TO ACID Owen O’Malley owen@cloudera.com October 2019 @owen_omalley
2.
WHY IS ACID
IMPORTANT?
3.
© 2019 Cloudera,
Inc. All rights reserved. 3 BIG DATA HAS A LOT OF CONCURRENCY • Your data changes continually. • Daily, hourly, or faster • Ad hoc solutions require a lot of work • Producers and consumers must agree • Distributed systems have lots of actors • And no global clock
4.
© 2019 Cloudera,
Inc. All rights reserved. 4 USE CASES • Updating dimension tables • Changing a user’s address • Deleting old records • GDPR user removal • Update/restate large fact tables • Fix problems after they are in the warehouse • Streaming data ingest • NOT OLTP
5.
THE SYSTEMS
6.
© 2019 Cloudera,
Inc. All rights reserved. 6 APACHE HADOOP MAP/REDUCE • Only supporting adding new directories • Provided isolation via the output committer. • Task isolation • Job isolation • Used HDFS atomic renames • Used _SUCCESS_ file to mark available directories
7.
© 2019 Cloudera,
Inc. All rights reserved. 7 APACHE HBASE • Provided point lookup and edits • Read & Write performance – low latency, low throughput • Row level atomicity • Tephra provided transactions, but lacks adoption • Write-Ahead Log (WAL) • Regular compactions
8.
© 2019 Cloudera,
Inc. All rights reserved. 8 TRADITIONAL APACHE HIVE • Provided Hive Meta-Store (HMS) to track tables • Provided structure for table layout • Value partitioning • Only add or remove partition operations were atomic • Only add partition was isolated • Provided simplistic locking
9.
© 2019 Cloudera,
Inc. All rights reserved. 9 APACHE HIVE ACID • Supports streaming writes • Integrated with SQL data manipulation commands • Insert, delete, update, merge • Snapshot isolation • Read & Write performance: high throughput, high latency • Lockless compaction • Writes delta directories • Assumes HDFS consistent directory listings
10.
© 2019 Cloudera,
Inc. All rights reserved. 10 APACHE HUDI • Designed for streaming data • Row level updates • WAL & compaction • Assumes HDFS • Provides three reading levels: • Compacted • Compacted + deltas • Deltas
11.
© 2019 Cloudera,
Inc. All rights reserved. 11 APACHE ICEBERG • Designed to support data in object stores (eg. S3) • Avoids inconsistent & slow directory listing • Tracks tables and partitions to file level • Supports column min, max, and count per file • Snapshot isolation • Writers automatically retry on conflict • Manifest files use copy on write • Supports time travel and rollback
12.
© 2019 Cloudera,
Inc. All rights reserved. 12 DATABRICKS DELTA • Open-source, but closed governance • Ignoring the proprietary version • Designed for object stores • Avoids inconsistent & slow directory listings • Snapshot isolation • Add, replace, remove data files
13.
CONCLUSIONS
14.
© 2019 Cloudera,
Inc. All rights reserved. 14 CONCLUSIONS • GDPR is huge and leading to redesign of data warehouse • Support for object stores like S3 is critical • Streaming ingest and processing is growing quickly • This area is under active development Will change over the next 6 months Hive ACID is adding Presto & Impala support. Iceberg is adding delta files and Hive support
15.
© 2019 Cloudera,
Inc. All rights reserved. 15 OVERVIEW OF HIGH THROUGHPUT SYSTEMS SQL data data ops Open Write Amp Amp Object Store Store Stream ingest ingest Engines Hive ACID Yes Govern Low Poor Good RW: Hive; R: Spark, Impala Hudi No Govern Low Poor Good RW: Spark; R: Hive, Presto Iceberg No Govern High Good Poor RW: Spark, Presto; R: Pig Delta No Source High Good Poor RW: Spark; R: Presto
16.
THANK YOU Owen O’Malley owen@cloudera.com @owen_omalley
Télécharger maintenant