SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
Confidential - Do Not Share or Distribute
Apache Arrow Flight SQL
A universal standard for high performance
data transfers from databases
Alex Merced
Developer Advocate @ Dremio
Confidential - Do Not Share or Distribute
Agenda
● What if we could have for our data transfers…
● What options do we have?
● What is Apache Arrow? Is it enough?
● What is Arrow Flight? Is it enough?
● What is Arrow Flight SQL? Is it enough?
● What are the Arrow Flight SQL ODBC & JDBC
Drivers?
Confidential - Do Not Share or Distribute
What if we could have for our data transfers…
✓ Standardized APIs
✓ No need for many different drivers
✓ High performance transfers
✓ Parallel data transfers (1:many, many:many)
✓ Easy to implement on server side, easy to use on client side
Confidential - Do Not Share or Distribute
● Built in 1992 & 1997, respectively
○ Inherently row-based
○ Prevented the many:many of
client-to-database-APIs
○ But resulted in one:many of
API-to-database-protocols
Why don’t ODBC & JDBC fit in today’s analytics world?
Today
Yesterday
● Today: heterogeneous big data workloads
● Most tools involved with OLAP workloads
today are columnar
○ But there is no columnar transfer
standard
Confidential - Do Not Share or Distribute
Increasingly, columnar → columnar is the standard
Yesterday
Today
Confidential - Do Not Share or Distribute
● Custom protocols
○ One per server
○ Implemented individually by each client
What options do we have to keep things columnar?
Confidential - Do Not Share or Distribute
Pick your poison
Enter: Apache Arrow Flight
JDBC/ODBC Custom protocol
Standardize client interface ✅ ❌
Standardize client database access
interface
✅ ❌
Standardize server interface ❌ ❌
Prevent unnecessary serialization ❌ ✅
Confidential - Do Not Share or Distribute
What if?
Enter: Arrow Flight
Confidential - Do Not Share or Distribute
Before we go there:
Apache Arrow
Confidential - Do Not Share or Distribute
Introduction to Apache Arrow
● An in-memory columnar format
○ Includes libraries for working with the format
■ E.g., computation engine, IPC, serialization /
deserialization from file formats.
○ Support for many languages (12 currently)
● Co-created by Dremio and Wes McKinney
● Rapid adoption, popularity, and capability growth
Confidential - Do Not Share or Distribute
● Arrow is primarily geared towards operations on data in a single-node
○ No specification for cross-network transfers
■ Within a cluster
■ Client-server
11
Isn’t Arrow enough?
Enter: Arrow Flight
Confidential - Do Not Share or Distribute
12
What is Arrow Flight?
● A general-purpose client-server framework to simplify high performance transport of
large datasets over network interfaces
● Designed for data in the modern world
○ Column-oriented, rather than row-oriented
○ Arrow’s columnar format is designed for high compressibility and large numbers of
rows
● Makes including the client-side in distributed computing easier
● Enables parallel data transfers
Confidential - Do Not Share or Distribute
13
Parallel data transfers with Arrow Flight
Client
Node 2
Node N
Node 1
CPU
memory
CPU
memory
CPU
memory
CPU
memory
DoGet(<ticket>)
DoGet(<ticket>)
DoGet(<ticket>)
Confidential - Do Not Share or Distribute
Isn’t Arrow Flight Enough?
JDBC/ODBC Custom protocols Arrow Flight
Standardize client interface ✅ ❌ ✅
Standardize client database access
interface
✅ ❌ ❌
Standardize server interface ❌ ❌ ✅
Prevent unnecessary serialization ❌ ✅ ✅
Confidential - Do Not Share or Distribute
Isn’t Arrow Flight Enough?
● Client sends an arbitrary byte stream, server sends a result
○ The byte stream only has meaning for a particular server that’s chosen to
support it
○ Example – Dremio interprets the byte stream to be a UTF-8 encoded SQL
query string
● Flight is meant to serve any tabular data, not databases in particular
○ Catalog information is not part of Arrow Flight’s design
● Haven’t we seen these problems before?
○ ODBC and JDBC standardize database activities (e.g., query execution,
catalog access)
Enter: Arrow Flight SQL
Confidential - Do Not Share or Distribute
16
● Client-server protocol for interacting with SQL databases built on Arrow Flight
○ Allows databases to use Arrow Flight as the transport protocol
○ Leverage the performance of Arrow and Flight for database access
● Set of commands to standardize a SQL interface on Flight:
○ Query execution
○ Prepared statements
○ Database catalog metadata (tables, columns, data types).
○ SQL syntax capabilities
● Language and database-agnostic
○ A single set of client libraries to connect to any Flight SQL server
○ Clients can talk to any database implementing the protocol
○ No need for a client-side driver per database
What is Arrow Flight SQL?
Confidential - Do Not Share or Distribute
Client Database
2 - FlightInfo<Schema, Endpoints>
1 - GetFlightInfo(GetTables)
4 - Arrow record batches
3 - DoGet(<ticket>)
6 - FlightInfo<Schema, Endpoints>
5 - GetFlightInfo(CommandStatementQuery)
7 - DoGet(<ticket>)
Memory
Listing Tables:
Querying a
table:
Memory
8 - Arrow record batches
Confidential - Do Not Share or Distribute
Queries:
● CommandStatementQuery
● CommandStatementUpdate
Prepared statements:
● ActionClosePreparedStatementRequest
● ActionCreatePreparedStatementRequest
● CommandPreparedStatementQuery
● CommandPreparedStatementUpdate
Metadata requests:
● CommandGetCatalogs
● CommandGetCrossReference
● CommandGetDbSchemas
● CommandGetExportedKeys
● CommandGetImportedKeys
● CommandGetPrimaryKeys
● CommandGetSqlInfo
● CommandGetTables
● CommandGetTableTypes
Arrow Flight SQL Commands
Confidential - Do Not Share or Distribute
Isn’t Arrow Flight SQL Enough?
JDBC/ODBC Custom protocols Arrow Flight Arrow Flight SQL
Standardize client interface ✅ ❌ ✅ ✅
Standardize client database
access interface
✅ ❌ ❌ ✅
Standardize server interface ❌ ❌ ✅ ✅
Prevent unnecessary serialization ❌ ✅ ✅ ✅
Confidential - Do Not Share or Distribute
● Arrow Flight SQL is fairly new, will take time to be widely adopted by existing
client tools (especially the enterprise BI ones)
○ Most BI tools already support ODBC or JDBC
Isn’t Arrow Flight SQL Enough?
Enter: Arrow Flight SQL ODBC & JDBC Drivers
Confidential - Do Not Share or Distribute
21
Arrow Flight SQL ODBC & JDBC Drivers
● ODBC & JDBC Drivers built on top of Flight SQL client libraries
○ A single driver to connect to any Flight SQL server
○ Supports arbitrary server-side options as connection properties
● The drivers will work with any ODBC/JDBC client tool without any code changes
● Completely open source
Confidential - Do Not Share or Distribute
22
Traditional ODBC/JDBC vs Arrow Flight SQL ODBC/JDBC
Traditional ODBC/JDBC Arrow Flight SQL ODBC/JDBC
Driver Management
Performance
install and manage a driver
for each database
row-based transfer, so not
great for large amounts of
data
column-based transfer, so
benefit from compression of
data over the wire
install and manage
a single driver for all
databases
Confidential - Do Not Share or Distribute
When wouldn’t I use Arrow Flight ODBC/JDBC Drivers?
● Generally preferable to have native Flight SQL applications
○ Easier to harness future features like multiple endpoints
○ Better performance if your client is column-based
Client is row-based
Client is column-based
Network
gRPC
Arrow
Arrow Flight
Arrow Flight SQL
JDBC
Legacy
Client
SQL Native
Client
Non-SQL
Client
Confidential - Do Not Share or Distribute
What if we could have for our data transfers…
✓ Standardized APIs
○ Arrow Flight for arbitrary tabular transfers, Arrow Flight SQL for transfers from databases
✓ No need for many different drivers
○ Both client and server side APIs are standardized
✓ High performance transfers
○ Transfers via highly optimized Arrow format
✓ Parallel data transfers (1:many, many:many)
○ Lays the foundation for parallel data transfers
✓ Easy to implement on server side, easy to use on client side
○ Server-side implementation is as easy as implementing a few RPC request handlers
○ A single set of client libraries to connect to any Flight SQL server
Confidential - Do Not Share or Distribute
Q&A
Confidential - Do Not Share or Distribute
Thanks!
jason@dremio.com

Contenu connexe

Similaire à Data Engineer's Lunch #77: Apache Arrow Flight SQL: A Universal Standard for High-Performance Data Transfers from Data

Presto: Query Anything - Data Engineer’s perspective
Presto: Query Anything - Data Engineer’s perspectivePresto: Query Anything - Data Engineer’s perspective
Presto: Query Anything - Data Engineer’s perspectiveAlluxio, Inc.
 
Introduction to ClustrixDB
Introduction to ClustrixDBIntroduction to ClustrixDB
Introduction to ClustrixDBI Goo Lee
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksDatabricks
 
Serverless Data Platform
Serverless Data PlatformServerless Data Platform
Serverless Data PlatformShu-Jeng Hsieh
 
introduction to micro services
introduction to micro servicesintroduction to micro services
introduction to micro servicesSpyros Lambrinidis
 
Bootify Yyour App from Zero to Hero
Bootify Yyour App from Zero to HeroBootify Yyour App from Zero to Hero
Bootify Yyour App from Zero to HeroEPAM
 
Introduction to Datasource V2 API
Introduction to Datasource V2 APIIntroduction to Datasource V2 API
Introduction to Datasource V2 APIdatamantra
 
Introdcution to Azure
Introdcution to AzureIntrodcution to Azure
Introdcution to AzureOmid Vahdaty
 
Defrag 2014 - Blend Web IDEs, Open Source and PaaS to Create and Deploy APIs
Defrag 2014 - Blend Web IDEs, Open Source and PaaS to Create and Deploy APIsDefrag 2014 - Blend Web IDEs, Open Source and PaaS to Create and Deploy APIs
Defrag 2014 - Blend Web IDEs, Open Source and PaaS to Create and Deploy APIsRestlet
 
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache CassandraApache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache CassandraAnant Corporation
 
Normalizing x pages web development
Normalizing x pages web development Normalizing x pages web development
Normalizing x pages web development Shean McManus
 
Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...DataWorks Summit
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flinkdatamantra
 
MongoDB 4.0 새로운 기능 소개
MongoDB 4.0 새로운 기능 소개MongoDB 4.0 새로운 기능 소개
MongoDB 4.0 새로운 기능 소개Ha-Yang(White) Moon
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauSam Palani
 
Ghost Environment
Ghost EnvironmentGhost Environment
Ghost EnvironmentPratipD
 
Provisioning infrastructure to AWS using Terraform – Exove
Provisioning infrastructure to AWS using Terraform – ExoveProvisioning infrastructure to AWS using Terraform – Exove
Provisioning infrastructure to AWS using Terraform – ExoveExove
 

Similaire à Data Engineer's Lunch #77: Apache Arrow Flight SQL: A Universal Standard for High-Performance Data Transfers from Data (20)

Presto: Query Anything - Data Engineer’s perspective
Presto: Query Anything - Data Engineer’s perspectivePresto: Query Anything - Data Engineer’s perspective
Presto: Query Anything - Data Engineer’s perspective
 
Introduction to ClustrixDB
Introduction to ClustrixDBIntroduction to ClustrixDB
Introduction to ClustrixDB
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
Serverless Data Platform
Serverless Data PlatformServerless Data Platform
Serverless Data Platform
 
introduction to micro services
introduction to micro servicesintroduction to micro services
introduction to micro services
 
Vue d'ensemble Dremio
Vue d'ensemble DremioVue d'ensemble Dremio
Vue d'ensemble Dremio
 
Bootify Yyour App from Zero to Hero
Bootify Yyour App from Zero to HeroBootify Yyour App from Zero to Hero
Bootify Yyour App from Zero to Hero
 
Introduction to Datasource V2 API
Introduction to Datasource V2 APIIntroduction to Datasource V2 API
Introduction to Datasource V2 API
 
Introdcution to Azure
Introdcution to AzureIntrodcution to Azure
Introdcution to Azure
 
Defrag 2014 - Blend Web IDEs, Open Source and PaaS to Create and Deploy APIs
Defrag 2014 - Blend Web IDEs, Open Source and PaaS to Create and Deploy APIsDefrag 2014 - Blend Web IDEs, Open Source and PaaS to Create and Deploy APIs
Defrag 2014 - Blend Web IDEs, Open Source and PaaS to Create and Deploy APIs
 
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache CassandraApache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
 
Normalizing x pages web development
Normalizing x pages web development Normalizing x pages web development
Normalizing x pages web development
 
Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
 
Red Hat Storage Roadmap
Red Hat Storage RoadmapRed Hat Storage Roadmap
Red Hat Storage Roadmap
 
Red Hat Storage Roadmap
Red Hat Storage RoadmapRed Hat Storage Roadmap
Red Hat Storage Roadmap
 
MongoDB 4.0 새로운 기능 소개
MongoDB 4.0 새로운 기능 소개MongoDB 4.0 새로운 기능 소개
MongoDB 4.0 새로운 기능 소개
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
 
Ghost Environment
Ghost EnvironmentGhost Environment
Ghost Environment
 
Provisioning infrastructure to AWS using Terraform – Exove
Provisioning infrastructure to AWS using Terraform – ExoveProvisioning infrastructure to AWS using Terraform – Exove
Provisioning infrastructure to AWS using Terraform – Exove
 

Plus de Anant Corporation

QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137Anant Corporation
 
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdfKono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdfAnant Corporation
 
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache PinotData Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache PinotAnant Corporation
 
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...Anant Corporation
 
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAutomate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAnant Corporation
 
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapAnant Corporation
 
Machine Learning Orchestration with Airflow
Machine Learning Orchestration with AirflowMachine Learning Orchestration with Airflow
Machine Learning Orchestration with AirflowAnant Corporation
 
Cassandra Lunch 130: Recap of Cassandra Forward Talks
Cassandra Lunch 130: Recap of Cassandra Forward TalksCassandra Lunch 130: Recap of Cassandra Forward Talks
Cassandra Lunch 130: Recap of Cassandra Forward TalksAnant Corporation
 
Data Engineer's Lunch 90: Migrating SQL Data with Arcion
Data Engineer's Lunch 90: Migrating SQL Data with ArcionData Engineer's Lunch 90: Migrating SQL Data with Arcion
Data Engineer's Lunch 90: Migrating SQL Data with ArcionAnant Corporation
 
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...Anant Corporation
 
Cassandra Lunch 129: What’s New: Apache Cassandra 4.1+ Features & Future
Cassandra Lunch 129: What’s New:  Apache Cassandra 4.1+ Features & FutureCassandra Lunch 129: What’s New:  Apache Cassandra 4.1+ Features & Future
Cassandra Lunch 129: What’s New: Apache Cassandra 4.1+ Features & FutureAnant Corporation
 
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...Anant Corporation
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackAnant Corporation
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergData Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergAnant Corporation
 
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOps
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOpsApache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOps
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOpsAnant Corporation
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Anant Corporation
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessAnant Corporation
 
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data PlatformsData Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data PlatformsAnant Corporation
 

Plus de Anant Corporation (20)

QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
 
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdfKono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
 
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache PinotData Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot
 
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
 
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAutomate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
 
YugabyteDB Developer Tools
YugabyteDB Developer ToolsYugabyteDB Developer Tools
YugabyteDB Developer Tools
 
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
 
Machine Learning Orchestration with Airflow
Machine Learning Orchestration with AirflowMachine Learning Orchestration with Airflow
Machine Learning Orchestration with Airflow
 
Cassandra Lunch 130: Recap of Cassandra Forward Talks
Cassandra Lunch 130: Recap of Cassandra Forward TalksCassandra Lunch 130: Recap of Cassandra Forward Talks
Cassandra Lunch 130: Recap of Cassandra Forward Talks
 
Data Engineer's Lunch 90: Migrating SQL Data with Arcion
Data Engineer's Lunch 90: Migrating SQL Data with ArcionData Engineer's Lunch 90: Migrating SQL Data with Arcion
Data Engineer's Lunch 90: Migrating SQL Data with Arcion
 
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
 
Cassandra Lunch 129: What’s New: Apache Cassandra 4.1+ Features & Future
Cassandra Lunch 129: What’s New:  Apache Cassandra 4.1+ Features & FutureCassandra Lunch 129: What’s New:  Apache Cassandra 4.1+ Features & Future
Cassandra Lunch 129: What’s New: Apache Cassandra 4.1+ Features & Future
 
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data Stack
 
CL 121
CL 121CL 121
CL 121
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergData Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
 
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOps
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOpsApache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOps
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOps
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
 
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data PlatformsData Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
 

Dernier

Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 

Dernier (20)

Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 

Data Engineer's Lunch #77: Apache Arrow Flight SQL: A Universal Standard for High-Performance Data Transfers from Data

  • 1. Confidential - Do Not Share or Distribute Apache Arrow Flight SQL A universal standard for high performance data transfers from databases Alex Merced Developer Advocate @ Dremio
  • 2. Confidential - Do Not Share or Distribute Agenda ● What if we could have for our data transfers… ● What options do we have? ● What is Apache Arrow? Is it enough? ● What is Arrow Flight? Is it enough? ● What is Arrow Flight SQL? Is it enough? ● What are the Arrow Flight SQL ODBC & JDBC Drivers?
  • 3. Confidential - Do Not Share or Distribute What if we could have for our data transfers… ✓ Standardized APIs ✓ No need for many different drivers ✓ High performance transfers ✓ Parallel data transfers (1:many, many:many) ✓ Easy to implement on server side, easy to use on client side
  • 4. Confidential - Do Not Share or Distribute ● Built in 1992 & 1997, respectively ○ Inherently row-based ○ Prevented the many:many of client-to-database-APIs ○ But resulted in one:many of API-to-database-protocols Why don’t ODBC & JDBC fit in today’s analytics world? Today Yesterday ● Today: heterogeneous big data workloads ● Most tools involved with OLAP workloads today are columnar ○ But there is no columnar transfer standard
  • 5. Confidential - Do Not Share or Distribute Increasingly, columnar → columnar is the standard Yesterday Today
  • 6. Confidential - Do Not Share or Distribute ● Custom protocols ○ One per server ○ Implemented individually by each client What options do we have to keep things columnar?
  • 7. Confidential - Do Not Share or Distribute Pick your poison Enter: Apache Arrow Flight JDBC/ODBC Custom protocol Standardize client interface ✅ ❌ Standardize client database access interface ✅ ❌ Standardize server interface ❌ ❌ Prevent unnecessary serialization ❌ ✅
  • 8. Confidential - Do Not Share or Distribute What if? Enter: Arrow Flight
  • 9. Confidential - Do Not Share or Distribute Before we go there: Apache Arrow
  • 10. Confidential - Do Not Share or Distribute Introduction to Apache Arrow ● An in-memory columnar format ○ Includes libraries for working with the format ■ E.g., computation engine, IPC, serialization / deserialization from file formats. ○ Support for many languages (12 currently) ● Co-created by Dremio and Wes McKinney ● Rapid adoption, popularity, and capability growth
  • 11. Confidential - Do Not Share or Distribute ● Arrow is primarily geared towards operations on data in a single-node ○ No specification for cross-network transfers ■ Within a cluster ■ Client-server 11 Isn’t Arrow enough? Enter: Arrow Flight
  • 12. Confidential - Do Not Share or Distribute 12 What is Arrow Flight? ● A general-purpose client-server framework to simplify high performance transport of large datasets over network interfaces ● Designed for data in the modern world ○ Column-oriented, rather than row-oriented ○ Arrow’s columnar format is designed for high compressibility and large numbers of rows ● Makes including the client-side in distributed computing easier ● Enables parallel data transfers
  • 13. Confidential - Do Not Share or Distribute 13 Parallel data transfers with Arrow Flight Client Node 2 Node N Node 1 CPU memory CPU memory CPU memory CPU memory DoGet(<ticket>) DoGet(<ticket>) DoGet(<ticket>)
  • 14. Confidential - Do Not Share or Distribute Isn’t Arrow Flight Enough? JDBC/ODBC Custom protocols Arrow Flight Standardize client interface ✅ ❌ ✅ Standardize client database access interface ✅ ❌ ❌ Standardize server interface ❌ ❌ ✅ Prevent unnecessary serialization ❌ ✅ ✅
  • 15. Confidential - Do Not Share or Distribute Isn’t Arrow Flight Enough? ● Client sends an arbitrary byte stream, server sends a result ○ The byte stream only has meaning for a particular server that’s chosen to support it ○ Example – Dremio interprets the byte stream to be a UTF-8 encoded SQL query string ● Flight is meant to serve any tabular data, not databases in particular ○ Catalog information is not part of Arrow Flight’s design ● Haven’t we seen these problems before? ○ ODBC and JDBC standardize database activities (e.g., query execution, catalog access) Enter: Arrow Flight SQL
  • 16. Confidential - Do Not Share or Distribute 16 ● Client-server protocol for interacting with SQL databases built on Arrow Flight ○ Allows databases to use Arrow Flight as the transport protocol ○ Leverage the performance of Arrow and Flight for database access ● Set of commands to standardize a SQL interface on Flight: ○ Query execution ○ Prepared statements ○ Database catalog metadata (tables, columns, data types). ○ SQL syntax capabilities ● Language and database-agnostic ○ A single set of client libraries to connect to any Flight SQL server ○ Clients can talk to any database implementing the protocol ○ No need for a client-side driver per database What is Arrow Flight SQL?
  • 17. Confidential - Do Not Share or Distribute Client Database 2 - FlightInfo<Schema, Endpoints> 1 - GetFlightInfo(GetTables) 4 - Arrow record batches 3 - DoGet(<ticket>) 6 - FlightInfo<Schema, Endpoints> 5 - GetFlightInfo(CommandStatementQuery) 7 - DoGet(<ticket>) Memory Listing Tables: Querying a table: Memory 8 - Arrow record batches
  • 18. Confidential - Do Not Share or Distribute Queries: ● CommandStatementQuery ● CommandStatementUpdate Prepared statements: ● ActionClosePreparedStatementRequest ● ActionCreatePreparedStatementRequest ● CommandPreparedStatementQuery ● CommandPreparedStatementUpdate Metadata requests: ● CommandGetCatalogs ● CommandGetCrossReference ● CommandGetDbSchemas ● CommandGetExportedKeys ● CommandGetImportedKeys ● CommandGetPrimaryKeys ● CommandGetSqlInfo ● CommandGetTables ● CommandGetTableTypes Arrow Flight SQL Commands
  • 19. Confidential - Do Not Share or Distribute Isn’t Arrow Flight SQL Enough? JDBC/ODBC Custom protocols Arrow Flight Arrow Flight SQL Standardize client interface ✅ ❌ ✅ ✅ Standardize client database access interface ✅ ❌ ❌ ✅ Standardize server interface ❌ ❌ ✅ ✅ Prevent unnecessary serialization ❌ ✅ ✅ ✅
  • 20. Confidential - Do Not Share or Distribute ● Arrow Flight SQL is fairly new, will take time to be widely adopted by existing client tools (especially the enterprise BI ones) ○ Most BI tools already support ODBC or JDBC Isn’t Arrow Flight SQL Enough? Enter: Arrow Flight SQL ODBC & JDBC Drivers
  • 21. Confidential - Do Not Share or Distribute 21 Arrow Flight SQL ODBC & JDBC Drivers ● ODBC & JDBC Drivers built on top of Flight SQL client libraries ○ A single driver to connect to any Flight SQL server ○ Supports arbitrary server-side options as connection properties ● The drivers will work with any ODBC/JDBC client tool without any code changes ● Completely open source
  • 22. Confidential - Do Not Share or Distribute 22 Traditional ODBC/JDBC vs Arrow Flight SQL ODBC/JDBC Traditional ODBC/JDBC Arrow Flight SQL ODBC/JDBC Driver Management Performance install and manage a driver for each database row-based transfer, so not great for large amounts of data column-based transfer, so benefit from compression of data over the wire install and manage a single driver for all databases
  • 23. Confidential - Do Not Share or Distribute When wouldn’t I use Arrow Flight ODBC/JDBC Drivers? ● Generally preferable to have native Flight SQL applications ○ Easier to harness future features like multiple endpoints ○ Better performance if your client is column-based Client is row-based Client is column-based Network gRPC Arrow Arrow Flight Arrow Flight SQL JDBC Legacy Client SQL Native Client Non-SQL Client
  • 24. Confidential - Do Not Share or Distribute What if we could have for our data transfers… ✓ Standardized APIs ○ Arrow Flight for arbitrary tabular transfers, Arrow Flight SQL for transfers from databases ✓ No need for many different drivers ○ Both client and server side APIs are standardized ✓ High performance transfers ○ Transfers via highly optimized Arrow format ✓ Parallel data transfers (1:many, many:many) ○ Lays the foundation for parallel data transfers ✓ Easy to implement on server side, easy to use on client side ○ Server-side implementation is as easy as implementing a few RPC request handlers ○ A single set of client libraries to connect to any Flight SQL server
  • 25. Confidential - Do Not Share or Distribute Q&A
  • 26. Confidential - Do Not Share or Distribute Thanks! jason@dremio.com