SlideShare a Scribd company logo
1 of 12
Why is Azure Data Explorer
fast in petabyte-scale
analytics?
www.linkedin.com/in/sheik-uduman-ali-m-54b5ab8
https://technicallysheik.com
Understand how its data storage architecture
makes this possible
sheikudumanali@gmail.com
Sheik (technicallysheik.com)
Azure Data Explorer (ADX)
• Managed large scale big data analytics platform
• Suitable for use cases that have high volume and variety of data ingestion at high velocity
• Internet of things – device telemetry data
• Timeseries data
• Log analytics
• Geo-spatial
• Big data analytics
• Variety of connectors available to ingest data from various sources including streaming data
• Simple query language even for complex data analytics
• Built-in data visualization and native support to Power BI and Grafana
Ingest Analyze (Query) Visualize
Outperforms all industry leading big data analytics services on performance and pricing
Sheik (technicallysheik.com)
"TableName": StormEvents,
"Schema": StartTime:datetime,EndTime:datetime,EpisodeId:int,EventId:int,
State:string,EventType:string,InjuriesDirect:int,InjuriesIndirect:int,
DeathsDirect:int,DeathsIndirect:int,DamageProperty:int,DamageCrops:int,
Source:string,BeginLocation:string,EndLocation:string,BeginLat:real,BeginLon:real,
EndLat:real,EndLon:real,EpisodeNarrative:string,EventNarrative:string,
StormSummary:dynamic,
"DatabaseName": Samples,
"Folder": Storm_Events,
"DocString": US storm events. Data source: https://www.ncdc.noaa.gov/stormevents
StormEvents - Sample table
let us take StormEvents table as a sample
this table contains 22 columns of information on US storm events
Sheik (technicallysheik.com)
"StartTime": 2007-09-18T20:00:00Z,
"EndTime": 2007-09-19T18:00:00Z,
"EpisodeId": 11074,
"EventId": 60904,
"State": FLORIDA,
"EventType": Heavy Rain,
"InjuriesDirect": 0,
"InjuriesIndirect": 0,
"DeathsDirect": 0,
"DeathsIndirect": 0,
"DamageProperty": 0,
"DamageCrops": 0,
"Source": Trained Spotter,
"BeginLocation": ORMOND BEACH,
"EndLocation": NEW SMYRNA BEACH,
"BeginLat": 29.28,
"BeginLon": -81.05,
"EndLat": 29.02,
"EndLon": -80.93,
"EpisodeNarrative": Thunderstorms lingered over Volusia County.,
"EventNarrative": As much as 9 inches of rain fell in a 24-hour period across parts of coastal Volusia County.,
"StormSummary": {
"TotalDamages": 0,
"StartTime": "2007-09-18T20:00:00.0000000Z",
"EndTime": "2007-09-19T18:00:00.0000000Z",
"Details": {
"Description": "As much as 9 inches of rain fell in a 24-hour period across parts of coastal Volusia County.",
"Location": "FLORIDA"
}
}
Sample record
Sheik (technicallysheik.com)
ADX
Storage
Columnar
Store
text
inverted
index
data shard
/ extent
Key tenets of ADX data store
Sheik (technicallysheik.com)
Columnar Store
stores the values from each column together
in separate files per column
instead of storing all the values from a row together
To return a row as a result of a query, it needs
to fetch corresponding position from each
column storage files
append only WRITE operation of ADX helps use
of this storage format
consider StormEvent table data
Sheik (technicallysheik.com)
Advantages of Columnar Store - 1
StormEvents
| take 5
| project StartTime, EndTime, EventType, State;
high query performance
among multiple columns, projection of few columns needs
less disk scans instead of searching all rows in the table
StormEvents
| summarize StormCount = count(),
TypeOfStorms = dcount(EventType) by State
| top 5 by StormCount desc
high performant
aggregation queries
as an immutable data nature, results can be cached
particularly aggregations.
Sheik (technicallysheik.com)
Advantages of Columnar Store - 2
Column compression compressed column storage on disk improves throughput.
by default ADX uses LZ4 compression
StormEvents
| where EventType =="Flood"
| summarize EventCount = count() by State
| where EventCount > 100
queries with WHERE predicate performs well
because the columns contain the rows in the same order
and compression improves disk I/O
vectorized processing
with the compressed columns, when a query needs to
fetch data from disk to apply projection or predicates may
fit into L1 cache itself that avoids unnecessary
memory and disk I/O
Memory
L1
Sheik (technicallysheik.com)
Extent or Shard
Shard 1 Shard 2 Shard 3
StartTime
EndTime
EpisodeId
EventId
State
EventType
StartTime Index
EndTime Index
EpisodeId Index
EventId Index
State Index
EventType Index
Table
An extent or shard holds a collection of records
that are physically arranged in columns
Shard 1 holds StartTime and EndTime
columns collection of records
A shard contains data, metadata and index
All columns are indexed
Sheik (technicallysheik.com)
Shard on both Ingestion and Queries
Shard 1
Shard 2
Shard 3
Table
Data
Ingestion
Cluster Node 1
Cluster Node 2
Distributed
Query
Engine
Query
Shards are evenly spread across the cluster nodes based on the partition key.
By default, ingestion time is the partition key
immutable nature, data
stored in both memory
and SSD
A query will be
distributed across
the nodes and run
concurrently
Distributed
Query
Plan
append only write with effective use of
free-text inverted indexing
A query result will
be fetched from
more than one
shards
ingest into Table
r1:= (c1, c2, c3, …, cn)
append c1, c2
append c3, c4, c5
append cn
query result
r1:= (c1, c8)
return c8
query
return c1
Sheik (technicallysheik.com)
Advantages of Shards
• Scale-out nature of sharding allows to effectively use computing on all nodes that
improves query performance
• Petabyte scale of ingestion and storage is very fast and reliable
Sheik (technicallysheik.com)
Closing Note
• The columnar store, column compression, inverted text index and data shard are the
key tenets of ADX to perform well on petabyte-scale analytics queries
• Immutable records with caching benefit makes your data analytics faster
• Materialized View and Query Result Cache are other ADX features that improves the
performance of data analytics

More Related Content

Similar to Why is Azure Data Explorer fast in petabyte-scale analytics?

2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptxAndrew Lamb
 
Making sense of your data jug
Making sense of your data   jugMaking sense of your data   jug
Making sense of your data jugGerald Muecke
 
Azure Cosmos DB - The Swiss Army NoSQL Cloud Database
Azure Cosmos DB - The Swiss Army NoSQL Cloud DatabaseAzure Cosmos DB - The Swiss Army NoSQL Cloud Database
Azure Cosmos DB - The Swiss Army NoSQL Cloud DatabaseBizTalk360
 
IBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data LakeIBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data LakeTorsten Steinbach
 
Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview Amazon Web Services
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
Think Like Spark: Some Spark Concepts and a Use Case
Think Like Spark: Some Spark Concepts and a Use CaseThink Like Spark: Some Spark Concepts and a Use Case
Think Like Spark: Some Spark Concepts and a Use CaseRachel Warren
 
Writing Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark APIWriting Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark APIDatabricks
 
The life of a query (oracle edition)
The life of a query (oracle edition)The life of a query (oracle edition)
The life of a query (oracle edition)maclean liu
 
Apache IOTDB: a Time Series Database for Industrial IoT
Apache IOTDB: a Time Series Database for Industrial IoTApache IOTDB: a Time Series Database for Industrial IoT
Apache IOTDB: a Time Series Database for Industrial IoTjixuan1989
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMark Kromer
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...Andrew Lamb
 
Supercharging the Value of Your Data with Amazon S3
Supercharging the Value of Your Data with Amazon S3Supercharging the Value of Your Data with Amazon S3
Supercharging the Value of Your Data with Amazon S3Amazon Web Services
 
Writing Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySparkWriting Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySparkDatabricks
 
Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustStructuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustSpark Summit
 

Similar to Why is Azure Data Explorer fast in petabyte-scale analytics? (20)

2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptx
 
Making sense of your data jug
Making sense of your data   jugMaking sense of your data   jug
Making sense of your data jug
 
Azure Cosmos DB - The Swiss Army NoSQL Cloud Database
Azure Cosmos DB - The Swiss Army NoSQL Cloud DatabaseAzure Cosmos DB - The Swiss Army NoSQL Cloud Database
Azure Cosmos DB - The Swiss Army NoSQL Cloud Database
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
IBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data LakeIBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data Lake
 
Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Think Like Spark: Some Spark Concepts and a Use Case
Think Like Spark: Some Spark Concepts and a Use CaseThink Like Spark: Some Spark Concepts and a Use Case
Think Like Spark: Some Spark Concepts and a Use Case
 
Writing Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark APIWriting Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark API
 
The life of a query (oracle edition)
The life of a query (oracle edition)The life of a query (oracle edition)
The life of a query (oracle edition)
 
Apache IOTDB: a Time Series Database for Industrial IoT
Apache IOTDB: a Time Series Database for Industrial IoTApache IOTDB: a Time Series Database for Industrial IoT
Apache IOTDB: a Time Series Database for Industrial IoT
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
 
MCT Virtual Summit 2021
MCT Virtual Summit 2021MCT Virtual Summit 2021
MCT Virtual Summit 2021
 
Supercharging the Value of Your Data with Amazon S3
Supercharging the Value of Your Data with Amazon S3Supercharging the Value of Your Data with Amazon S3
Supercharging the Value of Your Data with Amazon S3
 
Writing Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySparkWriting Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySpark
 
Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Big Data Tools in AWS
Big Data Tools in AWSBig Data Tools in AWS
Big Data Tools in AWS
 
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustStructuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
 

Recently uploaded

Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Valters Lauzums
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeBoston Institute of Analytics
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...yulianti213969
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...ThinkInnovation
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证pwgnohujw
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancingmohamed Elzalabany
 
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptx
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptxChapter 1 - Introduction to Data Mining Concepts and Techniques.pptx
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptxkusamee0
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjadimosmejiaslendon
 
Jual Obat Aborsi Lhokseumawe ( Asli No.1 ) 088980685493 Obat Penggugur Kandun...
Jual Obat Aborsi Lhokseumawe ( Asli No.1 ) 088980685493 Obat Penggugur Kandun...Jual Obat Aborsi Lhokseumawe ( Asli No.1 ) 088980685493 Obat Penggugur Kandun...
Jual Obat Aborsi Lhokseumawe ( Asli No.1 ) 088980685493 Obat Penggugur Kandun...Obat Aborsi 088980685493 Jual Obat Aborsi
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024patrickdtherriault
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunksgmuir1066
 
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive FutureFuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive FutureBoston Institute of Analytics
 
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单aqpto5bt
 
obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di Ban...
obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di  Ban...obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di  Ban...
obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di Ban...siskavia95
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Voces Mineras
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...ThinkInnovation
 

Recently uploaded (20)

Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptx
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptxChapter 1 - Introduction to Data Mining Concepts and Techniques.pptx
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptx
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
Jual Obat Aborsi Lhokseumawe ( Asli No.1 ) 088980685493 Obat Penggugur Kandun...
Jual Obat Aborsi Lhokseumawe ( Asli No.1 ) 088980685493 Obat Penggugur Kandun...Jual Obat Aborsi Lhokseumawe ( Asli No.1 ) 088980685493 Obat Penggugur Kandun...
Jual Obat Aborsi Lhokseumawe ( Asli No.1 ) 088980685493 Obat Penggugur Kandun...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
 
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive FutureFuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
 
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
 
obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di Ban...
obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di  Ban...obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di  Ban...
obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di Ban...
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 

Why is Azure Data Explorer fast in petabyte-scale analytics?

  • 1. Why is Azure Data Explorer fast in petabyte-scale analytics? www.linkedin.com/in/sheik-uduman-ali-m-54b5ab8 https://technicallysheik.com Understand how its data storage architecture makes this possible sheikudumanali@gmail.com
  • 2. Sheik (technicallysheik.com) Azure Data Explorer (ADX) • Managed large scale big data analytics platform • Suitable for use cases that have high volume and variety of data ingestion at high velocity • Internet of things – device telemetry data • Timeseries data • Log analytics • Geo-spatial • Big data analytics • Variety of connectors available to ingest data from various sources including streaming data • Simple query language even for complex data analytics • Built-in data visualization and native support to Power BI and Grafana Ingest Analyze (Query) Visualize Outperforms all industry leading big data analytics services on performance and pricing
  • 3. Sheik (technicallysheik.com) "TableName": StormEvents, "Schema": StartTime:datetime,EndTime:datetime,EpisodeId:int,EventId:int, State:string,EventType:string,InjuriesDirect:int,InjuriesIndirect:int, DeathsDirect:int,DeathsIndirect:int,DamageProperty:int,DamageCrops:int, Source:string,BeginLocation:string,EndLocation:string,BeginLat:real,BeginLon:real, EndLat:real,EndLon:real,EpisodeNarrative:string,EventNarrative:string, StormSummary:dynamic, "DatabaseName": Samples, "Folder": Storm_Events, "DocString": US storm events. Data source: https://www.ncdc.noaa.gov/stormevents StormEvents - Sample table let us take StormEvents table as a sample this table contains 22 columns of information on US storm events
  • 4. Sheik (technicallysheik.com) "StartTime": 2007-09-18T20:00:00Z, "EndTime": 2007-09-19T18:00:00Z, "EpisodeId": 11074, "EventId": 60904, "State": FLORIDA, "EventType": Heavy Rain, "InjuriesDirect": 0, "InjuriesIndirect": 0, "DeathsDirect": 0, "DeathsIndirect": 0, "DamageProperty": 0, "DamageCrops": 0, "Source": Trained Spotter, "BeginLocation": ORMOND BEACH, "EndLocation": NEW SMYRNA BEACH, "BeginLat": 29.28, "BeginLon": -81.05, "EndLat": 29.02, "EndLon": -80.93, "EpisodeNarrative": Thunderstorms lingered over Volusia County., "EventNarrative": As much as 9 inches of rain fell in a 24-hour period across parts of coastal Volusia County., "StormSummary": { "TotalDamages": 0, "StartTime": "2007-09-18T20:00:00.0000000Z", "EndTime": "2007-09-19T18:00:00.0000000Z", "Details": { "Description": "As much as 9 inches of rain fell in a 24-hour period across parts of coastal Volusia County.", "Location": "FLORIDA" } } Sample record
  • 6. Sheik (technicallysheik.com) Columnar Store stores the values from each column together in separate files per column instead of storing all the values from a row together To return a row as a result of a query, it needs to fetch corresponding position from each column storage files append only WRITE operation of ADX helps use of this storage format consider StormEvent table data
  • 7. Sheik (technicallysheik.com) Advantages of Columnar Store - 1 StormEvents | take 5 | project StartTime, EndTime, EventType, State; high query performance among multiple columns, projection of few columns needs less disk scans instead of searching all rows in the table StormEvents | summarize StormCount = count(), TypeOfStorms = dcount(EventType) by State | top 5 by StormCount desc high performant aggregation queries as an immutable data nature, results can be cached particularly aggregations.
  • 8. Sheik (technicallysheik.com) Advantages of Columnar Store - 2 Column compression compressed column storage on disk improves throughput. by default ADX uses LZ4 compression StormEvents | where EventType =="Flood" | summarize EventCount = count() by State | where EventCount > 100 queries with WHERE predicate performs well because the columns contain the rows in the same order and compression improves disk I/O vectorized processing with the compressed columns, when a query needs to fetch data from disk to apply projection or predicates may fit into L1 cache itself that avoids unnecessary memory and disk I/O Memory L1
  • 9. Sheik (technicallysheik.com) Extent or Shard Shard 1 Shard 2 Shard 3 StartTime EndTime EpisodeId EventId State EventType StartTime Index EndTime Index EpisodeId Index EventId Index State Index EventType Index Table An extent or shard holds a collection of records that are physically arranged in columns Shard 1 holds StartTime and EndTime columns collection of records A shard contains data, metadata and index All columns are indexed
  • 10. Sheik (technicallysheik.com) Shard on both Ingestion and Queries Shard 1 Shard 2 Shard 3 Table Data Ingestion Cluster Node 1 Cluster Node 2 Distributed Query Engine Query Shards are evenly spread across the cluster nodes based on the partition key. By default, ingestion time is the partition key immutable nature, data stored in both memory and SSD A query will be distributed across the nodes and run concurrently Distributed Query Plan append only write with effective use of free-text inverted indexing A query result will be fetched from more than one shards ingest into Table r1:= (c1, c2, c3, …, cn) append c1, c2 append c3, c4, c5 append cn query result r1:= (c1, c8) return c8 query return c1
  • 11. Sheik (technicallysheik.com) Advantages of Shards • Scale-out nature of sharding allows to effectively use computing on all nodes that improves query performance • Petabyte scale of ingestion and storage is very fast and reliable
  • 12. Sheik (technicallysheik.com) Closing Note • The columnar store, column compression, inverted text index and data shard are the key tenets of ADX to perform well on petabyte-scale analytics queries • Immutable records with caching benefit makes your data analytics faster • Materialized View and Query Result Cache are other ADX features that improves the performance of data analytics