Columnstore indexes are one of the most needed and welcomed features that have been implemented in SQL Server. These indexes have really changed our thinking about how effective and fast data reads can be. Imagine that operations on really big data sets can be done in subseconds! But to have this done right you have to care about your index especially in data loading phase. And this is what I would like to show you – how to make the ETL process optimal to make sure that the columnstore index will do its job really correctly.
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Columnstore indexes - best practices for the ETL process - Damian Widera
1. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Columnstore indexes – best
practices for the ETL process
Damian Widera
Microsoft Data Platform MVP
EUVIC
@damianwidera
http://sqlblog.com/blogs/damian_widera/default.aspx
5. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals
EUVIC
PALO ALTO
NOWY JORK
WARSZAWA
KATOWICE
GLIWICE
BIELSKO BIAŁA
WROCŁAW
CZĘSTOCHOWA
GDYNIA
KRAKÓW
BYDGOSZCZ
WIEDEŃ
BIAŁYSTOK
7. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• Introduction to CI
• Three important views at the Clustered Columnstore
Index:
– How to load data efficiently
– How to use the index efficiently
– How to maintain it efficiently
• Internals....
What and how?
8. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Anatomy of a columnstore index
• Traditional (rowstore) clustered
index
Saledate Product Amt GrossPrice SalesTax NetPrice ...
2012-03-08 Candy bar 50 75.00 14.25 89.25 ...
2012-03-10 Smart phone 1 349.50 66.41 419.91 ...
2012-03-11 Apple (bag) 7 31.57 1.89 33.46 ...
2012-03-12 Smart phone 1 349.50 66.41 419.91 ...
2012-03-19 Chair 1 599.50 113.91 713.41 ...
2012-03-20 Chair 3 1,798.50 341.72 2,140.22 ...
2012-03-20 Laptop 2 2,860.00 543.40 3,403.40 ...
2012-03-20 Toy car 3 29.97 5.69 35.66 ...
2012-03-21 Apple (bag) 14 63.14 3.79 66.93 ...
2012-03-24 Pocket knife 1 12.95 2.46 15.41 ...
2012-03-27 Apple (bag) 2 9.02 0.54 9.56 ...
2012-03-28 Candy bar 5 7.50 1.43 8.93 ...
10. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals
How Row Mode Works
• Each operator calls child for each row to
“pull” the next row
• Works fine for smaller queries
• Often each operator transition causes L2
cache misses to load instructions/data
• When databases were new, the cost of IO
was MUCH larger than CPU speed and
this never mattered
• Now the equation has changed
Project
Filter
Table Scan
GetRow()…(row returned)
11. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Saledate Product Amt GrossPrice SalesTax NetPrice ...
2012-03-08 Candy bar 50 75.00 14.25 89.25 ...
2012-03-10 Smart phone 1 349.50 66.41 419.91 ...
2012-03-11 Apple (bag) 7 31.57 1.89 33.46 ...
2012-03-12 Smart phone 1 349.50 66.41 419.91 ...
2012-03-19 Chair 1 599.50 113.91 713.41 ...
2012-03-20 Chair 3 1,798.50 341.72 2,140.22 ...
2012-03-20 Laptop 2 2,860.00 543.40 3,403.40 ...
2012-03-20 Toy car 3 29.97 5.69 35.66 ...
2012-03-21 Apple (bag) 14 63.14 3.79 66.93 ...
2012-03-24 Pocket knife 1 12.95 2.46 15.41 ...
2012-03-27 Apple (bag) 2 9.02 0.54 9.56 ...
2012-03-28 Candy bar 5 7.50 1.43 8.93 ...
Anatomy of a columnstore index
• Columnstore index
Saledate
2012-03-08
2012-03-10
2012-03-11
2012-03-12
2012-03-19
2012-03-20
2012-03-20
2012-03-20
2012-03-21
2012-03-24
2012-03-27
2012-03-28
1millionrowchunks
Storage in
LOB pages
12. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Saledate Product
2012-03-08 Candy bar
2012-03-10 Smart phone
2012-03-11 Apple (bag)
2012-03-12 Smart phone
2012-03-19 Chair
2012-03-20 Chair
2012-03-20 Laptop
2012-03-20 Toy car
2012-03-21 Apple (bag)
2012-03-24 Pocket knife
2012-03-27 Apple (bag)
2012-03-28 Candy bar
Anatomy of a columnstore index
• Nonclustered columnstore index
1
13. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Anatomy of a columnstore index
• Nonclustered columnstore index
1
Saledate Product Amt
2012-03-08 Candy bar 50
2012-03-10 Smart phone 1
2012-03-11 Apple (bag) 7
2012-03-12 Smart phone 1
2012-03-19 Chair 1
2012-03-20 Chair 3
2012-03-20 Laptop 2
2012-03-20 Toy car 3
2012-03-21 Apple (bag) 14
2012-03-24 Pocket knife 1
2012-03-27 Apple (bag) 2
2012-03-28 Candy bar 5
14. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Anatomy of a columnstore index
• Nonclustered columnstore index
1
Saledate Product Amt GrossPrice
2012-03-08 Candy bar 50 75.00
2012-03-10 Smart phone 1 349.50
2012-03-11 Apple (bag) 7 31.57
2012-03-12 Smart phone 1 349.50
2012-03-19 Chair 1 599.50
2012-03-20 Chair 3 1,798.50
2012-03-20 Laptop 2 2,860.00
2012-03-20 Toy car 3 29.97
2012-03-21 Apple (bag) 14 63.14
2012-03-24 Pocket knife 1 12.95
2012-03-27 Apple (bag) 2 9.02
2012-03-28 Candy bar 5 7.50
15. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Anatomy of a columnstore index
• Nonclustered columnstore index
1
Saledate Product Amt GrossPrice SalesTax
2012-03-08 Candy bar 50 75.00 14.25
2012-03-10 Smart phone 1 349.50 66.41
2012-03-11 Apple (bag) 7 31.57 1.89
2012-03-12 Smart phone 1 349.50 66.41
2012-03-19 Chair 1 599.50 113.91
2012-03-20 Chair 3 1,798.50 341.72
2012-03-20 Laptop 2 2,860.00 543.40
2012-03-20 Toy car 3 29.97 5.69
2012-03-21 Apple (bag) 14 63.14 3.79
2012-03-24 Pocket knife 1 12.95 2.46
2012-03-27 Apple (bag) 2 9.02 0.54
2012-03-28 Candy bar 5 7.50 1.43
16. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Anatomy of a columnstore index
• Nonclustered columnstore index
1
Saledate Product Amt GrossPrice SalesTax NetPrice
2012-03-08 Candy bar 50 75.00 14.25 89.25
2012-03-10 Smart phone 1 349.50 66.41 419.91
2012-03-11 Apple (bag) 7 31.57 1.89 33.46
2012-03-12 Smart phone 1 349.50 66.41 419.91
2012-03-19 Chair 1 599.50 113.91 713.41
2012-03-20 Chair 3 1,798.50 341.72 2,140.22
2012-03-20 Laptop 2 2,860.00 543.40 3,403.40
2012-03-20 Toy car 3 29.97 5.69 35.66
2012-03-21 Apple (bag) 14 63.14 3.79 66.93
2012-03-24 Pocket knife 1 12.95 2.46 15.41
2012-03-27 Apple (bag) 2 9.02 0.54 9.56
2012-03-28 Candy bar 5 7.50 1.43 8.93
17. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals
An Aside…How CPUs Work
Level 3 Cache (Megabytes)
Level 2 Cache
(100s Kilobytes)
L1 Data
(32KB)
CPU Core
L1 Instr
(32KB)
• Modern CPUs have Multiple Cores
• Cache Hierarchies: L1, L2, L3
– Small L1 and L2 per core; L3 shared by all cores on die
– L1 is faster than L2, L2 faster than L3
– CPUs can stall waiting for caches to load
Level 2 Cache
(100s Kilobytes)
L1 Data
(32KB)
CPU Core
L1 Instr
(32KB)
Time to Access
Increases each
level you need to
touch!
18. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Batch Model
• Move from “pull” model to “push”
• Group rows into batches
– Re-use instructions while in cache
– Touch all “close” data in each operator
• This model reduces L2 cache misses
• It works best for queries with lots of
rows being processed
Project
Filter
Table Scan
ProcessBatch()
19. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals
…
C1 C2 C3 C5C4
Benefits:
• Improved compression:
Data from same domain compress
better
• Reduced I/O:
Fetch only columns needed
• Improved Performance:
More data fits in memory
Data stored as rows
Columnstore Refresher = > howisitdifferent?
Data stored as columns
20. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals
ColumnStore Terminology
C1 C2 C3 C5 C6C4
Row Group
Column Segment
• Column Segment
– contains values from one column for a set of rows
• Row Group
– Segments for the same set of rows comprise a row group
• Segments are compressed
• Each segment stored in a separate LOB
• Segment is unit of transfer between disk and memory
25. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals
How to load data to the CCI and not get into the troubles
Initial situation: Table is a Heap
– (1) Use INSERT .... SELECT and then create CCI
– (2) Use BULK LOAD and then create CCI
– (3) Use SELECT * INTO and then create CCI
Initial situation: Table already has a CCI
– (1) Use INSERT .... SELECT
– (2) Use BULK LOAD
27. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• The „Magic Number” described by Niko
Neugebauer – 102400
• There is also another magic number: 1048576
How to load data to the CCI – BONUS!!!
29. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals
How to use the index
• Don’t use it in OLTP scenario – but WHY NOT????
• Update or Insert + Delete?
• What about transaction support?
• Partitioning
31. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals
How to maintain the index
• Tupple mover revealed
• Reorganize or rebuild the index ?
• Extended events – great monitoring „tool”
32. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• How to make use of the DBCC commands for the CCI ?
• Where is my memory?
• What about memory grants?
• What about memory pressure?
• What about the transaction log usage?
Internals
33. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Resources
• Niko Neugebauer: http://www.nikoport.com/columnstore/
• Benjamin Nevarez: http://www.benjaminnevarez.com/
• Paul White: http://sqlblog.com/blogs/paul_white/
• Remus Rusanu: http://rusanu.com/
• Hugo Kornelis: http://sqlblog.com/blogs/hugo_kornelis/
• Joe Sack: http://www.sqlskills.com/blogs/joe
• Sunil Agarwal
http://blogs.msdn.microsoft.com/sqlserverstorageengine