29. 統計情報を利用したクエリの最適化(1/3)
Arrowのスキーマ定義を利用したArrow_Fdw外部テーブルの定義
postgres=# IMPORT FOREIGN SCHEMA flineorder_sort
FROM SERVER arrow_fdw INTO public
OPTIONS (file '/dev/shm/flineorder_sort.arrow');
IMPORT FOREIGN SCHEMA
Arrowのmin/max statisticsを利用した問合せ例
postgres=# EXPLAIN ANALYZE
SELECT count(*) FROM flineorder_sort
WHERE lo_orderpriority = '2-HIGH'
AND lo_orderdate BETWEEN 19940101 AND 19940630;
QUERY PLAN
------------------------------------------------------------------------------------------
Aggregate (cost=33143.09..33143.10 rows=1 width=8) (actual time=115.591..115.593 rows=1loops=1)
-> Custom Scan (GpuPreAgg) on flineorder_sort (cost=33139.52..33142.07 rows=204 width=8)
(actual time=115.580..115.585 rows=1 loops=1)
Reduction: NoGroup
Outer Scan: flineorder_sort (cost=4000.00..33139.42 rows=300 width=0)
(actual time=10.682..21.555 rows=2606170 loops=1)
Outer Scan Filter: ((lo_orderdate >= 19940101) AND
(lo_orderdate <= 19940630) AND (lo_orderpriority = '2-HIGH'::bpchar))
Rows Removed by Outer Scan Filter: 2425885
referenced: lo_orderdate, lo_orderpriority
Stats-Hint: (lo_orderdate >= 19940101), (lo_orderdate <= 19940630) [loaded: 2, skipped: 8]
files0: /dev/shm/flineorder_sort.arrow (read: 217.52MB, size: 2357.11MB)
Planning Time: 0.210 ms
Execution Time: 153.508 ms
(11 rows)
OSC/Kyoto Online 2021 - “爆速!”を実現する PG-Strom v3.0の新機能
29
30. 統計情報を利用したクエリの最適化(2/3)
列方向だけでなく、行方向での絞り込みと同義
Full Table Scan of Row-Data
(PostgreSQL Heap)
Full Table Scan of Column-Data
(Arrow_Fdw; no Stats-Hint)
Full Table Scan of Column-Data
with Min/Max Stats Hint
Only referenced
columns
ts BETWEEN ‘2021-04-01’
AND ‘2021-06-30’
OSC/Kyoto Online 2021 - “爆速!”を実現する PG-Strom v3.0の新機能
30
31. 統計情報を利用したクエリの最適化(3/3)
ログデータの“タイムスタンプ“に統計情報を付加するのがお勧め。
Pg2Arrowだけでなく、他のツールが生成した Arrow ファイルに、
後付けで統計情報を埋め込むためのツールを準備中。
列/行双方の絞り込みで、上手くハマれば強烈に速くなる
254.905
99.587
10.052
1.191
0
50
100
150
200
250
300
Query
Response
Time
[sec]
(shorter
is
better)
Query response time of SSBM Q1_2 [mod]
PostgreSQL v13.3 PG-Strom v3.1dev [Row-Data]
PG-Strom v3.1dev [Apache Arrow; No Stats Hint] PG-Strom v3.1dev [Apache Arrow; with Stats Hint]
select sum(lo_extendedprice*lo_discount) as revenue
from flineorder_sort, date1
where lo_orderdate = d_datekey
and d_yearmonthnum = 199401
and lo_discount between 4 and 6
and lo_quantity between 26 and 35;
select sum(lo_extendedprice*lo_discount) as revenue
from flineorder_sort, date1
where lo_orderdate = d_datekey
and d_yearmonthnum = 199401
and lo_discount between 4 and 6
and lo_quantity between 26 and 35;
and lo_orderdate between 19940101 and 19940131;
GPU-Direct SQLの効果
Apache Arrowの効果
min/max statisticsの効果
OSC/Kyoto Online 2021 - “爆速!”を実現する PG-Strom v3.0の新機能
31