This document discusses PostgreSQL statistics and how to use them effectively. It provides an overview of various PostgreSQL statistics sources like views, functions and third-party tools. It then demonstrates how to analyze specific statistics like those for databases, tables, indexes, replication and query activity to identify anomalies, optimize performance and troubleshoot issues.
5. Black box
$ ps hf -u postgres -o cmd
/usr/pgsql-9.4/bin/postgres -D /var/lib/pgsql/9.4/data
_ postgres: logger process
_ postgres: checkpointer process
_ postgres: writer process
_ postgres: wal writer process
_ postgres: autovacuum launcher process
_ postgres: stats collector process
_ postgres: postgres pgbench [local] idle in transaction
_ postgres: postgres pgbench [local] idle
_ postgres: postgres pgbench [local] UPDATE
_ postgres: postgres pgbench [local] UPDATE waiting
_ postgres: postgres pgbench [local] UPDATE
6. Where time is wasted
Write Ahead Log
Shared
Buffers
Buffers IO Autovacuum Workers
Autovacuum Launcher
Background Workers
Indexes IO
Query Execution
Query Planning
Client Backends Postmaster
Relations IO
Logger Process Stats Collector
Logical
Replication
WAL Sender
Process
Archiver
Process
Background
Writer
Checkpointer
Process
Network Storage
Recovery Process
WAL Receiver Process
Tables/Indexes Data Files
7. Problems
Information too much (more than 100 counters in 9.4).
Statistics provided as an online counters.
No history (but reset functions are available).
No native handy stat tools in PostgreSQL.
Too many 3rd party tools and programs.
8. Problems
Information too much (more than 100 counters in 9.4).
Statistics provided as an online counters.
No history (but reset functions are available).
No native handy stat tools in PostgreSQL.
Too many 3rd party tools and programs.
Important to use stats directly from Postgres.
Basic SQL skills are required.
9. Statistics have
Events happens in postgres cluster.
Objects properties (databases, tables, indexes).
Time spent on events.
14. Cache hit ratio
$ select * from pg_stat_database;
...
blks_read | 7978770895
blks_hit | 9683551077519
...
$ select
sum(blks_hit)*100/sum(blks_hit+blks_read) as hit_ratio
from pg_stat_database;
More = better, and not less than 90%
21. Replication lag
$ select * from pg_stat_replication;
...
sent_location | 1691/EEE65900
write_location | 1691/EEE65900
flush_location | 1691/EEE65900
replay_location | 1691/EEE658D0
...
1692/EEE65900 — location in transaction log (WAL)
All values are equals = ideal
22. Replication lag
What can cause lag:
Networking
Storage
CPU
How much bytes written in WAL
$ select
pg_xlog_location_diff(pg_current_xlog_location(),'0/00000000');
Replication lag in bytes
$ select
client_addr,
pg_xlog_location_diff(pg_current_xlog_location(), replay_location)
from pg_stat_replication;
Replication lag in seconds
$ select
extract(epoch from now() - pg_last_xact_replay_timestamp());
24. Sequential scans
$ select * from pg_stat_all_tables;
...
seq_scan | 192
seq_tup_read | 364544695 > 1000 (seq_tup_avg)
...
$ select
relname,
pg_size_pretty(pg_relation_size(relname::regclass)) as size,
seq_scan, seq_tup_read,
seq_scan / seq_tup_read as seq_tup_avg
from pg_stat_user_tables
where seq_tup_read > 0 order by 3,4 desc limit 5;
25. Tables size
$ select
relname,
pg_size_pretty(pg_total_relation_size(relname::regclass)) as
full_size,
pg_size_pretty(pg_relation_size(relname::regclass)) as
table_size,
pg_size_pretty(pg_total_relation_size(relname::regclass) -
pg_relation_size(relname::regclass)) as index_size
from pg_stat_user_tables
order by pg_total_relation_size(relname::regclass) desc limit 10;
psql meta-commands: dt+ and di+
27. Write activity
$ select
s.relname,
pg_size_pretty(pg_relation_size(relid)),
coalesce(n_tup_ins,0) + 2 * coalesce(n_tup_upd,0) -
coalesce(n_tup_hot_upd,0) + coalesce(n_tup_del,0) AS total_writes,
(coalesce(n_tup_hot_upd,0)::float * 100 / (case when n_tup_upd > 0
then n_tup_upd else 1 end)::float)::numeric(10,2) AS hot_rate,
(select v[1] FROM regexp_matches(reloptions::text,E'fillfactor=(d+)') as
r(v) limit 1) AS fillfactor
from pg_stat_all_tables s
join pg_class c ON c.oid=relid
order by total_writes desc limit 50;
What is Heap-Only Tuples?
HOT do not cause index update.
HOT only for non-indexed columns.
Big n_tup_hot_upd = good.
How increase n_tup_hot_upd?
31. pg_stat_all_indexes
$ select * from pg_stat_all_indexes where idx_scan = 0;
-[ RECORD 1 ]-+------------------------------------------
relid | 98242
indexrelid | 55732253
schemaname | public
relname | products
indexrelname | products_special2_idx
idx_scan | 0
idx_tup_read | 0
idx_tup_fetch | 0
32. Unused indexes
$ select * from pg_stat_all_indexes where idx_scan = 0;
...
indexrelname | products_special2_idx
idx_scan | 0 0 = bad
...
https://github.com/PostgreSQL-Consulting/pg-utils/blob/master/sql/lo
w_used_indexes.sql
http://www.databasesoup.com/2014/05/new-finding-unused-indexes-q
uery.html
33. Unused indexes
$ select * from pg_stat_all_indexes where idx_scan = 0;
...
indexrelname | products_special2_idx
idx_scan | 0 0 = bad
...
https://github.com/PostgreSQL-Consulting/pg-utils/blob/master/sql/lo
w_used_indexes.sql
http://www.databasesoup.com/2014/05/new-finding-unused-indexes-q
uery.html
Unused indexes are bad.
Use storage.
Slow down UPDATE, DELETE, INSERT operations.
Extra work for VACUUM.
38. Long queries
$ select * from pg_stat_activity;
...
backend_start | 2015-10-14 15:18:03.01039+00
xact_start | 2015-10-14 15:21:15.336325+00
query_start | 2015-10-14 15:21:30.336325+00
state_change | 2015-10-14 15:21:30.33635+00
...
$ select
client_addr, usename, datname,
clock_timestamp() - xact_start as xact_age,
clock_timestamp() - query_start as query_age,
query
from pg_stat_activity order by xact_start, query_start;
39. Long queries
$ select * from pg_stat_activity;
...
backend_start | 2015-10-14 15:18:03.01039+00
xact_start | 2015-10-14 15:21:15.336325+00
query_start | 2015-10-14 15:21:30.336325+00
state_change | 2015-10-14 15:21:30.33635+00
...
$ select
client_addr, usename, datname,
clock_timestamp() - xact_start as xact_age,
clock_timestamp() - query_start as query_age,
query
from pg_stat_activity order by xact_start, query_start;
clock_timestamp() for calculating query or transaction age.
Long queries: remember, terminate, optimize.
40. Bad xacts
$ select * from pg_stat_activity where state in
('idle in transaction', 'idle in transaction (aborted)';
...
xact_start | 2015-10-14 15:21:21.128192+00
query_start | 2015-10-14 15:21:30.336325+00
state_change | 2015-10-14 15:21:30.33635+00
state | idle in transaction
...
41. Bad xacts
$ select * from pg_stat_activity where state in
('idle in transaction', 'idle in transaction (aborted)';
...
xact_start | 2015-10-14 15:21:21.128192+00
query_start | 2015-10-14 15:21:30.336325+00
state_change | 2015-10-14 15:21:30.33635+00
state | idle in transaction
...
idle in transaction, idle in transaction (aborted) = bad
Warning value: > 5
clock_timestamp() for calculate xact age.
Bad xacts: remember, terminate, optimize app.
42. Waiting clients
$ select * from pg_stat_activity where waiting;
...
xact_start | 2015-10-14 15:21:21.128192+00
query_start | 2015-10-14 15:21:30.336325+00
state_change | 2015-10-14 15:21:30.33635+00
waiting | t
...
43. Waiting clients
$ select * from pg_stat_activity where waiting;
...
xact_start | 2015-10-14 15:21:21.128192+00
query_start | 2015-10-14 15:21:30.336325+00
state_change | 2015-10-14 15:21:30.33635+00
waiting | t
...
waiting = true = bad.
clock_timestamp() for calculating query or xact age.
Use pg_locks for searching blocking query or xact.
Waiting queries: remember, terminate, optimize app.
46. pg_stat_statements
$ select * from pg_stat_statements;
...
query | SELECT "id" FROM run_plan_xact(?)
calls | 11165832
total_time | 11743325.6880088
rows | 11165832
blk_read_time | 495425.535999976
blk_write_time | 0
Statements average time in ms
$ select (sum(total_time) / sum(calls))::numeric(6,3)
from pg_stat_statements;
Most writing (to shared_buffers) queries
$ select query, shared_blks_dirtied
from pg_stat_statements
where shared_blks_dirtied > 0 order by 2 desc;
47. Query reports
query total time: 15:43:07 (14.9%, CPU: 18.2%, IO: 9.0%)
сalls: 476 (0.00%) rows: 476,000
avg_time: 118881.54ms (IO: 21.2%)
user: app_user db: ustats
query: select
filepath, type, deviceuid
from imv5event
where
state = ?::eventstate
and servertime between $1 and $2
order by servertime desc LIMIT $3 OFFSET $4
48. Query reports
query total time: 15:43:07 (14.9%, CPU: 18.2%, IO: 9.0%)
сalls: 476 (0.00%) rows: 476,000
avg_time: 118881.54ms (IO: 21.2%)
user: app_user db: ustats
query: select
filepath, type, deviceuid
from imv5event
where
state = ?::eventstate
and servertime between $1 and $2
order by servertime desc LIMIT $3 OFFSET $4
Use sum() for calculating totals.
Calculate our query «contribution» in totals.
Resource usage (CPU, IO).
49. Behind the scene
pg_statio_all_tables, pg_statio_all_indexes.
pg_stat_user_functions.
Size functions - df *size*
pgstattuple (official contrib)
●
Bloat estimation for tables and indexes.
●
Estimation time depends on table (or index) size.
pg_buffercache (official contrib)
●
Shared buffers inspection.
●
Heavy performance impact (buffers lock).
50. Behind the scene
pgfincore (3rd party contrib)
●
Low-level operations with tables using mincore().
●
OS page cache inspection.
pg_stat_kcache (3rd party contrib)
●
Using getrusage() before and after query.
●
CPU usage and real filesystem operations stats.
●
Requires pg_stat_statements and postgresql-9.4.
●
No performance impact.
51. Resume
●
The ability to use statistics is useful.
●
Statistics in Postgres is not difficult.
●
Statistics will help answer the questions.
●
Do experiment.