SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
Maintaining Terabytes
   with Postgres

          Selena Deckelmann
     Emma, Inc - http://myemma.com
 PostgreSQL Global Development Group
http://tech.myemma.com
   @emmaemailtech
Environment at Emma

• 1.6 TB, 1 cluster,Version 8.2 (RAID10)
• 1.1 TB, 2 clusters,Version 8.3 (RAID10)
• 8.4, 9.0 Dev
• Putting 9.0 into production (May 2011)
• pgpool, Redis, RabbitMQ, NFS
Other stats

• daily peaks: ~3000 commits per second
• average writes: 4 MBps
• average reads: 8 MBps
• From benchmarks we’ve done, load is
  pushing the limits of our hardware.
I say all of this with
         love.
Huge catalogs

• 409,994 tables
• Minor mistake in parent table definitions
 • Parent table updates take 30+ minutes
not null default nextval('important_sequence'::regclass)

                            vs

not null default nextval('important_sequence'::text)
Huge catalogs

• Bloat in the catalog
 • User-provoked ALTER TABLE
 • VACUUM FULL of catalog takes 2+ hrs
Huge catalogs suck

• 9,019,868 total data points for table stats
• 4,550,770 total data points for index stats
• Stats collection is slow
Disk Management

• $PGDATA:
 • pg_tblspc (TABLESPACES)
 • pg_xlog
 • global/pg_stats
 • wal for warm standby
Problems we worked through
  with big schemas Postgres

• Bloat
• Backups
• System resource exhaustion
• Minor upgrades
• Major upgrades
• Transaction wraparound
Bloat Causes

• Frequent UPDATE patterns
• Frequent DELETEs without VACUUM
 • a terabyte of dead tuples
SELECT
                  BLOAT QUERY
   schemaname, tablename, reltuples::bigint, relpages::bigint, otta,
   ROUND(CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS tbloat,
   CASE WHEN relpages < otta THEN 0 ELSE relpages::bigint - otta END AS wastedpages,
   CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END AS wastedbytes,
   CASE WHEN relpages < otta THEN '0 bytes'::text ELSE (bs*(relpages-otta))::bigint || ' bytes' END AS wastedsize,
   iname, ituples::bigint, ipages::bigint, iotta,
   ROUND(CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages/iotta::numeric END,1) AS ibloat,
   CASE WHEN ipages < iotta THEN 0 ELSE ipages::bigint - iotta END AS wastedipages,
   CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes,
   CASE WHEN ipages < iotta THEN '0 bytes' ELSE (bs*(ipages-iotta))::bigint || ' bytes' END AS wastedisize
 FROM (
   SELECT
     schemaname, tablename, cc.reltuples, cc.relpages, bs,
     CEIL((cc.reltuples*((datahdr+ma-
        (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)) AS otta,
     COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages,
     COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta
   FROM (
     SELECT
        ma,bs,schemaname,tablename,
        (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr,
        (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2
     FROM (
        SELECT
          schemaname, tablename, hdr, ma, bs,
          SUM((1-null_frac)*avg_width) AS datawidth,
          MAX(null_frac) AS maxfracsum,
          hdr+(
            SELECT 1+count(*)/8
            FROM pg_stats s2
            WHERE null_frac<>0 AND s2.schemaname = s.schemaname AND s2.tablename = s.tablename
          ) AS nullhdr
        FROM pg_stats s, (
          SELECT
            (SELECT current_setting('block_size')::numeric) AS bs,
            CASE WHEN substring(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr,
            CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma
          FROM (SELECT version() AS v) AS foo
        ) AS constants
        GROUP BY 1,2,3,4,5
     ) AS foo
   ) AS rs
   JOIN pg_class cc ON cc.relname = rs.tablename
   JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname AND nn.nspname <> 'information_schema'
   LEFT JOIN pg_index i ON indrelid = cc.oid
   LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid
 ) AS sml
 WHERE tablename = 'addr'
 ORDER BY wastedbytes DESC LIMIT 1;




            Use check_postgres.pl
https://github.com/bucardo/check_postgres/
Fixing bloat
• Wrote scripts to clean things up
 • VACUUM (for small amounts)
 • CLUSTER
 • TRUNCATE (data loss!)
 • Or most extreme: DROP/CREATE
• And then ran the scripts.
Backups


• pg_dump takes longer and longer
 	
  	
  backup	
  	
  	
  |	
  	
  	
  	
  	
  	
  	
  	
  duration	
  	
  	
  	
  	
  	
  	
  	
  
-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐
	
  2009-­‐11-­‐22	
  |	
  02:44:36.821475
	
  2009-­‐11-­‐23	
  |	
  02:46:20.003507
	
  2009-­‐11-­‐24	
  |	
  02:47:06.260705
	
  2009-­‐12-­‐06	
  |	
  07:13:04.174964
	
  2009-­‐12-­‐13	
  |	
  05:00:01.082676
	
  2009-­‐12-­‐20	
  |	
  06:24:49.433043
	
  2009-­‐12-­‐27	
  |	
  05:35:20.551477
	
  2010-­‐01-­‐03	
  |	
  07:36:49.651492
	
  2010-­‐01-­‐10	
  |	
  05:55:02.396163
	
  2010-­‐01-­‐17	
  |	
  07:32:33.277559
	
  2010-­‐01-­‐24	
  |	
  06:22:46.522319
	
  2010-­‐01-­‐31	
  |	
  10:48:13.060888
	
  2010-­‐02-­‐07	
  |	
  21:21:47.77618
	
  2010-­‐02-­‐14	
  |	
  14:32:04.638267
	
  2010-­‐02-­‐21	
  |	
  11:34:42.353244
	
  2010-­‐02-­‐28	
  |	
  11:13:02.102345
Backups

• pg_dump fails
 • patching pg_dump for SELECT ... LIMIT
 • Crank down shared_buffers
 • or...
http://seeifixedit.com/view/there-i-fixed-it/45
Install 32-bit Postgres and libraries on a 64-bit system.

    Install 64-bit Postgres/libs of the same version.

Copy “hot backup” from 32-bit sys over to 64-bit sys.

Run pg_dump from 64-bit version on 32-bit Postgres.
PSA

• Warm standby is not a backup
 • Hot backup instances
 • “You don’t have valid backups, you have
    valid restores.” (thanks @sarahnovotny)
 • Necessity is the mother of invention...
Ship WAL from Solaris x86 -> Linux
           It did work!
Running out of inodes
• UFS on Solaris
    “The only way to add more inodes to a UFS
    filesystem is:
    1. destroy the filesystem and create a new
    filesystem with a higher inode density
    2. enlarge the filesystem - growfs man page”
•   Solution 0: Delete files.

•   Solution 1: Sharding and bigger FS on Linux

•   Solution 2: ext4 (soon!)
Running out of
available file descriptors

• Too many open files by the database
• Pooling - pgpool-II or pgbouncer?
Minor upgrades


• Stop/start database
• CHECKPOINT() before shutdown
Major Version upgrades

• Too much downtime to dump/restore
 • Write tools to migrate data
 • Trigger-based replication
 •  pg_upgrade
Transaction
wraparound avoidance
• autovacuum triggers are too small
 • 200,000 transactions (2 days)
 • Watch age(datfrozenxid)
 • Increase autovacuum_freeze_max_age
Thanks!

• We’re hiring! - selena@myemma.com
• Emma’s Tech Blog: http://tech.myemma.com
• My blog: http://chesnok.com
• http://twitter.com/selenamarie

Contenu connexe

Tendances

Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres Open
Koichi Suzuki - Postgres-XC Dynamic Cluster  Management @ Postgres OpenKoichi Suzuki - Postgres-XC Dynamic Cluster  Management @ Postgres Open
Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres OpenPostgresOpen
 
Disperse xlator ramon_datalab
Disperse xlator ramon_datalabDisperse xlator ramon_datalab
Disperse xlator ramon_datalabGluster.org
 
Gluster.community.day.2013
Gluster.community.day.2013Gluster.community.day.2013
Gluster.community.day.2013Udo Seidel
 
Latest performance changes by Scylla - Project optimus / Nolimits
Latest performance changes by Scylla - Project optimus / Nolimits Latest performance changes by Scylla - Project optimus / Nolimits
Latest performance changes by Scylla - Project optimus / Nolimits ScyllaDB
 
Distributed Postgres
Distributed PostgresDistributed Postgres
Distributed PostgresStas Kelvich
 
Postgres-XC Write Scalable PostgreSQL Cluster
Postgres-XC Write Scalable PostgreSQL ClusterPostgres-XC Write Scalable PostgreSQL Cluster
Postgres-XC Write Scalable PostgreSQL ClusterMason Sharp
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamojbellis
 
Bn 1016 demo postgre sql-online-training
Bn 1016 demo  postgre sql-online-trainingBn 1016 demo  postgre sql-online-training
Bn 1016 demo postgre sql-online-trainingconline training
 
Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQLJim Mlodgenski
 
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...DataStax
 
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...Gluster.org
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compactionMIJIN AN
 
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...DataStax
 
XSKY - ceph luminous update
XSKY - ceph luminous updateXSKY - ceph luminous update
XSKY - ceph luminous updateinwin stack
 
Tiering barcelona
Tiering barcelonaTiering barcelona
Tiering barcelonaGluster.org
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value StoreSantal Li
 
Challenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightChallenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightGluster.org
 
Cassandra background-and-architecture
Cassandra background-and-architectureCassandra background-and-architecture
Cassandra background-and-architectureMarkus Klems
 
Gluster Storage
Gluster StorageGluster Storage
Gluster StorageRaz Tamir
 

Tendances (20)

Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres Open
Koichi Suzuki - Postgres-XC Dynamic Cluster  Management @ Postgres OpenKoichi Suzuki - Postgres-XC Dynamic Cluster  Management @ Postgres Open
Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres Open
 
Disperse xlator ramon_datalab
Disperse xlator ramon_datalabDisperse xlator ramon_datalab
Disperse xlator ramon_datalab
 
Gluster.community.day.2013
Gluster.community.day.2013Gluster.community.day.2013
Gluster.community.day.2013
 
Latest performance changes by Scylla - Project optimus / Nolimits
Latest performance changes by Scylla - Project optimus / Nolimits Latest performance changes by Scylla - Project optimus / Nolimits
Latest performance changes by Scylla - Project optimus / Nolimits
 
Write behind logging
Write behind loggingWrite behind logging
Write behind logging
 
Distributed Postgres
Distributed PostgresDistributed Postgres
Distributed Postgres
 
Postgres-XC Write Scalable PostgreSQL Cluster
Postgres-XC Write Scalable PostgreSQL ClusterPostgres-XC Write Scalable PostgreSQL Cluster
Postgres-XC Write Scalable PostgreSQL Cluster
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamo
 
Bn 1016 demo postgre sql-online-training
Bn 1016 demo  postgre sql-online-trainingBn 1016 demo  postgre sql-online-training
Bn 1016 demo postgre sql-online-training
 
Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQL
 
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
 
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
 
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
 
XSKY - ceph luminous update
XSKY - ceph luminous updateXSKY - ceph luminous update
XSKY - ceph luminous update
 
Tiering barcelona
Tiering barcelonaTiering barcelona
Tiering barcelona
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value Store
 
Challenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightChallenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan Lambright
 
Cassandra background-and-architecture
Cassandra background-and-architectureCassandra background-and-architecture
Cassandra background-and-architecture
 
Gluster Storage
Gluster StorageGluster Storage
Gluster Storage
 

En vedette

Product Management: Site & Situation
Product Management: Site & SituationProduct Management: Site & Situation
Product Management: Site & SituationJon Gatrell
 
Flex and the city in London - Keynote
Flex and the city in London - KeynoteFlex and the city in London - Keynote
Flex and the city in London - KeynoteMichael Chaize
 
Understand Risk in Communications and Data Breach
Understand Risk in Communications and Data BreachUnderstand Risk in Communications and Data Breach
Understand Risk in Communications and Data BreachJon Gatrell
 
Twitter Personas: Bot or Not?
Twitter Personas: Bot or Not?Twitter Personas: Bot or Not?
Twitter Personas: Bot or Not?Jon Gatrell
 
Irrational Loss Aversion
Irrational Loss AversionIrrational Loss Aversion
Irrational Loss AversionJon Gatrell
 
Social Media Class Baarn - 151112
Social Media Class Baarn - 151112Social Media Class Baarn - 151112
Social Media Class Baarn - 151112Peter Wiegman
 
Museum learning between physical and conceptual spaces
Museum learning between physical and conceptual spaces Museum learning between physical and conceptual spaces
Museum learning between physical and conceptual spaces Mike Sharples
 
Contributing to the WordPress Codex
Contributing to the WordPress CodexContributing to the WordPress Codex
Contributing to the WordPress CodexLorelle VanFossen
 
Maker Art: How to Create a Wonderbox
Maker Art: How to Create a WonderboxMaker Art: How to Create a Wonderbox
Maker Art: How to Create a WonderboxGreen Change
 
Ria2010 keynote développeurs
Ria2010 keynote développeursRia2010 keynote développeurs
Ria2010 keynote développeursMichael Chaize
 

En vedette (14)

Pitchtraining voor studievereniging WATT
Pitchtraining voor studievereniging WATTPitchtraining voor studievereniging WATT
Pitchtraining voor studievereniging WATT
 
Product Management: Site & Situation
Product Management: Site & SituationProduct Management: Site & Situation
Product Management: Site & Situation
 
Flex and the city in London - Keynote
Flex and the city in London - KeynoteFlex and the city in London - Keynote
Flex and the city in London - Keynote
 
Understand Risk in Communications and Data Breach
Understand Risk in Communications and Data BreachUnderstand Risk in Communications and Data Breach
Understand Risk in Communications and Data Breach
 
Twitter Personas: Bot or Not?
Twitter Personas: Bot or Not?Twitter Personas: Bot or Not?
Twitter Personas: Bot or Not?
 
Irrational Loss Aversion
Irrational Loss AversionIrrational Loss Aversion
Irrational Loss Aversion
 
Social Media Class Baarn - 151112
Social Media Class Baarn - 151112Social Media Class Baarn - 151112
Social Media Class Baarn - 151112
 
Museum learning between physical and conceptual spaces
Museum learning between physical and conceptual spaces Museum learning between physical and conceptual spaces
Museum learning between physical and conceptual spaces
 
Contributing to the WordPress Codex
Contributing to the WordPress CodexContributing to the WordPress Codex
Contributing to the WordPress Codex
 
Maker Art: How to Create a Wonderbox
Maker Art: How to Create a WonderboxMaker Art: How to Create a Wonderbox
Maker Art: How to Create a Wonderbox
 
Ria2010 keynote développeurs
Ria2010 keynote développeursRia2010 keynote développeurs
Ria2010 keynote développeurs
 
I M S Bocharov
I M S BocharovI M S Bocharov
I M S Bocharov
 
Foldervisie
FoldervisieFoldervisie
Foldervisie
 
Chebanova
ChebanovaChebanova
Chebanova
 

Similaire à Managing terabytes: When Postgres gets big

Postgresql Database Administration Basic - Day2
Postgresql  Database Administration Basic  - Day2Postgresql  Database Administration Basic  - Day2
Postgresql Database Administration Basic - Day2PoguttuezhiniVP
 
Infrastructure review - Shining a light on the Black Box
Infrastructure review - Shining a light on the Black BoxInfrastructure review - Shining a light on the Black Box
Infrastructure review - Shining a light on the Black BoxMiklos Szel
 
Riak add presentation
Riak add presentationRiak add presentation
Riak add presentationIlya Bogunov
 
High Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseHigh Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseAltinity Ltd
 
Designs, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDesigns, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDaehyeok Kim
 
Linux Performance Tools 2014
Linux Performance Tools 2014Linux Performance Tools 2014
Linux Performance Tools 2014Brendan Gregg
 
Managing Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBManaging Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBJason Terpko
 
Monitoreo de MySQL y PostgreSQL con SQL
Monitoreo de MySQL y PostgreSQL con SQLMonitoreo de MySQL y PostgreSQL con SQL
Monitoreo de MySQL y PostgreSQL con SQLEmanuel Calvo
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingDatabricks
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXzznate
 
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...DataStax
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and PigRicardo Varela
 
Всеволод Поляков (DevOps Team Lead в Grammarly)
Всеволод Поляков (DevOps Team Lead в Grammarly)Всеволод Поляков (DevOps Team Lead в Grammarly)
Всеволод Поляков (DevOps Team Lead в Grammarly)Provectus
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...MongoDB
 
dns.workshop.hsgr
dns.workshop.hsgrdns.workshop.hsgr
dns.workshop.hsgrebalaskas
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Michael Renner
 
Managing data and operation distribution in MongoDB
Managing data and operation distribution in MongoDBManaging data and operation distribution in MongoDB
Managing data and operation distribution in MongoDBAntonios Giannopoulos
 

Similaire à Managing terabytes: When Postgres gets big (20)

Postgresql Database Administration Basic - Day2
Postgresql  Database Administration Basic  - Day2Postgresql  Database Administration Basic  - Day2
Postgresql Database Administration Basic - Day2
 
Infrastructure review - Shining a light on the Black Box
Infrastructure review - Shining a light on the Black BoxInfrastructure review - Shining a light on the Black Box
Infrastructure review - Shining a light on the Black Box
 
Riak add presentation
Riak add presentationRiak add presentation
Riak add presentation
 
High Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseHigh Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouse
 
Designs, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDesigns, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed Systems
 
Linux Performance Tools 2014
Linux Performance Tools 2014Linux Performance Tools 2014
Linux Performance Tools 2014
 
Managing Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBManaging Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDB
 
Data Collection and Storage
Data Collection and StorageData Collection and Storage
Data Collection and Storage
 
Monitoreo de MySQL y PostgreSQL con SQL
Monitoreo de MySQL y PostgreSQL con SQLMonitoreo de MySQL y PostgreSQL con SQL
Monitoreo de MySQL y PostgreSQL con SQL
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
 
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
 
Quick Wins
Quick WinsQuick Wins
Quick Wins
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
 
Всеволод Поляков (DevOps Team Lead в Grammarly)
Всеволод Поляков (DevOps Team Lead в Grammarly)Всеволод Поляков (DevOps Team Lead в Grammarly)
Всеволод Поляков (DevOps Team Lead в Grammarly)
 
Metrics: where and how
Metrics: where and howMetrics: where and how
Metrics: where and how
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
 
dns.workshop.hsgr
dns.workshop.hsgrdns.workshop.hsgr
dns.workshop.hsgr
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014
 
Managing data and operation distribution in MongoDB
Managing data and operation distribution in MongoDBManaging data and operation distribution in MongoDB
Managing data and operation distribution in MongoDB
 

Plus de Selena Deckelmann

While we're here, let's fix computer science education
While we're here, let's fix computer science educationWhile we're here, let's fix computer science education
While we're here, let's fix computer science educationSelena Deckelmann
 
Mistakes were made - LCA 2012
Mistakes were made - LCA 2012Mistakes were made - LCA 2012
Mistakes were made - LCA 2012Selena Deckelmann
 
Postgres needs an aircraft carrier
Postgres needs an aircraft carrierPostgres needs an aircraft carrier
Postgres needs an aircraft carrierSelena Deckelmann
 
Harder, better, faster, stronger: PostgreSQL 9.1
Harder, better, faster, stronger: PostgreSQL 9.1Harder, better, faster, stronger: PostgreSQL 9.1
Harder, better, faster, stronger: PostgreSQL 9.1Selena Deckelmann
 
Letters from the open source trenches - Postgres community
Letters from the open source trenches - Postgres communityLetters from the open source trenches - Postgres community
Letters from the open source trenches - Postgres communitySelena Deckelmann
 
Own it: working with a changing open source community
Own it: working with a changing open source communityOwn it: working with a changing open source community
Own it: working with a changing open source communitySelena Deckelmann
 
Managing terabytes: When PostgreSQL gets big
Managing terabytes: When PostgreSQL gets bigManaging terabytes: When PostgreSQL gets big
Managing terabytes: When PostgreSQL gets bigSelena Deckelmann
 
How a bunch of normal people Used Technology To Repair a Rigged Election
How a bunch of normal people Used Technology To Repair a Rigged ElectionHow a bunch of normal people Used Technology To Repair a Rigged Election
How a bunch of normal people Used Technology To Repair a Rigged ElectionSelena Deckelmann
 
Open Source Bridge Opening Day
Open Source Bridge Opening DayOpen Source Bridge Opening Day
Open Source Bridge Opening DaySelena Deckelmann
 

Plus de Selena Deckelmann (20)

While we're here, let's fix computer science education
While we're here, let's fix computer science educationWhile we're here, let's fix computer science education
While we're here, let's fix computer science education
 
Algorithms are Recipes
Algorithms are RecipesAlgorithms are Recipes
Algorithms are Recipes
 
Hire the right way
Hire the right wayHire the right way
Hire the right way
 
Mistakes were made - LCA 2012
Mistakes were made - LCA 2012Mistakes were made - LCA 2012
Mistakes were made - LCA 2012
 
Pg92 HA, LCA 2012, Ballarat
Pg92 HA, LCA 2012, BallaratPg92 HA, LCA 2012, Ballarat
Pg92 HA, LCA 2012, Ballarat
 
Managing terabytes
Managing terabytesManaging terabytes
Managing terabytes
 
Mistakes were made
Mistakes were madeMistakes were made
Mistakes were made
 
Postgres needs an aircraft carrier
Postgres needs an aircraft carrierPostgres needs an aircraft carrier
Postgres needs an aircraft carrier
 
Mistakes were made
Mistakes were madeMistakes were made
Mistakes were made
 
Harder, better, faster, stronger: PostgreSQL 9.1
Harder, better, faster, stronger: PostgreSQL 9.1Harder, better, faster, stronger: PostgreSQL 9.1
Harder, better, faster, stronger: PostgreSQL 9.1
 
How to ask for money
How to ask for moneyHow to ask for money
How to ask for money
 
Letters from the open source trenches - Postgres community
Letters from the open source trenches - Postgres communityLetters from the open source trenches - Postgres community
Letters from the open source trenches - Postgres community
 
Own it: working with a changing open source community
Own it: working with a changing open source communityOwn it: working with a changing open source community
Own it: working with a changing open source community
 
Managing terabytes: When PostgreSQL gets big
Managing terabytes: When PostgreSQL gets bigManaging terabytes: When PostgreSQL gets big
Managing terabytes: When PostgreSQL gets big
 
Pdxpugday2010 pg90
Pdxpugday2010 pg90Pdxpugday2010 pg90
Pdxpugday2010 pg90
 
Making Software Communities
Making Software CommunitiesMaking Software Communities
Making Software Communities
 
Illustrated buffer cache
Illustrated buffer cacheIllustrated buffer cache
Illustrated buffer cache
 
Bucardo
BucardoBucardo
Bucardo
 
How a bunch of normal people Used Technology To Repair a Rigged Election
How a bunch of normal people Used Technology To Repair a Rigged ElectionHow a bunch of normal people Used Technology To Repair a Rigged Election
How a bunch of normal people Used Technology To Repair a Rigged Election
 
Open Source Bridge Opening Day
Open Source Bridge Opening DayOpen Source Bridge Opening Day
Open Source Bridge Opening Day
 

Managing terabytes: When Postgres gets big

  • 1. Maintaining Terabytes with Postgres Selena Deckelmann Emma, Inc - http://myemma.com PostgreSQL Global Development Group
  • 2. http://tech.myemma.com @emmaemailtech
  • 3. Environment at Emma • 1.6 TB, 1 cluster,Version 8.2 (RAID10) • 1.1 TB, 2 clusters,Version 8.3 (RAID10) • 8.4, 9.0 Dev • Putting 9.0 into production (May 2011) • pgpool, Redis, RabbitMQ, NFS
  • 4. Other stats • daily peaks: ~3000 commits per second • average writes: 4 MBps • average reads: 8 MBps • From benchmarks we’ve done, load is pushing the limits of our hardware.
  • 5. I say all of this with love.
  • 6. Huge catalogs • 409,994 tables • Minor mistake in parent table definitions • Parent table updates take 30+ minutes
  • 7. not null default nextval('important_sequence'::regclass) vs not null default nextval('important_sequence'::text)
  • 8. Huge catalogs • Bloat in the catalog • User-provoked ALTER TABLE • VACUUM FULL of catalog takes 2+ hrs
  • 9. Huge catalogs suck • 9,019,868 total data points for table stats • 4,550,770 total data points for index stats • Stats collection is slow
  • 10. Disk Management • $PGDATA: • pg_tblspc (TABLESPACES) • pg_xlog • global/pg_stats • wal for warm standby
  • 11. Problems we worked through with big schemas Postgres • Bloat • Backups • System resource exhaustion • Minor upgrades • Major upgrades • Transaction wraparound
  • 12. Bloat Causes • Frequent UPDATE patterns • Frequent DELETEs without VACUUM • a terabyte of dead tuples
  • 13. SELECT BLOAT QUERY schemaname, tablename, reltuples::bigint, relpages::bigint, otta, ROUND(CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS tbloat, CASE WHEN relpages < otta THEN 0 ELSE relpages::bigint - otta END AS wastedpages, CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END AS wastedbytes, CASE WHEN relpages < otta THEN '0 bytes'::text ELSE (bs*(relpages-otta))::bigint || ' bytes' END AS wastedsize, iname, ituples::bigint, ipages::bigint, iotta, ROUND(CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages/iotta::numeric END,1) AS ibloat, CASE WHEN ipages < iotta THEN 0 ELSE ipages::bigint - iotta END AS wastedipages, CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes, CASE WHEN ipages < iotta THEN '0 bytes' ELSE (bs*(ipages-iotta))::bigint || ' bytes' END AS wastedisize FROM ( SELECT schemaname, tablename, cc.reltuples, cc.relpages, bs, CEIL((cc.reltuples*((datahdr+ma- (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)) AS otta, COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages, COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta FROM ( SELECT ma,bs,schemaname,tablename, (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr, (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2 FROM ( SELECT schemaname, tablename, hdr, ma, bs, SUM((1-null_frac)*avg_width) AS datawidth, MAX(null_frac) AS maxfracsum, hdr+( SELECT 1+count(*)/8 FROM pg_stats s2 WHERE null_frac<>0 AND s2.schemaname = s.schemaname AND s2.tablename = s.tablename ) AS nullhdr FROM pg_stats s, ( SELECT (SELECT current_setting('block_size')::numeric) AS bs, CASE WHEN substring(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr, CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma FROM (SELECT version() AS v) AS foo ) AS constants GROUP BY 1,2,3,4,5 ) AS foo ) AS rs JOIN pg_class cc ON cc.relname = rs.tablename JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname AND nn.nspname <> 'information_schema' LEFT JOIN pg_index i ON indrelid = cc.oid LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid ) AS sml WHERE tablename = 'addr' ORDER BY wastedbytes DESC LIMIT 1; Use check_postgres.pl https://github.com/bucardo/check_postgres/
  • 14. Fixing bloat • Wrote scripts to clean things up • VACUUM (for small amounts) • CLUSTER • TRUNCATE (data loss!) • Or most extreme: DROP/CREATE • And then ran the scripts.
  • 15. Backups • pg_dump takes longer and longer
  • 16.      backup      |                duration                 -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐  2009-­‐11-­‐22  |  02:44:36.821475  2009-­‐11-­‐23  |  02:46:20.003507  2009-­‐11-­‐24  |  02:47:06.260705  2009-­‐12-­‐06  |  07:13:04.174964  2009-­‐12-­‐13  |  05:00:01.082676  2009-­‐12-­‐20  |  06:24:49.433043  2009-­‐12-­‐27  |  05:35:20.551477  2010-­‐01-­‐03  |  07:36:49.651492  2010-­‐01-­‐10  |  05:55:02.396163  2010-­‐01-­‐17  |  07:32:33.277559  2010-­‐01-­‐24  |  06:22:46.522319  2010-­‐01-­‐31  |  10:48:13.060888  2010-­‐02-­‐07  |  21:21:47.77618  2010-­‐02-­‐14  |  14:32:04.638267  2010-­‐02-­‐21  |  11:34:42.353244  2010-­‐02-­‐28  |  11:13:02.102345
  • 17. Backups • pg_dump fails • patching pg_dump for SELECT ... LIMIT • Crank down shared_buffers • or...
  • 19. Install 32-bit Postgres and libraries on a 64-bit system. Install 64-bit Postgres/libs of the same version. Copy “hot backup” from 32-bit sys over to 64-bit sys. Run pg_dump from 64-bit version on 32-bit Postgres.
  • 20. PSA • Warm standby is not a backup • Hot backup instances • “You don’t have valid backups, you have valid restores.” (thanks @sarahnovotny) • Necessity is the mother of invention...
  • 21. Ship WAL from Solaris x86 -> Linux It did work!
  • 22. Running out of inodes • UFS on Solaris “The only way to add more inodes to a UFS filesystem is: 1. destroy the filesystem and create a new filesystem with a higher inode density 2. enlarge the filesystem - growfs man page” • Solution 0: Delete files. • Solution 1: Sharding and bigger FS on Linux • Solution 2: ext4 (soon!)
  • 23. Running out of available file descriptors • Too many open files by the database • Pooling - pgpool-II or pgbouncer?
  • 24. Minor upgrades • Stop/start database • CHECKPOINT() before shutdown
  • 25. Major Version upgrades • Too much downtime to dump/restore • Write tools to migrate data • Trigger-based replication • pg_upgrade
  • 26. Transaction wraparound avoidance • autovacuum triggers are too small • 200,000 transactions (2 days) • Watch age(datfrozenxid) • Increase autovacuum_freeze_max_age
  • 27.
  • 28.
  • 29. Thanks! • We’re hiring! - selena@myemma.com • Emma’s Tech Blog: http://tech.myemma.com • My blog: http://chesnok.com • http://twitter.com/selenamarie