SlideShare une entreprise Scribd logo
1  sur  46
Amazon Redshift –
Performance Tuning and
Optimization
Dario Rivera – AWS Solutions Architect
Agenda
• Service overview
• Top 10 Redshift Performance Optimizations
• What’s new?
• Q & A (~15 minutes)
Service overview
Relational data warehouse
Massively parallel; petabyte scale
Fully managed
HDD and SSD platforms
$1,000/TB/year; starts at $0.25/hour
Amazon
Redshift
a lot faster
a lot simpler
a lot cheaper
Selected Amazon Redshift customers
Amazon Redshift system architecture
Leader node
• SQL endpoint
• Stores metadata
• Coordinates query execution
Compute nodes
• Local, columnar storage
• Executes queries in parallel
• Load, backup, restore via
Amazon S3; load from
Amazon DynamoDB, Amazon EMR, or SSH
Two hardware platforms
• Optimized for data processing
• DS2: HDD; scale from 2 TB to 2 PB
• DC1: SSD; scale from 160 GB to 326 TB
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC
A deeper look at compute node architecture
Each node contains multiple slices
• DS2 – 2 slices on XL, 16 on 8 XL
• DC1 – 2 slices on L, 32 on 8 XL
Each slice is allocated CPU and
table data
Each slice processes a piece of
the workload in parallel
Leader Node
Let’s get to It!! –
Top 10 Best Practices
Issue #1: Incorrect column encoding
• Amazon Redshift is a column-oriented database
• well suited to analytics queries on tables with a large
number of columns
• Only able to access those blocks on disk that are for
columns included in the SELECT or WHERE clause, and
doesn’t have to read all table data to evaluate a query
• Data stored by column should also be encoded
• Clusters without column encoding is not considered a
best practice
Solution #1: the v_extended_table_info view
• To determine if you are deviating from this best practice,
you can use the v_extended_table_info view from the
Amazon Redshift Utils GitHub repository
• Once view is created, run the following view:
• SELECT database, tablename, columns FROM
admin.v_extended_table_info ORDER BY database;
• Afterward, review the tables and columns which aren’t
encoded by running the following query:
SELECT trim(n.nspname || '.' || c.relname) AS "table",
trim(a.attname) AS "column",
format_type(a.atttypid, a.atttypmod) AS "type",
format_encoding(a.attencodingtype::integer) AS "encoding",
a.attsortkeyord AS "sortkey"
FROM pg_namespace n, pg_class c, pg_attribute a
WHERE n.oid = c.relnamespace
AND c.oid = a.attrelid
AND a.attnum > 0
AND NOT a.attisdropped and n.nspname NOT IN ('information_schema','pg_catalog','pg_toast')
AND format_encoding(a.attencodingtype::integer) = 'none'
AND c.relkind='r'
AND a.attsortkeyord != 1
ORDER BY n.nspname, c.relname, a.attnum;
Solution #1: the v_extended_table_info view
• If you find that you have tables without optimal column
encoding, then use the Amazon Redshift Column
Encoding Utility on AWS Labs GitHub to apply encoding.
• This utility uses the ANALYZE COMPRESSION
command on each table.
• If encoding is required, it generates a SQL script which
creates a new table with the correct encoding, copies all
the data into the new table, and then transactionally
renames the new table to the old name while retaining
the original data.
Solution #1: Redshift Column Encoding Utility
Issue #2 – Skewed table data
• Redshift nodes are managed by the number of slices
(CPUs) per node
• When a table is created, you decide whether to spread
the data evenly among slices (default), or assign data to
specific slices on the basis of one of the columns.
• By choosing columns for distribution that are commonly
joined together, you can minimize the amount of data
transferred over the network during the join.
• This can significantly increase performance on these
types of queries.
• The selection of a good distribution key is the topic of
many AWS articles, including Choose the Best
Distribution Style; see a definitive guide to distribution
and sorting of star schemas in the Optimizing for Star
Schemas and Interleaved Sorting on Amazon Redshift
blog post.
• A skewed distribution key results in slices not working
equally hard as each other during query execution,
requiring unbalanced CPU or memory, and ultimately
only running as fast as the slowest slice:
Issue #2 – Skewed table data
Issue #2 – Skewed table data
If skewing is an issue:
• Use one of the admin
scripts in the
Amazon Redshift
Utils GitHub
repository, such as
table_inspector.sql,
to see how data
blocks in a
distribution key map
to the slices and
nodes in the cluster.
Solution #2: Use Correct Distribution Key
• If you find that you have tables with skewed distribution
keys, then consider changing the distribution key to a
column that exhibits high cardinality and uniform
distribution.
• Evaluate a candidate column as a distribution key by
creating a new table using CTAS:
CREATE TABLE my_test_table DISTKEY (<column name>) AS SELECT <column name> FROM <table name>;
• Run the table_inspector.sql script against the table again
to analyze data skew.
• If there is no good distribution key in any of your records,
you may find that moving to EVEN distribution works
better.
• For small tables (for example, dimension tables with a
couple of million rows), you can also use DISTSTYLE
ALL to place table data onto every node in the cluster.
Solution #2: Use Correct Distribution Key
Issue #3 – Queries not benefiting from sort keys
• Sort keys acts like an index in other databases, but
which does not incur a storage cost as with other
platforms. (for more information, see Choosing Sort Keys)
• To determine which tables don’t have sort keys, run the
following query against the v_extended_table_info view
from the Amazon Redshift Utils repository:
SELECT * FROM admin.v_extended_table_info WHERE sortkey IS null
Solution #3: Use Sort Key on Columns used in
Where Clause
• A sort key should be created on those columns which
are most commonly used in WHERE clauses.
• If you have a known query pattern, then COMPOUND sort
keys give the best performance;
• If using compound sort keys, review your queries to ensure
that their WHERE clauses specify the sort columns in the
same order they were defined in the compound key.
• if end users query different columns equally, then use an
INTERLEAVED sort key.
Solution #3: Run Sort Key Recommendation
Query
• You can run a tutorial that walks you through how to address unsorted
tables in the Amazon Redshift Developer Guide.
• You can also run the following query to generate a list of recommended sort
keys based on query activity:
http://tinyurl.com/redshift-sortkey-recommender
• Queries evaluated against a sort key column must not apply a SQL function
to the sort key; instead, ensure that you apply the functions to the
compared values so that the sort key is used. This is commonly found on
TIMESTAMP columns that are used as sort keys.
Issue #4 – Tables without statistics or which
need vacuum
• Bad Stats: Amazon Redshift, like other databases, requires statistics
about tables and the composition of data blocks being stored in
order to make good decisions when planning a query (for more
information, see Analyzing Tables).
• Stale Data: When rows are DELETED or UPDATED, they are
simply logically deleted (flagged for deletion) but not physically
removed from disk. Updates result in a new block being written with
new data appended. As a result, table storage space is increased
and performance degraded
Solution #4: missing_table_stats.sql admin script
• The ANALYZE Command History topic in the Amazon
Redshift Developer Guide supplies queries to help you
address missing or stale statistics
• Or, you can also simply run the missing_table_stats.sql
admin script to determine which tables are missing stats,
or the statement below to determine tables that have
stale statistics:
SELECT database, schema || '.' || "table" AS "table", stats_off
FROM svv_table_info
WHERE stats_off > 5
ORDER BY 2;
• You can use the perf_alert.sql admin script to identify
tables for which alerts about scanning a large number of
deleted rows have been raised in the last seven days.
• To address issues with tables with missing or stale
statistics or where vacuum is required, run another AWS
Labs utility, Analyze & Vacuum Schema. This ensures
that you always keep up-to-date statistics, and only
vacuum tables that actually need re-organisation.
Solution #4: perf_alert.sql admin script
Issue #5 – Tables with very large VARCHAR columns
During processing of complex queries, intermediate query
results can be stored in temporary uncompressed blocks,
consuming excessive memory and temporary disk space,
affecting query performance. For more information, see
Use the Smallest Possible Column Size.
Solution #5 – Inspect and Deep Copy to Optimized Table
Use the following query to generate a list of tables that should have their
maximum column widths reviewed:
SELECT database, schema || '.' || "table" AS "table", max_varchar
FROM svv_table_info
WHERE max_varchar > 150
ORDER BY 2
After you have a list of tables, identify which table columns have wide varchar
columns and then determine the true maximum width for each wide column,
using the following query:
SELECT max(len(rtrim(column_name)))
FROM table_name;
If you find that the table has columns that are wider than necessary, then you
need to re-create a version of the table with appropriate column widths by
performing a deep copy.
Solution #5 – Working with JSON Columns
• In some cases, you may have large VARCHAR type
columns because you are storing JSON fragments in the
table, which you then query with JSON functions.
• Pay special attention to SELECT * queries which include
the JSON fragment column, from the top_queries.sql
admin script.
• If end users query these large JSON columns but don’t
execute JSON functions against them, consider moving them
into another table that only contains the primary key column
of the original table and the JSON column.
Issue #6 - Queries waiting on queue slots
Amazon Redshift runs queries using a queuing system
known as workload management (WLM), allowing up to 8
separate workloads.
In some cases, the queue to which a user or query has
been assigned is completely busy and a user’s query must
wait for a slot to be open. During this time, the system is
not executing the query at all, which is a sign that you may
need to increase concurrency.
Solution #6 – Assess Queuing History and configure
WLM
• determine if any queries are queuing, using the
queuing_queries.sql admin script.
• Review the maximum concurrency that your cluster has
needed in the past with wlm_apex.sql,
• down to an hour-by-hour historical analysis with
wlm_apex_hourly.sql.
• Modify WLM to optimal queuing configuration based on
history stats
Solution #6 – Configure WLM
-Note: while increasing concurrency allows more queries to
run, each query will get a smaller share of the memory
allocated to its queue (unless you increase it).
You may find that by increasing concurrency, some queries
must use temporary disk storage to complete, which is also
sub-optimal
Issue #7 - Queries that are disk-based
• If a query isn’t able to completely execute in memory, it
may need to use disk-based temporary storage for parts
of an explain plan.
• The additional disk I/O slows down the query, and can
be addressed by increasing the amount of memory
allocated to a session (for more information, see WLM
Dynamic Memory Allocation).
Solution #7 – Disk Write Check and Configure
• To determine if any queries have been writing to disk,
use the following query: http://tinyurl.com/redshift-
diskquery
• Based on the user or the queue assignment rules, you
can increase the amount of memory given to the
selected queue to prevent queries needing to spill to
disk to complete.
• You can also increase the
WLM_QUERY_SLOT_COUNT …use with care!
Issue #8 – Commit Queue Waits
• Amazon Redshift is designed for analytics queries,
rather than transaction processing.
• The cost of COMMIT is relatively high, and excessive
use of COMMIT can result in queries waiting for access
to a commit queue.
Solution 8: Analyze Commit Workloads
• If you are committing too often on your database, you
will start to see waits on the commit queue increase,
• Use the commit_stats.sql admin script to show the
largest queue length and queue time for queries run in
the past two days.
• If you have queries that are waiting on the commit
queue, then look for sessions that are committing
multiple times per session, such as ETL jobs that are
logging progress or inefficient data loads.
Issue #9 - Inefficient data loads
• Anti-Pattern: Insert data directly into Amazon Redshift,
with single record inserts or the use of a multi-value
INSERT statement,
• These INSERTs allow up to a 16 MB ingest of data at
one time.
• These are leader node–based operations, and can
create significant performance bottlenecks by maxing
out the leader node network as data is distributed by the
leader to the compute nodes.
Solution #9 – Use COPY cmd, and Compression
• Amazon Redshift best practices suggest the use of the
COPY command to perform data loads. This API
operation uses all compute nodes in the cluster to load
data in parallel.
• Compress the files to be loaded whenever possible;
Amazon Redshift supports both GZIP and LZO
compression.
Solution #9 – Use COPY cmd, and Compression
• It is more efficient to load a large number of small files
than one large one, and the ideal file count is a multiple
of the slice count.
• The number of slices per node depends on the node
size of the cluster.
• By ensuring you have an equal number of files per slice,
you can know that COPY execution will evenly use
cluster resources and complete as quickly as possible.
Solution #9 – Table Load Statistics Scripts
The following query calculates statistics for each load:
http://preview.tinyurl.com/redshift-ingest-load-stats
The following query shows the time taken to load a table,
and the time taken to update the table statistics, both in
seconds and as a percentage of the overall load process:
http://tinyurl.com/redshift-tableload-stat-time
Issue #10 - Inefficient use of Temporary Tables
Overview of Temp Tables:
• Amazon Redshift provides temporary tables, which are like normal
tables except that they are only visible within a single session.
• When the user disconnects the session, the tables are automatically
deleted.
• Temporary tables can be created using the CREATE TEMPORARY
TABLE syntax, or by issuing a SELECT … INTO #TEMP_TABLE
query.
• The CREATE TABLE statement gives you complete control over the
definition of the temporary table, while the SELECT … INTO and
C(T)TAS commands use the input data to determine column names,
sizes and data types, and use default storage properties.
• These default storage properties may cause issues if not
carefully considered.
• Amazon Redshift’s default table structure is to use
EVEN distribution with no column encoding.
• This is a sub-optimal data structure for many types of
queries, and if you are using SELECT…INTO syntax you
cannot set the column encoding or distribution and sort
keys.
• If you use the CREATE TABLE AS (CTAS) syntax, you
can specify a distribution style and sort keys, but you still
can’t set the column encodings.
Issue #10 - Inefficient use of Temporary Tables
Solution #10 – Apply Source Table configurations to Target Temp Table
If you are creating temporary tables, it is highly
recommended that you convert all SELECT…INTO syntax
to use the CREATE statement. This ensures that your
temporary tables have column encodings and are
distributed in a fashion that is sympathetic to the other
entities that are part of the workflow.
If you typically do this:
SELECT column_a, column_b INTO #my_temp_table FROM my_table;
You would analyze the temporary table for optimal column
encoding:
Solution #10 – Apply Source Table configurations to Target Temp Table
And then convert the select/into statement to:
BEGIN;
CREATE TEMPORARY TABLE my_temp_table(
column_a varchar(128) encode lzo,
column_b char(4) encode bytedict)
distkey (column_a) -- Assuming you intend to join this table on column_a
sortkey (column_b); -- Assuming you are sorting or grouping by column_b
INSERT INTO my_temp_table SELECT column_a, column_b FROM my_table;
COMMIT;
You may also wish to analyze statistics on the temporary
table, if it is used as a join table for subsequent queries:
ANALYZE my_temp_table;
Solution #10 – Apply Source Table configurations to Target Temp Table
Solution #11 - Using explain plan alerts
• The last tip is to use diagnostic information from the
cluster during query execution. This is stored in an
extremely useful view called STL_ALERT_EVENT_LOG.
• Use the perf_alert.sql admin script to diagnose issues
that the cluster has encountered over the last seven
days. This is an invaluable resource in understanding
how your cluster develops over time.
Open source tools
https://github.com/awslabs/amazon-redshift-utils
https://github.com/awslabs/amazon-redshift-monitoring
https://github.com/awslabs/amazon-redshift-udfs
Admin scripts
Collection of utilities for running diagnostics on your cluster
Admin views
Collection of utilities for managing your cluster, generating schema DDL, etc.
ColumnEncodingUtility
Gives you the ability to apply optimal column encoding to an established
schema with data already loaded
Remember to complete
your evaluations!
Thank you!

Contenu connexe

Tendances

Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowJulien Le Dem
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon RedshiftAmazon Web Services
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesNishith Agarwal
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Amazon Web Services
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon Web Services
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeDatabricks
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodDatabricks
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 

Tendances (20)

Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
The delta architecture
The delta architectureThe delta architecture
The delta architecture
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best Practices
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 

Similaire à Amazon Redshift: Performance Tuning and Optimization

How to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon RedshiftHow to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon RedshiftAWS Germany
 
MySQL: Know more about open Source Database
MySQL: Know more about open Source DatabaseMySQL: Know more about open Source Database
MySQL: Know more about open Source DatabaseMahesh Salaria
 
Deep Dive: Amazon Redshift (March 2017)
Deep Dive: Amazon Redshift (March 2017)Deep Dive: Amazon Redshift (March 2017)
Deep Dive: Amazon Redshift (March 2017)Julien SIMON
 
Deep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performanceDeep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performanceAmazon Web Services
 
Query Optimization in SQL Server
Query Optimization in SQL ServerQuery Optimization in SQL Server
Query Optimization in SQL ServerRajesh Gunasundaram
 
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAmazon Web Services
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandraTarun Garg
 
MemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks WebcastMemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks WebcastSingleStore
 
02 database oprimization - improving sql performance - ent-db
02  database oprimization - improving sql performance - ent-db02  database oprimization - improving sql performance - ent-db
02 database oprimization - improving sql performance - ent-dbuncleRhyme
 
Geek Sync I Need for Speed: In-Memory Databases in Oracle and SQL Server
Geek Sync I Need for Speed: In-Memory Databases in Oracle and SQL ServerGeek Sync I Need for Speed: In-Memory Databases in Oracle and SQL Server
Geek Sync I Need for Speed: In-Memory Databases in Oracle and SQL ServerIDERA Software
 
Amazon Redshift For Data Analysts
Amazon Redshift For Data AnalystsAmazon Redshift For Data Analysts
Amazon Redshift For Data AnalystsCan Abacıgil
 
PostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / ShardingPostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / ShardingAmir Reza Hashemi
 
Postgres db performance improvements
Postgres db performance improvementsPostgres db performance improvements
Postgres db performance improvementsMahesh Chopker
 
Introduction To Maxtable
Introduction To MaxtableIntroduction To Maxtable
Introduction To Maxtablemaxtable
 
Managing user Online Training in IBM Netezza DBA Development by www.etraining...
Managing user Online Training in IBM Netezza DBA Development by www.etraining...Managing user Online Training in IBM Netezza DBA Development by www.etraining...
Managing user Online Training in IBM Netezza DBA Development by www.etraining...Ravikumar Nandigam
 

Similaire à Amazon Redshift: Performance Tuning and Optimization (20)

How to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon RedshiftHow to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon Redshift
 
MySQL: Know more about open Source Database
MySQL: Know more about open Source DatabaseMySQL: Know more about open Source Database
MySQL: Know more about open Source Database
 
Deep Dive: Amazon Redshift (March 2017)
Deep Dive: Amazon Redshift (March 2017)Deep Dive: Amazon Redshift (March 2017)
Deep Dive: Amazon Redshift (March 2017)
 
Deep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performanceDeep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performance
 
Query Optimization in SQL Server
Query Optimization in SQL ServerQuery Optimization in SQL Server
Query Optimization in SQL Server
 
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
 
Module 3 design and implementing tables
Module 3 design and implementing tablesModule 3 design and implementing tables
Module 3 design and implementing tables
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
 
MemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks WebcastMemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks Webcast
 
Mysql For Developers
Mysql For DevelopersMysql For Developers
Mysql For Developers
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
02 database oprimization - improving sql performance - ent-db
02  database oprimization - improving sql performance - ent-db02  database oprimization - improving sql performance - ent-db
02 database oprimization - improving sql performance - ent-db
 
Geek Sync I Need for Speed: In-Memory Databases in Oracle and SQL Server
Geek Sync I Need for Speed: In-Memory Databases in Oracle and SQL ServerGeek Sync I Need for Speed: In-Memory Databases in Oracle and SQL Server
Geek Sync I Need for Speed: In-Memory Databases in Oracle and SQL Server
 
Amazon Redshift For Data Analysts
Amazon Redshift For Data AnalystsAmazon Redshift For Data Analysts
Amazon Redshift For Data Analysts
 
Tunning overview
Tunning overviewTunning overview
Tunning overview
 
PostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / ShardingPostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / Sharding
 
Postgres db performance improvements
Postgres db performance improvementsPostgres db performance improvements
Postgres db performance improvements
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Introduction To Maxtable
Introduction To MaxtableIntroduction To Maxtable
Introduction To Maxtable
 
Managing user Online Training in IBM Netezza DBA Development by www.etraining...
Managing user Online Training in IBM Netezza DBA Development by www.etraining...Managing user Online Training in IBM Netezza DBA Development by www.etraining...
Managing user Online Training in IBM Netezza DBA Development by www.etraining...
 

Plus de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Plus de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Dernier

Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Chameera Dedduwage
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyPooja Nehwal
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Delhi Call girls
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Pooja Nehwal
 
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrSaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrsaastr
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Salam Al-Karadaghi
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...NETWAYS
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Vipesco
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AITatiana Gurgel
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesPooja Nehwal
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfhenrik385807
 
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...henrik385807
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024eCommerce Institute
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...Sheetaleventcompany
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Kayode Fayemi
 
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )Pooja Nehwal
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Hasting Chen
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfhenrik385807
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxmohammadalnahdi22
 
Mathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMoumonDas2
 

Dernier (20)

Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
 
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrSaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AI
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
 
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
 
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
 
Mathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptx
 

Amazon Redshift: Performance Tuning and Optimization

  • 1. Amazon Redshift – Performance Tuning and Optimization Dario Rivera – AWS Solutions Architect
  • 2. Agenda • Service overview • Top 10 Redshift Performance Optimizations • What’s new? • Q & A (~15 minutes)
  • 4. Relational data warehouse Massively parallel; petabyte scale Fully managed HDD and SSD platforms $1,000/TB/year; starts at $0.25/hour Amazon Redshift a lot faster a lot simpler a lot cheaper
  • 6. Amazon Redshift system architecture Leader node • SQL endpoint • Stores metadata • Coordinates query execution Compute nodes • Local, columnar storage • Executes queries in parallel • Load, backup, restore via Amazon S3; load from Amazon DynamoDB, Amazon EMR, or SSH Two hardware platforms • Optimized for data processing • DS2: HDD; scale from 2 TB to 2 PB • DC1: SSD; scale from 160 GB to 326 TB 10 GigE (HPC) Ingestion Backup Restore JDBC/ODBC
  • 7. A deeper look at compute node architecture Each node contains multiple slices • DS2 – 2 slices on XL, 16 on 8 XL • DC1 – 2 slices on L, 32 on 8 XL Each slice is allocated CPU and table data Each slice processes a piece of the workload in parallel Leader Node
  • 8. Let’s get to It!! – Top 10 Best Practices
  • 9. Issue #1: Incorrect column encoding • Amazon Redshift is a column-oriented database • well suited to analytics queries on tables with a large number of columns • Only able to access those blocks on disk that are for columns included in the SELECT or WHERE clause, and doesn’t have to read all table data to evaluate a query • Data stored by column should also be encoded • Clusters without column encoding is not considered a best practice
  • 10. Solution #1: the v_extended_table_info view • To determine if you are deviating from this best practice, you can use the v_extended_table_info view from the Amazon Redshift Utils GitHub repository • Once view is created, run the following view: • SELECT database, tablename, columns FROM admin.v_extended_table_info ORDER BY database;
  • 11. • Afterward, review the tables and columns which aren’t encoded by running the following query: SELECT trim(n.nspname || '.' || c.relname) AS "table", trim(a.attname) AS "column", format_type(a.atttypid, a.atttypmod) AS "type", format_encoding(a.attencodingtype::integer) AS "encoding", a.attsortkeyord AS "sortkey" FROM pg_namespace n, pg_class c, pg_attribute a WHERE n.oid = c.relnamespace AND c.oid = a.attrelid AND a.attnum > 0 AND NOT a.attisdropped and n.nspname NOT IN ('information_schema','pg_catalog','pg_toast') AND format_encoding(a.attencodingtype::integer) = 'none' AND c.relkind='r' AND a.attsortkeyord != 1 ORDER BY n.nspname, c.relname, a.attnum; Solution #1: the v_extended_table_info view
  • 12. • If you find that you have tables without optimal column encoding, then use the Amazon Redshift Column Encoding Utility on AWS Labs GitHub to apply encoding. • This utility uses the ANALYZE COMPRESSION command on each table. • If encoding is required, it generates a SQL script which creates a new table with the correct encoding, copies all the data into the new table, and then transactionally renames the new table to the old name while retaining the original data. Solution #1: Redshift Column Encoding Utility
  • 13. Issue #2 – Skewed table data • Redshift nodes are managed by the number of slices (CPUs) per node • When a table is created, you decide whether to spread the data evenly among slices (default), or assign data to specific slices on the basis of one of the columns. • By choosing columns for distribution that are commonly joined together, you can minimize the amount of data transferred over the network during the join. • This can significantly increase performance on these types of queries.
  • 14. • The selection of a good distribution key is the topic of many AWS articles, including Choose the Best Distribution Style; see a definitive guide to distribution and sorting of star schemas in the Optimizing for Star Schemas and Interleaved Sorting on Amazon Redshift blog post. • A skewed distribution key results in slices not working equally hard as each other during query execution, requiring unbalanced CPU or memory, and ultimately only running as fast as the slowest slice: Issue #2 – Skewed table data
  • 15. Issue #2 – Skewed table data If skewing is an issue: • Use one of the admin scripts in the Amazon Redshift Utils GitHub repository, such as table_inspector.sql, to see how data blocks in a distribution key map to the slices and nodes in the cluster.
  • 16. Solution #2: Use Correct Distribution Key • If you find that you have tables with skewed distribution keys, then consider changing the distribution key to a column that exhibits high cardinality and uniform distribution. • Evaluate a candidate column as a distribution key by creating a new table using CTAS: CREATE TABLE my_test_table DISTKEY (<column name>) AS SELECT <column name> FROM <table name>; • Run the table_inspector.sql script against the table again to analyze data skew.
  • 17. • If there is no good distribution key in any of your records, you may find that moving to EVEN distribution works better. • For small tables (for example, dimension tables with a couple of million rows), you can also use DISTSTYLE ALL to place table data onto every node in the cluster. Solution #2: Use Correct Distribution Key
  • 18. Issue #3 – Queries not benefiting from sort keys • Sort keys acts like an index in other databases, but which does not incur a storage cost as with other platforms. (for more information, see Choosing Sort Keys) • To determine which tables don’t have sort keys, run the following query against the v_extended_table_info view from the Amazon Redshift Utils repository: SELECT * FROM admin.v_extended_table_info WHERE sortkey IS null
  • 19. Solution #3: Use Sort Key on Columns used in Where Clause • A sort key should be created on those columns which are most commonly used in WHERE clauses. • If you have a known query pattern, then COMPOUND sort keys give the best performance; • If using compound sort keys, review your queries to ensure that their WHERE clauses specify the sort columns in the same order they were defined in the compound key. • if end users query different columns equally, then use an INTERLEAVED sort key.
  • 20. Solution #3: Run Sort Key Recommendation Query • You can run a tutorial that walks you through how to address unsorted tables in the Amazon Redshift Developer Guide. • You can also run the following query to generate a list of recommended sort keys based on query activity: http://tinyurl.com/redshift-sortkey-recommender • Queries evaluated against a sort key column must not apply a SQL function to the sort key; instead, ensure that you apply the functions to the compared values so that the sort key is used. This is commonly found on TIMESTAMP columns that are used as sort keys.
  • 21. Issue #4 – Tables without statistics or which need vacuum • Bad Stats: Amazon Redshift, like other databases, requires statistics about tables and the composition of data blocks being stored in order to make good decisions when planning a query (for more information, see Analyzing Tables). • Stale Data: When rows are DELETED or UPDATED, they are simply logically deleted (flagged for deletion) but not physically removed from disk. Updates result in a new block being written with new data appended. As a result, table storage space is increased and performance degraded
  • 22. Solution #4: missing_table_stats.sql admin script • The ANALYZE Command History topic in the Amazon Redshift Developer Guide supplies queries to help you address missing or stale statistics • Or, you can also simply run the missing_table_stats.sql admin script to determine which tables are missing stats, or the statement below to determine tables that have stale statistics: SELECT database, schema || '.' || "table" AS "table", stats_off FROM svv_table_info WHERE stats_off > 5 ORDER BY 2;
  • 23. • You can use the perf_alert.sql admin script to identify tables for which alerts about scanning a large number of deleted rows have been raised in the last seven days. • To address issues with tables with missing or stale statistics or where vacuum is required, run another AWS Labs utility, Analyze & Vacuum Schema. This ensures that you always keep up-to-date statistics, and only vacuum tables that actually need re-organisation. Solution #4: perf_alert.sql admin script
  • 24. Issue #5 – Tables with very large VARCHAR columns During processing of complex queries, intermediate query results can be stored in temporary uncompressed blocks, consuming excessive memory and temporary disk space, affecting query performance. For more information, see Use the Smallest Possible Column Size.
  • 25. Solution #5 – Inspect and Deep Copy to Optimized Table Use the following query to generate a list of tables that should have their maximum column widths reviewed: SELECT database, schema || '.' || "table" AS "table", max_varchar FROM svv_table_info WHERE max_varchar > 150 ORDER BY 2 After you have a list of tables, identify which table columns have wide varchar columns and then determine the true maximum width for each wide column, using the following query: SELECT max(len(rtrim(column_name))) FROM table_name; If you find that the table has columns that are wider than necessary, then you need to re-create a version of the table with appropriate column widths by performing a deep copy.
  • 26. Solution #5 – Working with JSON Columns • In some cases, you may have large VARCHAR type columns because you are storing JSON fragments in the table, which you then query with JSON functions. • Pay special attention to SELECT * queries which include the JSON fragment column, from the top_queries.sql admin script. • If end users query these large JSON columns but don’t execute JSON functions against them, consider moving them into another table that only contains the primary key column of the original table and the JSON column.
  • 27. Issue #6 - Queries waiting on queue slots Amazon Redshift runs queries using a queuing system known as workload management (WLM), allowing up to 8 separate workloads. In some cases, the queue to which a user or query has been assigned is completely busy and a user’s query must wait for a slot to be open. During this time, the system is not executing the query at all, which is a sign that you may need to increase concurrency.
  • 28. Solution #6 – Assess Queuing History and configure WLM • determine if any queries are queuing, using the queuing_queries.sql admin script. • Review the maximum concurrency that your cluster has needed in the past with wlm_apex.sql, • down to an hour-by-hour historical analysis with wlm_apex_hourly.sql. • Modify WLM to optimal queuing configuration based on history stats
  • 29. Solution #6 – Configure WLM -Note: while increasing concurrency allows more queries to run, each query will get a smaller share of the memory allocated to its queue (unless you increase it). You may find that by increasing concurrency, some queries must use temporary disk storage to complete, which is also sub-optimal
  • 30. Issue #7 - Queries that are disk-based • If a query isn’t able to completely execute in memory, it may need to use disk-based temporary storage for parts of an explain plan. • The additional disk I/O slows down the query, and can be addressed by increasing the amount of memory allocated to a session (for more information, see WLM Dynamic Memory Allocation).
  • 31. Solution #7 – Disk Write Check and Configure • To determine if any queries have been writing to disk, use the following query: http://tinyurl.com/redshift- diskquery • Based on the user or the queue assignment rules, you can increase the amount of memory given to the selected queue to prevent queries needing to spill to disk to complete. • You can also increase the WLM_QUERY_SLOT_COUNT …use with care!
  • 32. Issue #8 – Commit Queue Waits • Amazon Redshift is designed for analytics queries, rather than transaction processing. • The cost of COMMIT is relatively high, and excessive use of COMMIT can result in queries waiting for access to a commit queue.
  • 33. Solution 8: Analyze Commit Workloads • If you are committing too often on your database, you will start to see waits on the commit queue increase, • Use the commit_stats.sql admin script to show the largest queue length and queue time for queries run in the past two days. • If you have queries that are waiting on the commit queue, then look for sessions that are committing multiple times per session, such as ETL jobs that are logging progress or inefficient data loads.
  • 34. Issue #9 - Inefficient data loads • Anti-Pattern: Insert data directly into Amazon Redshift, with single record inserts or the use of a multi-value INSERT statement, • These INSERTs allow up to a 16 MB ingest of data at one time. • These are leader node–based operations, and can create significant performance bottlenecks by maxing out the leader node network as data is distributed by the leader to the compute nodes.
  • 35. Solution #9 – Use COPY cmd, and Compression • Amazon Redshift best practices suggest the use of the COPY command to perform data loads. This API operation uses all compute nodes in the cluster to load data in parallel. • Compress the files to be loaded whenever possible; Amazon Redshift supports both GZIP and LZO compression.
  • 36. Solution #9 – Use COPY cmd, and Compression • It is more efficient to load a large number of small files than one large one, and the ideal file count is a multiple of the slice count. • The number of slices per node depends on the node size of the cluster. • By ensuring you have an equal number of files per slice, you can know that COPY execution will evenly use cluster resources and complete as quickly as possible.
  • 37. Solution #9 – Table Load Statistics Scripts The following query calculates statistics for each load: http://preview.tinyurl.com/redshift-ingest-load-stats The following query shows the time taken to load a table, and the time taken to update the table statistics, both in seconds and as a percentage of the overall load process: http://tinyurl.com/redshift-tableload-stat-time
  • 38. Issue #10 - Inefficient use of Temporary Tables Overview of Temp Tables: • Amazon Redshift provides temporary tables, which are like normal tables except that they are only visible within a single session. • When the user disconnects the session, the tables are automatically deleted. • Temporary tables can be created using the CREATE TEMPORARY TABLE syntax, or by issuing a SELECT … INTO #TEMP_TABLE query. • The CREATE TABLE statement gives you complete control over the definition of the temporary table, while the SELECT … INTO and C(T)TAS commands use the input data to determine column names, sizes and data types, and use default storage properties.
  • 39. • These default storage properties may cause issues if not carefully considered. • Amazon Redshift’s default table structure is to use EVEN distribution with no column encoding. • This is a sub-optimal data structure for many types of queries, and if you are using SELECT…INTO syntax you cannot set the column encoding or distribution and sort keys. • If you use the CREATE TABLE AS (CTAS) syntax, you can specify a distribution style and sort keys, but you still can’t set the column encodings. Issue #10 - Inefficient use of Temporary Tables
  • 40. Solution #10 – Apply Source Table configurations to Target Temp Table If you are creating temporary tables, it is highly recommended that you convert all SELECT…INTO syntax to use the CREATE statement. This ensures that your temporary tables have column encodings and are distributed in a fashion that is sympathetic to the other entities that are part of the workflow.
  • 41. If you typically do this: SELECT column_a, column_b INTO #my_temp_table FROM my_table; You would analyze the temporary table for optimal column encoding: Solution #10 – Apply Source Table configurations to Target Temp Table
  • 42. And then convert the select/into statement to: BEGIN; CREATE TEMPORARY TABLE my_temp_table( column_a varchar(128) encode lzo, column_b char(4) encode bytedict) distkey (column_a) -- Assuming you intend to join this table on column_a sortkey (column_b); -- Assuming you are sorting or grouping by column_b INSERT INTO my_temp_table SELECT column_a, column_b FROM my_table; COMMIT; You may also wish to analyze statistics on the temporary table, if it is used as a join table for subsequent queries: ANALYZE my_temp_table; Solution #10 – Apply Source Table configurations to Target Temp Table
  • 43. Solution #11 - Using explain plan alerts • The last tip is to use diagnostic information from the cluster during query execution. This is stored in an extremely useful view called STL_ALERT_EVENT_LOG. • Use the perf_alert.sql admin script to diagnose issues that the cluster has encountered over the last seven days. This is an invaluable resource in understanding how your cluster develops over time.
  • 44. Open source tools https://github.com/awslabs/amazon-redshift-utils https://github.com/awslabs/amazon-redshift-monitoring https://github.com/awslabs/amazon-redshift-udfs Admin scripts Collection of utilities for running diagnostics on your cluster Admin views Collection of utilities for managing your cluster, generating schema DDL, etc. ColumnEncodingUtility Gives you the ability to apply optimal column encoding to an established schema with data already loaded